Reasoning Model - Albert Masoliver's learning site

## Definition **Reasoning model** is a model trained to reason and plan *internally* before answering — folding chain-of-thought inward as a learned behaviour rather than something you coax out with a prompt. OpenAI's o1 ("Strawberry") is the archetype: it generates a long private reasoning trace, then emits a comparatively short final answer. ## Internal versus external reasoning The distinction that defines this note: *where does the reasoning live?* | | External reasoning | Reasoning model | |---|---|---| | How elicited | prompt tricks ("think step by step") | trained-in, automatic | | Where the trace lives | in the visible output | in a private/internal phase | | Control knob | [[Prompt Engineering]] | a [[Reasoning Budget]] | | Cost shape | as many tokens as you prompt | scales with internal effort | Classic [[Chain-of-Thought]] is the external case — you *ask* for reasoning and it appears in the answer. A reasoning model does this on its own; the chain-of-thought becomes machinery rather than output. You no longer prompt for reasoning, you *budget* it. ## The reasoning budget Because the internal trace is variable-length, the natural control surface is a [[Reasoning Budget]] — a knob (low / medium / high, or a token allowance) that tells the model how hard to think. This converts "reasoning" from a prompt-engineering art into a deployment dial: spend more inference compute on hard problems, less on easy ones. It is the same idea as [[Self-Consistency]] or [[Tree of Thought]] — buy accuracy with compute — but pushed inside the model where the model itself allocates the spend. ## Anthropic's extended / adaptive thinking Anthropic's [[Extended Thinking]] is the same pressure expressed differently: the model can be allotted a thinking budget and will emit explicit thinking blocks before its answer, with *adaptive* thinking deciding how much to use per request. Whether you call it a reasoning model, extended thinking, or test-time scaling, the convergent insight across the field is one sentence: **spend more inference compute when the task is hard, and less when it isn't.** ## Why it matters for agents Reasoning models change the economics of [[Agent Planning]]: a model that plans well internally needs fewer external scaffolding tricks and fewer loop iterations, which directly attacks [[Compounding Error]] by raising per-step quality. The trade-off is latency and cost — every "hard" turn now carries a long hidden trace — so the budget knob is not a nicety but the primary lever for tuning an agent's reliability-versus-cost curve. ## Related - [[Chain-of-Thought]] - [[Test-Time Compute]] - [[Extended Thinking]] - [[Reasoning Budget]] - [[Agent Planning]]