# Labs — Module 1: Foundations & The "Thinking" Economy
> Four short labs to build mechanical intuition for tokens, reasoning
> budgets, model selection, and context decay. Each fits in a single
> focused block; the whole set is ~70 minutes.
| Lab | Title | Time | Maps to |
|-----|---------------------------------------------|--------|----------------------------------|
| 1.1 | Measure your codebase in tokens | 15 min | §1.2 Tokens are the unit |
| 1.2 | A/B the reasoning budgets | 20 min | §1.3 Reasoning budget management |
| 1.3 | Draft a personal model-selection rubric | 15 min | §1.4 Model selection strategy |
| 1.4 | Trigger and recover from context decay | 20 min | §1.5 The context window challenge |
---
## Lab 1.1 — Measure your codebase in tokens
### Objective
Replace your gut-feel estimate of "how big is this prompt?" with a
measured number from your own repo.
### Time
15 minutes.
### Real-world scenarios
Pick one — chars/token ratio shifts dramatically by code style.
- **A — Multi-tenant SaaS backend (TS / Fastify).** Measure
`src/auth/`'s largest file (heavy Zod schemas tokenize
expensively).
- **B — Data pipeline (Python / Airflow).** Measure one operator
class plus one large SQL template (mixed languages reveal which
is inflating prompts).
- **C — Game backend (Go).** Measure one matchmaking file plus a
generated `*.pb.go` (protobuf is long but redundant — a "what
not to load" example).
### Setup
Pick two real files: one medium (~300 lines), one large (biggest
file you regularly ask an agent to read).
### Steps
1. Count tokens for both files:
```bash
python -c "import anthropic,sys; c=anthropic.Anthropic(); \
print(sys.argv[1], c.messages.count_tokens(model='claude-sonnet-4-6', \
messages=[{'role':'user','content':open(sys.argv[1]).read()}]).input_tokens)" \
<small-file> <large-file>
```
2. **Before peeking:** estimate the total for "`AGENTS.md` + the
large file". Write it down.
3. Measure that combination, compute % error.
### Deliverable
`labs/notes/1.1-token-budget.md` with the two file counts, your
estimate, your error %, and a one-line rule like *"if a prompt
would load > N tokens of files, scope it first."*
### Success criteria
- Two real measurements, not estimates.
- A rule that names a concrete token threshold (not "if it's a lot").
### Reflection
- Which file's chars-per-token ratio surprised you, and why?
### Stretch
- Compute the dollar cost of a "load all of `src/` and refactor"
prompt at current Sonnet rates.
---
## Lab 1.2 — A/B the reasoning budgets
### Objective
Build personal evidence for when `ultrathink` earns its keep on a
real task — instead of guessing.
### Time
20 minutes.
### Real-world scenarios
Pick a task whose shape matches one of these — small enough to
attempt twice in 20 minutes, hard enough to discriminate:
- **A — Payment refund refactor (fintech).** Break a tangled
`processRefund` into validation / ledger / notification. Edge cases
reward reasoning.
- **B — Observability log parser.** Refactor a brittle 150-line
parser into one strategy per source format.
- **C — Auth middleware split.** Untangle bearer-token and session-
cookie paths in a single file. Security-relevant.
### Setup
A scratch branch off a clean main. A task estimated at ~6 minutes
per run.
### Steps
1. **Run A — Sonnet, no thinking:** `claude --model claude-sonnet-4-6 "<task>"`.
Capture wall-clock, `/cost`, tests-pass yes/no, code quality 1–5.
2. `git reset --hard <base>`.
3. **Run B — Sonnet, ultrathink:**
`claude --model claude-sonnet-4-6 "ultrathink. <task>"`. Capture the
same metrics.
### Deliverable
`labs/notes/1.2-reasoning.md` with a two-row comparison table and a
one-line verdict naming a *task shape* (not "complex tasks") where
each setting wins.
### Success criteria
- Cost ratio between runs is ≥3×. If less, your prompt isn't
triggering the thinking mode — fix it and retry.
- Your verdict can be applied tomorrow without re-reading the table.
### Reflection
- Where did the extra reasoning visibly help — in the *plan*, the
*code*, or the *failure-mode analysis*?
### Stretch
- Add a third run: Opus + `think harder`. Often loses to Sonnet +
ultrathink at higher cost.
---
## Lab 1.3 — Draft a personal model-selection rubric
### Objective
Turn the §1.4 generic rubric into one specific to your work — and
defensible without notes.
### Time
15 minutes.
### Real-world scenarios
Your role shifts the rubric. Pick the closest:
- **A — IC at a mid-sized SaaS (~30 engineers).** Mixed full-stack,
cost-sensitive day-to-day.
- **B — Tech lead on a platform team.** Heavy on proposals/reviews,
light on coding.
- **C — Solo founder / small startup.** Latency and cost both
matter; almost everything is Sonnet.
### Setup
Open `labs/notes/1.3-fleet-rubric.md`.
### Steps
1. List 5 task shapes you actually do (one line each — e.g.,
"bug fix in single file", "schema migration", "research a new
library").
2. Beside each, write the model + reasoning budget you'd default
to, plus a one-line *why*.
3. Add **one** override condition: a path, keyword, or trigger that
escalates regardless of size (e.g., "any change under `db/migrations/`
→ Opus + think harder").
### Deliverable
`labs/notes/1.3-fleet-rubric.md` with the 5-row table plus the
override condition.
### Success criteria
- Each row has a *why* you'd say aloud in standup without flinching.
- The override condition is *specific* (a path or keyword), not
"important changes".
### Reflection
- Which task shape did you almost over-spend on out of habit?
### Stretch
- Encode one row into `.claude/agents/<role>.md` frontmatter so the
right model picks itself up.
---
## Lab 1.4 — Trigger and recover from context decay
### Objective
See compaction damage on purpose so you recognize it in the wild —
and rehearse the file-based recovery move.
### Time
20 minutes.
### Real-world scenarios
Pick the closest to something you're in:
- **A — REST → gRPC migration.** Conventions like "backport gRPC
status to HTTP via this table" get lost in compaction.
- **B — Multi-day incident.** Day-3 agent reproduces Day-1
hypothesis you already ruled out.
- **C — Service onboarding.** Deviations from the template
(`"but in this case the auth middleware is configured differently"`)
vanish.
### Setup
A fresh session in a real repo. A throwaway convention to anchor on.
### Steps
1. Open a new session. Tell the agent:
*"In this session, all error responses use shape
`{code, message, requestId}`. Acknowledge."*
2. Run `/compact` to simulate context loss.
3. Without restating the shape, ask: *"Add a new error response for
`invoice_not_found`."* Observe the shape produced.
4. **Recover via file:** end the session, write the convention to
`docs/CONVENTIONS.md`, point `AGENTS.md` at it, start a fresh
session, re-issue the same prompt.
### Deliverable
`labs/notes/1.4-decay.md` with the convention, the post-compaction
shape, the file-anchored shape, and a one-line "first move when I
see drift."
### Success criteria
- The post-compaction shape *drifts* (otherwise the lab didn't
exercise the failure mode — push for more turns first).
- The file-anchored shape matches the original. If not, your
`AGENTS.md` pointer is too vague.
### Reflection
- What was your *first* repair instinct — re-explain in chat, or
write to a file? Train toward the file.
### Stretch
- Add a hook (Module 4 preview) that warns when `/compact` runs
while a known convention file hasn't been read this session.
---
## Wrap-up
In `labs/notes/`:
- `1.1-token-budget.md`, `1.2-reasoning.md`, `1.3-fleet-rubric.md`,
`1.4-decay.md`.
These become the first entries in your durable memory layer
(Module 5). Commit them.
**Next:** [Labs — Module 2: Spec-Driven Development](02-spec-driven-development-labs.md)