01-foundations-labs - Albert Masoliver's learning site

# Labs — Module 1: Foundations & The "Thinking" Economy > Four short labs to build mechanical intuition for tokens, reasoning > budgets, model selection, and context decay. Each fits in a single > focused block; the whole set is ~70 minutes. | Lab | Title | Time | Maps to | |-----|---------------------------------------------|--------|----------------------------------| | 1.1 | Measure your codebase in tokens | 15 min | §1.2 Tokens are the unit | | 1.2 | A/B the reasoning budgets | 20 min | §1.3 Reasoning budget management | | 1.3 | Draft a personal model-selection rubric | 15 min | §1.4 Model selection strategy | | 1.4 | Trigger and recover from context decay | 20 min | §1.5 The context window challenge | --- ## Lab 1.1 — Measure your codebase in tokens ### Objective Replace your gut-feel estimate of "how big is this prompt?" with a measured number from your own repo. ### Time 15 minutes. ### Real-world scenarios Pick one — chars/token ratio shifts dramatically by code style. - **A — Multi-tenant SaaS backend (TS / Fastify).** Measure `src/auth/`'s largest file (heavy Zod schemas tokenize expensively). - **B — Data pipeline (Python / Airflow).** Measure one operator class plus one large SQL template (mixed languages reveal which is inflating prompts). - **C — Game backend (Go).** Measure one matchmaking file plus a generated `*.pb.go` (protobuf is long but redundant — a "what not to load" example). ### Setup Pick two real files: one medium (~300 lines), one large (biggest file you regularly ask an agent to read). ### Steps 1. Count tokens for both files: ```bash python -c "import anthropic,sys; c=anthropic.Anthropic(); \ print(sys.argv[1], c.messages.count_tokens(model='claude-sonnet-4-6', \ messages=[{'role':'user','content':open(sys.argv[1]).read()}]).input_tokens)" \ <small-file> <large-file> ``` 2. **Before peeking:** estimate the total for "`AGENTS.md` + the large file". Write it down. 3. Measure that combination, compute % error. ### Deliverable `labs/notes/1.1-token-budget.md` with the two file counts, your estimate, your error %, and a one-line rule like *"if a prompt would load > N tokens of files, scope it first."* ### Success criteria - Two real measurements, not estimates. - A rule that names a concrete token threshold (not "if it's a lot"). ### Reflection - Which file's chars-per-token ratio surprised you, and why? ### Stretch - Compute the dollar cost of a "load all of `src/` and refactor" prompt at current Sonnet rates. --- ## Lab 1.2 — A/B the reasoning budgets ### Objective Build personal evidence for when `ultrathink` earns its keep on a real task — instead of guessing. ### Time 20 minutes. ### Real-world scenarios Pick a task whose shape matches one of these — small enough to attempt twice in 20 minutes, hard enough to discriminate: - **A — Payment refund refactor (fintech).** Break a tangled `processRefund` into validation / ledger / notification. Edge cases reward reasoning. - **B — Observability log parser.** Refactor a brittle 150-line parser into one strategy per source format. - **C — Auth middleware split.** Untangle bearer-token and session- cookie paths in a single file. Security-relevant. ### Setup A scratch branch off a clean main. A task estimated at ~6 minutes per run. ### Steps 1. **Run A — Sonnet, no thinking:** `claude --model claude-sonnet-4-6 "<task>"`. Capture wall-clock, `/cost`, tests-pass yes/no, code quality 1–5. 2. `git reset --hard <base>`. 3. **Run B — Sonnet, ultrathink:** `claude --model claude-sonnet-4-6 "ultrathink. <task>"`. Capture the same metrics. ### Deliverable `labs/notes/1.2-reasoning.md` with a two-row comparison table and a one-line verdict naming a *task shape* (not "complex tasks") where each setting wins. ### Success criteria - Cost ratio between runs is ≥3×. If less, your prompt isn't triggering the thinking mode — fix it and retry. - Your verdict can be applied tomorrow without re-reading the table. ### Reflection - Where did the extra reasoning visibly help — in the *plan*, the *code*, or the *failure-mode analysis*? ### Stretch - Add a third run: Opus + `think harder`. Often loses to Sonnet + ultrathink at higher cost. --- ## Lab 1.3 — Draft a personal model-selection rubric ### Objective Turn the §1.4 generic rubric into one specific to your work — and defensible without notes. ### Time 15 minutes. ### Real-world scenarios Your role shifts the rubric. Pick the closest: - **A — IC at a mid-sized SaaS (~30 engineers).** Mixed full-stack, cost-sensitive day-to-day. - **B — Tech lead on a platform team.** Heavy on proposals/reviews, light on coding. - **C — Solo founder / small startup.** Latency and cost both matter; almost everything is Sonnet. ### Setup Open `labs/notes/1.3-fleet-rubric.md`. ### Steps 1. List 5 task shapes you actually do (one line each — e.g., "bug fix in single file", "schema migration", "research a new library"). 2. Beside each, write the model + reasoning budget you'd default to, plus a one-line *why*. 3. Add **one** override condition: a path, keyword, or trigger that escalates regardless of size (e.g., "any change under `db/migrations/` → Opus + think harder"). ### Deliverable `labs/notes/1.3-fleet-rubric.md` with the 5-row table plus the override condition. ### Success criteria - Each row has a *why* you'd say aloud in standup without flinching. - The override condition is *specific* (a path or keyword), not "important changes". ### Reflection - Which task shape did you almost over-spend on out of habit? ### Stretch - Encode one row into `.claude/agents/<role>.md` frontmatter so the right model picks itself up. --- ## Lab 1.4 — Trigger and recover from context decay ### Objective See compaction damage on purpose so you recognize it in the wild — and rehearse the file-based recovery move. ### Time 20 minutes. ### Real-world scenarios Pick the closest to something you're in: - **A — REST → gRPC migration.** Conventions like "backport gRPC status to HTTP via this table" get lost in compaction. - **B — Multi-day incident.** Day-3 agent reproduces Day-1 hypothesis you already ruled out. - **C — Service onboarding.** Deviations from the template (`"but in this case the auth middleware is configured differently"`) vanish. ### Setup A fresh session in a real repo. A throwaway convention to anchor on. ### Steps 1. Open a new session. Tell the agent: *"In this session, all error responses use shape `{code, message, requestId}`. Acknowledge."* 2. Run `/compact` to simulate context loss. 3. Without restating the shape, ask: *"Add a new error response for `invoice_not_found`."* Observe the shape produced. 4. **Recover via file:** end the session, write the convention to `docs/CONVENTIONS.md`, point `AGENTS.md` at it, start a fresh session, re-issue the same prompt. ### Deliverable `labs/notes/1.4-decay.md` with the convention, the post-compaction shape, the file-anchored shape, and a one-line "first move when I see drift." ### Success criteria - The post-compaction shape *drifts* (otherwise the lab didn't exercise the failure mode — push for more turns first). - The file-anchored shape matches the original. If not, your `AGENTS.md` pointer is too vague. ### Reflection - What was your *first* repair instinct — re-explain in chat, or write to a file? Train toward the file. ### Stretch - Add a hook (Module 4 preview) that warns when `/compact` runs while a known convention file hasn't been read this session. --- ## Wrap-up In `labs/notes/`: - `1.1-token-budget.md`, `1.2-reasoning.md`, `1.3-fleet-rubric.md`, `1.4-decay.md`. These become the first entries in your durable memory layer (Module 5). Commit them. **Next:** [Labs — Module 2: Spec-Driven Development](02-spec-driven-development-labs.md)