02-spec-driven-development-labs - Albert Masoliver's learning site

# Labs — Module 2: Spec-Driven Development > Five short labs that move you from "I told the agent what I wanted" > to "the agent and I share a contract." Each fits in a single focused > block; the whole set is ~90 minutes. | Lab | Title | Time | Maps to | |-----|------------------------------------------------|--------|----------------------------------------| | 2.1 | Classify five backlog tickets on the spectrum | 15 min | §2.2 The SDD spectrum | | 2.2 | Rewrite one vague ticket as binary ACs | 20 min | §2.3 Writing executable contracts | | 2.3 | Write three Gherkin scenarios | 20 min | §2.3 GIVEN/WHEN/THEN scaffolding | | 2.4 | Draft an OpenSpec proposal + critic pass | 20 min | §2.4 OpenSpec | | 2.5 | Prose-prompt vs spec-as-prompt, head-to-head | 20 min | §2.5 The spec-as-prompt pattern | --- ## Lab 2.1 — Classify five backlog tickets on the spectrum ### Objective Stop applying ceremony uniformly. Per-ticket, decide Spec-First / Spec-Anchored / Spec-as-Source — with reasons. ### Time 15 minutes. ### Real-world scenarios If you don't have your own backlog handy: - **A — E-commerce.** Mix of UI tweaks, PCI remediations, sessions migration, coupon edge cases. - **B — Internal admin tool.** Mostly UI; RBAC is the outlier (Spec-First) among Spec-Anchored CRUD. - **C — Fintech / compliance.** Almost everything trends Spec-First; the lab's value is identifying where Spec-Anchored *still* fits. ### Setup Open `labs/notes/2.1-spectrum.md` with the columns: `Ticket | Reversibility | Surface | Approach | Why`. ### Steps 1. Pick **5 real tickets** (or 5 from a scenario above). 2. Fill Reversibility (low/med/high) and Surface (internal / cross- team / public API / data shape). 3. Pick an approach per row. Write the *why* in one sentence each. ### Deliverable The 5-row table plus a one-line "one ticket whose approach my team currently gets wrong, and the fix." ### Success criteria - Not all 5 rows are the same approach — if they are, you're either over- or under-ceremonialising. - At least one *why* names a concrete system property (a path, a regulator, a blast radius), not a generic principle. ### Reflection - Where is your team's default biased — toward too much ceremony or too little? ### Stretch - Repeat the exercise blind with a teammate; compare. Disagreement is data. --- ## Lab 2.2 — Rewrite one vague ticket as binary ACs ### Objective Convert one prose ticket into 3 acceptance criteria a machine could decide. Discover what the prose was silently leaving you to guess. ### Time 20 minutes. ### Real-world scenarios If your tracker doesn't have a sufficiently vague candidate: - **A — "Improve search relevance on the products page."** Surfaces ACs about median match position, deterministic tiebreaker, latency budget. - **B — "Show user-friendly validation errors."** Surfaces ACs about stable error codes, locale, no PII echo, max errors per field. - **C — "Make the dashboard faster."** Surfaces ACs about p95 TTFB on a defined fixture, Lighthouse score, CSP compliance. ### Setup Open `labs/notes/2.2-executable-spec.md`. Paste the prose ticket under `## Before`. ### Steps 1. Under `## After`, write **3 binary acceptance criteria** (AC1, AC2, AC3). Each names a verification mechanism (test name, log assertion, header check). 2. Include **one security or edge-case AC** that wasn't obvious from the prose (no enumeration, no PII leak, idempotency, etc.). 3. Add an `## Open questions` section for anything you don't actually know. At least one entry — silent guesses are the bug. ### Deliverable The completed before/after note with explicit ACs and open questions. ### Success criteria - Each AC names a *specific* check (a test name, a header, an observable shape). - At least one open question is documented, not silently resolved. ### Reflection - Which AC was hardest to make binary? That's where your team was tacitly agreeing on something nobody had stated. ### Stretch - Feed the spec to an Opus session: *"ultrathink. find three inputs not covered by these ACs where the implementation could fail."* Add the gaps as new ACs. --- ## Lab 2.3 — Write three Gherkin scenarios ### Objective Translate one feature into scenarios precise enough to act as both the agent's plan and the test suite's source. ### Time 20 minutes. ### Real-world scenarios - **A — Coupon engine.** Happy path + expired-code + stacking rule (non-obvious). - **B — Webhook delivery.** Happy path + retry on 500 + idempotency on duplicate `event_id`. - **C — Rate-limit headers.** Under quota + over quota + `Retry-After` semantics across error responses. ### Setup `mkdir -p features && touch features/<feature>.feature`. ### Steps 1. Draft **3 scenarios** in the file: happy path, common failure, one *non-obvious property*. 2. Re-read them as the PM. If two engineers could implement different code and both pass, tighten. 3. Hand the file to an agent in plan mode: *"Propose the test code for each scenario. Don't write product code yet."* Review the proposed tests for tautology. ### Deliverable `features/<feature>.feature` committed + the proposed test file under your usual test directory. ### Success criteria - Each scenario reads aloud unambiguously. - The non-obvious-property scenario's test *would actually fail* against the current code if applicable. ### Reflection - Could a non-engineer add a fourth scenario to this file? If not, it's written too programmer-y for its audience. ### Stretch - Add a scenario you expect to *break the existing code*. Watch the agent identify the regression. --- ## Lab 2.4 — Draft an OpenSpec proposal + critic pass ### Objective Experience proposal-before-code on a real change. Stop at the critic pass; implementation is for another session. ### Time 20 minutes. ### Real-world scenarios - **A — API versioning bump** (`/v1` → `/v2`, deprecation window). - **B — Schema change** (column type bump on a hot table — locks, rollback, replica lag). - **C — Feature flag rollout** (cohort, kill switch, telemetry). ### Setup ```bash mkdir -p openspec/changes/$(date +%Y-%m)-<slug> ``` ### Steps 1. Write `openspec/changes/<slug>/proposal.md` following the §2.4 template: Why / Spec delta / Acceptance criteria / Open questions. Have an agent help with the delta — but defend every bullet yourself. 2. Run a **critic pass** in a fresh session: *"Review this proposal as a skeptic. List every assumption you'd push back on. Don't propose changes."* 3. Iterate the proposal once based on the critic's strongest pushback. Stop. ### Deliverable `openspec/changes/<slug>/proposal.md` committed + a short note `labs/notes/2.4-proposal.md` with the critic's top 3 pushback items. ### Success criteria - Every open question is either *answered* or *deferred with a date* — no silent guesses. - A teammate could read the proposal cold and arrive at the same understanding of what changes and why. ### Reflection - Which open question did you only see because the proposal forced you to look at it? ### Stretch - Run the critic step with **two different reviewer agents**. Compare what each pushes back on. --- ## Lab 2.5 — Prose-prompt vs spec-as-prompt, head-to-head ### Objective Quantify the cost and quality delta on a tiny scope you can run twice in 20 minutes. ### Time 20 minutes. ### Real-world scenarios - **A — New REST endpoint** with validation schema. - **B — Stripe webhook handler** for a new event type. - **C — One-AC accessibility fix** (focus trap, ARIA, return focus). ### Setup A scratch branch. A scope small enough to attempt in ~7 minutes per run — *one acceptance criterion* from the Lab 2.4 proposal, ideally. ### Steps 1. **Prose pass.** Fresh session. Describe the change in chat (no file pointer). Implement. Save `/cost` and the diff. 2. `git reset --hard <base>`. 3. **Spec-as-prompt pass.** Fresh session: ``` Implement openspec/changes/<slug>/proposal.md, scoped to AC<N>. Every AC needs a matching test. STOP and ask if an open question is unresolved. ``` Save `/cost` and the diff. ### Deliverable `labs/notes/2.5-prose-vs-spec.md` with the two runs' tokens, diff size, and one paragraph titled *"When I'd deliberately choose the prose form."* ### Success criteria - Spec pass tokens are lower per *correction-free* turn. (First turn may be higher due to file read.) - The spec pass had fewer mid-session "actually, no…" interjections from you. ### Reflection - Where did the prose form silently *paraphrase* the spec into something subtly wrong? Find at least one. ### Stretch - Add prompt caching (Module 7) to the spec-as-prompt session and re-measure. --- ## Wrap-up In `labs/notes/`: `2.1-spectrum.md` through `2.5-prose-vs-spec.md`. In the repo: `openspec/changes/<slug>/proposal.md`, `features/<feature>.feature`. These artifacts become the contracts the verifier agents in Module 6 work against — without them, the verifiers have nothing to verify. **Next:** [Labs — Module 3: Agentic CLI Tools](03-agentic-cli-tools-labs.md)