02-spec-driven-development - Albert Masoliver's learning site

# Module 2 — Advanced Spec-Driven Development > *"Prose specs describe what you want. Executable specs prove you got it. > Only the second kind survives an agent."* --- ## Learning objectives By the end of this module you will be able to: 1. Locate any specification on the **Spec-Driven Development (SDD) spectrum** — Spec-First, Spec-Anchored, Spec-as-Source — and pick the right point for the task. 2. Write **executable contracts** with binary acceptance criteria that an agent can self-verify against. 3. Translate user stories into **GIVEN / WHEN / THEN** scenarios precise enough to fail fast when the implementation drifts. 4. Choose between **OpenSpec**, **BMAD**, and **Kiro** workflows based on team shape and project phase. --- ## 2.1 Why specs got serious again For a generation, "writing a spec" was a ceremony that produced a Word document, killed a Friday afternoon, and was ignored the following Monday. Agents changed the economics. A spec is no longer a hand-off artifact between humans; it is the **prompt template** that runs every time an agent picks up the work. Specs that are vague produce code that is vague. Specs that contain binary checks produce code that *fails noisily when it's wrong*, which is the only useful kind of code. This is the central thesis of SDD in the agentic era: > **A spec is the program. The code is its current implementation.** When the spec changes, the code is wrong until proven otherwise — and the spec itself tells you how to prove it. --- ## 2.2 The SDD spectrum Not every change deserves a spec. Pretending otherwise leads to the worst of both worlds: ceremonious specs that get out of date and unverified code that ships anyway. Pick a point on the spectrum per task. ### Spec-First You write the spec **before any implementation exists**. The agent reads the spec, asks clarifying questions, then proposes code. - **Best for:** new features, public APIs, schema changes, anything where a wrong shape costs days to undo. - **Example artifact:** an OpenSpec change proposal (see §2.4) describing added endpoints, request/response schemas, error semantics, and acceptance scenarios. - **Trade-off:** higher up-front cost; lower rework cost. ### Spec-Anchored The code already exists. You write a spec **alongside** it that captures the *intended* behavior, then run the spec against the implementation to find drift. - **Best for:** legacy code you're about to modify, "I don't trust this module," pre-refactor stabilization. - **Example artifact:** a Gherkin/`GIVEN-WHEN-THEN` file driving a property- based test suite that the agent runs against the existing module. - **Trade-off:** spec must be discovered, not designed — slower to write but illuminates hidden assumptions. ### Spec-as-Source The spec **generates** the implementation. You don't edit the code; you edit the spec, regenerate, and let the diff drive review. - **Best for:** highly structured artifacts — clients/SDKs from OpenAPI, database migrations from schema diffs, infrastructure-as-code from a declarative model. - **Example artifact:** an OpenAPI document plus a generation pipeline; or a Kiro visual model that emits the React/TypeScript frontend. - **Trade-off:** powerful where it fits, brittle where the model can't express what you need. > **Decision rule:** start *Spec-Anchored* by default. Move toward > *Spec-First* for new public surfaces and high-blast-radius changes; move > toward *Spec-as-Source* only where you have a generator you trust. --- ## 2.3 Writing executable contracts The defining feature of an executable contract is that **a machine can decide whether it's been met**. No prose. No "should be reasonably fast." No "user- friendly error messages." Just predicates that return true or false. ### Anatomy of an executable acceptance criterion A useful criterion has four parts: 1. **A precondition** the criterion assumes about the system. 2. **A trigger** that is concrete and reproducible. 3. **An observable outcome** that is *binary*. 4. **A verification mechanism** — usually a test, sometimes a script, never a human glance. **Bad** (prose, unverifiable): > Users should be able to log in quickly and see a friendly error if their > password is wrong. **Good** (executable): > - **AC1:** Given a registered user with email `[email protected]` and > password `correct-horse-battery-staple`, when `POST /auth/login` is > called with those credentials, the response is `200 OK` with a JSON body > matching schema `LoginSuccess` within 250 ms p95 measured over 100 > sequential requests on the staging cluster. > - **AC2:** Given the same user, when `POST /auth/login` is called with > password `wrong`, the response is `401 Unauthorized` with body > `{"error":"invalid_credentials"}` and the same response is returned for > non-existent emails (no username enumeration). > - **Verification:** `test/auth/login.spec.ts` covers both cases; CI fails > if either assertion fails. Notice the second criterion encodes a *security* property (no username enumeration) that prose would have buried. Executable specs surface this because they force you to name what "good" looks like. ### GIVEN / WHEN / THEN as scaffolding Gherkin-style scenarios are the cheapest way to get from "I know what I want" to "an agent can implement and verify this." They are not magical — they are a forcing function. ```gherkin # features/auth/login.feature Feature: User login Background: Given a user "[email protected]" exists with password "correct-horse-battery-staple" Scenario: Successful login returns a session token When I POST to "/auth/login" with: | email | [email protected] | | password | correct-horse-battery-staple | Then the response status is 200 And the response body matches schema "LoginSuccess" And the body field "token" is a non-empty string Scenario: Wrong password returns a generic error When I POST to "/auth/login" with: | email | [email protected] | | password | wrong | Then the response status is 401 And the response body equals: """ { "error": "invalid_credentials" } """ Scenario: Unknown email returns the same generic error When I POST to "/auth/login" with: | email | [email protected] | | password | whatever | Then the response status is 401 And the response body equals: """ { "error": "invalid_credentials" } """ ``` Hand this to an agent and three things happen: 1. The agent has a **plan** (three scenarios, three implementations). 2. The agent has a **definition of done** (the scenarios pass). 3. *You* have a **review surface** that doesn't require reading the implementation — you read the scenarios and the test output. ### The "feed the spec back" pattern Once a contract is written, the most valuable thing you can do is **feed it back to the model that wrote the code** and ask it to find places where the implementation might violate the spec under inputs the tests didn't cover. ``` You wrote the implementation at src/auth/login.ts. The spec is features/auth/login.feature. ultrathink. Identify three inputs not covered by the existing scenarios where the implementation could violate any of the named properties (in particular: no username enumeration, generic error body). For each, give the concrete request, the expected response per the spec, and the actual response per the current code. ``` This is the seed of Module 6's verifier-agent pattern. The spec turns a sycophantic "looks good to me" review into a property-based interrogation. --- ## 2.4 SDD frameworks in practice Three frameworks dominate the current landscape. They are not interchangeable — they encode different theories about *who writes specs* and *when*. ### OpenSpec — change deltas as first-class artifacts OpenSpec treats every meaningful change as a **proposal** (a versioned diff against the spec) before it touches code. Proposals live in `openspec/changes/` and look like this: ```markdown  # Proposal: Passwordless login via magic link ## Why Password reset is currently the top support ticket category (~22% in 2026-Q1). Removing passwords entirely for low-risk accounts is projected to cut tickets by ~15%. ## Spec delta ### Added - `POST /auth/magic-link/request` — issues a single-use, 15-minute link. - `POST /auth/magic-link/consume` — exchanges a token for a session. ### Changed - `POST /auth/login` now accepts `{"method":"password"|"magic_link"}`. ### Removed - (none) ## Acceptance criteria - AC1: Requesting a magic link for an unknown email returns 202 with no body (no enumeration). - AC2: A consumed token cannot be reused; second use returns 410 Gone. - AC3: Tokens older than 15 minutes return 410 Gone with the same body as AC2. ## Open questions - Rate limit shape: per-IP or per-email? (Decision needed before impl.) ``` Workflow: 1. **Propose** — author writes the change. Agent helps fill in deltas. 2. **Review** — humans (and a reviewer agent) push back on the proposal, not the code. 3. **Implement** — only after the proposal is approved. The proposal *is* the prompt. 4. **Archive** — merged proposals move to `openspec/archive/`, becoming audit trail and onboarding material. **When to use:** team is large enough that "what changed and why" is a real question. Public APIs. Compliance-relevant code. ### BMAD — Build, Measure, Analyze, Decide as agent roles BMAD ("Business-driven Multi-Agent Development") simulates a product team in agents. Instead of a single spec author, you have: - A **PM agent** that turns user input into stories. - An **architect agent** that decomposes stories into tasks. - A **builder agent** that implements. - A **reviewer agent** that checks against the original story. Each role has its own system prompt, model choice, and tool set. The artifact passed between them is structured — typically YAML or JSON — so the next role can parse it deterministically. ```yaml # .bmad/story-184.yaml story: id: 184 as: registered user i_want: to sign in without a password so_that: I never have to reset one again architect: tasks: - id: 184-1 summary: Add magic-link request endpoint depends_on: [] files_touched: [src/auth/magic_link.ts, src/auth/router.ts] - id: 184-2 summary: Add token consumption + session issuance depends_on: [184-1] builder: assigned: 184-1 status: in_progress ``` **When to use:** features that need traceability from user goal to commit. Regulated environments. Teams onboarding new engineers who need to *see* the process. ### Kiro — visual design as the spec Kiro lets you draw the system — flow diagrams, component trees, state machines — and treats the diagram as the source of truth. The agent reads the diagram and produces code consistent with it. The strength is **alignment**: a product manager and an engineer can stare at the same canvas and disagree precisely, instead of disagreeing about prose that means different things to each of them. The weakness is **expressiveness**: not every property is easy to draw. Use Kiro for *structural* specs (component layout, navigation flow, state transitions) and pair it with text-form acceptance criteria for *behavioral* properties (security, performance, error handling). > **Don't pick a framework. Pick a workflow.** OpenSpec, BMAD, and Kiro can > coexist in the same project — OpenSpec for API changes, BMAD for feature > stories, Kiro for the frontend. The thing that matters is that *the > artifact the agent reads is precise enough to fail loudly when wrong*. --- ## 2.5 The spec-as-prompt pattern Once a spec is executable, the prompt to the agent collapses to a pointer: ``` Implement openspec/changes/2026-05-add-passwordless-login/proposal.md. Constraints: - Touch only files listed under "Files touched" in the proposal. - Every acceptance criterion must have a corresponding test in test/auth/. - If you discover an open question is unresolved, STOP and ask. Do not guess. Run the new tests after each file you create. Stop on the first failure and explain. ``` Three things are happening here: 1. The spec is **canonical** — no paraphrasing in the prompt. 2. The constraints are **operational** — they shape *how* the agent works, not *what* it builds. 3. The stop condition is **explicit** — open questions are escape hatches, not silently-resolved gotchas. --- ## Lab 2 — Convert a vague ticket into an executable spec **Goal:** experience the leverage of spec-first agent prompting. **Time:** ~60 minutes. 1. Find a real ticket in your tracker that contains the word "should" or "user-friendly" or "appropriate." (You will not have to look long.) 2. **Step 1 — Brute force.** Hand the ticket text verbatim to an agent and ask for an implementation. Save the diff. 3. **Step 2 — Rewrite the ticket as an OpenSpec proposal** with at least three named acceptance criteria, each binary and machine-checkable. 4. **Step 3 — Re-prompt.** Hand the proposal to a fresh agent session with the spec-as-prompt template from §2.5. Save the diff. 5. Compare: - Lines changed (proxy for surgical-ness). - Number of tests added. - Number of clarifying questions the agent asked (zero in step 2 is a red flag; 1–3 in step 3 is healthy). - Number of acceptance criteria still failing after one iteration. **What to look for:** the step-3 agent will refuse to make decisions you delegated by omission in step 2. That refusal is the spec working. --- ## Common pitfalls - **"Acceptance criteria" that aren't.** If the criterion requires a human to *judge* whether it's met, it isn't an acceptance criterion — it's a hope. - **Specs that paraphrase the code.** A spec that says "function X takes Y and returns Z" with no semantics is documentation, not a contract. - **Frozen specs.** A spec that doesn't change when the code changes is worse than no spec at all — it actively misleads future readers and agents. Treat spec drift as a bug. - **Skipping the proposal because "it's a small change."** Small changes that break public contracts cause the most expensive incidents. Use the framework even when it feels like overkill; you'll thank yourself when someone asks "why did we do this?" six months later. --- ## Summary - Specs are now prompts. Vague specs produce vague code; executable specs produce code that fails noisily when wrong. - Place every task on the SDD spectrum (Spec-First / Spec-Anchored / Spec-as-Source) — pick deliberately. - Acceptance criteria are binary or they don't exist. - OpenSpec, BMAD, and Kiro are three lenses on the same idea; combine them per the artifact you're producing. - The spec-as-prompt pattern turns multi-page prompts into a single pointer plus operational constraints. --- ## Further reading - *OpenSpec* — project documentation and reference implementations. - *BMAD-METHOD* — community templates for agent role decomposition. - *Specification by Example* (Gojko Adzic) — the pre-AI canonical text; every page applies tenfold now. **Next:** [Module 3 — Mastery of Agentic CLI Tools](03-agentic-cli-tools.md)