# Module 2 — Advanced Spec-Driven Development
> *"Prose specs describe what you want. Executable specs prove you got it.
> Only the second kind survives an agent."*
---
## Learning objectives
By the end of this module you will be able to:
1. Locate any specification on the **Spec-Driven Development (SDD) spectrum**
— Spec-First, Spec-Anchored, Spec-as-Source — and pick the right point for
the task.
2. Write **executable contracts** with binary acceptance criteria that an
agent can self-verify against.
3. Translate user stories into **GIVEN / WHEN / THEN** scenarios precise
enough to fail fast when the implementation drifts.
4. Choose between **OpenSpec**, **BMAD**, and **Kiro** workflows based on
team shape and project phase.
---
## 2.1 Why specs got serious again
For a generation, "writing a spec" was a ceremony that produced a Word
document, killed a Friday afternoon, and was ignored the following Monday.
Agents changed the economics. A spec is no longer a hand-off artifact between
humans; it is the **prompt template** that runs every time an agent picks up
the work. Specs that are vague produce code that is vague. Specs that contain
binary checks produce code that *fails noisily when it's wrong*, which is the
only useful kind of code.
This is the central thesis of SDD in the agentic era:
> **A spec is the program. The code is its current implementation.**
When the spec changes, the code is wrong until proven otherwise — and the
spec itself tells you how to prove it.
---
## 2.2 The SDD spectrum
Not every change deserves a spec. Pretending otherwise leads to the worst of
both worlds: ceremonious specs that get out of date and unverified code that
ships anyway. Pick a point on the spectrum per task.
### Spec-First
You write the spec **before any implementation exists**. The agent reads the
spec, asks clarifying questions, then proposes code.
- **Best for:** new features, public APIs, schema changes, anything where a
wrong shape costs days to undo.
- **Example artifact:** an OpenSpec change proposal (see §2.4) describing
added endpoints, request/response schemas, error semantics, and acceptance
scenarios.
- **Trade-off:** higher up-front cost; lower rework cost.
### Spec-Anchored
The code already exists. You write a spec **alongside** it that captures the
*intended* behavior, then run the spec against the implementation to find
drift.
- **Best for:** legacy code you're about to modify, "I don't trust this
module," pre-refactor stabilization.
- **Example artifact:** a Gherkin/`GIVEN-WHEN-THEN` file driving a property-
based test suite that the agent runs against the existing module.
- **Trade-off:** spec must be discovered, not designed — slower to write but
illuminates hidden assumptions.
### Spec-as-Source
The spec **generates** the implementation. You don't edit the code; you edit
the spec, regenerate, and let the diff drive review.
- **Best for:** highly structured artifacts — clients/SDKs from OpenAPI,
database migrations from schema diffs, infrastructure-as-code from a
declarative model.
- **Example artifact:** an OpenAPI document plus a generation pipeline; or a
Kiro visual model that emits the React/TypeScript frontend.
- **Trade-off:** powerful where it fits, brittle where the model can't
express what you need.
> **Decision rule:** start *Spec-Anchored* by default. Move toward
> *Spec-First* for new public surfaces and high-blast-radius changes; move
> toward *Spec-as-Source* only where you have a generator you trust.
---
## 2.3 Writing executable contracts
The defining feature of an executable contract is that **a machine can decide
whether it's been met**. No prose. No "should be reasonably fast." No "user-
friendly error messages." Just predicates that return true or false.
### Anatomy of an executable acceptance criterion
A useful criterion has four parts:
1. **A precondition** the criterion assumes about the system.
2. **A trigger** that is concrete and reproducible.
3. **An observable outcome** that is *binary*.
4. **A verification mechanism** — usually a test, sometimes a script, never a
human glance.
**Bad** (prose, unverifiable):
> Users should be able to log in quickly and see a friendly error if their
> password is wrong.
**Good** (executable):
> - **AC1:** Given a registered user with email `
[email protected]` and
> password `correct-horse-battery-staple`, when `POST /auth/login` is
> called with those credentials, the response is `200 OK` with a JSON body
> matching schema `LoginSuccess` within 250 ms p95 measured over 100
> sequential requests on the staging cluster.
> - **AC2:** Given the same user, when `POST /auth/login` is called with
> password `wrong`, the response is `401 Unauthorized` with body
> `{"error":"invalid_credentials"}` and the same response is returned for
> non-existent emails (no username enumeration).
> - **Verification:** `test/auth/login.spec.ts` covers both cases; CI fails
> if either assertion fails.
Notice the second criterion encodes a *security* property (no username
enumeration) that prose would have buried. Executable specs surface this
because they force you to name what "good" looks like.
### GIVEN / WHEN / THEN as scaffolding
Gherkin-style scenarios are the cheapest way to get from "I know what I
want" to "an agent can implement and verify this." They are not magical —
they are a forcing function.
```gherkin
# features/auth/login.feature
Feature: User login
Background:
Given a user "
[email protected]" exists with password "correct-horse-battery-staple"
Scenario: Successful login returns a session token
When I POST to "/auth/login" with:
| email |
[email protected] |
| password | correct-horse-battery-staple |
Then the response status is 200
And the response body matches schema "LoginSuccess"
And the body field "token" is a non-empty string
Scenario: Wrong password returns a generic error
When I POST to "/auth/login" with:
| email |
[email protected] |
| password | wrong |
Then the response status is 401
And the response body equals:
"""
{ "error": "invalid_credentials" }
"""
Scenario: Unknown email returns the same generic error
When I POST to "/auth/login" with:
| email |
[email protected] |
| password | whatever |
Then the response status is 401
And the response body equals:
"""
{ "error": "invalid_credentials" }
"""
```
Hand this to an agent and three things happen:
1. The agent has a **plan** (three scenarios, three implementations).
2. The agent has a **definition of done** (the scenarios pass).
3. *You* have a **review surface** that doesn't require reading the
implementation — you read the scenarios and the test output.
### The "feed the spec back" pattern
Once a contract is written, the most valuable thing you can do is **feed it
back to the model that wrote the code** and ask it to find places where the
implementation might violate the spec under inputs the tests didn't cover.
```
You wrote the implementation at src/auth/login.ts.
The spec is features/auth/login.feature.
ultrathink. Identify three inputs not covered by the existing scenarios
where the implementation could violate any of the named properties (in
particular: no username enumeration, generic error body). For each, give
the concrete request, the expected response per the spec, and the actual
response per the current code.
```
This is the seed of Module 6's verifier-agent pattern. The spec turns a
sycophantic "looks good to me" review into a property-based interrogation.
---
## 2.4 SDD frameworks in practice
Three frameworks dominate the current landscape. They are not interchangeable
— they encode different theories about *who writes specs* and *when*.
### OpenSpec — change deltas as first-class artifacts
OpenSpec treats every meaningful change as a **proposal** (a versioned diff
against the spec) before it touches code. Proposals live in
`openspec/changes/` and look like this:
```markdown
<!-- openspec/changes/2026-05-add-passwordless-login/proposal.md -->
# Proposal: Passwordless login via magic link
## Why
Password reset is currently the top support ticket category (~22% in
2026-Q1). Removing passwords entirely for low-risk accounts is projected
to cut tickets by ~15%.
## Spec delta
### Added
- `POST /auth/magic-link/request` — issues a single-use, 15-minute link.
- `POST /auth/magic-link/consume` — exchanges a token for a session.
### Changed
- `POST /auth/login` now accepts `{"method":"password"|"magic_link"}`.
### Removed
- (none)
## Acceptance criteria
- AC1: Requesting a magic link for an unknown email returns 202 with no
body (no enumeration).
- AC2: A consumed token cannot be reused; second use returns 410 Gone.
- AC3: Tokens older than 15 minutes return 410 Gone with the same body
as AC2.
## Open questions
- Rate limit shape: per-IP or per-email? (Decision needed before impl.)
```
Workflow:
1. **Propose** — author writes the change. Agent helps fill in deltas.
2. **Review** — humans (and a reviewer agent) push back on the proposal,
not the code.
3. **Implement** — only after the proposal is approved. The proposal *is*
the prompt.
4. **Archive** — merged proposals move to `openspec/archive/`, becoming
audit trail and onboarding material.
**When to use:** team is large enough that "what changed and why" is a real
question. Public APIs. Compliance-relevant code.
### BMAD — Build, Measure, Analyze, Decide as agent roles
BMAD ("Business-driven Multi-Agent Development") simulates a product team in
agents. Instead of a single spec author, you have:
- A **PM agent** that turns user input into stories.
- An **architect agent** that decomposes stories into tasks.
- A **builder agent** that implements.
- A **reviewer agent** that checks against the original story.
Each role has its own system prompt, model choice, and tool set. The
artifact passed between them is structured — typically YAML or JSON — so
the next role can parse it deterministically.
```yaml
# .bmad/story-184.yaml
story:
id: 184
as: registered user
i_want: to sign in without a password
so_that: I never have to reset one again
architect:
tasks:
- id: 184-1
summary: Add magic-link request endpoint
depends_on: []
files_touched: [src/auth/magic_link.ts, src/auth/router.ts]
- id: 184-2
summary: Add token consumption + session issuance
depends_on: [184-1]
builder:
assigned: 184-1
status: in_progress
```
**When to use:** features that need traceability from user goal to commit.
Regulated environments. Teams onboarding new engineers who need to *see* the
process.
### Kiro — visual design as the spec
Kiro lets you draw the system — flow diagrams, component trees, state
machines — and treats the diagram as the source of truth. The agent reads
the diagram and produces code consistent with it.
The strength is **alignment**: a product manager and an engineer can stare
at the same canvas and disagree precisely, instead of disagreeing about
prose that means different things to each of them.
The weakness is **expressiveness**: not every property is easy to draw. Use
Kiro for *structural* specs (component layout, navigation flow, state
transitions) and pair it with text-form acceptance criteria for *behavioral*
properties (security, performance, error handling).
> **Don't pick a framework. Pick a workflow.** OpenSpec, BMAD, and Kiro can
> coexist in the same project — OpenSpec for API changes, BMAD for feature
> stories, Kiro for the frontend. The thing that matters is that *the
> artifact the agent reads is precise enough to fail loudly when wrong*.
---
## 2.5 The spec-as-prompt pattern
Once a spec is executable, the prompt to the agent collapses to a pointer:
```
Implement openspec/changes/2026-05-add-passwordless-login/proposal.md.
Constraints:
- Touch only files listed under "Files touched" in the proposal.
- Every acceptance criterion must have a corresponding test in test/auth/.
- If you discover an open question is unresolved, STOP and ask. Do not guess.
Run the new tests after each file you create. Stop on the first failure
and explain.
```
Three things are happening here:
1. The spec is **canonical** — no paraphrasing in the prompt.
2. The constraints are **operational** — they shape *how* the agent works,
not *what* it builds.
3. The stop condition is **explicit** — open questions are escape hatches,
not silently-resolved gotchas.
---
## Lab 2 — Convert a vague ticket into an executable spec
**Goal:** experience the leverage of spec-first agent prompting.
**Time:** ~60 minutes.
1. Find a real ticket in your tracker that contains the word "should" or
"user-friendly" or "appropriate." (You will not have to look long.)
2. **Step 1 — Brute force.** Hand the ticket text verbatim to an agent and
ask for an implementation. Save the diff.
3. **Step 2 — Rewrite the ticket as an OpenSpec proposal** with at least
three named acceptance criteria, each binary and machine-checkable.
4. **Step 3 — Re-prompt.** Hand the proposal to a fresh agent session with
the spec-as-prompt template from §2.5. Save the diff.
5. Compare:
- Lines changed (proxy for surgical-ness).
- Number of tests added.
- Number of clarifying questions the agent asked (zero in step 2 is a
red flag; 1–3 in step 3 is healthy).
- Number of acceptance criteria still failing after one iteration.
**What to look for:** the step-3 agent will refuse to make decisions you
delegated by omission in step 2. That refusal is the spec working.
---
## Common pitfalls
- **"Acceptance criteria" that aren't.** If the criterion requires a human
to *judge* whether it's met, it isn't an acceptance criterion — it's a
hope.
- **Specs that paraphrase the code.** A spec that says "function X takes Y
and returns Z" with no semantics is documentation, not a contract.
- **Frozen specs.** A spec that doesn't change when the code changes is
worse than no spec at all — it actively misleads future readers and
agents. Treat spec drift as a bug.
- **Skipping the proposal because "it's a small change."** Small changes
that break public contracts cause the most expensive incidents. Use the
framework even when it feels like overkill; you'll thank yourself when
someone asks "why did we do this?" six months later.
---
## Summary
- Specs are now prompts. Vague specs produce vague code; executable specs
produce code that fails noisily when wrong.
- Place every task on the SDD spectrum (Spec-First / Spec-Anchored /
Spec-as-Source) — pick deliberately.
- Acceptance criteria are binary or they don't exist.
- OpenSpec, BMAD, and Kiro are three lenses on the same idea; combine them
per the artifact you're producing.
- The spec-as-prompt pattern turns multi-page prompts into a single pointer
plus operational constraints.
---
## Further reading
- *OpenSpec* — project documentation and reference implementations.
- *BMAD-METHOD* — community templates for agent role decomposition.
- *Specification by Example* (Gojko Adzic) — the pre-AI canonical text;
every page applies tenfold now.
**Next:** [Module 3 — Mastery of Agentic CLI Tools](03-agentic-cli-tools.md)