# Labs — Module 2: Spec-Driven Development
> Five short labs that move you from "I told the agent what I wanted"
> to "the agent and I share a contract." Each fits in a single focused
> block; the whole set is ~90 minutes.
| Lab | Title | Time | Maps to |
|-----|------------------------------------------------|--------|----------------------------------------|
| 2.1 | Classify five backlog tickets on the spectrum | 15 min | §2.2 The SDD spectrum |
| 2.2 | Rewrite one vague ticket as binary ACs | 20 min | §2.3 Writing executable contracts |
| 2.3 | Write three Gherkin scenarios | 20 min | §2.3 GIVEN/WHEN/THEN scaffolding |
| 2.4 | Draft an OpenSpec proposal + critic pass | 20 min | §2.4 OpenSpec |
| 2.5 | Prose-prompt vs spec-as-prompt, head-to-head | 20 min | §2.5 The spec-as-prompt pattern |
---
## Lab 2.1 — Classify five backlog tickets on the spectrum
### Objective
Stop applying ceremony uniformly. Per-ticket, decide Spec-First /
Spec-Anchored / Spec-as-Source — with reasons.
### Time
15 minutes.
### Real-world scenarios
If you don't have your own backlog handy:
- **A — E-commerce.** Mix of UI tweaks, PCI remediations, sessions
migration, coupon edge cases.
- **B — Internal admin tool.** Mostly UI; RBAC is the outlier
(Spec-First) among Spec-Anchored CRUD.
- **C — Fintech / compliance.** Almost everything trends Spec-First;
the lab's value is identifying where Spec-Anchored *still* fits.
### Setup
Open `labs/notes/2.1-spectrum.md` with the columns:
`Ticket | Reversibility | Surface | Approach | Why`.
### Steps
1. Pick **5 real tickets** (or 5 from a scenario above).
2. Fill Reversibility (low/med/high) and Surface (internal / cross-
team / public API / data shape).
3. Pick an approach per row. Write the *why* in one sentence each.
### Deliverable
The 5-row table plus a one-line "one ticket whose approach my team
currently gets wrong, and the fix."
### Success criteria
- Not all 5 rows are the same approach — if they are, you're either
over- or under-ceremonialising.
- At least one *why* names a concrete system property (a path, a
regulator, a blast radius), not a generic principle.
### Reflection
- Where is your team's default biased — toward too much ceremony or
too little?
### Stretch
- Repeat the exercise blind with a teammate; compare. Disagreement
is data.
---
## Lab 2.2 — Rewrite one vague ticket as binary ACs
### Objective
Convert one prose ticket into 3 acceptance criteria a machine could
decide. Discover what the prose was silently leaving you to guess.
### Time
20 minutes.
### Real-world scenarios
If your tracker doesn't have a sufficiently vague candidate:
- **A — "Improve search relevance on the products page."** Surfaces
ACs about median match position, deterministic tiebreaker,
latency budget.
- **B — "Show user-friendly validation errors."** Surfaces ACs
about stable error codes, locale, no PII echo, max errors per
field.
- **C — "Make the dashboard faster."** Surfaces ACs about p95 TTFB
on a defined fixture, Lighthouse score, CSP compliance.
### Setup
Open `labs/notes/2.2-executable-spec.md`. Paste the prose ticket
under `## Before`.
### Steps
1. Under `## After`, write **3 binary acceptance criteria** (AC1,
AC2, AC3). Each names a verification mechanism (test name, log
assertion, header check).
2. Include **one security or edge-case AC** that wasn't obvious
from the prose (no enumeration, no PII leak, idempotency, etc.).
3. Add an `## Open questions` section for anything you don't
actually know. At least one entry — silent guesses are the bug.
### Deliverable
The completed before/after note with explicit ACs and open
questions.
### Success criteria
- Each AC names a *specific* check (a test name, a header, an
observable shape).
- At least one open question is documented, not silently resolved.
### Reflection
- Which AC was hardest to make binary? That's where your team was
tacitly agreeing on something nobody had stated.
### Stretch
- Feed the spec to an Opus session: *"ultrathink. find three inputs
not covered by these ACs where the implementation could fail."*
Add the gaps as new ACs.
---
## Lab 2.3 — Write three Gherkin scenarios
### Objective
Translate one feature into scenarios precise enough to act as both
the agent's plan and the test suite's source.
### Time
20 minutes.
### Real-world scenarios
- **A — Coupon engine.** Happy path + expired-code + stacking rule
(non-obvious).
- **B — Webhook delivery.** Happy path + retry on 500 + idempotency
on duplicate `event_id`.
- **C — Rate-limit headers.** Under quota + over quota +
`Retry-After` semantics across error responses.
### Setup
`mkdir -p features && touch features/<feature>.feature`.
### Steps
1. Draft **3 scenarios** in the file: happy path, common failure,
one *non-obvious property*.
2. Re-read them as the PM. If two engineers could implement
different code and both pass, tighten.
3. Hand the file to an agent in plan mode:
*"Propose the test code for each scenario. Don't write product
code yet."* Review the proposed tests for tautology.
### Deliverable
`features/<feature>.feature` committed + the proposed test file
under your usual test directory.
### Success criteria
- Each scenario reads aloud unambiguously.
- The non-obvious-property scenario's test *would actually fail*
against the current code if applicable.
### Reflection
- Could a non-engineer add a fourth scenario to this file? If not,
it's written too programmer-y for its audience.
### Stretch
- Add a scenario you expect to *break the existing code*. Watch the
agent identify the regression.
---
## Lab 2.4 — Draft an OpenSpec proposal + critic pass
### Objective
Experience proposal-before-code on a real change. Stop at the
critic pass; implementation is for another session.
### Time
20 minutes.
### Real-world scenarios
- **A — API versioning bump** (`/v1` → `/v2`, deprecation window).
- **B — Schema change** (column type bump on a hot table — locks,
rollback, replica lag).
- **C — Feature flag rollout** (cohort, kill switch, telemetry).
### Setup
```bash
mkdir -p openspec/changes/$(date +%Y-%m)-<slug>
```
### Steps
1. Write `openspec/changes/<slug>/proposal.md` following the §2.4
template: Why / Spec delta / Acceptance criteria / Open
questions. Have an agent help with the delta — but defend every
bullet yourself.
2. Run a **critic pass** in a fresh session:
*"Review this proposal as a skeptic. List every assumption
you'd push back on. Don't propose changes."*
3. Iterate the proposal once based on the critic's strongest
pushback. Stop.
### Deliverable
`openspec/changes/<slug>/proposal.md` committed + a short note
`labs/notes/2.4-proposal.md` with the critic's top 3 pushback items.
### Success criteria
- Every open question is either *answered* or *deferred with a
date* — no silent guesses.
- A teammate could read the proposal cold and arrive at the same
understanding of what changes and why.
### Reflection
- Which open question did you only see because the proposal forced
you to look at it?
### Stretch
- Run the critic step with **two different reviewer agents**.
Compare what each pushes back on.
---
## Lab 2.5 — Prose-prompt vs spec-as-prompt, head-to-head
### Objective
Quantify the cost and quality delta on a tiny scope you can run
twice in 20 minutes.
### Time
20 minutes.
### Real-world scenarios
- **A — New REST endpoint** with validation schema.
- **B — Stripe webhook handler** for a new event type.
- **C — One-AC accessibility fix** (focus trap, ARIA, return focus).
### Setup
A scratch branch. A scope small enough to attempt in ~7 minutes per
run — *one acceptance criterion* from the Lab 2.4 proposal, ideally.
### Steps
1. **Prose pass.** Fresh session. Describe the change in chat (no
file pointer). Implement. Save `/cost` and the diff.
2. `git reset --hard <base>`.
3. **Spec-as-prompt pass.** Fresh session:
```
Implement openspec/changes/<slug>/proposal.md, scoped to AC<N>.
Every AC needs a matching test. STOP and ask if an open question
is unresolved.
```
Save `/cost` and the diff.
### Deliverable
`labs/notes/2.5-prose-vs-spec.md` with the two runs' tokens, diff
size, and one paragraph titled *"When I'd deliberately choose the
prose form."*
### Success criteria
- Spec pass tokens are lower per *correction-free* turn. (First
turn may be higher due to file read.)
- The spec pass had fewer mid-session "actually, no…" interjections
from you.
### Reflection
- Where did the prose form silently *paraphrase* the spec into
something subtly wrong? Find at least one.
### Stretch
- Add prompt caching (Module 7) to the spec-as-prompt session and
re-measure.
---
## Wrap-up
In `labs/notes/`: `2.1-spectrum.md` through `2.5-prose-vs-spec.md`.
In the repo: `openspec/changes/<slug>/proposal.md`,
`features/<feature>.feature`.
These artifacts become the contracts the verifier agents in
Module 6 work against — without them, the verifiers have nothing to
verify.
**Next:** [Labs — Module 3: Agentic CLI Tools](03-agentic-cli-tools-labs.md)