# Labs — Module 7: The Human-in-the-Loop & Strategic ROI
> Four short labs that turn the orchestrator role from a metaphor
> into a measurable practice. Each fits in a single focused block;
> the whole set is ~70 minutes. The follow-through afterwards is the
> work.
| Lab | Title | Time | Maps to |
|-----|--------------------------------------------------|--------|--------------------------------------|
| 7.1 | Bucket last week against the orchestrator shape | 15 min | §7.1 Your weekly shape |
| 7.2 | One cost lever, measured before and after | 20 min | §7.2 Cost optimization |
| 7.3 | Stand up a two-metric scorecard | 20 min | §7.3 Measuring velocity |
| 7.4 | Decommission one component | 15 min | §7.4 When to remove an agent |
---
## Lab 7.1 — Bucket last week against the orchestrator shape
### Objective
Compare your actual week against the §7.1 shape. Name one shift
you'll make.
### Time
15 minutes.
### Real-world scenarios
- **A — IC engineer at mid-size SaaS.** Hands-on too high; specs
too low. Shift: "this week, one OpenSpec proposal before code."
- **B — Tech lead on platform.** Tuning eating 25%+; firefighting
prompts. Shift: "promote three `AGENTS.md` rules to hooks."
- **C — Architect at large enterprise.** Meetings eating 40%+
mis-labeled as strategic. Shift: collapse two recurring meetings
to async.
### Setup
Calendar + ticket history + recent PRs visible.
### Steps
1. Bucket last week's working hours into: Specs / Reviews / Tuning /
Strategic / Hands-on / Meetings. Approximations are fine.
2. Tabulate `Category | Hours | % | §7.1 target % | Delta` in
`labs/notes/7.1-week.md`.
3. Write one observable, dated shift: *"This week I'll do X by
Friday."*
### Deliverable
The week table + the one-shift commitment in `labs/notes/7.1-week.md`.
### Success criteria
- All hours accounted for; no "miscellaneous" >10%.
- The shift is *observable* — a teammate could verify on Friday.
### Reflection
- Which category surprised you most — over or under?
### Stretch
- Repeat next week. The shift should show in the numbers, or it
wasn't real.
---
## Lab 7.2 — One cost lever, measured before and after
### Objective
Apply a single cost lever to one real workflow and measure the
delta. Don't claim 90%; report what *you* got.
### Time
20 minutes.
### Real-world scenarios
- **A — CI-heavy team.** Lever: prompt caching on reviewer prefix.
Expected delta: 40–50% per-PR.
- **B — Local-session-heavy.** Lever: bounded sessions (end at task
boundaries). Expected: ~30% daily cost.
- **C — Mixed workload.** Lever: route one Opus usage to Sonnet
where it doesn't earn Opus. Expected: 20–40% on that path.
### Setup
Pull a 5-sample baseline before changing anything (5 PRs or 5
sessions), record cost.
### Steps
1. Apply the lever (one config change, not a rewrite).
2. Run 3–5 more samples on the same workflow.
3. Compute before/after deltas. Compare against expected.
### Deliverable
`labs/notes/7.2-cost.md` with baseline, the lever, after-numbers,
the % delta, and one observed quality regression (if any).
### Success criteria
- Real measurements, before and after, on the *same* workflow type.
- Honesty about regressions: if quality dropped, name it.
### Reflection
- Was the lever you expected to pay most actually the biggest? Or
did something else surprise you?
### Stretch
- Add a recurring cost snapshot (Stop hook → CSV append) so the
data accumulates without effort.
---
## Lab 7.3 — Stand up a two-metric scorecard
### Objective
Pick two metrics, define how each is measured, capture baselines,
publish. Skip target-setting depth; that's the follow-through.
### Time
20 minutes.
### Real-world scenarios
- **A — 5-person product team.** Metrics: lead time, cost per
merged PR.
- **B — 15-person platform team.** Metrics: MTTR, PRs auto-reviewed
on first pass.
- **C — 50-engineer org.** Metrics: change failure rate, specs
authored per engineer per month.
### Setup
Open `labs/notes/7.3-scorecard.md`.
### Steps
1. Pick two metrics from §7.3 that fit your team. Skip vanity
(lines of code, PR count).
2. For each: define data source, query/script, owner, cadence
(weekly).
3. Capture today's baseline value for each. No guesses.
### Deliverable
`labs/notes/7.3-scorecard.md` published somewhere persistent
(wiki, Slack canvas, or repo `dashboard.md`).
### Success criteria
- Each metric has an *operational definition* — a teammate could
compute the same number.
- Baselines are *measured*, not estimated.
### Reflection
- Which metric might *worsen* short-term as you adopt agentic
workflows? Plan for it.
### Stretch
- Add one *leading* metric — something that predicts movement in
the two lagging ones.
---
## Lab 7.4 — Decommission one component
### Objective
Practice the under-discussed discipline: *remove* automation that
isn't earning. Today, retire one thing.
### Time
15 minutes.
### Real-world scenarios
- **A — Early stage (4–6 weeks in).** Many speculative components.
Easy retirement candidates.
- **B — Mature (3–6 months in).** Specific friction points known.
Tune most; retire the worst offenders.
- **C — Year-old platform.** Several prompts now obsolete because
the underlying model improved. Retire eagerly.
### Setup
Open `labs/notes/7.4-decommission.md` with columns:
`Component | What it does | Last earned its keep | Verdict`.
### Steps
1. Inventory: list every sub-agent, skill, MCP server, hook, and
`AGENTS.md` rule you've installed.
2. Per row, name a concrete event in the last month where it
earned its keep. If you can't, mark RETIRE.
3. Execute **one** retirement: delete the file or disable the
config. No commented-out leftovers.
### Deliverable
The audit table + the retirement commit + `labs/notes/7.4-decommission.md`
with the retired component and the *reason*.
### Success criteria
- At least one component is *actually deleted* from the repo. Not
flagged, not deprecated — gone.
- The reason is data ("hasn't fired usefully in 6 weeks"), not
vibes.
### Reflection
- Was the retirement painful? Sunk cost, identity, or actual loss?
### Stretch
- Calendar the next decommission audit for 90 days from today.
---
## Wrap-up: end of the course
In `labs/notes/`: `7.1` through `7.4`.
In your repo / process:
- A published two-metric scorecard with baselines.
- A measurably cheaper workflow on one lever.
- Fewer, sharper automations than before this lab.
You now have, across all seven modules' labs:
- A **configured, governed agentic CLI** (Modules 3–4).
- A **memory stack** that survives sessions and teammates (Module 5).
- A **specialized agent fleet** with verifier pipelines (Module 6).
- A **defensible orchestrator practice** with numbers behind it
(Module 7).
The agents will keep getting better. So should the system you wrap
around them.
---
## Where to go next
- Repeat the labs in a different repo. Different shape, different
gaps.
- Teach one lab to a teammate.
- When the tooling shifts under the instructions, open a PR.
Living documents stay alive only when their readers tend them.