03-agentic-cli-tools-labs - Albert Masoliver's learning site

# Labs — Module 3: Mastery of Agentic CLI Tools > Four short labs that turn an unconfigured repo into a governed, > permission-safe agentic environment. Each lab fits in a single > focused block; the whole set is ~70 minutes. | Lab | Title | Time | Maps to | |-----|--------------------------------------------------|--------|--------------------------------------| | 3.1 | Add a permission policy + verified LSP | 15 min | §3.2 Settings & permissions | | 3.2 | Write a tight AGENTS.md (under 80 lines) | 20 min | §3.4 Governing the project | | 3.3 | Plan mode vs Build mode — one A/B | 20 min | §3.3 Plan mode vs Build mode | | 3.4 | Configure one alternate model route | 15 min | §3.2 Multi-provider configuration | --- ## Lab 3.1 — Add a permission policy + verified LSP ### Objective Stop being prompted for routine commands. Block dangerous ones at the policy layer. Confirm LSP is running. ### Time 15 minutes. ### Real-world scenarios - **A — TypeScript monorepo (pnpm + Turbo).** Allow `Bash(pnpm test:*)`, `Bash(turbo run lint --filter:*)`. Deny `pnpm publish`. - **B — Python data pipeline (poetry + Airflow).** Allow `Bash(poetry run pytest:*)`, `Bash(airflow dags list)`. Deny `airflow dags trigger`. - **C — Go microservices (Make-driven).** Allow `Bash(make test:*)`, `Bash(go test ./...)`. Deny `make deploy`, `kubectl apply`, `terraform apply`. ### Setup ```bash mkdir -p .claude && touch .claude/settings.json echo ".claude/settings.local.json" >> .gitignore ``` ### Steps 1. Write `.claude/settings.json` with `model`, 3 `allow` patterns for routine commands, and 2 `deny` patterns for danger. 2. Run `claude /diagnostics`; confirm LSP for your primary language is listed. If not, install the standard server. 3. Open a fresh session. Trigger one allow-listed command (should run silently) and ask the agent to attempt one denied command (should refuse). ### Deliverable `.claude/settings.json` committed + `labs/notes/3.1-permissions.md` listing the allow/deny lines, the LSP server confirmed, and the denied-command transcript snippet. ### Success criteria - A routine session needs **zero** permission prompts for the allowed patterns. - The denied command is *blocked at the policy layer*, not just unallowed. ### Reflection - Which pattern did you almost wildcard? What restrains you? ### Stretch - Move one personal-only allowance into `.claude/settings.local.json` (gitignored). --- ## Lab 3.2 — Write a tight AGENTS.md (under 80 lines) ### Objective Codify the conventions an agent *can't infer from the code* in one file the harness loads automatically. ### Time 20 minutes. ### Real-world scenarios - **A — Mature SaaS.** Heavy on "use *this* not *that*" with rationale (Zod not class-validator; custom logger not winston). - **B — Greenfield startup.** Short. Protects the few decisions that exist (feature-folder layout, error shape). - **C — Legacy modernization.** Names the migration direction and the rules during transit ("new code goes through the service layer; do not introduce a new ORM"). ### Setup `touch AGENTS.md`. ### Steps 1. Brainstorm 10 things you'd tell a new engineer joining tomorrow. 2. **Cull the inferable.** Cross out anything an agent could derive by reading 2–3 files. 3. Write `AGENTS.md` with three sections: Stack (one-line each), Non-obvious rules (with rationale), How to work with agents here. Stay under 80 lines. 4. Smoke test: fresh session, ask for a change that *should* trigger one of your non-obvious rules. Observe whether the agent honors it without prompting. ### Deliverable `AGENTS.md` committed + `labs/notes/3.2-agents-md.md` with the smoke-test prompt and a yes/no. ### Success criteria - File is **under 80 lines**. - Each non-obvious rule has a *why* (or links to a doc that does). - The smoke test produced behavior consistent with at least one rule the agent had no other way to know. ### Reflection - Which rule was hardest to articulate? That rule lives in your team's head, not the code. ### Stretch - Audit: ask an agent to read `AGENTS.md` and list rules it *thinks* are no longer followed in the codebase. Delete what's dead. --- ## Lab 3.3 — Plan mode vs Build mode, one A/B ### Objective Feel the difference between plan-first and dive-in on the *kind* of change where it matters most: cross-cutting. ### Time 20 minutes. ### Real-world scenarios - **A — Cross-cutting feature.** Add request-id propagation across 3 services and the shared middleware. - **B — Cross-module refactor.** Replace a global constant with an injected config across ~4 files. - **C — Cross-team renaming.** Rename a domain entity in code, tests, and one fixture set. ### Setup A scratch branch off a clean main. ### Steps 1. **Run X — Build mode.** Fresh session. Make the change as you normally would. Save the diff. Count mid-task corrections you had to issue. 2. `git reset --hard <base>`. 3. **Run Y — Plan → Build.** `/plan`, get a plan, approve/edit, then build. Save the diff. Count corrections. ### Deliverable `labs/notes/3.3-plan-vs-build.md` with a two-row table (`Mode | Corrections | Tests pass | Quality 1–5 | Wall-clock`) and a one-line *"My rule for when to enter plan mode."* ### Success criteria - The plan-mode run had **fewer corrections**. If equal or worse, the change wasn't cross-cutting enough — pick a bigger one and redo. - Your rule references the *change shape*, not "important changes." ### Reflection - What was the in-session cue that, retrospectively, should have pushed you into plan mode earlier? ### Stretch - Repeat with a surgical bug fix. Plan mode often *hurts* there — good data for calibrating the rule. --- ## Lab 3.4 — Configure one alternate model route ### Objective Stop typing `--model` every time. One agent role routes automatically to a non-default model. ### Time 15 minutes. ### Real-world scenarios - **A — Cost-conscious startup.** Scout role → Haiku for cheap search. - **B — Reliability-focused.** Default through Bedrock; fallback to direct Anthropic. - **C — Language-specialty workload.** Rust/Elixir tasks route to a model your benchmarks show is best for the language. ### Setup Identify one role to route — `architect`, `scout`, or any agent from your Module 6 plans. ### Steps 1. In `.claude/agents/<role>.md` frontmatter (or `opencode.json` `agents` block), set the `model` for that role to something *different* from your default. 2. Invoke the role with a tiny task: `claude --agent <role> "ping"`. 3. Verify in `/cost` that the expected model was used. ### Deliverable The committed routing config + `labs/notes/3.4-routing.md` with the role, the model, and one line on the use case. ### Success criteria - The smoke invocation actually used the configured model — verified by `/cost`, not assumed. - You can explain in one sentence why this role got this model. ### Reflection - Where did you almost route a role "up" to a bigger model out of caution rather than reason? ### Stretch - Simulate a provider outage by temporarily unsetting that provider's API key. Confirm the failure surfaces *cleanly* rather than silently degrading. --- ## Wrap-up In the repo (committed): `.claude/settings.json`, `AGENTS.md`, one routed agent definition. In `labs/notes/`: `3.1` through `3.4`. You now have a configured, governed, permission-safe project. The next module turns this base into a platform. **Next:** [Labs — Module 4: Extending AI Intelligence](04-extending-intelligence-labs.md)