04-extending-intelligence - Albert Masoliver's learning site

# Module 4 — Extending AI Intelligence > *"The base agent is a generalist. You make it useful by teaching it what > your system actually looks like — and by letting it touch the same tools > you do."* --- ## Learning objectives By the end of this module you will be able to: 1. Package reusable workflows and domain knowledge as **Skills** with discoverable `SKILL.md` files. 2. Connect an agent to external systems (Jira, Notion, Postgres, your own internal API) via the **Model Context Protocol (MCP)**. 3. Wire **hooks** that enforce quality gates and automate repetitive ceremony without a human prompt in the loop. 4. Decide — for any given capability — whether it belongs in a skill, an MCP server, or a hook. --- ## 4.1 The extension trifecta Out of the box, an agentic CLI knows how to read files, edit them, and run shell commands. That's surprisingly powerful, and also where most teams plateau. Three complementary extension mechanisms break through: | Mechanism | Adds… | Best for… | |-----------|---------------------------------------------|----------------------------------------| | Skills | Reusable *procedures* and *domain knowledge*| "When asked to X, do Y the way we do." | | MCP | New *tools* and *data sources* | Talking to systems that aren't files. | | Hooks | *Deterministic* automation around events | Lint on save, block secrets, gate PRs. | You will use all three. The mistake is reaching for the wrong one — building a skill where a hook belongs, or an MCP server where a skill would have done. §4.5 has the decision rubric. --- ## 4.2 Skills — packaging knowledge as discoverable procedures ### Anatomy of a skill A **Skill** is a directory containing at minimum a `SKILL.md` file whose frontmatter describes when the skill should be loaded. The agent's harness indexes the descriptions and offers the matching skill to the model when a relevant request arrives. ``` .claude/skills/ ├── release-checklist/ │ ├── SKILL.md │ ├── CHANGELOG_TEMPLATE.md │ └── scripts/ │ └── verify_version_bump.sh └── seed-test-data/ ├── SKILL.md └── fixtures/ ├── users.json └── orders.json ``` A minimal `SKILL.md`: ```markdown --- name: release-checklist description: | Walk a Claude session through our release process: version bump, changelog update, tag creation, GitHub release notes. Use when the user says "cut a release", "tag v*", or "publish a new version". --- # Release checklist Steps, in order: 1. Confirm the working tree is clean (`git status`). Abort if not. 2. Determine version bump (semver). If unclear, ASK. 3. Update `package.json` and `package-lock.json`. 4. Run `scripts/verify_version_bump.sh`. Abort on failure. 5. Update `CHANGELOG.md` from the commits since the last tag — use the template in `CHANGELOG_TEMPLATE.md`. Group commits by type (feat / fix / chore). 6. Commit with message `chore(release): vX.Y.Z`. 7. Tag `vX.Y.Z` and push the tag. 8. STOP. Do not push to main; the human creates the GitHub release. ## Notes - We *never* skip step 1. A dirty tree is the most common cause of bad releases. - The template references `## Unreleased` — replace it with the new version header, do not delete history. ``` Three properties make this a *good* skill: - **The description is specific.** Not "release stuff" — it lists the trigger phrases. Skills are matched on description quality. - **Steps are operational, not aspirational.** "Run X. If Y, abort." The model executes the recipe; it doesn't have to invent it. - **It hands off to the human at the right moment.** Step 8 stops short of publishing. Skills that try to do too much are unsafe. ### How skills get discovered In Claude Code, skills live in: - `.claude/skills/` (project, committed) — team-shared. - `~/.claude/skills/` (user) — personal. - Plugins — for distribution beyond a single repo. The harness reads each `SKILL.md` frontmatter and surfaces matches to the model based on the *description* string. If your skill never seems to load, the description is the first thing to fix: be explicit about trigger phrases and the artifacts the skill produces. ### Skills vs `AGENTS.md` A natural question: *what goes in a skill and what goes in `AGENTS.md`?* - `AGENTS.md` describes **standing rules** that apply to *every* session. "We use Vitest." - A **skill** describes a **procedure** to invoke *on demand*. "When asked to cut a release, do this." If you find yourself writing "When the user asks to deploy, …" in `AGENTS.md`, promote it to a skill. The agent will load it only when relevant and your standing rules stay short. ### Composing skills Skills can reference other skills. A `cut-release` skill might call a `generate-changelog` skill. Keep the dependency tree shallow — two levels deep at most — and resist the urge to build a framework. A flat folder of focused skills outperforms a clever hierarchy every time. --- ## 4.3 Model Context Protocol (MCP) ### What MCP is MCP is an open protocol that lets an agent talk to external systems via a small set of standardized verbs (list resources, read a resource, call a tool). Each external system is exposed by an **MCP server** — a process the agent launches and converses with. The point: instead of the agent shelling out to ten different CLIs with ten different output formats, it gets a uniform interface. From the model's perspective, querying Jira looks identical to querying Postgres. ### A working `.claude/mcp.json` ```jsonc { "mcpServers": { "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/alice/work"], "type": "stdio" }, "postgres-staging": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://readonly:***@staging-db.internal/app?sslmode=require"], "type": "stdio" }, "linear": { "command": "npx", "args": ["-y", "@tacticlaunch/mcp-linear"], "env": { "LINEAR_API_KEY": "${LINEAR_API_KEY}" }, "type": "stdio" }, "internal-api": { "url": "https://mcp.internal.company.com", "type": "http", "headers": { "Authorization": "Bearer ${INTERNAL_MCP_TOKEN}" } } } } ``` Verify with `claude /mcp`: ``` MCP servers: ✔ filesystem (12 tools, 3 resources) ✔ postgres-staging (2 tools) ✔ linear (7 tools) ✔ internal-api (5 tools, 1 prompt) ``` ### What to put behind MCP **Yes:** - Read-only access to production-shaped data (staging DB, log search, metric queries). - Ticketing systems where the agent files / updates issues. - Internal APIs the agent calls during code-gen ("look up the canonical shape of this entity"). - Documentation systems (Confluence, Notion) used for context. **No, or carefully:** - Anything that mutates production. Default to read-only MCP servers and add write capability surgically. - Secrets vaults. The MCP server itself becomes a tempting attack surface. If you must, use scoped, short-lived credentials. - Replacing things the filesystem already does well. The model can read your repo; you don't need a `repo-reader` MCP server. > **Security:** an MCP server runs with whatever credentials you give it. > The agent calling that server can request *any* tool the server exposes. > Audit the server's tool surface and apply least privilege — read-only, > single-database, single-project where possible. ### Writing your own MCP server A minimal Python MCP server using the official SDK: ```python # servers/feature_flags.py from mcp.server.fastmcp import FastMCP import httpx, os mcp = FastMCP("feature-flags") @mcp.tool() async def get_flag(name: str, env: str = "staging") -> dict: """Return the current value of a feature flag in the given environment.""" async with httpx.AsyncClient() as c: r = await c.get( f"https://flags.internal/api/{env}/{name}", headers={"Authorization": f"Bearer {os.environ['FLAGS_TOKEN']}"}, ) r.raise_for_status() return r.json() @mcp.tool() async def list_flags(env: str = "staging", prefix: str | None = None) -> list[str]: """List flag names in the environment, optionally filtered by prefix.""" async with httpx.AsyncClient() as c: r = await c.get( f"https://flags.internal/api/{env}", headers={"Authorization": f"Bearer {os.environ['FLAGS_TOKEN']}"}, ) r.raise_for_status() names = [f["name"] for f in r.json()] return [n for n in names if not prefix or n.startswith(prefix)] if __name__ == "__main__": mcp.run() ``` Wire it into `.claude/mcp.json`: ```jsonc { "mcpServers": { "feature-flags": { "command": "python", "args": ["servers/feature_flags.py"], "env": { "FLAGS_TOKEN": "${FLAGS_TOKEN}" } } } } ``` Once registered, the agent will, in plan mode, be able to answer "what flags are currently on in staging?" without ever asking you. That round-trip is where MCP earns its keep. --- ## 4.4 Hooks — deterministic automation ### What a hook is A **hook** is a shell command the harness runs on a specific lifecycle event — before a tool call, after a tool call, on session start, on Stop, etc. Hooks are **deterministic**: they don't go through the model. They run because the event fired. Use hooks for things you want **always** to happen, never sometimes. ### Common hook patterns ```jsonc // .claude/settings.json { "hooks": { "PostToolUse": [ { "matcher": "Edit|Write|MultiEdit", "hooks": [ { "type": "command", "command": "npm run lint:fix --silent || true" }, { "type": "command", "command": "scripts/check-secrets.sh" } ] } ], "PreToolUse": [ { "matcher": "Bash", "hooks": [ { "type": "command", "command": "scripts/deny-dangerous-bash.sh" } ] } ], "Stop": [ { "hooks": [ { "type": "command", "command": "scripts/post-session-summary.sh" } ] } ] } } ``` What each is doing: - **PostToolUse → Edit/Write**: every time the agent modifies a file, run the linter (autofix) and a secret-scanner. The lint result is fed back into the conversation so the agent sees its own mess. - **PreToolUse → Bash**: gate every Bash invocation through a script that can refuse (exit code non-zero) commands matching team policy (`rm -rf /`, force pushes, package installs not in `package.json`). - **Stop**: when the session ends, write a one-line summary to a project log. Useful for retros and for billing dispute resolution. ### A real `check-secrets.sh` ```bash #!/usr/bin/env bash # scripts/check-secrets.sh # Block commits containing obvious secret patterns. Called by PostToolUse. set -euo pipefail DIFF=$(git diff --cached --no-color 2>/dev/null || true) PATTERNS=( 'AKIA[0-9A-Z]{16}' # AWS access key 'sk-[A-Za-z0-9]{20,}' # OpenAI / Anthropic style 'github_pat_[A-Za-z0-9_]+' # GitHub PAT '-----BEGIN (RSA|EC|OPENSSH) PRIVATE KEY-----' ) for p in "${PATTERNS[@]}"; do if echo "$DIFF" | grep -E -q "$p"; then echo "BLOCKED: matched secret pattern /$p/" >&2 exit 2 # non-zero feeds the message back into the conversation fi done ``` The exit code matters. **Exit 0** = silent success. **Exit 2** = block and surface the stderr to the model so it can correct. **Other non-zero** = error visible to the user. ### Why hooks beat asking the model to do it If you put "always run the linter after editing" in `AGENTS.md`, the model will *usually* remember. Usually is not good enough for safety-critical behavior. A hook fires *every time*, and you can prove it. > **Heuristic:** every "always" or "never" rule in your `AGENTS.md` that > involves running a command is a candidate for promotion to a hook. --- ## 4.5 Skill vs MCP vs Hook — the decision rubric Same task, three possible homes. How to pick: | Question | If yes → | |-------------------------------------------------------------------|----------| | Is this a **procedure** the model decides *when* to invoke? | Skill | | Is this a **tool / data source** the model calls during a task? | MCP | | Does this need to happen **deterministically** on an event? | Hook | | Is it knowledge that should be **available every session**? | `AGENTS.md` | Worked examples: - *"Walk through our release process when asked."* → **Skill.** Model decides when. Procedure-shaped. - *"Let the agent query our staging database."* → **MCP.** Data source, needed mid-task. - *"Run the linter after every file edit."* → **Hook.** Must happen every time, no model decision involved. - *"We use Vitest for tests."* → **`AGENTS.md`.** Standing fact, every session. When a capability has multiple dimensions, you compose: - *"Run the data-quality SQL suite when asked to verify a migration."* - **MCP** server exposes the SQL queries. - **Skill** documents the procedure (which queries, in what order, how to interpret results). - **Hook** fires the procedure after every migration file edit. --- ## Lab 4 — Build one of each **Goal:** internalize the three extension mechanisms by shipping a useful one of each. **Time:** ~90 minutes. 1. **Skill.** Pick a repetitive procedure you do at least weekly (releasing, onboarding a teammate, generating a new module from a template). Write it as a `SKILL.md`. Verify it loads — ask the agent to invoke it in a fresh session. 2. **MCP server.** Add the official Postgres MCP server with a read-only user pointed at a staging or local database. Confirm with `/mcp` and prompt the agent: *"summarize the schema in plan mode."* 3. **Hook.** Add a PostToolUse hook for `Edit|Write` that runs your formatter. Make a deliberate badly-formatted edit through the agent and watch the hook fix it before the next turn. **What to look for:** the second time you use each one, you'll notice the session is *quieter* — fewer "do you want me to format this?" round-trips, fewer hand-held tool invocations. That quiet is the extension paying off. --- ## Common pitfalls - **Vague skill descriptions.** "Helpful skill for releases" won't match. Name the trigger phrases. - **MCP servers running as god.** A single server with read/write access to production via long-lived credentials is one prompt injection from a bad day. Scope and rotate. - **Hooks that block silently.** A hook that exits non-zero with no stderr produces a session that "just stops working." Always emit a message explaining why the hook blocked. - **Reaching for tooling before the basics work.** If the agent can't read your code well in plain Claude Code, no amount of MCP wiring will fix that. Verify your `AGENTS.md` and LSP setup first. --- ## Summary - Skills, MCP servers, and hooks are the three primary extension surfaces. They are complementary, not interchangeable. - Skills package procedures; MCP exposes tools and data; hooks enforce determinism. - Good extensions reduce the number of words you have to type in your prompts — that's the measure. - Audit and rotate. Every extension is also an attack surface. --- ## Further reading - *Model Context Protocol* — spec, reference servers, and SDKs. - *Claude Code Skills* — official guide and example skills. - *Anthropic — Hooks* documentation, especially the lifecycle event reference. **Next:** [Module 5 — Memory Orchestration & Context Engineering](05-memory-orchestration.md)