# Module 4 — Extending AI Intelligence
> *"The base agent is a generalist. You make it useful by teaching it what
> your system actually looks like — and by letting it touch the same tools
> you do."*
---
## Learning objectives
By the end of this module you will be able to:
1. Package reusable workflows and domain knowledge as **Skills** with
discoverable `SKILL.md` files.
2. Connect an agent to external systems (Jira, Notion, Postgres, your own
internal API) via the **Model Context Protocol (MCP)**.
3. Wire **hooks** that enforce quality gates and automate repetitive ceremony
without a human prompt in the loop.
4. Decide — for any given capability — whether it belongs in a skill, an MCP
server, or a hook.
---
## 4.1 The extension trifecta
Out of the box, an agentic CLI knows how to read files, edit them, and run
shell commands. That's surprisingly powerful, and also where most teams
plateau. Three complementary extension mechanisms break through:
| Mechanism | Adds… | Best for… |
|-----------|---------------------------------------------|----------------------------------------|
| Skills | Reusable *procedures* and *domain knowledge*| "When asked to X, do Y the way we do." |
| MCP | New *tools* and *data sources* | Talking to systems that aren't files. |
| Hooks | *Deterministic* automation around events | Lint on save, block secrets, gate PRs. |
You will use all three. The mistake is reaching for the wrong one — building
a skill where a hook belongs, or an MCP server where a skill would have done.
§4.5 has the decision rubric.
---
## 4.2 Skills — packaging knowledge as discoverable procedures
### Anatomy of a skill
A **Skill** is a directory containing at minimum a `SKILL.md` file whose
frontmatter describes when the skill should be loaded. The agent's harness
indexes the descriptions and offers the matching skill to the model when a
relevant request arrives.
```
.claude/skills/
├── release-checklist/
│ ├── SKILL.md
│ ├── CHANGELOG_TEMPLATE.md
│ └── scripts/
│ └── verify_version_bump.sh
└── seed-test-data/
├── SKILL.md
└── fixtures/
├── users.json
└── orders.json
```
A minimal `SKILL.md`:
```markdown
---
name: release-checklist
description: |
Walk a Claude session through our release process: version bump,
changelog update, tag creation, GitHub release notes. Use when the
user says "cut a release", "tag v*", or "publish a new version".
---
# Release checklist
Steps, in order:
1. Confirm the working tree is clean (`git status`). Abort if not.
2. Determine version bump (semver). If unclear, ASK.
3. Update `package.json` and `package-lock.json`.
4. Run `scripts/verify_version_bump.sh`. Abort on failure.
5. Update `CHANGELOG.md` from the commits since the last tag — use the
template in `CHANGELOG_TEMPLATE.md`. Group commits by type
(feat / fix / chore).
6. Commit with message `chore(release): vX.Y.Z`.
7. Tag `vX.Y.Z` and push the tag.
8. STOP. Do not push to main; the human creates the GitHub release.
## Notes
- We *never* skip step 1. A dirty tree is the most common cause of bad
releases.
- The template references `## Unreleased` — replace it with the new
version header, do not delete history.
```
Three properties make this a *good* skill:
- **The description is specific.** Not "release stuff" — it lists the
trigger phrases. Skills are matched on description quality.
- **Steps are operational, not aspirational.** "Run X. If Y, abort." The
model executes the recipe; it doesn't have to invent it.
- **It hands off to the human at the right moment.** Step 8 stops short of
publishing. Skills that try to do too much are unsafe.
### How skills get discovered
In Claude Code, skills live in:
- `.claude/skills/` (project, committed) — team-shared.
- `~/.claude/skills/` (user) — personal.
- Plugins — for distribution beyond a single repo.
The harness reads each `SKILL.md` frontmatter and surfaces matches to the
model based on the *description* string. If your skill never seems to load,
the description is the first thing to fix: be explicit about trigger phrases
and the artifacts the skill produces.
### Skills vs `AGENTS.md`
A natural question: *what goes in a skill and what goes in `AGENTS.md`?*
- `AGENTS.md` describes **standing rules** that apply to *every* session.
"We use Vitest."
- A **skill** describes a **procedure** to invoke *on demand*. "When asked
to cut a release, do this."
If you find yourself writing "When the user asks to deploy, …" in
`AGENTS.md`, promote it to a skill. The agent will load it only when
relevant and your standing rules stay short.
### Composing skills
Skills can reference other skills. A `cut-release` skill might call a
`generate-changelog` skill. Keep the dependency tree shallow — two levels
deep at most — and resist the urge to build a framework. A flat folder of
focused skills outperforms a clever hierarchy every time.
---
## 4.3 Model Context Protocol (MCP)
### What MCP is
MCP is an open protocol that lets an agent talk to external systems via a
small set of standardized verbs (list resources, read a resource, call a
tool). Each external system is exposed by an **MCP server** — a process
the agent launches and converses with.
The point: instead of the agent shelling out to ten different CLIs with ten
different output formats, it gets a uniform interface. From the model's
perspective, querying Jira looks identical to querying Postgres.
### A working `.claude/mcp.json`
```jsonc
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/alice/work"],
"type": "stdio"
},
"postgres-staging": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres",
"postgresql://readonly:***@staging-db.internal/app?sslmode=require"],
"type": "stdio"
},
"linear": {
"command": "npx",
"args": ["-y", "@tacticlaunch/mcp-linear"],
"env": { "LINEAR_API_KEY": "${LINEAR_API_KEY}" },
"type": "stdio"
},
"internal-api": {
"url": "https://mcp.internal.company.com",
"type": "http",
"headers": { "Authorization": "Bearer ${INTERNAL_MCP_TOKEN}" }
}
}
}
```
Verify with `claude /mcp`:
```
MCP servers:
✔ filesystem (12 tools, 3 resources)
✔ postgres-staging (2 tools)
✔ linear (7 tools)
✔ internal-api (5 tools, 1 prompt)
```
### What to put behind MCP
**Yes:**
- Read-only access to production-shaped data (staging DB, log search,
metric queries).
- Ticketing systems where the agent files / updates issues.
- Internal APIs the agent calls during code-gen ("look up the canonical
shape of this entity").
- Documentation systems (Confluence, Notion) used for context.
**No, or carefully:**
- Anything that mutates production. Default to read-only MCP servers and
add write capability surgically.
- Secrets vaults. The MCP server itself becomes a tempting attack surface.
If you must, use scoped, short-lived credentials.
- Replacing things the filesystem already does well. The model can read
your repo; you don't need a `repo-reader` MCP server.
> **Security:** an MCP server runs with whatever credentials you give it.
> The agent calling that server can request *any* tool the server exposes.
> Audit the server's tool surface and apply least privilege — read-only,
> single-database, single-project where possible.
### Writing your own MCP server
A minimal Python MCP server using the official SDK:
```python
# servers/feature_flags.py
from mcp.server.fastmcp import FastMCP
import httpx, os
mcp = FastMCP("feature-flags")
@mcp.tool()
async def get_flag(name: str, env: str = "staging") -> dict:
"""Return the current value of a feature flag in the given environment."""
async with httpx.AsyncClient() as c:
r = await c.get(
f"https://flags.internal/api/{env}/{name}",
headers={"Authorization": f"Bearer {os.environ['FLAGS_TOKEN']}"},
)
r.raise_for_status()
return r.json()
@mcp.tool()
async def list_flags(env: str = "staging", prefix: str | None = None) -> list[str]:
"""List flag names in the environment, optionally filtered by prefix."""
async with httpx.AsyncClient() as c:
r = await c.get(
f"https://flags.internal/api/{env}",
headers={"Authorization": f"Bearer {os.environ['FLAGS_TOKEN']}"},
)
r.raise_for_status()
names = [f["name"] for f in r.json()]
return [n for n in names if not prefix or n.startswith(prefix)]
if __name__ == "__main__":
mcp.run()
```
Wire it into `.claude/mcp.json`:
```jsonc
{
"mcpServers": {
"feature-flags": {
"command": "python",
"args": ["servers/feature_flags.py"],
"env": { "FLAGS_TOKEN": "${FLAGS_TOKEN}" }
}
}
}
```
Once registered, the agent will, in plan mode, be able to answer "what flags
are currently on in staging?" without ever asking you. That round-trip is
where MCP earns its keep.
---
## 4.4 Hooks — deterministic automation
### What a hook is
A **hook** is a shell command the harness runs on a specific lifecycle event
— before a tool call, after a tool call, on session start, on Stop, etc.
Hooks are **deterministic**: they don't go through the model. They run
because the event fired.
Use hooks for things you want **always** to happen, never sometimes.
### Common hook patterns
```jsonc
// .claude/settings.json
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write|MultiEdit",
"hooks": [
{
"type": "command",
"command": "npm run lint:fix --silent || true"
},
{
"type": "command",
"command": "scripts/check-secrets.sh"
}
]
}
],
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "scripts/deny-dangerous-bash.sh"
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "scripts/post-session-summary.sh"
}
]
}
]
}
}
```
What each is doing:
- **PostToolUse → Edit/Write**: every time the agent modifies a file, run
the linter (autofix) and a secret-scanner. The lint result is fed back
into the conversation so the agent sees its own mess.
- **PreToolUse → Bash**: gate every Bash invocation through a script that
can refuse (exit code non-zero) commands matching team policy
(`rm -rf /`, force pushes, package installs not in `package.json`).
- **Stop**: when the session ends, write a one-line summary to a project
log. Useful for retros and for billing dispute resolution.
### A real `check-secrets.sh`
```bash
#!/usr/bin/env bash
# scripts/check-secrets.sh
# Block commits containing obvious secret patterns. Called by PostToolUse.
set -euo pipefail
DIFF=$(git diff --cached --no-color 2>/dev/null || true)
PATTERNS=(
'AKIA[0-9A-Z]{16}' # AWS access key
'sk-[A-Za-z0-9]{20,}' # OpenAI / Anthropic style
'github_pat_[A-Za-z0-9_]+' # GitHub PAT
'-----BEGIN (RSA|EC|OPENSSH) PRIVATE KEY-----'
)
for p in "${PATTERNS[@]}"; do
if echo "$DIFF" | grep -E -q "$p"; then
echo "BLOCKED: matched secret pattern /$p/" >&2
exit 2 # non-zero feeds the message back into the conversation
fi
done
```
The exit code matters. **Exit 0** = silent success. **Exit 2** = block and
surface the stderr to the model so it can correct. **Other non-zero** =
error visible to the user.
### Why hooks beat asking the model to do it
If you put "always run the linter after editing" in `AGENTS.md`, the model
will *usually* remember. Usually is not good enough for safety-critical
behavior. A hook fires *every time*, and you can prove it.
> **Heuristic:** every "always" or "never" rule in your `AGENTS.md` that
> involves running a command is a candidate for promotion to a hook.
---
## 4.5 Skill vs MCP vs Hook — the decision rubric
Same task, three possible homes. How to pick:
| Question | If yes → |
|-------------------------------------------------------------------|----------|
| Is this a **procedure** the model decides *when* to invoke? | Skill |
| Is this a **tool / data source** the model calls during a task? | MCP |
| Does this need to happen **deterministically** on an event? | Hook |
| Is it knowledge that should be **available every session**? | `AGENTS.md` |
Worked examples:
- *"Walk through our release process when asked."* → **Skill.** Model
decides when. Procedure-shaped.
- *"Let the agent query our staging database."* → **MCP.** Data source,
needed mid-task.
- *"Run the linter after every file edit."* → **Hook.** Must happen
every time, no model decision involved.
- *"We use Vitest for tests."* → **`AGENTS.md`.** Standing fact, every
session.
When a capability has multiple dimensions, you compose:
- *"Run the data-quality SQL suite when asked to verify a migration."*
- **MCP** server exposes the SQL queries.
- **Skill** documents the procedure (which queries, in what order, how
to interpret results).
- **Hook** fires the procedure after every migration file edit.
---
## Lab 4 — Build one of each
**Goal:** internalize the three extension mechanisms by shipping a useful
one of each.
**Time:** ~90 minutes.
1. **Skill.** Pick a repetitive procedure you do at least weekly (releasing,
onboarding a teammate, generating a new module from a template). Write
it as a `SKILL.md`. Verify it loads — ask the agent to invoke it in a
fresh session.
2. **MCP server.** Add the official Postgres MCP server with a read-only
user pointed at a staging or local database. Confirm with
`/mcp` and prompt the agent: *"summarize the schema in plan mode."*
3. **Hook.** Add a PostToolUse hook for `Edit|Write` that runs your
formatter. Make a deliberate badly-formatted edit through the agent and
watch the hook fix it before the next turn.
**What to look for:** the second time you use each one, you'll notice the
session is *quieter* — fewer "do you want me to format this?" round-trips,
fewer hand-held tool invocations. That quiet is the extension paying off.
---
## Common pitfalls
- **Vague skill descriptions.** "Helpful skill for releases" won't match.
Name the trigger phrases.
- **MCP servers running as god.** A single server with read/write access
to production via long-lived credentials is one prompt injection from a
bad day. Scope and rotate.
- **Hooks that block silently.** A hook that exits non-zero with no stderr
produces a session that "just stops working." Always emit a message
explaining why the hook blocked.
- **Reaching for tooling before the basics work.** If the agent can't read
your code well in plain Claude Code, no amount of MCP wiring will fix
that. Verify your `AGENTS.md` and LSP setup first.
---
## Summary
- Skills, MCP servers, and hooks are the three primary extension surfaces.
They are complementary, not interchangeable.
- Skills package procedures; MCP exposes tools and data; hooks enforce
determinism.
- Good extensions reduce the number of words you have to type in your
prompts — that's the measure.
- Audit and rotate. Every extension is also an attack surface.
---
## Further reading
- *Model Context Protocol* — spec, reference servers, and SDKs.
- *Claude Code Skills* — official guide and example skills.
- *Anthropic — Hooks* documentation, especially the lifecycle event
reference.
**Next:** [Module 5 — Memory Orchestration & Context Engineering](05-memory-orchestration.md)