# Module 5 — Memory Orchestration & Context Engineering
> *"Context is what the model sees this turn. Memory is what survives the
> next compaction, the next session, the next teammate. They are different
> problems and they need different tools."*
---
## Learning objectives
By the end of this module you will be able to:
1. Distinguish **context** (per-turn attention surface) from **memory**
(cross-session persistence) and choose the right strategy for each.
2. Build a layered memory stack — file-based notes, auto-memory, structured
stores like Engram — that survives compaction and onboards new sessions
in seconds.
3. Apply **context-engineering** patterns: hierarchical compression, scoped
memories, temporal awareness, just-in-time retrieval.
4. Diagnose and recover from memory failures: stale facts, duplicates,
contradictions, "phantom progress" from compaction.
---
## 5.1 The amnesia problem
An agent's "memory" within a session feels real until the moment it breaks.
Common breakage moments:
- A multi-hour session hits the compaction threshold. You return from lunch
and the agent has forgotten the convention you agreed to in turn 12.
- You start a new session on Tuesday. The agent has no idea you spent
Monday debugging the exact issue it's now confidently re-proposing the
same broken solution to.
- A teammate picks up your branch. They get a fresh agent that knows
nothing about the trade-offs you weighed.
These aren't model failures. They're the absence of a memory layer. Building
one is a first-class engineering activity, not a luxury.
---
## 5.2 Context vs memory
### Context — the attention surface for one turn
Everything the model considers when generating its next response:
- The system prompt.
- The `AGENTS.md` and `CLAUDE.md` files loaded at session start.
- The conversation history (compacted if necessary).
- The files explicitly read or attached.
- Tool outputs from this turn.
Context is **ephemeral**. The token at position N in the window has no
guaranteed influence over the token at position N+10,000.
### Memory — what survives
Memory is anything **outside the conversation** that an agent can re-load:
- Files in the repo (`DECISIONS.md`, `CHANGELOG.md`, code comments).
- Per-project memory stores managed by the harness.
- External knowledge bases reached via MCP.
- Structured stores like **Engram** that record decisions, entities, and
relationships across sessions.
The crucial property is **re-loadable**. If a future agent in a future
session can't deterministically retrieve it, it isn't memory — it's just
hope.
### Why the distinction matters
Conflating them produces two failure modes:
- **Treating context as memory.** "I told you yesterday we decided on
PostgreSQL." Yesterday's session is gone. The model has no way to know.
- **Treating memory as context.** Loading a 100-page memory store into
every session as context wastes tokens and degrades quality
("lost-in-the-middle" again).
The right shape: **memory is a corpus you retrieve from**; **context is the
small, freshly-curated subset of memory you load right now**.
---
## 5.3 Layered memory architecture
A working memory stack has three layers. Each has a different write
discipline and a different retrieval pattern.
### Layer 1 — Source-of-truth files (in the repo)
The slowest-changing, most durable layer. Lives in the repo, reviewed in
PRs, survives team turnover.
Typical files:
```
docs/
├── ARCHITECTURE.md # high-level system shape
├── DECISIONS.md # architecture decision records (ADRs)
├── CONVENTIONS.md # detailed coding conventions
└── GLOSSARY.md # domain terms with definitions
```
Rules of thumb:
- Write to these layers when a decision is **final** and **load-bearing for
future agents**.
- Each entry has a date and a "why." Decisions without context get
re-litigated.
- Prefer ADR format for `DECISIONS.md`:
```markdown
## ADR-0042 — Use PostgreSQL row-level security for tenant isolation
**Date:** 2026-04-19
**Status:** Accepted
### Context
We considered (a) separate databases per tenant, (b) schema-per-tenant,
(c) row-level security (RLS). We have ~3k tenants today and project ~50k
in three years. Per-database overhead at that scale dominates the cost
model.
### Decision
We use Postgres RLS keyed on `tenant_id`. All application connections
run with the `app_user` role; queries SET LOCAL `app.tenant_id` at the
start of each transaction.
### Consequences
- Bench shows ~3% query overhead vs no RLS. Acceptable.
- ALL new tables containing tenant-scoped data MUST add a `tenant_id`
column and an RLS policy. See `db/migrations/_template_rls.sql`.
- The `pg_dump` flow needs to be re-tested for cross-tenant leakage at
every major Postgres upgrade.
```
An agent reading this on a fresh session knows what to do *and what to
guard against* — the consequences section is where the real value lives.
### Layer 2 — Harness-managed memory (per-project, per-user)
Modern agentic CLIs ship a file-based memory store the agent itself can
write to between turns. In Claude Code, it lives under
`~/.claude/projects/<project>/memory/` and is loaded automatically.
This layer captures things the agent learns *during* sessions:
- The user's role, preferences, expertise.
- Feedback patterns ("don't summarize unprompted").
- Project facts that aren't yet stable enough for `docs/`.
- References to external systems ("incidents are tracked in PagerDuty").
The discipline that makes this work:
- **Write incrementally, one fact per file.** Atomicity makes
contradiction visible.
- **Use frontmatter for type and discoverability.** The harness uses
descriptions to surface only relevant memories per session.
- **Update or delete stale entries actively.** A memory that says "we use
MySQL" when the code uses Postgres is a hallucination factory.
A sample memory file (typical structure):
```markdown
---
name: tenant-isolation
description: Decision and operational rules for tenant isolation; canonical reference is ADR-0042.
metadata:
type: project
---
Tenant isolation is enforced via Postgres row-level security. All
tenant-scoped tables must include `tenant_id` and an RLS policy.
**Why:** We picked RLS over separate DBs at ~3k tenants; see
docs/DECISIONS.md ADR-0042 for the full reasoning.
**How to apply:** When generating new tables or migrations touching
tenant-scoped data, add the RLS policy template from
db/migrations/_template_rls.sql. Don't propose schema-per-tenant
alternatives without re-checking the ADR.
```
Note the explicit `Why` and `How to apply` — those are what let the model
*generalize* the memory to new situations instead of repeating it
verbatim.
### Layer 3 — Structured external stores (Engram and friends)
When a project gets big — many engineers, many specs, many decisions — file
notes hit a ceiling. A **structured memory store** like Engram is the next
step. It records entities (specs, decisions, incidents, people) and the
relationships between them, and exposes a query interface (usually MCP).
What you get over plain files:
- **Cross-reference.** "Show me every decision touching the auth module
made since the last incident."
- **Decay.** Entries can be marked stale, deprecated, or superseded — and
the store enforces it.
- **Multi-agent.** The reviewer agent and the builder agent see the *same*
memory store, so they can't disagree about facts.
A typical Engram-style call from an agent:
```
engram.search({
type: "decision",
about: "auth.session_token",
since: "2026-01-01"
}) → [
{ id: ADR-0038, decided: "tokens are JWT with 1h expiry" },
{ id: ADR-0041, decided: "refresh tokens stored hashed in DB" }
]
```
Whether you need this layer depends on team size and project complexity. A
two-person side project does not. A 30-engineer platform team running a
multi-year program absolutely does — without it, every new engineer's first
month is a rediscovery exercise.
---
## 5.4 Context engineering — choosing what enters the window
### Hierarchical compression
Not every memory needs to enter context as its full text. Compress on the
way in:
- **Title only** for irrelevant items in a search result.
- **Title + summary** for plausibly relevant items.
- **Full text** only for items the agent decided to use.
In practice this looks like a retrieval pipeline:
1. Scout agent (Haiku) reads memory titles + descriptions, returns ~10 IDs.
2. Builder agent (Sonnet) loads the full text of the top ~3.
3. Architect agent (Opus) is only invoked if those ~3 disagree or are
insufficient.
This is *exactly* the model-selection pyramid from Module 1, applied to
memory.
### Memory scopes — user vs project vs session
Three scopes, three retention policies:
| Scope | Where it lives | Decay rate | Example |
|-----------|-------------------------------------------|------------|----------------------------------|
| User | `~/.claude/memory/` (per developer) | Slow | "Prefers terse responses." |
| Project | `~/.claude/projects/<id>/memory/` or `docs/` | Medium | "We use RLS for tenant isolation." |
| Session | Conversation history | Fast (compaction) | "We've decided this turn to call the variable `tenantId`." |
Two failure patterns:
- **Promoting session facts to project memory too eagerly.** A decision
made in turn 3 might be reversed in turn 9. Wait until the change ships
or is otherwise *stable* before writing it to a layer with slow decay.
- **Letting project facts pollute user memory.** "User prefers PostgreSQL"
is not a user preference; it's a project fact. Write it to the project
scope.
### Temporal context — the date matters
Two patterns:
1. **Anchor absolute dates in memory.** "We'll freeze on Thursday" rots in
24 hours. "We'll freeze on 2026-03-05" doesn't.
2. **Tell the model the current date.** Modern harnesses do this
automatically. If yours doesn't, include it in `AGENTS.md` or in the
first message of the session. The model's training cutoff is months
behind; without a date anchor, it will reason as if it were earlier
than it is.
### Just-in-time retrieval, not just-in-case loading
The temptation: "let me load all the docs at the start of the session so
the agent has everything." The cost: context bloat and
lost-in-the-middle degradation.
The better pattern:
- **At session start, load only standing rules** (`AGENTS.md`,
`DECISIONS.md` index).
- **Let the agent retrieve specifics** as it decides it needs them — via
MCP, file reads, or memory queries.
- **Re-curate the window** when the task changes mid-session ("the
architecture phase is done; drop those files, load the implementation
ones").
### Anti-patterns to avoid
- **The "kitchen sink" `AGENTS.md`.** 2,000 lines of standing rules
produces compaction the moment anything else happens. Keep it under
~300 lines and link out to detail files.
- **Memory as ChangeLog.** Writing "today I did X" memories that no future
session will ever query. Use git log; that's its job.
- **Re-summarizing the summary.** When compaction happens, the existing
summary is *the input*. Re-summarizing degrades it further. After two
compactions, restart the session.
---
## 5.5 Engram — a closer look
Engram is one of several emerging structured-memory tools designed for
agentic workflows. It deserves a closer look because its model is broadly
applicable even if you choose a different implementation.
### Core entities
- **Decision** — an immutable record of a choice. Has a status
(`proposed`, `accepted`, `superseded`).
- **Entity** — a noun in the project (a service, a model, a customer
segment, a person).
- **Event** — something that happened (deploy, incident, spec change).
- **Reference** — a pointer to canonical source (file path, URL, ticket).
Relationships connect them: decisions *supersede* other decisions; events
*affect* entities; entities *depend on* entities.
### Why this shape works
The schema mirrors what good engineering teams already do informally. ADRs
*are* decisions. PagerDuty incidents *are* events. The CMDB is a graph of
entities. Engram simply gives the agent a single interface to all of it.
### Sample workflow
1. **Session starts.** Agent calls `engram.context_for(repo: "billing")`
and gets back the 8 most-relevant active decisions and 3 recent events.
2. **Mid-session.** Agent proposes a change that would supersede ADR-0038.
It calls `engram.record_decision_draft(...)` so the proposal itself
becomes a queryable entity.
3. **PR merges.** A hook calls `engram.accept_decision(ADR-0042)`,
marking it active and ADR-0038 superseded.
4. **Next month, new engineer arrives.** Their first prompt:
`"explain the current state of tenant isolation"`. Engram returns
ADR-0042 (active) with a link back to ADR-0038 (superseded), giving
them the *history* of the decision, not just the latest answer.
### Where Engram-like stores struggle
- **Free-form prose.** They are not designed for "the long discussion we
had about whether to do X." Keep that in `docs/` and reference it.
- **Ambiguous scope.** If a decision applies to *this service* vs *the
whole org*, the entity model has to express that. Bad scoping creates
ghost-applies and ghost-doesn't-applies.
---
## 5.6 Auto-memory — letting the agent write its own notes
Most modern harnesses include an **auto-memory** system: the agent observes
the conversation and writes durable memories on its own initiative. Used
well, this is a force multiplier. Used badly, it's an entropy generator.
### When auto-memory helps
- User explicitly asks: "remember that we always use the `slim` Docker
image."
- User corrects the agent's approach: "no, don't mock the database —
we tried that and got burned." That correction should become a memory.
- A non-obvious decision is reached after deliberation. Write it down.
### When auto-memory hurts
- Writing memories for every passing detail. Memory files balloon.
- Writing contradictory memories without reconciling them.
- Writing memories that paraphrase the code. The code is its own memory.
The discipline:
- **Reviewable.** Every auto-memory write should be visible to you. Most
harnesses surface them in the session — don't hide them.
- **Editable.** Memory is a directory of files; treat them as code.
Periodically audit and prune.
- **Decay-aware.** When a memory is contradicted, *delete or update*.
Never let two contradictory memories coexist.
---
## Lab 5 — Build a three-layer memory stack
**Goal:** install a memory architecture that demonstrably improves a
second-session experience.
**Time:** ~75 minutes.
1. **Layer 1.** In a real repo, create `docs/DECISIONS.md` and seed it
with two real ADRs (write them from existing decisions you can
articulate). Add a one-line pointer to `AGENTS.md`.
2. **Layer 2.** Have a working session about a non-trivial change. At
three moments — when a convention is agreed, when a non-obvious
trade-off is resolved, when a stale assumption is corrected — ask the
agent to write a memory file. Review and edit each one for clarity.
3. **Layer 3 (optional).** Install an Engram MCP server (or any
structured store of your choice) and record one decision through it.
Note the difference in retrieval ergonomics vs the files.
4. **The test.** Tomorrow, in a brand new session and a brand new branch,
start with: *"I want to make a change in the area we discussed
yesterday. Recap your understanding."* Score the recap on a 1–5 scale
for accuracy and completeness.
**What to look for:** a well-built stack produces a recap that surprises
you with how much it remembers — including the *why* behind decisions, not
just the *what*. If the recap is vague or wrong, your layer-1 or layer-2
write discipline needs work; do not blame the model.
---
## Common pitfalls
- **Trusting compaction.** The summary is a paraphrase; load-bearing
content has to live outside the conversation.
- **Memory drift without auditing.** Files marked "we use X" while the
code uses Y, contradicted by another file saying "we use Z." Audit
monthly.
- **Loading everything every time.** Memory is for *retrieval*, not for
*constant presence in context*. Curate per session.
- **No `Why:` in your memories.** Without the why, the agent can't
generalize; it can only quote. Edge cases will defeat it.
---
## Summary
- Context and memory are different concepts and need different tools.
- Build memory in layers: source-of-truth files, harness-managed memory,
structured external stores.
- Engineer context actively — compress, scope, anchor in time, retrieve
just-in-time.
- Auto-memory is powerful when reviewed and pruned, and corrosive when
ignored.
---
## Further reading
- *Engram* — project documentation.
- Anthropic's *auto-memory* documentation and the patterns in the public
`claude-code` examples.
- *Architecture Decision Records* (Michael Nygard) — the original ADR
template; still the right shape.
**Next:** [Module 6 — The Meta-Agent Factory & Verification Frontier](06-meta-agent-factory.md)