Adversarial Agent - Albert Masoliver's learning site

## Definition An **adversarial agent** is a verifier prompted to *find ways to break* an implementation rather than to *confirm it works*. Pairs with — does not replace — a collaborative critic. ## Prompt Shape ```markdown ultrathink. You are a security researcher with a grudge. The code in this diff is going to production. Find at least three concrete attacks: each must be a specific input and the resulting incorrect behavior. Cite line numbers. Do not propose fixes — your job is to break. After three, also consider: - What about side channels (timing, error messages)? - What does this trust that it shouldn't? - What can an authenticated user do that they shouldn't? ``` ## Why the Tone Matters The prompt's *tone* alone shifts findings. A collaborative reviewer surfaces "missing tests" and "minor improvements." An adversarial reviewer surfaces IDORs, log-token echoes, timing leaks, and CSV injection. Neither agent is more accurate — they're tuned for different failure modes. ## When to Use - Authentication and authorisation surfaces. - Payment and refund endpoints. - Data export ("download my data") flows. - Anything that runs unauthenticated. ## When Not to Use - Internal refactors with no public surface. - UI tweaks. - Throwaway experiments. The adversarial agent is expensive (Opus + ultrathink) and its output is unpleasant to read. Reserve it for changes whose blast radius justifies the cost. ## Common Finds by Surface - **Magic-link auth:** token entropy, no browser-binding, log-token echo. - **Refund endpoint:** uncapped amount, idempotency key under user control, error enumeration. - **Data export:** IDOR, CSV injection, exported PII at rest. ## Integration Wire into CI but **only on paths matching sensitive directories** (`src/auth/`, `src/billing/`). Don't run it on every PR. ## Related - [[Builder-Critic Pattern]] - [[Verifier Independence]] - [[Headless Agent in CI]] - [[Reasoning Budget]]