## Definition
**AI test scaffolding** is the workflow of using a generative AI tool (IDE assistant or chat model) to generate the boilerplate structure of a test suite — fixtures, import paths, in-memory databases, and initial test cases — so that the developer begins from a working scaffold rather than a blank file. The developer then reviews, corrects, and extends the AI-generated tests.
## Why It Works Better Than AI-Generated Production Code
Unit tests are naturally scoped to a single function or method. This narrow context reduces the failure modes of LLM-generated code: less surface area for hallucinated APIs, IP concerns, and architectural biases. Jeremy Morgan (*Coding with AI*, 2025) observes that AI-generated tests, even when imperfect, require far fewer corrections than AI-generated application code, making the time-saving more reliable.
## Core Workflow
1. **Frame the prompt specifically.** Specify the testing framework (pytest vs unittest), the file path of the class under test, and any infrastructure constraints (in-memory database, mocking approach).
2. **Generate the scaffold.** Accept the AI's initial output: imports, fixture functions, and a first pass of test cases.
3. **Review assumptions.** Check whether the AI correctly understood the class's purpose and signature. Misunderstandings (e.g., inferring a class's interface from its name alone) are common and must be caught before running tests.
4. **Refine iteratively.** Use follow-up prompts to correct the framework choice, fixture scope, or edge-case coverage.
5. **Run and debug incrementally.** Fix failures one at a time; AI tools can help interpret error messages in context.
## In-Memory Database Fixtures
A recurring pattern in Chapter 8 of Morgan's book: instead of mocking, create an in-memory SQLite database that mirrors the production schema and data for each test run.
```python
@pytest.fixture
def db_connection():
memory_conn = create_in_memory_db_from_existing('../data/questions.db')
yield memory_conn
memory_conn.close()
```
Benefits over mocks: no divergence between test and production schema; tests read like production code; no cleanup step; fast (memory I/O).
## Prompting Principles for Test Generation
- Name the framework explicitly: "Create pytest tests …" not "Create unit tests …"
- Reference the file path: helps the AI locate method signatures via context.
- Mention fixtures: "Use the existing `db_connection` fixture" avoids duplicate setup code.
- Request edge cases explicitly: boundary values, null inputs, empty collections.
- Start general then refine: broad scaffold first, specific edge cases in follow-up prompts.
## AI Tool Comparison (2025 Snapshot)
Morgan compared GitHub Copilot, Tabnine, and Blackbox AI for test generation on a Python/Flask/SQLite application:
- **Copilot** — strong language coverage; `/tests` shortcut defaults to unittest and often misunderstands context; explicit chat prompt yields better results.
- **Tabnine** — generates a test plan with suggested test names before producing code; high per-test accuracy; multiple model options including privacy-first on-premise.
- **Blackbox AI** — inferred the in-memory database pattern from the existing codebase without being prompted; highest first-shot accuracy in this comparison; supports language-specific agents via `/`.
All three tools require the developer to validate assumptions, correct import paths, and supply domain knowledge the model cannot infer (e.g., the expected row count in a production database).
## Relation to Broader AI-Assisted Development
AI test scaffolding is a low-risk entry point for developers sceptical of AI-generated production code. The same principles — specific prompts, iterative refinement, human validation — apply to production code generation (see [[Prompt Engineering]]), but the blast radius of errors is smaller for tests.
## Related
- [[Prompt Engineering]]
- [[Agentic CLI]]
- [[Plan Mode and Build Mode]]
- [[Spec-Driven Development]]
- [[Executable Acceptance Criterion]]
- [[Vibe Coding]]
## Sources
- [[Coding with AI - Jeremy Morgan]]