## Definition **AI test scaffolding** is the workflow of using a generative AI tool (IDE assistant or chat model) to generate the boilerplate structure of a test suite — fixtures, import paths, in-memory databases, and initial test cases — so that the developer begins from a working scaffold rather than a blank file. The developer then reviews, corrects, and extends the AI-generated tests. ## Why It Works Better Than AI-Generated Production Code Unit tests are naturally scoped to a single function or method. This narrow context reduces the failure modes of LLM-generated code: less surface area for hallucinated APIs, IP concerns, and architectural biases. Jeremy Morgan (*Coding with AI*, 2025) observes that AI-generated tests, even when imperfect, require far fewer corrections than AI-generated application code, making the time-saving more reliable. ## Core Workflow 1. **Frame the prompt specifically.** Specify the testing framework (pytest vs unittest), the file path of the class under test, and any infrastructure constraints (in-memory database, mocking approach). 2. **Generate the scaffold.** Accept the AI's initial output: imports, fixture functions, and a first pass of test cases. 3. **Review assumptions.** Check whether the AI correctly understood the class's purpose and signature. Misunderstandings (e.g., inferring a class's interface from its name alone) are common and must be caught before running tests. 4. **Refine iteratively.** Use follow-up prompts to correct the framework choice, fixture scope, or edge-case coverage. 5. **Run and debug incrementally.** Fix failures one at a time; AI tools can help interpret error messages in context. ## In-Memory Database Fixtures A recurring pattern in Chapter 8 of Morgan's book: instead of mocking, create an in-memory SQLite database that mirrors the production schema and data for each test run. ```python @pytest.fixture def db_connection(): memory_conn = create_in_memory_db_from_existing('../data/questions.db') yield memory_conn memory_conn.close() ``` Benefits over mocks: no divergence between test and production schema; tests read like production code; no cleanup step; fast (memory I/O). ## Prompting Principles for Test Generation - Name the framework explicitly: "Create pytest tests …" not "Create unit tests …" - Reference the file path: helps the AI locate method signatures via context. - Mention fixtures: "Use the existing `db_connection` fixture" avoids duplicate setup code. - Request edge cases explicitly: boundary values, null inputs, empty collections. - Start general then refine: broad scaffold first, specific edge cases in follow-up prompts. ## AI Tool Comparison (2025 Snapshot) Morgan compared GitHub Copilot, Tabnine, and Blackbox AI for test generation on a Python/Flask/SQLite application: - **Copilot** — strong language coverage; `/tests` shortcut defaults to unittest and often misunderstands context; explicit chat prompt yields better results. - **Tabnine** — generates a test plan with suggested test names before producing code; high per-test accuracy; multiple model options including privacy-first on-premise. - **Blackbox AI** — inferred the in-memory database pattern from the existing codebase without being prompted; highest first-shot accuracy in this comparison; supports language-specific agents via `/`. All three tools require the developer to validate assumptions, correct import paths, and supply domain knowledge the model cannot infer (e.g., the expected row count in a production database). ## Relation to Broader AI-Assisted Development AI test scaffolding is a low-risk entry point for developers sceptical of AI-generated production code. The same principles — specific prompts, iterative refinement, human validation — apply to production code generation (see [[Prompt Engineering]]), but the blast radius of errors is smaller for tests. ## Related - [[Prompt Engineering]] - [[Agentic CLI]] - [[Plan Mode and Build Mode]] - [[Spec-Driven Development]] - [[Executable Acceptance Criterion]] - [[Vibe Coding]] ## Sources - [[Coding with AI - Jeremy Morgan]]