Test-Driven Development

TDD with AI: Writing Specs That Generate Tests

AI tools excel at generating tests from precise specifications — combine human spec-writing with AI test generation.

AI + TDD: The Powerful Combination

AI coding tools are excellent at generating test code — when given the right specifications. The combination of human-written specifications and AI-generated tests is more powerful than either alone.

Without specs: AI generates tests that cover the happy path but miss edge cases, error conditions, and security-relevant scenarios.

With specs: AI generates comprehensive tests because the specifications tell it exactly what behavior to verify.

How to Prompt for Test Generation

Effective prompts include:

  1. The function signature or component interface
  2. The expected behavior for each scenario
  3. Edge cases and error conditions
  4. The testing framework to use
typescript
// Prompt example:
// "Generate Vitest tests for this function:
//
// function calculateShipping(weightKg: number, distance: number): number
//
// Rules:
// - Throws for negative weight or distance
// - Free shipping for orders under 1kg and under 100km
// - $5 flat rate for orders 1-10kg or under 100km (not both)
// - $0.50 per kg for orders over 10kg
// - Additional $0.02 per km for distances over 100km
// - Maximum charge is $50
//
// Use describe/it blocks and cover all rule combinations."

The Spec-to-Test Pipeline

text
1. Write requirement in EARS format:
   "When the user submits a login form with invalid credentials,
    the system shall display an error message and remain on the login page."

2. Translate to test cases:
   - "shows error message for wrong password"
   - "remains on login page after failed attempt"
   - "clears error on new submission attempt"
   - "locks account after 5 failed attempts"

3. AI generates test code from test case descriptions

4. Human reviews: Are assertions meaningful? Are edge cases covered?

5. Implement code to pass the tests

Reviewing AI-Generated Tests

AI tests need human review. A checklist:

✅ Does it test behavior, not implementation?

✅ Are assertions meaningful (not just "doesn't throw")?

✅ Are error paths tested, not just happy paths?

✅ Are edge cases covered (empty input, boundary values, null)?

✅ If a bug existed, would this test catch it?

typescript
// AI-generated test (needs improvement):
it('creates a user', async () => {
  const result = await createUser({ email: 'test@example.com' });
  expect(result).toBeDefined(); // Too vague — doesn't test anything specific
});

// Improved after human review:
it('creates a user and returns the user object without password', async () => {
  const result = await createUser({
    email: 'test@example.com',
    password: 'SecurePass1!',
    name: 'Alice',
  });

  expect(result.id).toBeDefined();
  expect(result.email).toBe('test@example.com');
  expect(result.name).toBe('Alice');
  expect(result.password).toBeUndefined(); // Security check
  expect(result.createdAt).toBeInstanceOf(Date);
});

What AI Misses (Add These Manually)

  • Security edge cases: What if the user provides a script tag? A SQL injection?
  • Race conditions: What if two requests arrive simultaneously?
  • Business-specific rules: AI doesn't know your domain
  • Authorization failures: AI often skips testing "what users cannot do"
  • Error propagation: What happens when a dependency fails?

Key Takeaways

  • AI generates tests best when given precise specifications — vague prompts produce vague tests
  • The pipeline: write requirement → define test cases → AI generates test code → human reviews
  • Review checklist: behavior not implementation, meaningful assertions, error paths, edge cases
  • AI generates excellent happy-path tests; humans must add edge cases, security checks, and domain-specific scenarios
  • The spec-driven approach (write specs, generate tests, implement code) produces the most reliable AI-generated codebases

Example

typescript
// Effective prompt → quality AI-generated tests
// Prompt: "Write Vitest tests for validateEmail(email: string): boolean
// Rules: must have @, domain must have dot, no spaces, max 254 chars"

describe('validateEmail', () => {
  it('accepts valid email addresses', () => {
    expect(validateEmail('user@example.com')).toBe(true);
    expect(validateEmail('user.name+tag@sub.domain.com')).toBe(true);
  });

  it('rejects emails without @', () => {
    expect(validateEmail('userexample.com')).toBe(false);
  });

  it('rejects emails with domain missing dot', () => {
    expect(validateEmail('user@example')).toBe(false);
  });

  it('rejects emails with spaces', () => {
    expect(validateEmail('user @example.com')).toBe(false);
    expect(validateEmail('user@ example.com')).toBe(false);
  });

  it('rejects emails exceeding 254 characters', () => {
    const longEmail = 'a'.repeat(250) + '@b.com';
    expect(validateEmail(longEmail)).toBe(false);
  });

  it('rejects empty string and null-like inputs', () => {
    expect(validateEmail('')).toBe(false);
  });
});
Try it yourself — TYPESCRIPT