The Test Quality Review workflow performs comprehensive quality validation of test code using TEA's knowledge base of best practices. It detects flaky patterns, validates structure, and provides actionable feedback to improve test maintainability and reliability.

Overview

This workflow reviews test quality against proven patterns from TEA's knowledge base including fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. It generates a quality score (0-100) with detailed feedback on violations and recommendations.

Key Features:

Knowledge-Based Review: Applies patterns from 19+ knowledge fragments in tea-index.csv
Quality Scoring: 0-100 score with letter grade (A+ to F) based on violations
Multi-Scope Review: Single file, directory, or entire test suite
Pattern Detection: Identifies hard waits, race conditions, shared state, conditionals
Best Practice Validation: BDD format, test IDs, priorities, assertions, test length
Actionable Feedback: Critical issues (must fix) vs recommendations (should fix)
Code Examples: Every issue includes recommended fix with code snippets
Integration: Works with story files, test-design, acceptance criteria context

Usage

bmad tea *test-review

The TEA agent runs this workflow when:

After *atdd workflow → validate generated acceptance tests
After *automate workflow → ensure regression suite quality
After developer writes tests → provide quality feedback
Before *gate workflow → confirm test quality before release
User explicitly requests review: bmad tea *test-review
Periodic quality audits of existing test suite

Typical workflow sequence:

*atdd → Generate failing acceptance tests
*test-review → Validate test quality ⬅️ YOU ARE HERE (option 1)
*dev story → Implement feature with tests passing
*test-review → Review implementation tests ⬅️ YOU ARE HERE (option 2)
*automate → Expand regression suite
*test-review → Validate new regression tests ⬅️ YOU ARE HERE (option 3)
*gate → Final quality gate decision

Inputs

Required Context Files

Test File(s): One or more test files to review (auto-discovered or explicitly provided)
Test Framework Config: playwright.config.ts, jest.config.js, etc. (for context)

Recommended Context Files

Story File: Acceptance criteria for context (e.g., story-1.3.md)
Test Design: Priority context (P0/P1/P2/P3) from test-design.md
Knowledge Base: tea-index.csv with best practice fragments (required for thorough review)

Workflow Variables

Key variables that control review behavior (configured in workflow.yaml):

review_scope: single | directory | suite (default: single)
- single: Review one test file
- directory: Review all tests in a directory
- suite: Review entire test suite
quality_score_enabled: Enable 0-100 quality scoring (default: true)
append_to_file: Add inline comments to test files (default: false)
check_against_knowledge: Use tea-index.csv fragments (default: true)
strict_mode: Fail on any violation vs advisory only (default: false)

Quality Criteria Flags (all default to true):

check_given_when_then: BDD format validation
check_test_ids: Test ID conventions
check_priority_markers: P0/P1/P2/P3 classification
check_hard_waits: Detect sleep(), wait(X)
check_determinism: No conditionals/try-catch abuse
check_isolation: Tests clean up, no shared state
check_fixture_patterns: Pure function → Fixture → mergeTests
check_data_factories: Factory usage vs hardcoded data
check_network_first: Route intercept before navigate
check_assertions: Explicit assertions present
check_test_length: Warn if >300 lines
check_test_duration: Warn if >1.5 min
check_flakiness_patterns: Common flaky patterns

Outputs

Primary Deliverable

Test Quality Review Report (test-review-{filename}.md):

Executive Summary: Overall assessment, key strengths/weaknesses, recommendation
Quality Score: 0-100 score with letter grade (A+ to F)
Quality Criteria Assessment: Table with all criteria evaluated (PASS/WARN/FAIL)
Critical Issues: P0/P1 violations that must be fixed
Recommendations: P2/P3 violations that should be fixed
Best Practices Examples: Good patterns found in tests
Knowledge Base References: Links to detailed guidance

Each issue includes:

Code location (file:line)
Explanation of problem
Recommended fix with code example
Knowledge base fragment reference

Secondary Outputs

Inline Comments: TODO comments in test files at violation locations (if enabled)
Quality Badge: Badge with score (e.g., "Test Quality: 87/100 (A)")
Story Update: Test quality section appended to story file (if enabled)

Validation Safeguards

✅ All knowledge base fragments loaded successfully
✅ Test files parsed and structure analyzed
✅ All enabled quality criteria evaluated
✅ Violations categorized by severity (P0/P1/P2/P3)
✅ Quality score calculated with breakdown
✅ Actionable feedback with code examples provided

Quality Criteria Explained

1. BDD Format (Given-When-Then)

PASS: Tests use clear Given-When-Then structure

// Given: User is logged in
const user = await createTestUser();
await loginPage.login(user.email, user.password);

// When: User navigates to dashboard
await page.goto('/dashboard');

// Then: User sees welcome message
await expect(page.locator('[data-testid="welcome"]')).toContainText(user.name);

FAIL: Tests lack structure, hard to understand intent

await page.goto('/dashboard');
await page.click('.button');
await expect(page.locator('.text')).toBeVisible();

Knowledge: test-quality.md, tdd-cycles.md

2. Test IDs

PASS: All tests have IDs following convention

test.describe('1.3-E2E-001: User Login Flow', () => {
  test('should log in successfully with valid credentials', async ({ page }) => {
    // Test implementation
  });
});

FAIL: No test IDs, can't trace to requirements

test.describe('Login', () => {
  test('login works', async ({ page }) => {
    // Test implementation
  });
});

Knowledge: traceability.md, test-quality.md

3. Priority Markers

PASS: Tests classified as P0/P1/P2/P3

test.describe('P0: Critical User Journey - Checkout', () => {
  // Critical tests
});

test.describe('P2: Edge Case - International Addresses', () => {
  // Nice-to-have tests
});

Knowledge: test-priorities.md, risk-governance.md

4. No Hard Waits

PASS: No sleep(), wait(), hardcoded delays

// ✅ Good: Explicit wait for condition
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });

FAIL: Hard waits introduce flakiness

// ❌ Bad: Hard wait
await page.waitForTimeout(2000);
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();

Knowledge: test-quality.md, network-first.md

5. Determinism

PASS: Tests work deterministically, no conditionals

// ✅ Good: Deterministic test
await expect(page.locator('[data-testid="status"]')).toHaveText('Active');

FAIL: Conditionals make tests unpredictable

// ❌ Bad: Conditional logic
const status = await page.locator('[data-testid="status"]').textContent();
if (status === 'Active') {
  await page.click('[data-testid="deactivate"]');
} else {
  await page.click('[data-testid="activate"]');
}

Knowledge: test-quality.md, data-factories.md

6. Isolation

PASS: Tests clean up, no shared state

test.afterEach(async ({ page, testUser }) => {
  // Cleanup: Delete test user
  await api.deleteUser(testUser.id);
});

FAIL: Shared state, tests depend on order

// ❌ Bad: Shared global variable
let userId: string;

test('create user', async () => {
  userId = await createUser(); // Sets global
});

test('update user', async () => {
  await updateUser(userId); // Depends on previous test
});

Knowledge: test-quality.md, data-factories.md

7. Fixture Patterns

PASS: Pure function → Fixture → mergeTests

// ✅ Good: Pure function fixture
const createAuthenticatedPage = async (page: Page, user: User) => {
  await loginPage.login(user.email, user.password);
  return page;
};

const test = base.extend({
  authenticatedPage: async ({ page }, use) => {
    const user = createTestUser();
    const authedPage = await createAuthenticatedPage(page, user);
    await use(authedPage);
  },
});

FAIL: No fixtures, repeated setup

// ❌ Bad: Repeated setup in every test
test('test 1', async ({ page }) => {
  await page.goto('/login');
  await page.fill('[name="email"]', 'test@example.com');
  await page.fill('[name="password"]', 'password123');
  await page.click('[type="submit"]');
  // Test logic
});

Knowledge: fixture-architecture.md

8. Data Factories

PASS: Factory functions with overrides

// ✅ Good: Factory function
import { createTestUser } from './factories/user-factory';

test('user can update profile', async ({ page }) => {
  const user = createTestUser({ role: 'admin' });
  await api.createUser(user); // API-first setup
  // Test UI interaction
});

FAIL: Hardcoded test data

// ❌ Bad: Magic strings
await page.fill('[name="email"]', 'test@example.com');
await page.fill('[name="phone"]', '555-1234');

Knowledge: data-factories.md

9. Network-First Pattern

PASS: Route intercept before navigate

// ✅ Good: Intercept before navigation
await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers }));
await page.goto('/users'); // Navigate after route setup

FAIL: Race condition risk

// ❌ Bad: Navigate before intercept
await page.goto('/users');
await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers })); // Too late!

Knowledge: network-first.md

10. Explicit Assertions

PASS: Clear, specific assertions

await expect(page.locator('[data-testid="username"]')).toHaveText('John Doe');
await expect(page.locator('[data-testid="status"]')).toHaveClass(/active/);

FAIL: Missing or vague assertions

await page.locator('[data-testid="username"]').isVisible(); // No assertion!

Knowledge: test-quality.md

11. Test Length

PASS: ≤300 lines per file (ideal: ≤200) WARN: 301-500 lines (consider splitting) FAIL: >500 lines (too large)

Knowledge: test-quality.md

12. Test Duration

PASS: ≤1.5 minutes per test (target: <30 seconds) WARN: 1.5-3 minutes (consider optimization) FAIL: >3 minutes (too slow)

Knowledge: test-quality.md, selective-testing.md

13. Flakiness Patterns

Common flaky patterns detected:

Tight timeouts (e.g., { timeout: 1000 })
Race conditions (navigation before route interception)
Timing-dependent assertions
Retry logic hiding flakiness
Environment-dependent assumptions

Knowledge: test-quality.md, network-first.md, ci-burn-in.md

Quality Scoring

Score Calculation

Starting Score: 100

Deductions:
- Critical Violations (P0): -10 points each
- High Violations (P1): -5 points each
- Medium Violations (P2): -2 points each
- Low Violations (P3): -1 point each

Bonus Points (max +30):
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern consistently used: +5
+ Perfect isolation (all tests clean up): +5
+ All test IDs present and correct: +5

Final Score: max(0, min(100, Starting Score - Violations + Bonus))

Quality Grades

90-100 (A+): Excellent - Production-ready, best practices followed
80-89 (A): Good - Minor improvements recommended
70-79 (B): Acceptable - Some issues to address
60-69 (C): Needs Improvement - Several issues detected
<60 (F): Critical Issues - Significant problems, not production-ready

Example Scenarios

Scenario 1: Excellent Quality (Score: 95)

# Test Quality Review: checkout-flow.spec.ts

**Quality Score**: 95/100 (A+ - Excellent)
**Recommendation**: Approve - Production Ready

## Executive Summary

Excellent test quality with comprehensive coverage and best practices throughout.
Tests demonstrate expert-level patterns including fixture architecture, data
factories, network-first approach, and perfect isolation.

**Strengths:**
✅ Clear Given-When-Then structure in all tests
✅ Comprehensive fixtures for authenticated states
✅ Data factories with faker.js for realistic test data
✅ Network-first pattern prevents race conditions
✅ Perfect test isolation with cleanup
✅ All test IDs present (1.2-E2E-001 through 1.2-E2E-005)

**Minor Recommendations:**
⚠️ One test slightly verbose (245 lines) - consider extracting helper function

**Recommendation**: Approve without changes. Use as reference for other tests.

Scenario 2: Good Quality (Score: 82)

# Test Quality Review: user-profile.spec.ts

**Quality Score**: 82/100 (A - Good)
**Recommendation**: Approve with Comments

## Executive Summary

Solid test quality with good structure and coverage. A few improvements would
enhance maintainability and reduce flakiness risk.

**Strengths:**
✅ Good BDD structure
✅ Test IDs present
✅ Explicit assertions

**Issues to Address:**
⚠️ 2 hard waits detected (lines 34, 67) - use explicit waits instead
⚠️ Hardcoded test data (line 23) - use factory functions
⚠️ Missing cleanup in one test (line 89) - add afterEach hook

**Recommendation**: Address hard waits before merging. Other improvements
can be addressed in follow-up PR.

Scenario 3: Needs Improvement (Score: 68)

# Test Quality Review: legacy-report.spec.ts

**Quality Score**: 68/100 (C - Needs Improvement)
**Recommendation**: Request Changes

## Executive Summary

Test has several quality issues that should be addressed before merging.
Primarily concerns around flakiness risk and maintainability.

**Critical Issues:**
❌ 5 hard waits detected (flakiness risk)
❌ Race condition: navigation before route interception (line 45)
❌ Shared global state between tests (line 12)
❌ Missing test IDs (can't trace to requirements)

**Recommendations:**
⚠️ Test file is 487 lines - consider splitting
⚠️ Hardcoded data throughout - use factories
⚠️ Missing cleanup in afterEach

**Recommendation**: Address all critical issues (❌) before re-review.
Significant refactoring needed.

Scenario 4: Critical Issues (Score: 42)

# Test Quality Review: data-export.spec.ts

**Quality Score**: 42/100 (F - Critical Issues)
**Recommendation**: Block - Not Production Ready

## Executive Summary

CRITICAL: Test has severe quality issues that make it unsuitable for
production. Significant refactoring required.

**Critical Issues:**
❌ 12 hard waits (page.waitForTimeout) throughout
❌ No test IDs or structure
❌ Try/catch blocks swallowing errors (lines 23, 45, 67, 89)
❌ No cleanup - tests leave data in database
❌ Conditional logic (if/else) throughout tests
❌ No assertions in 3 tests (tests do nothing!)
❌ 687 lines - far too large
❌ Multiple race conditions
❌ Hardcoded credentials in plain text (SECURITY ISSUE)

**Recommendation**: BLOCK MERGE. Complete rewrite recommended following
TEA knowledge base patterns. Suggest pairing session with QA engineer.

Integration with Other Workflows

Before Test Review

atdd - Generates acceptance tests → TEA reviews for quality
dev story - Developer implements tests → TEA provides feedback
automate - Expands regression suite → TEA validates new tests

After Test Review

Developer - Addresses critical issues, improves based on recommendations
gate - Test quality feeds into release decision (high-quality tests increase confidence)

Coordinates With

Story File: Review links to acceptance criteria for context
Test Design: Review validates tests align with P0/P1/P2/P3 prioritization
Knowledge Base: All feedback references tea-index.csv fragments

Review Scopes

Single File Review

# Review specific test file
bmad tea *test-review
# Provide test_file_path when prompted: tests/auth/login.spec.ts

Use When:

Reviewing tests just written
PR review of specific test file
Debugging flaky test
Learning test quality patterns

Directory Review

# Review all tests in directory
bmad tea *test-review
# Provide review_scope: directory
# Provide test_dir: tests/auth/

Use When:

Feature branch has multiple test files
Reviewing entire feature test suite
Auditing test quality for module

Suite Review

# Review entire test suite
bmad tea *test-review
# Provide review_scope: suite

Use When:

Periodic quality audit (monthly/quarterly)
Before major release
Identifying patterns across codebase
Establishing quality baseline

Configuration Examples

Strict Review (Fail on Violations)

review_scope: 'single'
quality_score_enabled: true
strict_mode: true # Fail if score <70
check_against_knowledge: true
# All check_* flags: true

Use for: PR gates, production releases

Balanced Review (Advisory)

review_scope: 'single'
quality_score_enabled: true
strict_mode: false # Advisory only
check_against_knowledge: true
# All check_* flags: true

Use for: Most development workflows (default)

Focused Review (Specific Criteria)

review_scope: 'single'
check_hard_waits: true
check_flakiness_patterns: true
check_network_first: true
# Other checks: false

Use for: Debugging flaky tests, targeted improvements

Important Notes

Non-Prescriptive: Review provides guidance, not rigid rules
Context Matters: Some violations may be justified (document with comments)
Knowledge-Based: All feedback grounded in proven patterns
Actionable: Every issue includes recommended fix with code example
Quality Score: Use as indicator, not absolute measure
Continuous Improvement: Review tests periodically as patterns evolve
Learning Tool: Use reviews to learn best practices, not just find bugs

Knowledge Base References

This workflow automatically consults:

test-quality.md - Definition of Done (no hard waits, <300 lines, <1.5 min, self-cleaning)
fixture-architecture.md - Pure function → Fixture → mergeTests pattern
network-first.md - Route intercept before navigate (race condition prevention)
data-factories.md - Factory functions with overrides, API-first setup
test-levels-framework.md - E2E vs API vs Component vs Unit appropriateness
playwright-config.md - Environment-based configuration patterns
tdd-cycles.md - Red-Green-Refactor patterns
selective-testing.md - Duplicate coverage detection
ci-burn-in.md - Flakiness detection patterns
test-priorities.md - P0/P1/P2/P3 classification framework
traceability.md - Requirements-to-tests mapping

See tea-index.csv for complete knowledge fragment mapping.

Troubleshooting

Problem: Quality score seems too low

Solution:

Review violation breakdown - focus on critical issues first
Consider project context - some patterns may be justified
Check if criteria are appropriate for project type
Score is indicator, not absolute - focus on actionable feedback

Problem: No test files found

Solution:

Verify test_dir path is correct
Check test file extensions (_.spec.ts, _.test.js, etc.)
Use glob pattern to discover: tests/**/*.spec.ts

Problem: Knowledge fragments not loading

Solution:

Verify tea-index.csv exists in testarch/ directory
Check fragment file paths are correct in tea-index.csv
Ensure auto_load_knowledge: true in workflow variables

Problem: Too many false positives

Solution:

Add justification comments in code for legitimate violations
Adjust check_* flags to disable specific criteria
Use strict_mode: false for advisory-only feedback
Context matters - document why pattern is appropriate

bmad tea *atdd - Generate acceptance tests (review after generation)
bmad tea *automate - Expand regression suite (review new tests)
bmad tea *gate - Quality gate decision (test quality feeds into decision)
bmad dev story - Implement story (review tests after implementation)

README.md

Test Quality Review Workflow

Overview

Usage

Inputs

Required Context Files

Recommended Context Files

Workflow Variables

Outputs

Primary Deliverable

Secondary Outputs

Validation Safeguards

Quality Criteria Explained

1. BDD Format (Given-When-Then)

2. Test IDs

3. Priority Markers

4. No Hard Waits

5. Determinism

6. Isolation

7. Fixture Patterns

8. Data Factories

9. Network-First Pattern

10. Explicit Assertions

11. Test Length

12. Test Duration

13. Flakiness Patterns

Quality Scoring

Score Calculation

Quality Grades

Example Scenarios

Scenario 1: Excellent Quality (Score: 95)

Scenario 2: Good Quality (Score: 82)

Scenario 3: Needs Improvement (Score: 68)

Scenario 4: Critical Issues (Score: 42)

Integration with Other Workflows

Before Test Review

After Test Review

Coordinates With

Review Scopes

Single File Review

Directory Review

Suite Review

Configuration Examples

Strict Review (Fail on Violations)

Balanced Review (Advisory)

Focused Review (Specific Criteria)

Important Notes

Knowledge Base References

Troubleshooting

Problem: Quality score seems too low

Problem: No test files found

Problem: Knowledge fragments not loading

Problem: Too many false positives

Related Commands