Test Quality Review Workflow
The Test Quality Review workflow performs comprehensive quality validation of test code using TEA's knowledge base of best practices. It detects flaky patterns, validates structure, and provides actionable feedback to improve test maintainability and reliability.
Overview
This workflow reviews test quality against proven patterns from TEA's knowledge base including fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. It generates a quality score (0-100) with detailed feedback on violations and recommendations.
Key Features:
- Knowledge-Based Review: Applies patterns from 19+ knowledge fragments in tea-index.csv
- Quality Scoring: 0-100 score with letter grade (A+ to F) based on violations
- Multi-Scope Review: Single file, directory, or entire test suite
- Pattern Detection: Identifies hard waits, race conditions, shared state, conditionals
- Best Practice Validation: BDD format, test IDs, priorities, assertions, test length
- Actionable Feedback: Critical issues (must fix) vs recommendations (should fix)
- Code Examples: Every issue includes recommended fix with code snippets
- Integration: Works with story files, test-design, acceptance criteria context
Usage
bmad tea *test-review
The TEA agent runs this workflow when:
- After
*atddworkflow → validate generated acceptance tests - After
*automateworkflow → ensure regression suite quality - After developer writes tests → provide quality feedback
- Before
*gateworkflow → confirm test quality before release - User explicitly requests review:
bmad tea *test-review - Periodic quality audits of existing test suite
Typical workflow sequence:
*atdd→ Generate failing acceptance tests*test-review→ Validate test quality ⬅️ YOU ARE HERE (option 1)*dev story→ Implement feature with tests passing*test-review→ Review implementation tests ⬅️ YOU ARE HERE (option 2)*automate→ Expand regression suite*test-review→ Validate new regression tests ⬅️ YOU ARE HERE (option 3)*gate→ Final quality gate decision
Inputs
Required Context Files
- Test File(s): One or more test files to review (auto-discovered or explicitly provided)
- Test Framework Config: playwright.config.ts, jest.config.js, etc. (for context)
Recommended Context Files
- Story File: Acceptance criteria for context (e.g.,
story-1.3.md) - Test Design: Priority context (P0/P1/P2/P3) from test-design.md
- Knowledge Base: tea-index.csv with best practice fragments (required for thorough review)
Workflow Variables
Key variables that control review behavior (configured in workflow.yaml):
-
review_scope:
single|directory|suite(default:single)single: Review one test filedirectory: Review all tests in a directorysuite: Review entire test suite
-
quality_score_enabled: Enable 0-100 quality scoring (default:
true) -
append_to_file: Add inline comments to test files (default:
false) -
check_against_knowledge: Use tea-index.csv fragments (default:
true) -
strict_mode: Fail on any violation vs advisory only (default:
false)
Quality Criteria Flags (all default to true):
check_given_when_then: BDD format validationcheck_test_ids: Test ID conventionscheck_priority_markers: P0/P1/P2/P3 classificationcheck_hard_waits: Detect sleep(), wait(X)check_determinism: No conditionals/try-catch abusecheck_isolation: Tests clean up, no shared statecheck_fixture_patterns: Pure function → Fixture → mergeTestscheck_data_factories: Factory usage vs hardcoded datacheck_network_first: Route intercept before navigatecheck_assertions: Explicit assertions presentcheck_test_length: Warn if >300 linescheck_test_duration: Warn if >1.5 mincheck_flakiness_patterns: Common flaky patterns
Outputs
Primary Deliverable
Test Quality Review Report (test-review-{filename}.md):
- Executive Summary: Overall assessment, key strengths/weaknesses, recommendation
- Quality Score: 0-100 score with letter grade (A+ to F)
- Quality Criteria Assessment: Table with all criteria evaluated (PASS/WARN/FAIL)
- Critical Issues: P0/P1 violations that must be fixed
- Recommendations: P2/P3 violations that should be fixed
- Best Practices Examples: Good patterns found in tests
- Knowledge Base References: Links to detailed guidance
Each issue includes:
- Code location (file:line)
- Explanation of problem
- Recommended fix with code example
- Knowledge base fragment reference
Secondary Outputs
- Inline Comments: TODO comments in test files at violation locations (if enabled)
- Quality Badge: Badge with score (e.g., "Test Quality: 87/100 (A)")
- Story Update: Test quality section appended to story file (if enabled)
Validation Safeguards
- ✅ All knowledge base fragments loaded successfully
- ✅ Test files parsed and structure analyzed
- ✅ All enabled quality criteria evaluated
- ✅ Violations categorized by severity (P0/P1/P2/P3)
- ✅ Quality score calculated with breakdown
- ✅ Actionable feedback with code examples provided
Quality Criteria Explained
1. BDD Format (Given-When-Then)
PASS: Tests use clear Given-When-Then structure
// Given: User is logged in
const user = await createTestUser();
await loginPage.login(user.email, user.password);
// When: User navigates to dashboard
await page.goto('/dashboard');
// Then: User sees welcome message
await expect(page.locator('[data-testid="welcome"]')).toContainText(user.name);
FAIL: Tests lack structure, hard to understand intent
await page.goto('/dashboard');
await page.click('.button');
await expect(page.locator('.text')).toBeVisible();
Knowledge: test-quality.md, tdd-cycles.md
2. Test IDs
PASS: All tests have IDs following convention
test.describe('1.3-E2E-001: User Login Flow', () => {
test('should log in successfully with valid credentials', async ({ page }) => {
// Test implementation
});
});
FAIL: No test IDs, can't trace to requirements
test.describe('Login', () => {
test('login works', async ({ page }) => {
// Test implementation
});
});
Knowledge: traceability.md, test-quality.md
3. Priority Markers
PASS: Tests classified as P0/P1/P2/P3
test.describe('P0: Critical User Journey - Checkout', () => {
// Critical tests
});
test.describe('P2: Edge Case - International Addresses', () => {
// Nice-to-have tests
});
Knowledge: test-priorities.md, risk-governance.md
4. No Hard Waits
PASS: No sleep(), wait(), hardcoded delays
// ✅ Good: Explicit wait for condition
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
FAIL: Hard waits introduce flakiness
// ❌ Bad: Hard wait
await page.waitForTimeout(2000);
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
Knowledge: test-quality.md, network-first.md
5. Determinism
PASS: Tests work deterministically, no conditionals
// ✅ Good: Deterministic test
await expect(page.locator('[data-testid="status"]')).toHaveText('Active');
FAIL: Conditionals make tests unpredictable
// ❌ Bad: Conditional logic
const status = await page.locator('[data-testid="status"]').textContent();
if (status === 'Active') {
await page.click('[data-testid="deactivate"]');
} else {
await page.click('[data-testid="activate"]');
}
Knowledge: test-quality.md, data-factories.md
6. Isolation
PASS: Tests clean up, no shared state
test.afterEach(async ({ page, testUser }) => {
// Cleanup: Delete test user
await api.deleteUser(testUser.id);
});
FAIL: Shared state, tests depend on order
// ❌ Bad: Shared global variable
let userId: string;
test('create user', async () => {
userId = await createUser(); // Sets global
});
test('update user', async () => {
await updateUser(userId); // Depends on previous test
});
Knowledge: test-quality.md, data-factories.md
7. Fixture Patterns
PASS: Pure function → Fixture → mergeTests
// ✅ Good: Pure function fixture
const createAuthenticatedPage = async (page: Page, user: User) => {
await loginPage.login(user.email, user.password);
return page;
};
const test = base.extend({
authenticatedPage: async ({ page }, use) => {
const user = createTestUser();
const authedPage = await createAuthenticatedPage(page, user);
await use(authedPage);
},
});
FAIL: No fixtures, repeated setup
// ❌ Bad: Repeated setup in every test
test('test 1', async ({ page }) => {
await page.goto('/login');
await page.fill('[name="email"]', 'test@example.com');
await page.fill('[name="password"]', 'password123');
await page.click('[type="submit"]');
// Test logic
});
Knowledge: fixture-architecture.md
8. Data Factories
PASS: Factory functions with overrides
// ✅ Good: Factory function
import { createTestUser } from './factories/user-factory';
test('user can update profile', async ({ page }) => {
const user = createTestUser({ role: 'admin' });
await api.createUser(user); // API-first setup
// Test UI interaction
});
FAIL: Hardcoded test data
// ❌ Bad: Magic strings
await page.fill('[name="email"]', 'test@example.com');
await page.fill('[name="phone"]', '555-1234');
Knowledge: data-factories.md
9. Network-First Pattern
PASS: Route intercept before navigate
// ✅ Good: Intercept before navigation
await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers }));
await page.goto('/users'); // Navigate after route setup
FAIL: Race condition risk
// ❌ Bad: Navigate before intercept
await page.goto('/users');
await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers })); // Too late!
Knowledge: network-first.md
10. Explicit Assertions
PASS: Clear, specific assertions
await expect(page.locator('[data-testid="username"]')).toHaveText('John Doe');
await expect(page.locator('[data-testid="status"]')).toHaveClass(/active/);
FAIL: Missing or vague assertions
await page.locator('[data-testid="username"]').isVisible(); // No assertion!
Knowledge: test-quality.md
11. Test Length
PASS: ≤300 lines per file (ideal: ≤200) WARN: 301-500 lines (consider splitting) FAIL: >500 lines (too large)
Knowledge: test-quality.md
12. Test Duration
PASS: ≤1.5 minutes per test (target: <30 seconds) WARN: 1.5-3 minutes (consider optimization) FAIL: >3 minutes (too slow)
Knowledge: test-quality.md, selective-testing.md
13. Flakiness Patterns
Common flaky patterns detected:
- Tight timeouts (e.g.,
{ timeout: 1000 }) - Race conditions (navigation before route interception)
- Timing-dependent assertions
- Retry logic hiding flakiness
- Environment-dependent assumptions
Knowledge: test-quality.md, network-first.md, ci-burn-in.md
Quality Scoring
Score Calculation
Starting Score: 100
Deductions:
- Critical Violations (P0): -10 points each
- High Violations (P1): -5 points each
- Medium Violations (P2): -2 points each
- Low Violations (P3): -1 point each
Bonus Points (max +30):
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern consistently used: +5
+ Perfect isolation (all tests clean up): +5
+ All test IDs present and correct: +5
Final Score: max(0, min(100, Starting Score - Violations + Bonus))
Quality Grades
- 90-100 (A+): Excellent - Production-ready, best practices followed
- 80-89 (A): Good - Minor improvements recommended
- 70-79 (B): Acceptable - Some issues to address
- 60-69 (C): Needs Improvement - Several issues detected
- <60 (F): Critical Issues - Significant problems, not production-ready
Example Scenarios
Scenario 1: Excellent Quality (Score: 95)
# Test Quality Review: checkout-flow.spec.ts
**Quality Score**: 95/100 (A+ - Excellent)
**Recommendation**: Approve - Production Ready
## Executive Summary
Excellent test quality with comprehensive coverage and best practices throughout.
Tests demonstrate expert-level patterns including fixture architecture, data
factories, network-first approach, and perfect isolation.
**Strengths:**
✅ Clear Given-When-Then structure in all tests
✅ Comprehensive fixtures for authenticated states
✅ Data factories with faker.js for realistic test data
✅ Network-first pattern prevents race conditions
✅ Perfect test isolation with cleanup
✅ All test IDs present (1.2-E2E-001 through 1.2-E2E-005)
**Minor Recommendations:**
⚠️ One test slightly verbose (245 lines) - consider extracting helper function
**Recommendation**: Approve without changes. Use as reference for other tests.
Scenario 2: Good Quality (Score: 82)
# Test Quality Review: user-profile.spec.ts
**Quality Score**: 82/100 (A - Good)
**Recommendation**: Approve with Comments
## Executive Summary
Solid test quality with good structure and coverage. A few improvements would
enhance maintainability and reduce flakiness risk.
**Strengths:**
✅ Good BDD structure
✅ Test IDs present
✅ Explicit assertions
**Issues to Address:**
⚠️ 2 hard waits detected (lines 34, 67) - use explicit waits instead
⚠️ Hardcoded test data (line 23) - use factory functions
⚠️ Missing cleanup in one test (line 89) - add afterEach hook
**Recommendation**: Address hard waits before merging. Other improvements
can be addressed in follow-up PR.
Scenario 3: Needs Improvement (Score: 68)
# Test Quality Review: legacy-report.spec.ts
**Quality Score**: 68/100 (C - Needs Improvement)
**Recommendation**: Request Changes
## Executive Summary
Test has several quality issues that should be addressed before merging.
Primarily concerns around flakiness risk and maintainability.
**Critical Issues:**
❌ 5 hard waits detected (flakiness risk)
❌ Race condition: navigation before route interception (line 45)
❌ Shared global state between tests (line 12)
❌ Missing test IDs (can't trace to requirements)
**Recommendations:**
⚠️ Test file is 487 lines - consider splitting
⚠️ Hardcoded data throughout - use factories
⚠️ Missing cleanup in afterEach
**Recommendation**: Address all critical issues (❌) before re-review.
Significant refactoring needed.
Scenario 4: Critical Issues (Score: 42)
# Test Quality Review: data-export.spec.ts
**Quality Score**: 42/100 (F - Critical Issues)
**Recommendation**: Block - Not Production Ready
## Executive Summary
CRITICAL: Test has severe quality issues that make it unsuitable for
production. Significant refactoring required.
**Critical Issues:**
❌ 12 hard waits (page.waitForTimeout) throughout
❌ No test IDs or structure
❌ Try/catch blocks swallowing errors (lines 23, 45, 67, 89)
❌ No cleanup - tests leave data in database
❌ Conditional logic (if/else) throughout tests
❌ No assertions in 3 tests (tests do nothing!)
❌ 687 lines - far too large
❌ Multiple race conditions
❌ Hardcoded credentials in plain text (SECURITY ISSUE)
**Recommendation**: BLOCK MERGE. Complete rewrite recommended following
TEA knowledge base patterns. Suggest pairing session with QA engineer.
Integration with Other Workflows
Before Test Review
- atdd - Generates acceptance tests → TEA reviews for quality
- dev story - Developer implements tests → TEA provides feedback
- automate - Expands regression suite → TEA validates new tests
After Test Review
- Developer - Addresses critical issues, improves based on recommendations
- gate - Test quality feeds into release decision (high-quality tests increase confidence)
Coordinates With
- Story File: Review links to acceptance criteria for context
- Test Design: Review validates tests align with P0/P1/P2/P3 prioritization
- Knowledge Base: All feedback references tea-index.csv fragments
Review Scopes
Single File Review
# Review specific test file
bmad tea *test-review
# Provide test_file_path when prompted: tests/auth/login.spec.ts
Use When:
- Reviewing tests just written
- PR review of specific test file
- Debugging flaky test
- Learning test quality patterns
Directory Review
# Review all tests in directory
bmad tea *test-review
# Provide review_scope: directory
# Provide test_dir: tests/auth/
Use When:
- Feature branch has multiple test files
- Reviewing entire feature test suite
- Auditing test quality for module
Suite Review
# Review entire test suite
bmad tea *test-review
# Provide review_scope: suite
Use When:
- Periodic quality audit (monthly/quarterly)
- Before major release
- Identifying patterns across codebase
- Establishing quality baseline
Configuration Examples
Strict Review (Fail on Violations)
review_scope: 'single'
quality_score_enabled: true
strict_mode: true # Fail if score <70
check_against_knowledge: true
# All check_* flags: true
Use for: PR gates, production releases
Balanced Review (Advisory)
review_scope: 'single'
quality_score_enabled: true
strict_mode: false # Advisory only
check_against_knowledge: true
# All check_* flags: true
Use for: Most development workflows (default)
Focused Review (Specific Criteria)
review_scope: 'single'
check_hard_waits: true
check_flakiness_patterns: true
check_network_first: true
# Other checks: false
Use for: Debugging flaky tests, targeted improvements
Important Notes
- Non-Prescriptive: Review provides guidance, not rigid rules
- Context Matters: Some violations may be justified (document with comments)
- Knowledge-Based: All feedback grounded in proven patterns
- Actionable: Every issue includes recommended fix with code example
- Quality Score: Use as indicator, not absolute measure
- Continuous Improvement: Review tests periodically as patterns evolve
- Learning Tool: Use reviews to learn best practices, not just find bugs
Knowledge Base References
This workflow automatically consults:
- test-quality.md - Definition of Done (no hard waits, <300 lines, <1.5 min, self-cleaning)
- fixture-architecture.md - Pure function → Fixture → mergeTests pattern
- network-first.md - Route intercept before navigate (race condition prevention)
- data-factories.md - Factory functions with overrides, API-first setup
- test-levels-framework.md - E2E vs API vs Component vs Unit appropriateness
- playwright-config.md - Environment-based configuration patterns
- tdd-cycles.md - Red-Green-Refactor patterns
- selective-testing.md - Duplicate coverage detection
- ci-burn-in.md - Flakiness detection patterns
- test-priorities.md - P0/P1/P2/P3 classification framework
- traceability.md - Requirements-to-tests mapping
See tea-index.csv for complete knowledge fragment mapping.
Troubleshooting
Problem: Quality score seems too low
Solution:
- Review violation breakdown - focus on critical issues first
- Consider project context - some patterns may be justified
- Check if criteria are appropriate for project type
- Score is indicator, not absolute - focus on actionable feedback
Problem: No test files found
Solution:
- Verify test_dir path is correct
- Check test file extensions (_.spec.ts, _.test.js, etc.)
- Use glob pattern to discover:
tests/**/*.spec.ts
Problem: Knowledge fragments not loading
Solution:
- Verify tea-index.csv exists in testarch/ directory
- Check fragment file paths are correct in tea-index.csv
- Ensure auto_load_knowledge: true in workflow variables
Problem: Too many false positives
Solution:
- Add justification comments in code for legitimate violations
- Adjust check_* flags to disable specific criteria
- Use strict_mode: false for advisory-only feedback
- Context matters - document why pattern is appropriate
Related Commands
bmad tea *atdd- Generate acceptance tests (review after generation)bmad tea *automate- Expand regression suite (review new tests)bmad tea *gate- Quality gate decision (test quality feeds into decision)bmad dev story- Implement story (review tests after implementation)