Files
pig-farm-controller/bmad/bmm/workflows/testarch/test-review/instructions.md
2025-11-01 19:22:39 +08:00

609 lines
22 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Test Quality Review - Instructions v4.0
**Workflow:** `testarch-test-review`
**Purpose:** Review test quality using TEA's comprehensive knowledge base and validate against best practices for maintainability, determinism, isolation, and flakiness prevention
**Agent:** Test Architect (TEA)
**Format:** Pure Markdown v4.0 (no XML blocks)
---
## Overview
This workflow performs comprehensive test quality reviews using TEA's knowledge base of best practices. It validates tests against proven patterns for fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. The review generates actionable feedback with quality scoring.
**Key Capabilities:**
- **Knowledge-Based Review**: Applies patterns from tea-index.csv fragments
- **Quality Scoring**: 0-100 score based on violations and best practices
- **Multi-Scope**: Review single file, directory, or entire test suite
- **Pattern Detection**: Identifies flaky patterns, hard waits, race conditions
- **Best Practice Validation**: BDD format, test IDs, priorities, assertions
- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
- **Integration**: Works with story files, test-design, acceptance criteria
---
## Prerequisites
**Required:**
- Test file(s) to review (auto-discovered or explicitly provided)
- Test framework configuration (playwright.config.ts, jest.config.js, etc.)
**Recommended:**
- Story file with acceptance criteria (for context)
- Test design document (for priority context)
- Knowledge base fragments available in tea-index.csv
**Halt Conditions:**
- If test file path is invalid or file doesn't exist, halt and request correction
- If test_dir is empty (no tests found), halt and notify user
---
## Workflow Steps
### Step 1: Load Context and Knowledge Base
**Actions:**
1. Load relevant knowledge fragments from `{project-root}/bmad/bmm/testarch/tea-index.csv`:
- `test-quality.md` - Definition of Done (deterministic tests, isolated with cleanup, explicit assertions, <300 lines, <1.5 min, 658 lines, 5 examples)
- `fixture-architecture.md` - Pure function Fixture mergeTests composition with auto-cleanup (406 lines, 5 examples)
- `network-first.md` - Route intercept before navigate to prevent race conditions (intercept before navigate, HAR capture, deterministic waiting, 489 lines, 5 examples)
- `data-factories.md` - Factory functions with faker: overrides, nested factories, API-first setup (498 lines, 5 examples)
- `test-levels-framework.md` - E2E vs API vs Component vs Unit appropriateness with decision matrix (467 lines, 4 examples)
- `playwright-config.md` - Environment-based configuration with fail-fast validation (722 lines, 5 examples)
- `component-tdd.md` - Red-Green-Refactor patterns with provider isolation, accessibility, visual regression (480 lines, 4 examples)
- `selective-testing.md` - Duplicate coverage detection with tag-based, spec filter, diff-based selection (727 lines, 4 examples)
- `test-healing-patterns.md` - Common failure patterns: stale selectors, race conditions, dynamic data, network errors, hard waits (648 lines, 5 examples)
- `selector-resilience.md` - Selector best practices (data-testid > ARIA > text > CSS hierarchy, anti-patterns, 541 lines, 4 examples)
- `timing-debugging.md` - Race condition prevention and async debugging techniques (370 lines, 3 examples)
- `ci-burn-in.md` - Flaky test detection with 10-iteration burn-in loop (678 lines, 4 examples)
2. Determine review scope:
- **single**: Review one test file (`test_file_path` provided)
- **directory**: Review all tests in directory (`test_dir` provided)
- **suite**: Review entire test suite (discover all test files)
3. Auto-discover related artifacts (if `auto_discover_story: true`):
- Extract test ID from filename (e.g., `1.3-E2E-001.spec.ts` → story 1.3)
- Search for story file (`story-1.3.md`)
- Search for test design (`test-design-story-1.3.md` or `test-design-epic-1.md`)
4. Read story file for context (if available):
- Extract acceptance criteria
- Extract priority classification
- Extract expected test IDs
**Output:** Complete knowledge base loaded, review scope determined, context gathered
---
### Step 2: Discover and Parse Test Files
**Actions:**
1. **Discover test files** based on scope:
- **single**: Use `test_file_path` variable
- **directory**: Use `glob` to find all test files in `test_dir` (e.g., `*.spec.ts`, `*.test.js`)
- **suite**: Use `glob` to find all test files recursively from project root
2. **Parse test file metadata**:
- File path and name
- File size (warn if >15 KB or >300 lines)
- Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
- Imports and dependencies
- Test structure (describe/context/it blocks)
3. **Extract test structure**:
- Count of describe blocks (test suites)
- Count of it/test blocks (individual tests)
- Test IDs (if present, e.g., `test.describe('1.3-E2E-001')`)
- Priority markers (if present, e.g., `test.describe.only` for P0)
- BDD structure (Given-When-Then comments or steps)
4. **Identify test patterns**:
- Fixtures used
- Data factories used
- Network interception patterns
- Assertions used (expect, assert, toHaveText, etc.)
- Waits and timeouts (page.waitFor, sleep, hardcoded delays)
- Conditionals (if/else, switch, ternary)
- Try/catch blocks
- Shared state or globals
**Output:** Complete test file inventory with structure and pattern analysis
---
### Step 3: Validate Against Quality Criteria
**Actions:**
For each test file, validate against quality criteria (configurable via workflow variables):
#### 1. BDD Format Validation (if `check_given_when_then: true`)
-**PASS**: Tests use Given-When-Then structure (comments or step organization)
- ⚠️ **WARN**: Tests have some structure but not explicit GWT
-**FAIL**: Tests lack clear structure, hard to understand intent
**Knowledge Fragment**: test-quality.md, tdd-cycles.md
---
#### 2. Test ID Conventions (if `check_test_ids: true`)
-**PASS**: Test IDs present and follow convention (e.g., `1.3-E2E-001`, `2.1-API-005`)
- ⚠️ **WARN**: Some test IDs missing or inconsistent
-**FAIL**: No test IDs, can't trace tests to requirements
**Knowledge Fragment**: traceability.md, test-quality.md
---
#### 3. Priority Markers (if `check_priority_markers: true`)
-**PASS**: Tests classified as P0/P1/P2/P3 (via markers or test-design reference)
- ⚠️ **WARN**: Some priority classifications missing
-**FAIL**: No priority classification, can't determine criticality
**Knowledge Fragment**: test-priorities.md, risk-governance.md
---
#### 4. Hard Waits Detection (if `check_hard_waits: true`)
-**PASS**: No hard waits detected (no `sleep()`, `wait(5000)`, hardcoded delays)
- ⚠️ **WARN**: Some hard waits used but with justification comments
-**FAIL**: Hard waits detected without justification (flakiness risk)
**Patterns to detect:**
- `sleep(1000)`, `setTimeout()`, `delay()`
- `page.waitForTimeout(5000)` without explicit reason
- `await new Promise(resolve => setTimeout(resolve, 3000))`
**Knowledge Fragment**: test-quality.md, network-first.md
---
#### 5. Determinism Check (if `check_determinism: true`)
-**PASS**: Tests are deterministic (no conditionals, no try/catch abuse, no random values)
- ⚠️ **WARN**: Some conditionals but with clear justification
-**FAIL**: Tests use if/else, switch, or try/catch to control flow (flakiness risk)
**Patterns to detect:**
- `if (condition) { test logic }` - tests should work deterministically
- `try { test } catch { fallback }` - tests shouldn't swallow errors
- `Math.random()`, `Date.now()` without factory abstraction
**Knowledge Fragment**: test-quality.md, data-factories.md
---
#### 6. Isolation Validation (if `check_isolation: true`)
-**PASS**: Tests clean up resources, no shared state, can run in any order
- ⚠️ **WARN**: Some cleanup missing but isolated enough
-**FAIL**: Tests share state, depend on execution order, leave resources
**Patterns to check:**
- afterEach/afterAll cleanup hooks present
- No global variables mutated
- Database/API state cleaned up after tests
- Test data deleted or marked inactive
**Knowledge Fragment**: test-quality.md, data-factories.md
---
#### 7. Fixture Patterns (if `check_fixture_patterns: true`)
-**PASS**: Uses pure function → Fixture → mergeTests pattern
- ⚠️ **WARN**: Some fixtures used but not consistently
-**FAIL**: No fixtures, tests repeat setup code (maintainability risk)
**Patterns to check:**
- Fixtures defined (e.g., `test.extend({ customFixture: async ({}, use) => { ... }})`)
- Pure functions used for fixture logic
- mergeTests used to combine fixtures
- No beforeEach with complex setup (should be in fixtures)
**Knowledge Fragment**: fixture-architecture.md
---
#### 8. Data Factories (if `check_data_factories: true`)
-**PASS**: Uses factory functions with overrides, API-first setup
- ⚠️ **WARN**: Some factories used but also hardcoded data
-**FAIL**: Hardcoded test data, magic strings/numbers (maintainability risk)
**Patterns to check:**
- Factory functions defined (e.g., `createUser()`, `generateInvoice()`)
- Factories use faker.js or similar for realistic data
- Factories accept overrides (e.g., `createUser({ email: 'custom@example.com' })`)
- API-first setup (create via API, test via UI)
**Knowledge Fragment**: data-factories.md
---
#### 9. Network-First Pattern (if `check_network_first: true`)
-**PASS**: Route interception set up BEFORE navigation (race condition prevention)
- ⚠️ **WARN**: Some routes intercepted correctly, others after navigation
-**FAIL**: Route interception after navigation (race condition risk)
**Patterns to check:**
- `page.route()` called before `page.goto()`
- `page.waitForResponse()` used with explicit URL pattern
- No navigation followed immediately by route setup
**Knowledge Fragment**: network-first.md
---
#### 10. Assertions (if `check_assertions: true`)
-**PASS**: Explicit assertions present (expect, assert, toHaveText)
- ⚠️ **WARN**: Some tests rely on implicit waits instead of assertions
-**FAIL**: Missing assertions, tests don't verify behavior
**Patterns to check:**
- Each test has at least one assertion
- Assertions are specific (not just truthy checks)
- Assertions use framework-provided matchers (toHaveText, toBeVisible)
**Knowledge Fragment**: test-quality.md
---
#### 11. Test Length (if `check_test_length: true`)
-**PASS**: Test file ≤200 lines (ideal), ≤300 lines (acceptable)
- ⚠️ **WARN**: Test file 301-500 lines (consider splitting)
-**FAIL**: Test file >500 lines (too large, maintainability risk)
**Knowledge Fragment**: test-quality.md
---
#### 12. Test Duration (if `check_test_duration: true`)
-**PASS**: Individual tests ≤1.5 minutes (target: <30 seconds)
- **WARN**: Some tests 1.5-3 minutes (consider optimization)
- **FAIL**: Tests >3 minutes (too slow, impacts CI/CD)
**Note:** Duration estimation based on complexity analysis if execution data unavailable
**Knowledge Fragment**: test-quality.md, selective-testing.md
---
#### 13. Flakiness Patterns (if `check_flakiness_patterns: true`)
-**PASS**: No known flaky patterns detected
- ⚠️ **WARN**: Some potential flaky patterns (e.g., tight timeouts, race conditions)
-**FAIL**: Multiple flaky patterns detected (high flakiness risk)
**Patterns to detect:**
- Tight timeouts (e.g., `{ timeout: 1000 }`)
- Race conditions (navigation before route interception)
- Timing-dependent assertions (e.g., checking timestamps)
- Retry logic in tests (hides flakiness)
- Environment-dependent assumptions (hardcoded URLs, ports)
**Knowledge Fragment**: test-quality.md, network-first.md, ci-burn-in.md
---
### Step 4: Calculate Quality Score
**Actions:**
1. **Count violations** by severity:
- **Critical (P0)**: Hard waits without justification, no assertions, race conditions, shared state
- **High (P1)**: Missing test IDs, no BDD structure, hardcoded data, missing fixtures
- **Medium (P2)**: Long test files (>300 lines), missing priorities, some conditionals
- **Low (P3)**: Minor style issues, incomplete cleanup, verbose tests
2. **Calculate quality score** (if `quality_score_enabled: true`):
```
Starting Score: 100
Critical Violations: -10 points each
High Violations: -5 points each
Medium Violations: -2 points each
Low Violations: -1 point each
Bonus Points:
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern: +5
+ Perfect isolation: +5
+ All test IDs present: +5
Quality Score: max(0, min(100, Starting Score - Violations + Bonus))
```
3. **Quality Grade**:
- **90-100**: Excellent (A+)
- **80-89**: Good (A)
- **70-79**: Acceptable (B)
- **60-69**: Needs Improvement (C)
- **<60**: Critical Issues (F)
**Output:** Quality score calculated with violation breakdown
---
### Step 5: Generate Review Report
**Actions:**
1. **Create review report** using `test-review-template.md`:
**Header Section:**
- Test file(s) reviewed
- Review date
- Review scope (single/directory/suite)
- Quality score and grade
**Executive Summary:**
- Overall assessment (Excellent/Good/Needs Improvement/Critical)
- Key strengths
- Key weaknesses
- Recommendation (Approve/Approve with comments/Request changes)
**Quality Criteria Assessment:**
- Table with all criteria evaluated
- Status for each (PASS/WARN/FAIL)
- Violation count per criterion
**Critical Issues (Must Fix):**
- Priority P0/P1 violations
- Code location (file:line)
- Explanation of issue
- Recommended fix
- Knowledge base reference
**Recommendations (Should Fix):**
- Priority P2/P3 violations
- Code location (file:line)
- Explanation of issue
- Recommended improvement
- Knowledge base reference
**Best Practices Examples:**
- Highlight good patterns found in tests
- Reference knowledge base fragments
- Provide examples for others to follow
**Knowledge Base References:**
- List all fragments consulted
- Provide links to detailed guidance
2. **Generate inline comments** (if `generate_inline_comments: true`):
- Add TODO comments in test files at violation locations
- Format: `// TODO (TEA Review): [Issue description] - See test-review-{filename}.md`
- Never modify test logic, only add comments
3. **Generate quality badge** (if `generate_quality_badge: true`):
- Create badge with quality score (e.g., "Test Quality: 87/100 (A)")
- Format for inclusion in README or documentation
4. **Append to story file** (if `append_to_story: true` and story file exists):
- Add "Test Quality Review" section to story
- Include quality score and critical issues
- Link to full review report
**Output:** Comprehensive review report with actionable feedback
---
### Step 6: Save Outputs and Notify
**Actions:**
1. **Save review report** to `{output_file}`
2. **Save inline comments** to test files (if enabled)
3. **Save quality badge** to output folder (if enabled)
4. **Update story file** (if enabled)
5. **Generate summary message** for user:
- Quality score and grade
- Critical issue count
- Recommendation
**Output:** All review artifacts saved and user notified
---
## Quality Criteria Decision Matrix
| Criterion | PASS | WARN | FAIL | Knowledge Fragment |
| ------------------ | ------------------------- | -------------- | ------------------- | ----------------------- |
| BDD Format | Given-When-Then present | Some structure | No structure | test-quality.md |
| Test IDs | All tests have IDs | Some missing | No IDs | traceability.md |
| Priority Markers | All classified | Some missing | No classification | test-priorities.md |
| Hard Waits | No hard waits | Some justified | Hard waits present | test-quality.md |
| Determinism | No conditionals/random | Some justified | Conditionals/random | test-quality.md |
| Isolation | Clean up, no shared state | Some gaps | Shared state | test-quality.md |
| Fixture Patterns | Pure fn Fixture | Some fixtures | No fixtures | fixture-architecture.md |
| Data Factories | Factory functions | Some factories | Hardcoded data | data-factories.md |
| Network-First | Intercept before navigate | Some correct | Race conditions | network-first.md |
| Assertions | Explicit assertions | Some implicit | Missing assertions | test-quality.md |
| Test Length | 300 lines | 301-500 lines | >500 lines | test-quality.md |
| Test Duration | ≤1.5 min | 1.5-3 min | >3 min | test-quality.md |
| Flakiness Patterns | No flaky patterns | Some potential | Multiple patterns | ci-burn-in.md |
---
## Example Review Summary
````markdown
# Test Quality Review: auth-login.spec.ts
**Quality Score**: 78/100 (B - Acceptable)
**Review Date**: 2025-10-14
**Recommendation**: Approve with Comments
## Executive Summary
Overall, the test demonstrates good structure and coverage of the login flow. However, there are several areas for improvement to enhance maintainability and prevent flakiness.
**Strengths:**
- Excellent BDD structure with clear Given-When-Then comments
- Good use of test IDs (1.3-E2E-001, 1.3-E2E-002)
- Comprehensive assertions on authentication state
**Weaknesses:**
- Hard wait detected (page.waitForTimeout(2000)) - flakiness risk
- Hardcoded test data (email: 'test@example.com') - use factories instead
- Missing fixture for common login setup - DRY violation
**Recommendation**: Address critical issue (hard wait) before merging. Other improvements can be addressed in follow-up PR.
## Critical Issues (Must Fix)
### 1. Hard Wait Detected (Line 45)
**Severity**: P0 (Critical)
**Issue**: `await page.waitForTimeout(2000)` introduces flakiness
**Fix**: Use explicit wait for element or network request instead
**Knowledge**: See test-quality.md, network-first.md
```typescript
// ❌ Bad (current)
await page.waitForTimeout(2000);
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
// ✅ Good (recommended)
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
```
````
## Recommendations (Should Fix)
### 1. Use Data Factory for Test User (Lines 23, 32, 41)
**Severity**: P1 (High)
**Issue**: Hardcoded email 'test@example.com' - maintainability risk
**Fix**: Create factory function for test users
**Knowledge**: See data-factories.md
```typescript
// ✅ Good (recommended)
import { createTestUser } from './factories/user-factory';
const testUser = createTestUser({ role: 'admin' });
await loginPage.login(testUser.email, testUser.password);
```
### 2. Extract Login Setup to Fixture (Lines 18-28)
**Severity**: P1 (High)
**Issue**: Login setup repeated across tests - DRY violation
**Fix**: Create fixture for authenticated state
**Knowledge**: See fixture-architecture.md
```typescript
// ✅ Good (recommended)
const test = base.extend({
authenticatedPage: async ({ page }, use) => {
const user = createTestUser();
await loginPage.login(user.email, user.password);
await use(page);
},
});
test('user can access dashboard', async ({ authenticatedPage }) => {
// Test starts already logged in
});
```
## Quality Score Breakdown
- Starting Score: 100
- Critical Violations (1 × -10): -10
- High Violations (2 × -5): -10
- Medium Violations (0 × -2): 0
- Low Violations (1 × -1): -1
- Bonus (BDD +5, Test IDs +5): +10
- **Final Score**: 78/100 (B)
```
---
## Integration with Other Workflows
### Before Test Review
- **atdd**: Generate acceptance tests (TEA reviews them for quality)
- **automate**: Expand regression suite (TEA reviews new tests)
- **dev story**: Developer writes implementation tests (TEA reviews them)
### After Test Review
- **Developer**: Addresses critical issues, improves based on recommendations
- **gate**: Test quality review feeds into gate decision (high-quality tests increase confidence)
### Coordinates With
- **Story File**: Review links to acceptance criteria context
- **Test Design**: Review validates tests align with prioritization
- **Knowledge Base**: Review references fragments for detailed guidance
---
## Important Notes
1. **Non-Prescriptive**: Review provides guidance, not rigid rules
2. **Context Matters**: Some violations may be justified for specific scenarios
3. **Knowledge-Based**: All feedback grounded in proven patterns from tea-index.csv
4. **Actionable**: Every issue includes recommended fix with code examples
5. **Quality Score**: Use as indicator, not absolute measure
6. **Continuous Improvement**: Review same tests periodically as patterns evolve
---
## Troubleshooting
**Problem: No test files found**
- Verify test_dir path is correct
- Check test file extensions match glob pattern
- Ensure test files exist in expected location
**Problem: Quality score seems too low/high**
- Review violation counts - may need to adjust thresholds
- Consider context - some projects have different standards
- Focus on critical issues first, not just score
**Problem: Inline comments not generated**
- Check generate_inline_comments: true in variables
- Verify write permissions on test files
- Review append_to_file: false (separate report mode)
**Problem: Knowledge fragments not loading**
- Verify tea-index.csv exists in testarch/ directory
- Check fragment file paths are correct
- Ensure auto_load_knowledge: true in variables
```