bmad初始化

This commit is contained in:
2025-11-01 19:22:39 +08:00
parent 5b21dc0bd5
commit 426ae41f54
447 changed files with 80633 additions and 0 deletions

View File

@@ -0,0 +1,775 @@
# Test Quality Review Workflow
The Test Quality Review workflow performs comprehensive quality validation of test code using TEA's knowledge base of best practices. It detects flaky patterns, validates structure, and provides actionable feedback to improve test maintainability and reliability.
## Overview
This workflow reviews test quality against proven patterns from TEA's knowledge base including fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. It generates a quality score (0-100) with detailed feedback on violations and recommendations.
**Key Features:**
- **Knowledge-Based Review**: Applies patterns from 19+ knowledge fragments in tea-index.csv
- **Quality Scoring**: 0-100 score with letter grade (A+ to F) based on violations
- **Multi-Scope Review**: Single file, directory, or entire test suite
- **Pattern Detection**: Identifies hard waits, race conditions, shared state, conditionals
- **Best Practice Validation**: BDD format, test IDs, priorities, assertions, test length
- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
- **Code Examples**: Every issue includes recommended fix with code snippets
- **Integration**: Works with story files, test-design, acceptance criteria context
---
## Usage
```bash
bmad tea *test-review
```
The TEA agent runs this workflow when:
- After `*atdd` workflow → validate generated acceptance tests
- After `*automate` workflow → ensure regression suite quality
- After developer writes tests → provide quality feedback
- Before `*gate` workflow → confirm test quality before release
- User explicitly requests review: `bmad tea *test-review`
- Periodic quality audits of existing test suite
**Typical workflow sequence:**
1. `*atdd` → Generate failing acceptance tests
2. **`*test-review`** → Validate test quality ⬅️ YOU ARE HERE (option 1)
3. `*dev story` → Implement feature with tests passing
4. **`*test-review`** → Review implementation tests ⬅️ YOU ARE HERE (option 2)
5. `*automate` → Expand regression suite
6. **`*test-review`** → Validate new regression tests ⬅️ YOU ARE HERE (option 3)
7. `*gate` → Final quality gate decision
---
## Inputs
### Required Context Files
- **Test File(s)**: One or more test files to review (auto-discovered or explicitly provided)
- **Test Framework Config**: playwright.config.ts, jest.config.js, etc. (for context)
### Recommended Context Files
- **Story File**: Acceptance criteria for context (e.g., `story-1.3.md`)
- **Test Design**: Priority context (P0/P1/P2/P3) from test-design.md
- **Knowledge Base**: tea-index.csv with best practice fragments (required for thorough review)
### Workflow Variables
Key variables that control review behavior (configured in `workflow.yaml`):
- **review_scope**: `single` | `directory` | `suite` (default: `single`)
- `single`: Review one test file
- `directory`: Review all tests in a directory
- `suite`: Review entire test suite
- **quality_score_enabled**: Enable 0-100 quality scoring (default: `true`)
- **append_to_file**: Add inline comments to test files (default: `false`)
- **check_against_knowledge**: Use tea-index.csv fragments (default: `true`)
- **strict_mode**: Fail on any violation vs advisory only (default: `false`)
**Quality Criteria Flags** (all default to `true`):
- `check_given_when_then`: BDD format validation
- `check_test_ids`: Test ID conventions
- `check_priority_markers`: P0/P1/P2/P3 classification
- `check_hard_waits`: Detect sleep(), wait(X)
- `check_determinism`: No conditionals/try-catch abuse
- `check_isolation`: Tests clean up, no shared state
- `check_fixture_patterns`: Pure function → Fixture → mergeTests
- `check_data_factories`: Factory usage vs hardcoded data
- `check_network_first`: Route intercept before navigate
- `check_assertions`: Explicit assertions present
- `check_test_length`: Warn if >300 lines
- `check_test_duration`: Warn if >1.5 min
- `check_flakiness_patterns`: Common flaky patterns
---
## Outputs
### Primary Deliverable
**Test Quality Review Report** (`test-review-{filename}.md`):
- **Executive Summary**: Overall assessment, key strengths/weaknesses, recommendation
- **Quality Score**: 0-100 score with letter grade (A+ to F)
- **Quality Criteria Assessment**: Table with all criteria evaluated (PASS/WARN/FAIL)
- **Critical Issues**: P0/P1 violations that must be fixed
- **Recommendations**: P2/P3 violations that should be fixed
- **Best Practices Examples**: Good patterns found in tests
- **Knowledge Base References**: Links to detailed guidance
Each issue includes:
- Code location (file:line)
- Explanation of problem
- Recommended fix with code example
- Knowledge base fragment reference
### Secondary Outputs
- **Inline Comments**: TODO comments in test files at violation locations (if enabled)
- **Quality Badge**: Badge with score (e.g., "Test Quality: 87/100 (A)")
- **Story Update**: Test quality section appended to story file (if enabled)
### Validation Safeguards
- ✅ All knowledge base fragments loaded successfully
- ✅ Test files parsed and structure analyzed
- ✅ All enabled quality criteria evaluated
- ✅ Violations categorized by severity (P0/P1/P2/P3)
- ✅ Quality score calculated with breakdown
- ✅ Actionable feedback with code examples provided
---
## Quality Criteria Explained
### 1. BDD Format (Given-When-Then)
**PASS**: Tests use clear Given-When-Then structure
```typescript
// Given: User is logged in
const user = await createTestUser();
await loginPage.login(user.email, user.password);
// When: User navigates to dashboard
await page.goto('/dashboard');
// Then: User sees welcome message
await expect(page.locator('[data-testid="welcome"]')).toContainText(user.name);
```
**FAIL**: Tests lack structure, hard to understand intent
```typescript
await page.goto('/dashboard');
await page.click('.button');
await expect(page.locator('.text')).toBeVisible();
```
**Knowledge**: test-quality.md, tdd-cycles.md
---
### 2. Test IDs
**PASS**: All tests have IDs following convention
```typescript
test.describe('1.3-E2E-001: User Login Flow', () => {
test('should log in successfully with valid credentials', async ({ page }) => {
// Test implementation
});
});
```
**FAIL**: No test IDs, can't trace to requirements
```typescript
test.describe('Login', () => {
test('login works', async ({ page }) => {
// Test implementation
});
});
```
**Knowledge**: traceability.md, test-quality.md
---
### 3. Priority Markers
**PASS**: Tests classified as P0/P1/P2/P3
```typescript
test.describe('P0: Critical User Journey - Checkout', () => {
// Critical tests
});
test.describe('P2: Edge Case - International Addresses', () => {
// Nice-to-have tests
});
```
**Knowledge**: test-priorities.md, risk-governance.md
---
### 4. No Hard Waits
**PASS**: No sleep(), wait(), hardcoded delays
```typescript
// ✅ Good: Explicit wait for condition
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
```
**FAIL**: Hard waits introduce flakiness
```typescript
// ❌ Bad: Hard wait
await page.waitForTimeout(2000);
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
```
**Knowledge**: test-quality.md, network-first.md
---
### 5. Determinism
**PASS**: Tests work deterministically, no conditionals
```typescript
// ✅ Good: Deterministic test
await expect(page.locator('[data-testid="status"]')).toHaveText('Active');
```
**FAIL**: Conditionals make tests unpredictable
```typescript
// ❌ Bad: Conditional logic
const status = await page.locator('[data-testid="status"]').textContent();
if (status === 'Active') {
await page.click('[data-testid="deactivate"]');
} else {
await page.click('[data-testid="activate"]');
}
```
**Knowledge**: test-quality.md, data-factories.md
---
### 6. Isolation
**PASS**: Tests clean up, no shared state
```typescript
test.afterEach(async ({ page, testUser }) => {
// Cleanup: Delete test user
await api.deleteUser(testUser.id);
});
```
**FAIL**: Shared state, tests depend on order
```typescript
// ❌ Bad: Shared global variable
let userId: string;
test('create user', async () => {
userId = await createUser(); // Sets global
});
test('update user', async () => {
await updateUser(userId); // Depends on previous test
});
```
**Knowledge**: test-quality.md, data-factories.md
---
### 7. Fixture Patterns
**PASS**: Pure function → Fixture → mergeTests
```typescript
// ✅ Good: Pure function fixture
const createAuthenticatedPage = async (page: Page, user: User) => {
await loginPage.login(user.email, user.password);
return page;
};
const test = base.extend({
authenticatedPage: async ({ page }, use) => {
const user = createTestUser();
const authedPage = await createAuthenticatedPage(page, user);
await use(authedPage);
},
});
```
**FAIL**: No fixtures, repeated setup
```typescript
// ❌ Bad: Repeated setup in every test
test('test 1', async ({ page }) => {
await page.goto('/login');
await page.fill('[name="email"]', 'test@example.com');
await page.fill('[name="password"]', 'password123');
await page.click('[type="submit"]');
// Test logic
});
```
**Knowledge**: fixture-architecture.md
---
### 8. Data Factories
**PASS**: Factory functions with overrides
```typescript
// ✅ Good: Factory function
import { createTestUser } from './factories/user-factory';
test('user can update profile', async ({ page }) => {
const user = createTestUser({ role: 'admin' });
await api.createUser(user); // API-first setup
// Test UI interaction
});
```
**FAIL**: Hardcoded test data
```typescript
// ❌ Bad: Magic strings
await page.fill('[name="email"]', 'test@example.com');
await page.fill('[name="phone"]', '555-1234');
```
**Knowledge**: data-factories.md
---
### 9. Network-First Pattern
**PASS**: Route intercept before navigate
```typescript
// ✅ Good: Intercept before navigation
await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers }));
await page.goto('/users'); // Navigate after route setup
```
**FAIL**: Race condition risk
```typescript
// ❌ Bad: Navigate before intercept
await page.goto('/users');
await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers })); // Too late!
```
**Knowledge**: network-first.md
---
### 10. Explicit Assertions
**PASS**: Clear, specific assertions
```typescript
await expect(page.locator('[data-testid="username"]')).toHaveText('John Doe');
await expect(page.locator('[data-testid="status"]')).toHaveClass(/active/);
```
**FAIL**: Missing or vague assertions
```typescript
await page.locator('[data-testid="username"]').isVisible(); // No assertion!
```
**Knowledge**: test-quality.md
---
### 11. Test Length
**PASS**: ≤300 lines per file (ideal: ≤200)
**WARN**: 301-500 lines (consider splitting)
**FAIL**: >500 lines (too large)
**Knowledge**: test-quality.md
---
### 12. Test Duration
**PASS**: ≤1.5 minutes per test (target: <30 seconds)
**WARN**: 1.5-3 minutes (consider optimization)
**FAIL**: >3 minutes (too slow)
**Knowledge**: test-quality.md, selective-testing.md
---
### 13. Flakiness Patterns
Common flaky patterns detected:
- Tight timeouts (e.g., `{ timeout: 1000 }`)
- Race conditions (navigation before route interception)
- Timing-dependent assertions
- Retry logic hiding flakiness
- Environment-dependent assumptions
**Knowledge**: test-quality.md, network-first.md, ci-burn-in.md
---
## Quality Scoring
### Score Calculation
```
Starting Score: 100
Deductions:
- Critical Violations (P0): -10 points each
- High Violations (P1): -5 points each
- Medium Violations (P2): -2 points each
- Low Violations (P3): -1 point each
Bonus Points (max +30):
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern consistently used: +5
+ Perfect isolation (all tests clean up): +5
+ All test IDs present and correct: +5
Final Score: max(0, min(100, Starting Score - Violations + Bonus))
```
### Quality Grades
- **90-100** (A+): Excellent - Production-ready, best practices followed
- **80-89** (A): Good - Minor improvements recommended
- **70-79** (B): Acceptable - Some issues to address
- **60-69** (C): Needs Improvement - Several issues detected
- **<60** (F): Critical Issues - Significant problems, not production-ready
---
## Example Scenarios
### Scenario 1: Excellent Quality (Score: 95)
```markdown
# Test Quality Review: checkout-flow.spec.ts
**Quality Score**: 95/100 (A+ - Excellent)
**Recommendation**: Approve - Production Ready
## Executive Summary
Excellent test quality with comprehensive coverage and best practices throughout.
Tests demonstrate expert-level patterns including fixture architecture, data
factories, network-first approach, and perfect isolation.
**Strengths:**
✅ Clear Given-When-Then structure in all tests
✅ Comprehensive fixtures for authenticated states
✅ Data factories with faker.js for realistic test data
✅ Network-first pattern prevents race conditions
✅ Perfect test isolation with cleanup
✅ All test IDs present (1.2-E2E-001 through 1.2-E2E-005)
**Minor Recommendations:**
⚠️ One test slightly verbose (245 lines) - consider extracting helper function
**Recommendation**: Approve without changes. Use as reference for other tests.
```
---
### Scenario 2: Good Quality (Score: 82)
```markdown
# Test Quality Review: user-profile.spec.ts
**Quality Score**: 82/100 (A - Good)
**Recommendation**: Approve with Comments
## Executive Summary
Solid test quality with good structure and coverage. A few improvements would
enhance maintainability and reduce flakiness risk.
**Strengths:**
✅ Good BDD structure
✅ Test IDs present
✅ Explicit assertions
**Issues to Address:**
⚠️ 2 hard waits detected (lines 34, 67) - use explicit waits instead
⚠️ Hardcoded test data (line 23) - use factory functions
⚠️ Missing cleanup in one test (line 89) - add afterEach hook
**Recommendation**: Address hard waits before merging. Other improvements
can be addressed in follow-up PR.
```
---
### Scenario 3: Needs Improvement (Score: 68)
```markdown
# Test Quality Review: legacy-report.spec.ts
**Quality Score**: 68/100 (C - Needs Improvement)
**Recommendation**: Request Changes
## Executive Summary
Test has several quality issues that should be addressed before merging.
Primarily concerns around flakiness risk and maintainability.
**Critical Issues:**
❌ 5 hard waits detected (flakiness risk)
❌ Race condition: navigation before route interception (line 45)
❌ Shared global state between tests (line 12)
❌ Missing test IDs (can't trace to requirements)
**Recommendations:**
⚠️ Test file is 487 lines - consider splitting
⚠️ Hardcoded data throughout - use factories
⚠️ Missing cleanup in afterEach
**Recommendation**: Address all critical issues (❌) before re-review.
Significant refactoring needed.
```
---
### Scenario 4: Critical Issues (Score: 42)
```markdown
# Test Quality Review: data-export.spec.ts
**Quality Score**: 42/100 (F - Critical Issues)
**Recommendation**: Block - Not Production Ready
## Executive Summary
CRITICAL: Test has severe quality issues that make it unsuitable for
production. Significant refactoring required.
**Critical Issues:**
❌ 12 hard waits (page.waitForTimeout) throughout
❌ No test IDs or structure
❌ Try/catch blocks swallowing errors (lines 23, 45, 67, 89)
❌ No cleanup - tests leave data in database
❌ Conditional logic (if/else) throughout tests
❌ No assertions in 3 tests (tests do nothing!)
❌ 687 lines - far too large
❌ Multiple race conditions
❌ Hardcoded credentials in plain text (SECURITY ISSUE)
**Recommendation**: BLOCK MERGE. Complete rewrite recommended following
TEA knowledge base patterns. Suggest pairing session with QA engineer.
```
---
## Integration with Other Workflows
### Before Test Review
1. **atdd** - Generates acceptance tests TEA reviews for quality
2. **dev story** - Developer implements tests TEA provides feedback
3. **automate** - Expands regression suite TEA validates new tests
### After Test Review
1. **Developer** - Addresses critical issues, improves based on recommendations
2. **gate** - Test quality feeds into release decision (high-quality tests increase confidence)
### Coordinates With
- **Story File**: Review links to acceptance criteria for context
- **Test Design**: Review validates tests align with P0/P1/P2/P3 prioritization
- **Knowledge Base**: All feedback references tea-index.csv fragments
---
## Review Scopes
### Single File Review
```bash
# Review specific test file
bmad tea *test-review
# Provide test_file_path when prompted: tests/auth/login.spec.ts
```
**Use When:**
- Reviewing tests just written
- PR review of specific test file
- Debugging flaky test
- Learning test quality patterns
---
### Directory Review
```bash
# Review all tests in directory
bmad tea *test-review
# Provide review_scope: directory
# Provide test_dir: tests/auth/
```
**Use When:**
- Feature branch has multiple test files
- Reviewing entire feature test suite
- Auditing test quality for module
---
### Suite Review
```bash
# Review entire test suite
bmad tea *test-review
# Provide review_scope: suite
```
**Use When:**
- Periodic quality audit (monthly/quarterly)
- Before major release
- Identifying patterns across codebase
- Establishing quality baseline
---
## Configuration Examples
### Strict Review (Fail on Violations)
```yaml
review_scope: 'single'
quality_score_enabled: true
strict_mode: true # Fail if score <70
check_against_knowledge: true
# All check_* flags: true
```
Use for: PR gates, production releases
---
### Balanced Review (Advisory)
```yaml
review_scope: 'single'
quality_score_enabled: true
strict_mode: false # Advisory only
check_against_knowledge: true
# All check_* flags: true
```
Use for: Most development workflows (default)
---
### Focused Review (Specific Criteria)
```yaml
review_scope: 'single'
check_hard_waits: true
check_flakiness_patterns: true
check_network_first: true
# Other checks: false
```
Use for: Debugging flaky tests, targeted improvements
---
## Important Notes
1. **Non-Prescriptive**: Review provides guidance, not rigid rules
2. **Context Matters**: Some violations may be justified (document with comments)
3. **Knowledge-Based**: All feedback grounded in proven patterns
4. **Actionable**: Every issue includes recommended fix with code example
5. **Quality Score**: Use as indicator, not absolute measure
6. **Continuous Improvement**: Review tests periodically as patterns evolve
7. **Learning Tool**: Use reviews to learn best practices, not just find bugs
---
## Knowledge Base References
This workflow automatically consults:
- **test-quality.md** - Definition of Done (no hard waits, <300 lines, <1.5 min, self-cleaning)
- **fixture-architecture.md** - Pure function Fixture mergeTests pattern
- **network-first.md** - Route intercept before navigate (race condition prevention)
- **data-factories.md** - Factory functions with overrides, API-first setup
- **test-levels-framework.md** - E2E vs API vs Component vs Unit appropriateness
- **playwright-config.md** - Environment-based configuration patterns
- **tdd-cycles.md** - Red-Green-Refactor patterns
- **selective-testing.md** - Duplicate coverage detection
- **ci-burn-in.md** - Flakiness detection patterns
- **test-priorities.md** - P0/P1/P2/P3 classification framework
- **traceability.md** - Requirements-to-tests mapping
See `tea-index.csv` for complete knowledge fragment mapping.
---
## Troubleshooting
### Problem: Quality score seems too low
**Solution:**
- Review violation breakdown - focus on critical issues first
- Consider project context - some patterns may be justified
- Check if criteria are appropriate for project type
- Score is indicator, not absolute - focus on actionable feedback
---
### Problem: No test files found
**Solution:**
- Verify test_dir path is correct
- Check test file extensions (_.spec.ts, _.test.js, etc.)
- Use glob pattern to discover: `tests/**/*.spec.ts`
---
### Problem: Knowledge fragments not loading
**Solution:**
- Verify tea-index.csv exists in testarch/ directory
- Check fragment file paths are correct in tea-index.csv
- Ensure auto_load_knowledge: true in workflow variables
---
### Problem: Too many false positives
**Solution:**
- Add justification comments in code for legitimate violations
- Adjust check\_\* flags to disable specific criteria
- Use strict_mode: false for advisory-only feedback
- Context matters - document why pattern is appropriate
---
## Related Commands
- `bmad tea *atdd` - Generate acceptance tests (review after generation)
- `bmad tea *automate` - Expand regression suite (review new tests)
- `bmad tea *gate` - Quality gate decision (test quality feeds into decision)
- `bmad dev story` - Implement story (review tests after implementation)

View File

@@ -0,0 +1,470 @@
# Test Quality Review - Validation Checklist
Use this checklist to validate that the test quality review workflow completed successfully and all quality criteria were properly evaluated.
---
## Prerequisites
### Test File Discovery
- [ ] Test file(s) identified for review (single/directory/suite scope)
- [ ] Test files exist and are readable
- [ ] Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
- [ ] Test framework configuration found (playwright.config.ts, jest.config.js, etc.)
### Knowledge Base Loading
- [ ] tea-index.csv loaded successfully
- [ ] `test-quality.md` loaded (Definition of Done)
- [ ] `fixture-architecture.md` loaded (Pure function → Fixture patterns)
- [ ] `network-first.md` loaded (Route intercept before navigate)
- [ ] `data-factories.md` loaded (Factory patterns)
- [ ] `test-levels-framework.md` loaded (E2E vs API vs Component vs Unit)
- [ ] All other enabled fragments loaded successfully
### Context Gathering
- [ ] Story file discovered or explicitly provided (if available)
- [ ] Test design document discovered or explicitly provided (if available)
- [ ] Acceptance criteria extracted from story (if available)
- [ ] Priority context (P0/P1/P2/P3) extracted from test-design (if available)
---
## Process Steps
### Step 1: Context Loading
- [ ] Review scope determined (single/directory/suite)
- [ ] Test file paths collected
- [ ] Related artifacts discovered (story, test-design)
- [ ] Knowledge base fragments loaded successfully
- [ ] Quality criteria flags read from workflow variables
### Step 2: Test File Parsing
**For Each Test File:**
- [ ] File read successfully
- [ ] File size measured (lines, KB)
- [ ] File structure parsed (describe blocks, it blocks)
- [ ] Test IDs extracted (if present)
- [ ] Priority markers extracted (if present)
- [ ] Imports analyzed
- [ ] Dependencies identified
**Test Structure Analysis:**
- [ ] Describe block count calculated
- [ ] It/test block count calculated
- [ ] BDD structure identified (Given-When-Then)
- [ ] Fixture usage detected
- [ ] Data factory usage detected
- [ ] Network interception patterns identified
- [ ] Assertions counted
- [ ] Waits and timeouts cataloged
- [ ] Conditionals (if/else) detected
- [ ] Try/catch blocks detected
- [ ] Shared state or globals detected
### Step 3: Quality Criteria Validation
**For Each Enabled Criterion:**
#### BDD Format (if `check_given_when_then: true`)
- [ ] Given-When-Then structure evaluated
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with line numbers
- [ ] Examples of good/bad patterns noted
#### Test IDs (if `check_test_ids: true`)
- [ ] Test ID presence validated
- [ ] Test ID format checked (e.g., 1.3-E2E-001)
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Missing IDs cataloged
#### Priority Markers (if `check_priority_markers: true`)
- [ ] P0/P1/P2/P3 classification validated
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Missing priorities cataloged
#### Hard Waits (if `check_hard_waits: true`)
- [ ] sleep(), waitForTimeout(), hardcoded delays detected
- [ ] Justification comments checked
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with line numbers and recommended fixes
#### Determinism (if `check_determinism: true`)
- [ ] Conditionals (if/else/switch) detected
- [ ] Try/catch abuse detected
- [ ] Random values (Math.random, Date.now) detected
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Isolation (if `check_isolation: true`)
- [ ] Cleanup hooks (afterEach/afterAll) validated
- [ ] Shared state detected
- [ ] Global variable mutations detected
- [ ] Resource cleanup verified
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Fixture Patterns (if `check_fixture_patterns: true`)
- [ ] Fixtures detected (test.extend)
- [ ] Pure functions validated
- [ ] mergeTests usage checked
- [ ] beforeEach complexity analyzed
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Data Factories (if `check_data_factories: true`)
- [ ] Factory functions detected
- [ ] Hardcoded data (magic strings/numbers) detected
- [ ] Faker.js or similar usage validated
- [ ] API-first setup pattern checked
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Network-First (if `check_network_first: true`)
- [ ] page.route() before page.goto() validated
- [ ] Race conditions detected (route after navigate)
- [ ] waitForResponse patterns checked
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Assertions (if `check_assertions: true`)
- [ ] Explicit assertions counted
- [ ] Implicit waits without assertions detected
- [ ] Assertion specificity validated
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Test Length (if `check_test_length: true`)
- [ ] File line count calculated
- [ ] Threshold comparison (≤300 lines ideal)
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Splitting recommendations generated (if >300 lines)
#### Test Duration (if `check_test_duration: true`)
- [ ] Test complexity analyzed (as proxy for duration if no execution data)
- [ ] Threshold comparison (≤1.5 min target)
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Optimization recommendations generated
#### Flakiness Patterns (if `check_flakiness_patterns: true`)
- [ ] Tight timeouts detected (e.g., { timeout: 1000 })
- [ ] Race conditions detected
- [ ] Timing-dependent assertions detected
- [ ] Retry logic detected
- [ ] Environment-dependent assumptions detected
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
---
### Step 4: Quality Score Calculation
**Violation Counting:**
- [ ] Critical (P0) violations counted
- [ ] High (P1) violations counted
- [ ] Medium (P2) violations counted
- [ ] Low (P3) violations counted
- [ ] Violation breakdown by criterion recorded
**Score Calculation:**
- [ ] Starting score: 100
- [ ] Critical violations deducted (-10 each)
- [ ] High violations deducted (-5 each)
- [ ] Medium violations deducted (-2 each)
- [ ] Low violations deducted (-1 each)
- [ ] Bonus points added (max +30):
- [ ] Excellent BDD structure (+5 if applicable)
- [ ] Comprehensive fixtures (+5 if applicable)
- [ ] Comprehensive data factories (+5 if applicable)
- [ ] Network-first pattern (+5 if applicable)
- [ ] Perfect isolation (+5 if applicable)
- [ ] All test IDs present (+5 if applicable)
- [ ] Final score calculated: max(0, min(100, Starting - Violations + Bonus))
**Quality Grade:**
- [ ] Grade assigned based on score:
- 90-100: A+ (Excellent)
- 80-89: A (Good)
- 70-79: B (Acceptable)
- 60-69: C (Needs Improvement)
- <60: F (Critical Issues)
---
### Step 5: Review Report Generation
**Report Sections Created:**
- [ ] **Header Section**:
- [ ] Test file(s) reviewed listed
- [ ] Review date recorded
- [ ] Review scope noted (single/directory/suite)
- [ ] Quality score and grade displayed
- [ ] **Executive Summary**:
- [ ] Overall assessment (Excellent/Good/Needs Improvement/Critical)
- [ ] Key strengths listed (3-5 bullet points)
- [ ] Key weaknesses listed (3-5 bullet points)
- [ ] Recommendation stated (Approve/Approve with comments/Request changes/Block)
- [ ] **Quality Criteria Assessment**:
- [ ] Table with all criteria evaluated
- [ ] Status for each criterion (PASS/WARN/FAIL)
- [ ] Violation count per criterion
- [ ] **Critical Issues (Must Fix)**:
- [ ] P0/P1 violations listed
- [ ] Code location provided for each (file:line)
- [ ] Issue explanation clear
- [ ] Recommended fix provided with code example
- [ ] Knowledge base reference provided
- [ ] **Recommendations (Should Fix)**:
- [ ] P2/P3 violations listed
- [ ] Code location provided for each (file:line)
- [ ] Issue explanation clear
- [ ] Recommended improvement provided with code example
- [ ] Knowledge base reference provided
- [ ] **Best Practices Examples** (if good patterns found):
- [ ] Good patterns highlighted from tests
- [ ] Knowledge base fragments referenced
- [ ] Examples provided for others to follow
- [ ] **Knowledge Base References**:
- [ ] All fragments consulted listed
- [ ] Links to detailed guidance provided
---
### Step 6: Optional Outputs Generation
**Inline Comments** (if `generate_inline_comments: true`):
- [ ] Inline comments generated at violation locations
- [ ] Comment format: `// TODO (TEA Review): [Issue] - See test-review-{filename}.md`
- [ ] Comments added to test files (no logic changes)
- [ ] Test files remain valid and executable
**Quality Badge** (if `generate_quality_badge: true`):
- [ ] Badge created with quality score (e.g., "Test Quality: 87/100 (A)")
- [ ] Badge format suitable for README or documentation
- [ ] Badge saved to output folder
**Story Update** (if `append_to_story: true` and story file exists):
- [ ] "Test Quality Review" section created
- [ ] Quality score included
- [ ] Critical issues summarized
- [ ] Link to full review report provided
- [ ] Story file updated successfully
---
### Step 7: Save and Notify
**Outputs Saved:**
- [ ] Review report saved to `{output_file}`
- [ ] Inline comments written to test files (if enabled)
- [ ] Quality badge saved (if enabled)
- [ ] Story file updated (if enabled)
- [ ] All outputs are valid and readable
**Summary Message Generated:**
- [ ] Quality score and grade included
- [ ] Critical issue count stated
- [ ] Recommendation provided (Approve/Request changes/Block)
- [ ] Next steps clarified
- [ ] Message displayed to user
---
## Output Validation
### Review Report Completeness
- [ ] All required sections present
- [ ] No placeholder text or TODOs in report
- [ ] All code locations are accurate (file:line)
- [ ] All code examples are valid and demonstrate fix
- [ ] All knowledge base references are correct
### Review Report Accuracy
- [ ] Quality score matches violation breakdown
- [ ] Grade matches score range
- [ ] Violations correctly categorized by severity (P0/P1/P2/P3)
- [ ] Violations correctly attributed to quality criteria
- [ ] No false positives (violations are legitimate issues)
- [ ] No false negatives (critical issues not missed)
### Review Report Clarity
- [ ] Executive summary is clear and actionable
- [ ] Issue explanations are understandable
- [ ] Recommended fixes are implementable
- [ ] Code examples are correct and runnable
- [ ] Recommendation (Approve/Request changes) is clear
---
## Quality Checks
### Knowledge-Based Validation
- [ ] All feedback grounded in knowledge base fragments
- [ ] Recommendations follow proven patterns
- [ ] No arbitrary or opinion-based feedback
- [ ] Knowledge fragment references accurate and relevant
### Actionable Feedback
- [ ] Every issue includes recommended fix
- [ ] Every fix includes code example
- [ ] Code examples demonstrate correct pattern
- [ ] Fixes reference knowledge base for more detail
### Severity Classification
- [ ] Critical (P0) issues are genuinely critical (hard waits, race conditions, no assertions)
- [ ] High (P1) issues impact maintainability/reliability (missing IDs, hardcoded data)
- [ ] Medium (P2) issues are nice-to-have improvements (long files, missing priorities)
- [ ] Low (P3) issues are minor style/preference (verbose tests)
### Context Awareness
- [ ] Review considers project context (some patterns may be justified)
- [ ] Violations with justification comments noted as acceptable
- [ ] Edge cases acknowledged
- [ ] Recommendations are pragmatic, not dogmatic
---
## Integration Points
### Story File Integration
- [ ] Story file discovered correctly (if available)
- [ ] Acceptance criteria extracted and used for context
- [ ] Test quality section appended to story (if enabled)
- [ ] Link to review report added to story
### Test Design Integration
- [ ] Test design document discovered correctly (if available)
- [ ] Priority context (P0/P1/P2/P3) extracted and used
- [ ] Review validates tests align with prioritization
- [ ] Misalignment flagged (e.g., P0 scenario missing tests)
### Knowledge Base Integration
- [ ] tea-index.csv loaded successfully
- [ ] All required fragments loaded
- [ ] Fragments applied correctly to validation
- [ ] Fragment references in report are accurate
---
## Edge Cases and Special Situations
### Empty or Minimal Tests
- [ ] If test file is empty, report notes "No tests found"
- [ ] If test file has only boilerplate, report notes "No meaningful tests"
- [ ] Score reflects lack of content appropriately
### Legacy Tests
- [ ] Legacy tests acknowledged in context
- [ ] Review provides practical recommendations for improvement
- [ ] Recognizes that complete refactor may not be feasible
- [ ] Prioritizes critical issues (flakiness) over style
### Test Framework Variations
- [ ] Review adapts to test framework (Playwright vs Jest vs Cypress)
- [ ] Framework-specific patterns recognized (e.g., Playwright fixtures)
- [ ] Framework-specific violations detected (e.g., Cypress anti-patterns)
- [ ] Knowledge fragments applied appropriately for framework
### Justified Violations
- [ ] Violations with justification comments in code noted as acceptable
- [ ] Justifications evaluated for legitimacy
- [ ] Report acknowledges justified patterns
- [ ] Score not penalized for justified violations
---
## Final Validation
### Review Completeness
- [ ] All enabled quality criteria evaluated
- [ ] All test files in scope reviewed
- [ ] All violations cataloged
- [ ] All recommendations provided
- [ ] Review report is comprehensive
### Review Accuracy
- [ ] Quality score is accurate
- [ ] Violations are correct (no false positives)
- [ ] Critical issues not missed (no false negatives)
- [ ] Code locations are correct
- [ ] Knowledge base references are accurate
### Review Usefulness
- [ ] Feedback is actionable
- [ ] Recommendations are implementable
- [ ] Code examples are correct
- [ ] Review helps developer improve tests
- [ ] Review educates on best practices
### Workflow Complete
- [ ] All checklist items completed
- [ ] All outputs validated and saved
- [ ] User notified with summary
- [ ] Review ready for developer consumption
- [ ] Follow-up actions identified (if any)
---
## Notes
Record any issues, observations, or important context during workflow execution:
- **Test Framework**: [Playwright, Jest, Cypress, etc.]
- **Review Scope**: [single file, directory, full suite]
- **Quality Score**: [0-100 score, letter grade]
- **Critical Issues**: [Count of P0/P1 violations]
- **Recommendation**: [Approve / Approve with comments / Request changes / Block]
- **Special Considerations**: [Legacy code, justified patterns, edge cases]
- **Follow-up Actions**: [Re-review after fixes, pair programming, etc.]

View File

@@ -0,0 +1,608 @@
# Test Quality Review - Instructions v4.0
**Workflow:** `testarch-test-review`
**Purpose:** Review test quality using TEA's comprehensive knowledge base and validate against best practices for maintainability, determinism, isolation, and flakiness prevention
**Agent:** Test Architect (TEA)
**Format:** Pure Markdown v4.0 (no XML blocks)
---
## Overview
This workflow performs comprehensive test quality reviews using TEA's knowledge base of best practices. It validates tests against proven patterns for fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. The review generates actionable feedback with quality scoring.
**Key Capabilities:**
- **Knowledge-Based Review**: Applies patterns from tea-index.csv fragments
- **Quality Scoring**: 0-100 score based on violations and best practices
- **Multi-Scope**: Review single file, directory, or entire test suite
- **Pattern Detection**: Identifies flaky patterns, hard waits, race conditions
- **Best Practice Validation**: BDD format, test IDs, priorities, assertions
- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
- **Integration**: Works with story files, test-design, acceptance criteria
---
## Prerequisites
**Required:**
- Test file(s) to review (auto-discovered or explicitly provided)
- Test framework configuration (playwright.config.ts, jest.config.js, etc.)
**Recommended:**
- Story file with acceptance criteria (for context)
- Test design document (for priority context)
- Knowledge base fragments available in tea-index.csv
**Halt Conditions:**
- If test file path is invalid or file doesn't exist, halt and request correction
- If test_dir is empty (no tests found), halt and notify user
---
## Workflow Steps
### Step 1: Load Context and Knowledge Base
**Actions:**
1. Load relevant knowledge fragments from `{project-root}/bmad/bmm/testarch/tea-index.csv`:
- `test-quality.md` - Definition of Done (deterministic tests, isolated with cleanup, explicit assertions, <300 lines, <1.5 min, 658 lines, 5 examples)
- `fixture-architecture.md` - Pure function Fixture mergeTests composition with auto-cleanup (406 lines, 5 examples)
- `network-first.md` - Route intercept before navigate to prevent race conditions (intercept before navigate, HAR capture, deterministic waiting, 489 lines, 5 examples)
- `data-factories.md` - Factory functions with faker: overrides, nested factories, API-first setup (498 lines, 5 examples)
- `test-levels-framework.md` - E2E vs API vs Component vs Unit appropriateness with decision matrix (467 lines, 4 examples)
- `playwright-config.md` - Environment-based configuration with fail-fast validation (722 lines, 5 examples)
- `component-tdd.md` - Red-Green-Refactor patterns with provider isolation, accessibility, visual regression (480 lines, 4 examples)
- `selective-testing.md` - Duplicate coverage detection with tag-based, spec filter, diff-based selection (727 lines, 4 examples)
- `test-healing-patterns.md` - Common failure patterns: stale selectors, race conditions, dynamic data, network errors, hard waits (648 lines, 5 examples)
- `selector-resilience.md` - Selector best practices (data-testid > ARIA > text > CSS hierarchy, anti-patterns, 541 lines, 4 examples)
- `timing-debugging.md` - Race condition prevention and async debugging techniques (370 lines, 3 examples)
- `ci-burn-in.md` - Flaky test detection with 10-iteration burn-in loop (678 lines, 4 examples)
2. Determine review scope:
- **single**: Review one test file (`test_file_path` provided)
- **directory**: Review all tests in directory (`test_dir` provided)
- **suite**: Review entire test suite (discover all test files)
3. Auto-discover related artifacts (if `auto_discover_story: true`):
- Extract test ID from filename (e.g., `1.3-E2E-001.spec.ts` → story 1.3)
- Search for story file (`story-1.3.md`)
- Search for test design (`test-design-story-1.3.md` or `test-design-epic-1.md`)
4. Read story file for context (if available):
- Extract acceptance criteria
- Extract priority classification
- Extract expected test IDs
**Output:** Complete knowledge base loaded, review scope determined, context gathered
---
### Step 2: Discover and Parse Test Files
**Actions:**
1. **Discover test files** based on scope:
- **single**: Use `test_file_path` variable
- **directory**: Use `glob` to find all test files in `test_dir` (e.g., `*.spec.ts`, `*.test.js`)
- **suite**: Use `glob` to find all test files recursively from project root
2. **Parse test file metadata**:
- File path and name
- File size (warn if >15 KB or >300 lines)
- Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
- Imports and dependencies
- Test structure (describe/context/it blocks)
3. **Extract test structure**:
- Count of describe blocks (test suites)
- Count of it/test blocks (individual tests)
- Test IDs (if present, e.g., `test.describe('1.3-E2E-001')`)
- Priority markers (if present, e.g., `test.describe.only` for P0)
- BDD structure (Given-When-Then comments or steps)
4. **Identify test patterns**:
- Fixtures used
- Data factories used
- Network interception patterns
- Assertions used (expect, assert, toHaveText, etc.)
- Waits and timeouts (page.waitFor, sleep, hardcoded delays)
- Conditionals (if/else, switch, ternary)
- Try/catch blocks
- Shared state or globals
**Output:** Complete test file inventory with structure and pattern analysis
---
### Step 3: Validate Against Quality Criteria
**Actions:**
For each test file, validate against quality criteria (configurable via workflow variables):
#### 1. BDD Format Validation (if `check_given_when_then: true`)
-**PASS**: Tests use Given-When-Then structure (comments or step organization)
- ⚠️ **WARN**: Tests have some structure but not explicit GWT
-**FAIL**: Tests lack clear structure, hard to understand intent
**Knowledge Fragment**: test-quality.md, tdd-cycles.md
---
#### 2. Test ID Conventions (if `check_test_ids: true`)
-**PASS**: Test IDs present and follow convention (e.g., `1.3-E2E-001`, `2.1-API-005`)
- ⚠️ **WARN**: Some test IDs missing or inconsistent
-**FAIL**: No test IDs, can't trace tests to requirements
**Knowledge Fragment**: traceability.md, test-quality.md
---
#### 3. Priority Markers (if `check_priority_markers: true`)
-**PASS**: Tests classified as P0/P1/P2/P3 (via markers or test-design reference)
- ⚠️ **WARN**: Some priority classifications missing
-**FAIL**: No priority classification, can't determine criticality
**Knowledge Fragment**: test-priorities.md, risk-governance.md
---
#### 4. Hard Waits Detection (if `check_hard_waits: true`)
-**PASS**: No hard waits detected (no `sleep()`, `wait(5000)`, hardcoded delays)
- ⚠️ **WARN**: Some hard waits used but with justification comments
-**FAIL**: Hard waits detected without justification (flakiness risk)
**Patterns to detect:**
- `sleep(1000)`, `setTimeout()`, `delay()`
- `page.waitForTimeout(5000)` without explicit reason
- `await new Promise(resolve => setTimeout(resolve, 3000))`
**Knowledge Fragment**: test-quality.md, network-first.md
---
#### 5. Determinism Check (if `check_determinism: true`)
-**PASS**: Tests are deterministic (no conditionals, no try/catch abuse, no random values)
- ⚠️ **WARN**: Some conditionals but with clear justification
-**FAIL**: Tests use if/else, switch, or try/catch to control flow (flakiness risk)
**Patterns to detect:**
- `if (condition) { test logic }` - tests should work deterministically
- `try { test } catch { fallback }` - tests shouldn't swallow errors
- `Math.random()`, `Date.now()` without factory abstraction
**Knowledge Fragment**: test-quality.md, data-factories.md
---
#### 6. Isolation Validation (if `check_isolation: true`)
-**PASS**: Tests clean up resources, no shared state, can run in any order
- ⚠️ **WARN**: Some cleanup missing but isolated enough
-**FAIL**: Tests share state, depend on execution order, leave resources
**Patterns to check:**
- afterEach/afterAll cleanup hooks present
- No global variables mutated
- Database/API state cleaned up after tests
- Test data deleted or marked inactive
**Knowledge Fragment**: test-quality.md, data-factories.md
---
#### 7. Fixture Patterns (if `check_fixture_patterns: true`)
-**PASS**: Uses pure function → Fixture → mergeTests pattern
- ⚠️ **WARN**: Some fixtures used but not consistently
-**FAIL**: No fixtures, tests repeat setup code (maintainability risk)
**Patterns to check:**
- Fixtures defined (e.g., `test.extend({ customFixture: async ({}, use) => { ... }})`)
- Pure functions used for fixture logic
- mergeTests used to combine fixtures
- No beforeEach with complex setup (should be in fixtures)
**Knowledge Fragment**: fixture-architecture.md
---
#### 8. Data Factories (if `check_data_factories: true`)
-**PASS**: Uses factory functions with overrides, API-first setup
- ⚠️ **WARN**: Some factories used but also hardcoded data
-**FAIL**: Hardcoded test data, magic strings/numbers (maintainability risk)
**Patterns to check:**
- Factory functions defined (e.g., `createUser()`, `generateInvoice()`)
- Factories use faker.js or similar for realistic data
- Factories accept overrides (e.g., `createUser({ email: 'custom@example.com' })`)
- API-first setup (create via API, test via UI)
**Knowledge Fragment**: data-factories.md
---
#### 9. Network-First Pattern (if `check_network_first: true`)
-**PASS**: Route interception set up BEFORE navigation (race condition prevention)
- ⚠️ **WARN**: Some routes intercepted correctly, others after navigation
-**FAIL**: Route interception after navigation (race condition risk)
**Patterns to check:**
- `page.route()` called before `page.goto()`
- `page.waitForResponse()` used with explicit URL pattern
- No navigation followed immediately by route setup
**Knowledge Fragment**: network-first.md
---
#### 10. Assertions (if `check_assertions: true`)
-**PASS**: Explicit assertions present (expect, assert, toHaveText)
- ⚠️ **WARN**: Some tests rely on implicit waits instead of assertions
-**FAIL**: Missing assertions, tests don't verify behavior
**Patterns to check:**
- Each test has at least one assertion
- Assertions are specific (not just truthy checks)
- Assertions use framework-provided matchers (toHaveText, toBeVisible)
**Knowledge Fragment**: test-quality.md
---
#### 11. Test Length (if `check_test_length: true`)
-**PASS**: Test file ≤200 lines (ideal), ≤300 lines (acceptable)
- ⚠️ **WARN**: Test file 301-500 lines (consider splitting)
-**FAIL**: Test file >500 lines (too large, maintainability risk)
**Knowledge Fragment**: test-quality.md
---
#### 12. Test Duration (if `check_test_duration: true`)
-**PASS**: Individual tests ≤1.5 minutes (target: <30 seconds)
- **WARN**: Some tests 1.5-3 minutes (consider optimization)
- **FAIL**: Tests >3 minutes (too slow, impacts CI/CD)
**Note:** Duration estimation based on complexity analysis if execution data unavailable
**Knowledge Fragment**: test-quality.md, selective-testing.md
---
#### 13. Flakiness Patterns (if `check_flakiness_patterns: true`)
-**PASS**: No known flaky patterns detected
- ⚠️ **WARN**: Some potential flaky patterns (e.g., tight timeouts, race conditions)
-**FAIL**: Multiple flaky patterns detected (high flakiness risk)
**Patterns to detect:**
- Tight timeouts (e.g., `{ timeout: 1000 }`)
- Race conditions (navigation before route interception)
- Timing-dependent assertions (e.g., checking timestamps)
- Retry logic in tests (hides flakiness)
- Environment-dependent assumptions (hardcoded URLs, ports)
**Knowledge Fragment**: test-quality.md, network-first.md, ci-burn-in.md
---
### Step 4: Calculate Quality Score
**Actions:**
1. **Count violations** by severity:
- **Critical (P0)**: Hard waits without justification, no assertions, race conditions, shared state
- **High (P1)**: Missing test IDs, no BDD structure, hardcoded data, missing fixtures
- **Medium (P2)**: Long test files (>300 lines), missing priorities, some conditionals
- **Low (P3)**: Minor style issues, incomplete cleanup, verbose tests
2. **Calculate quality score** (if `quality_score_enabled: true`):
```
Starting Score: 100
Critical Violations: -10 points each
High Violations: -5 points each
Medium Violations: -2 points each
Low Violations: -1 point each
Bonus Points:
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern: +5
+ Perfect isolation: +5
+ All test IDs present: +5
Quality Score: max(0, min(100, Starting Score - Violations + Bonus))
```
3. **Quality Grade**:
- **90-100**: Excellent (A+)
- **80-89**: Good (A)
- **70-79**: Acceptable (B)
- **60-69**: Needs Improvement (C)
- **<60**: Critical Issues (F)
**Output:** Quality score calculated with violation breakdown
---
### Step 5: Generate Review Report
**Actions:**
1. **Create review report** using `test-review-template.md`:
**Header Section:**
- Test file(s) reviewed
- Review date
- Review scope (single/directory/suite)
- Quality score and grade
**Executive Summary:**
- Overall assessment (Excellent/Good/Needs Improvement/Critical)
- Key strengths
- Key weaknesses
- Recommendation (Approve/Approve with comments/Request changes)
**Quality Criteria Assessment:**
- Table with all criteria evaluated
- Status for each (PASS/WARN/FAIL)
- Violation count per criterion
**Critical Issues (Must Fix):**
- Priority P0/P1 violations
- Code location (file:line)
- Explanation of issue
- Recommended fix
- Knowledge base reference
**Recommendations (Should Fix):**
- Priority P2/P3 violations
- Code location (file:line)
- Explanation of issue
- Recommended improvement
- Knowledge base reference
**Best Practices Examples:**
- Highlight good patterns found in tests
- Reference knowledge base fragments
- Provide examples for others to follow
**Knowledge Base References:**
- List all fragments consulted
- Provide links to detailed guidance
2. **Generate inline comments** (if `generate_inline_comments: true`):
- Add TODO comments in test files at violation locations
- Format: `// TODO (TEA Review): [Issue description] - See test-review-{filename}.md`
- Never modify test logic, only add comments
3. **Generate quality badge** (if `generate_quality_badge: true`):
- Create badge with quality score (e.g., "Test Quality: 87/100 (A)")
- Format for inclusion in README or documentation
4. **Append to story file** (if `append_to_story: true` and story file exists):
- Add "Test Quality Review" section to story
- Include quality score and critical issues
- Link to full review report
**Output:** Comprehensive review report with actionable feedback
---
### Step 6: Save Outputs and Notify
**Actions:**
1. **Save review report** to `{output_file}`
2. **Save inline comments** to test files (if enabled)
3. **Save quality badge** to output folder (if enabled)
4. **Update story file** (if enabled)
5. **Generate summary message** for user:
- Quality score and grade
- Critical issue count
- Recommendation
**Output:** All review artifacts saved and user notified
---
## Quality Criteria Decision Matrix
| Criterion | PASS | WARN | FAIL | Knowledge Fragment |
| ------------------ | ------------------------- | -------------- | ------------------- | ----------------------- |
| BDD Format | Given-When-Then present | Some structure | No structure | test-quality.md |
| Test IDs | All tests have IDs | Some missing | No IDs | traceability.md |
| Priority Markers | All classified | Some missing | No classification | test-priorities.md |
| Hard Waits | No hard waits | Some justified | Hard waits present | test-quality.md |
| Determinism | No conditionals/random | Some justified | Conditionals/random | test-quality.md |
| Isolation | Clean up, no shared state | Some gaps | Shared state | test-quality.md |
| Fixture Patterns | Pure fn Fixture | Some fixtures | No fixtures | fixture-architecture.md |
| Data Factories | Factory functions | Some factories | Hardcoded data | data-factories.md |
| Network-First | Intercept before navigate | Some correct | Race conditions | network-first.md |
| Assertions | Explicit assertions | Some implicit | Missing assertions | test-quality.md |
| Test Length | 300 lines | 301-500 lines | >500 lines | test-quality.md |
| Test Duration | ≤1.5 min | 1.5-3 min | >3 min | test-quality.md |
| Flakiness Patterns | No flaky patterns | Some potential | Multiple patterns | ci-burn-in.md |
---
## Example Review Summary
````markdown
# Test Quality Review: auth-login.spec.ts
**Quality Score**: 78/100 (B - Acceptable)
**Review Date**: 2025-10-14
**Recommendation**: Approve with Comments
## Executive Summary
Overall, the test demonstrates good structure and coverage of the login flow. However, there are several areas for improvement to enhance maintainability and prevent flakiness.
**Strengths:**
- Excellent BDD structure with clear Given-When-Then comments
- Good use of test IDs (1.3-E2E-001, 1.3-E2E-002)
- Comprehensive assertions on authentication state
**Weaknesses:**
- Hard wait detected (page.waitForTimeout(2000)) - flakiness risk
- Hardcoded test data (email: 'test@example.com') - use factories instead
- Missing fixture for common login setup - DRY violation
**Recommendation**: Address critical issue (hard wait) before merging. Other improvements can be addressed in follow-up PR.
## Critical Issues (Must Fix)
### 1. Hard Wait Detected (Line 45)
**Severity**: P0 (Critical)
**Issue**: `await page.waitForTimeout(2000)` introduces flakiness
**Fix**: Use explicit wait for element or network request instead
**Knowledge**: See test-quality.md, network-first.md
```typescript
// ❌ Bad (current)
await page.waitForTimeout(2000);
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
// ✅ Good (recommended)
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
```
````
## Recommendations (Should Fix)
### 1. Use Data Factory for Test User (Lines 23, 32, 41)
**Severity**: P1 (High)
**Issue**: Hardcoded email 'test@example.com' - maintainability risk
**Fix**: Create factory function for test users
**Knowledge**: See data-factories.md
```typescript
// ✅ Good (recommended)
import { createTestUser } from './factories/user-factory';
const testUser = createTestUser({ role: 'admin' });
await loginPage.login(testUser.email, testUser.password);
```
### 2. Extract Login Setup to Fixture (Lines 18-28)
**Severity**: P1 (High)
**Issue**: Login setup repeated across tests - DRY violation
**Fix**: Create fixture for authenticated state
**Knowledge**: See fixture-architecture.md
```typescript
// ✅ Good (recommended)
const test = base.extend({
authenticatedPage: async ({ page }, use) => {
const user = createTestUser();
await loginPage.login(user.email, user.password);
await use(page);
},
});
test('user can access dashboard', async ({ authenticatedPage }) => {
// Test starts already logged in
});
```
## Quality Score Breakdown
- Starting Score: 100
- Critical Violations (1 × -10): -10
- High Violations (2 × -5): -10
- Medium Violations (0 × -2): 0
- Low Violations (1 × -1): -1
- Bonus (BDD +5, Test IDs +5): +10
- **Final Score**: 78/100 (B)
```
---
## Integration with Other Workflows
### Before Test Review
- **atdd**: Generate acceptance tests (TEA reviews them for quality)
- **automate**: Expand regression suite (TEA reviews new tests)
- **dev story**: Developer writes implementation tests (TEA reviews them)
### After Test Review
- **Developer**: Addresses critical issues, improves based on recommendations
- **gate**: Test quality review feeds into gate decision (high-quality tests increase confidence)
### Coordinates With
- **Story File**: Review links to acceptance criteria context
- **Test Design**: Review validates tests align with prioritization
- **Knowledge Base**: Review references fragments for detailed guidance
---
## Important Notes
1. **Non-Prescriptive**: Review provides guidance, not rigid rules
2. **Context Matters**: Some violations may be justified for specific scenarios
3. **Knowledge-Based**: All feedback grounded in proven patterns from tea-index.csv
4. **Actionable**: Every issue includes recommended fix with code examples
5. **Quality Score**: Use as indicator, not absolute measure
6. **Continuous Improvement**: Review same tests periodically as patterns evolve
---
## Troubleshooting
**Problem: No test files found**
- Verify test_dir path is correct
- Check test file extensions match glob pattern
- Ensure test files exist in expected location
**Problem: Quality score seems too low/high**
- Review violation counts - may need to adjust thresholds
- Consider context - some projects have different standards
- Focus on critical issues first, not just score
**Problem: Inline comments not generated**
- Check generate_inline_comments: true in variables
- Verify write permissions on test files
- Review append_to_file: false (separate report mode)
**Problem: Knowledge fragments not loading**
- Verify tea-index.csv exists in testarch/ directory
- Check fragment file paths are correct
- Ensure auto_load_knowledge: true in variables
```

View File

@@ -0,0 +1,388 @@
# Test Quality Review: {test_filename}
**Quality Score**: {score}/100 ({grade} - {assessment})
**Review Date**: {YYYY-MM-DD}
**Review Scope**: {single | directory | suite}
**Reviewer**: {user_name or TEA Agent}
---
## Executive Summary
**Overall Assessment**: {Excellent | Good | Acceptable | Needs Improvement | Critical Issues}
**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
### Key Strengths
✅ {strength_1}
✅ {strength_2}
✅ {strength_3}
### Key Weaknesses
❌ {weakness_1}
❌ {weakness_2}
❌ {weakness_3}
### Summary
{1-2 paragraph summary of overall test quality, highlighting major findings and recommendation rationale}
---
## Quality Criteria Assessment
| Criterion | Status | Violations | Notes |
| ------------------------------------ | ------------------------------- | ---------- | ------------ |
| BDD Format (Given-When-Then) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Test IDs | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Priority Markers (P0/P1/P2/P3) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Hard Waits (sleep, waitForTimeout) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Determinism (no conditionals) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Isolation (cleanup, no shared state) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Fixture Patterns | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Data Factories | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Network-First Pattern | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Explicit Assertions | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Test Length (≤300 lines) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {lines} | {brief_note} |
| Test Duration (≤1.5 min) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {duration} | {brief_note} |
| Flakiness Patterns | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
**Total Violations**: {critical_count} Critical, {high_count} High, {medium_count} Medium, {low_count} Low
---
## Quality Score Breakdown
```
Starting Score: 100
Critical Violations: -{critical_count} × 10 = -{critical_deduction}
High Violations: -{high_count} × 5 = -{high_deduction}
Medium Violations: -{medium_count} × 2 = -{medium_deduction}
Low Violations: -{low_count} × 1 = -{low_deduction}
Bonus Points:
Excellent BDD: +{0|5}
Comprehensive Fixtures: +{0|5}
Data Factories: +{0|5}
Network-First: +{0|5}
Perfect Isolation: +{0|5}
All Test IDs: +{0|5}
--------
Total Bonus: +{bonus_total}
Final Score: {final_score}/100
Grade: {grade}
```
---
## Critical Issues (Must Fix)
{If no critical issues: "No critical issues detected. ✅"}
{For each critical issue:}
### {issue_number}. {Issue Title}
**Severity**: P0 (Critical)
**Location**: `{filename}:{line_number}`
**Criterion**: {criterion_name}
**Knowledge Base**: [{fragment_name}]({fragment_path})
**Issue Description**:
{Detailed explanation of what the problem is and why it's critical}
**Current Code**:
```typescript
// ❌ Bad (current implementation)
{
code_snippet_showing_problem;
}
```
**Recommended Fix**:
```typescript
// ✅ Good (recommended approach)
{
code_snippet_showing_solution;
}
```
**Why This Matters**:
{Explanation of impact - flakiness risk, maintainability, reliability}
**Related Violations**:
{If similar issue appears elsewhere, note line numbers}
---
## Recommendations (Should Fix)
{If no recommendations: "No additional recommendations. Test quality is excellent. ✅"}
{For each recommendation:}
### {rec_number}. {Recommendation Title}
**Severity**: {P1 (High) | P2 (Medium) | P3 (Low)}
**Location**: `{filename}:{line_number}`
**Criterion**: {criterion_name}
**Knowledge Base**: [{fragment_name}]({fragment_path})
**Issue Description**:
{Detailed explanation of what could be improved and why}
**Current Code**:
```typescript
// ⚠️ Could be improved (current implementation)
{
code_snippet_showing_current_approach;
}
```
**Recommended Improvement**:
```typescript
// ✅ Better approach (recommended)
{
code_snippet_showing_improvement;
}
```
**Benefits**:
{Explanation of benefits - maintainability, readability, reusability}
**Priority**:
{Why this is P1/P2/P3 - urgency and impact}
---
## Best Practices Found
{If good patterns found, highlight them}
{For each best practice:}
### {practice_number}. {Best Practice Title}
**Location**: `{filename}:{line_number}`
**Pattern**: {pattern_name}
**Knowledge Base**: [{fragment_name}]({fragment_path})
**Why This Is Good**:
{Explanation of why this pattern is excellent}
**Code Example**:
```typescript
// ✅ Excellent pattern demonstrated in this test
{
code_snippet_showing_best_practice;
}
```
**Use as Reference**:
{Encourage using this pattern in other tests}
---
## Test File Analysis
### File Metadata
- **File Path**: `{relative_path_from_project_root}`
- **File Size**: {line_count} lines, {kb_size} KB
- **Test Framework**: {Playwright | Jest | Cypress | Vitest | Other}
- **Language**: {TypeScript | JavaScript}
### Test Structure
- **Describe Blocks**: {describe_count}
- **Test Cases (it/test)**: {test_count}
- **Average Test Length**: {avg_lines_per_test} lines per test
- **Fixtures Used**: {fixture_count} ({fixture_names})
- **Data Factories Used**: {factory_count} ({factory_names})
### Test Coverage Scope
- **Test IDs**: {test_id_list}
- **Priority Distribution**:
- P0 (Critical): {p0_count} tests
- P1 (High): {p1_count} tests
- P2 (Medium): {p2_count} tests
- P3 (Low): {p3_count} tests
- Unknown: {unknown_count} tests
### Assertions Analysis
- **Total Assertions**: {assertion_count}
- **Assertions per Test**: {avg_assertions_per_test} (avg)
- **Assertion Types**: {assertion_types_used}
---
## Context and Integration
### Related Artifacts
{If story file found:}
- **Story File**: [{story_filename}]({story_path})
- **Acceptance Criteria Mapped**: {ac_mapped}/{ac_total} ({ac_coverage}%)
{If test-design found:}
- **Test Design**: [{test_design_filename}]({test_design_path})
- **Risk Assessment**: {risk_level}
- **Priority Framework**: P0-P3 applied
### Acceptance Criteria Validation
{If story file available, map tests to ACs:}
| Acceptance Criterion | Test ID | Status | Notes |
| -------------------- | --------- | -------------------------- | ------- |
| {AC_1} | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
| {AC_2} | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
| {AC_3} | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
**Coverage**: {covered_count}/{total_count} criteria covered ({coverage_percentage}%)
---
## Knowledge Base References
This review consulted the following knowledge base fragments:
- **[test-quality.md](../../../testarch/knowledge/test-quality.md)** - Definition of Done for tests (no hard waits, <300 lines, <1.5 min, self-cleaning)
- **[fixture-architecture.md](../../../testarch/knowledge/fixture-architecture.md)** - Pure function Fixture mergeTests pattern
- **[network-first.md](../../../testarch/knowledge/network-first.md)** - Route intercept before navigate (race condition prevention)
- **[data-factories.md](../../../testarch/knowledge/data-factories.md)** - Factory functions with overrides, API-first setup
- **[test-levels-framework.md](../../../testarch/knowledge/test-levels-framework.md)** - E2E vs API vs Component vs Unit appropriateness
- **[tdd-cycles.md](../../../testarch/knowledge/tdd-cycles.md)** - Red-Green-Refactor patterns
- **[selective-testing.md](../../../testarch/knowledge/selective-testing.md)** - Duplicate coverage detection
- **[ci-burn-in.md](../../../testarch/knowledge/ci-burn-in.md)** - Flakiness detection patterns (10-iteration loop)
- **[test-priorities.md](../../../testarch/knowledge/test-priorities.md)** - P0/P1/P2/P3 classification framework
- **[traceability.md](../../../testarch/knowledge/traceability.md)** - Requirements-to-tests mapping
See [tea-index.csv](../../../testarch/tea-index.csv) for complete knowledge base.
---
## Next Steps
### Immediate Actions (Before Merge)
1. **{action_1}** - {description}
- Priority: {P0 | P1 | P2}
- Owner: {team_or_person}
- Estimated Effort: {time_estimate}
2. **{action_2}** - {description}
- Priority: {P0 | P1 | P2}
- Owner: {team_or_person}
- Estimated Effort: {time_estimate}
### Follow-up Actions (Future PRs)
1. **{action_1}** - {description}
- Priority: {P2 | P3}
- Target: {next_sprint | backlog}
2. **{action_2}** - {description}
- Priority: {P2 | P3}
- Target: {next_sprint | backlog}
### Re-Review Needed?
{✅ No re-review needed - approve as-is}
{⚠ Re-review after critical fixes - request changes, then re-review}
{❌ Major refactor required - block merge, pair programming recommended}
---
## Decision
**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
**Rationale**:
{1-2 paragraph explanation of recommendation based on findings}
**For Approve**:
> Test quality is excellent/good with {score}/100 score. {Minor issues noted can be addressed in follow-up PRs.} Tests are production-ready and follow best practices.
**For Approve with Comments**:
> Test quality is acceptable with {score}/100 score. {High-priority recommendations should be addressed but don't block merge.} Critical issues resolved, but improvements would enhance maintainability.
**For Request Changes**:
> Test quality needs improvement with {score}/100 score. {Critical issues must be fixed before merge.} {X} critical violations detected that pose flakiness/maintainability risks.
**For Block**:
> Test quality is insufficient with {score}/100 score. {Multiple critical issues make tests unsuitable for production.} Recommend pairing session with QA engineer to apply patterns from knowledge base.
---
## Appendix
### Violation Summary by Location
{Table of all violations sorted by line number:}
| Line | Severity | Criterion | Issue | Fix |
| ------ | ------------- | ----------- | ------------- | ----------- |
| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
### Quality Trends
{If reviewing same file multiple times, show trend:}
| Review Date | Score | Grade | Critical Issues | Trend |
| ------------ | ------------- | --------- | --------------- | ----------- |
| {YYYY-MM-DD} | {score_1}/100 | {grade_1} | {count_1} | Improved |
| {YYYY-MM-DD} | {score_2}/100 | {grade_2} | {count_2} | Declined |
| {YYYY-MM-DD} | {score_3}/100 | {grade_3} | {count_3} | Stable |
### Related Reviews
{If reviewing multiple files in directory/suite:}
| File | Score | Grade | Critical | Status |
| -------- | ----------- | ------- | -------- | ------------------ |
| {file_1} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
| {file_2} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
| {file_3} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
**Suite Average**: {avg_score}/100 ({avg_grade})
---
## Review Metadata
**Generated By**: BMad TEA Agent (Test Architect)
**Workflow**: testarch-test-review v4.0
**Review ID**: test-review-{filename}-{YYYYMMDD}
**Timestamp**: {YYYY-MM-DD HH:MM:SS}
**Version**: 1.0
---
## Feedback on This Review
If you have questions or feedback on this review:
1. Review patterns in knowledge base: `testarch/knowledge/`
2. Consult tea-index.csv for detailed guidance
3. Request clarification on specific violations
4. Pair with QA engineer to apply patterns
This review is guidance, not rigid rules. Context matters - if a pattern is justified, document it with a comment.

View File

@@ -0,0 +1,53 @@
# Test Architect workflow: test-review
name: testarch-test-review
description: "Review test quality using comprehensive knowledge base and best practices validation"
author: "BMad"
# Critical variables from config
config_source: "{project-root}/bmad/bmm/config.yaml"
output_folder: "{config_source}:output_folder"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
document_output_language: "{config_source}:document_output_language"
date: system-generated
# Workflow components
installed_path: "{project-root}/bmad/bmm/workflows/testarch/test-review"
instructions: "{installed_path}/instructions.md"
validation: "{installed_path}/checklist.md"
template: "{installed_path}/test-review-template.md"
# Variables and inputs
variables:
test_dir: "{project-root}/tests" # Root test directory
review_scope: "single" # single (one file), directory (folder), suite (all tests)
# Output configuration
default_output_file: "{output_folder}/test-review.md"
# Required tools
required_tools:
- read_file # Read test files, story, test-design
- write_file # Create review report
- list_files # Discover test files in directory
- search_repo # Find tests by patterns
- glob # Find test files matching patterns
# Recommended inputs
recommended_inputs:
- test_file: "Test file to review (single file mode)"
- test_dir: "Directory of tests to review (directory mode)"
- story: "Related story for acceptance criteria context (optional)"
- test_design: "Test design for priority context (optional)"
tags:
- qa
- test-architect
- code-review
- quality
- best-practices
execution_hints:
interactive: false # Minimize prompts
autonomous: true # Proceed without user input unless blocked
iterative: true # Can review multiple files