Files

2025-11-01 19:22:39 +08:00

15 KiB

Raw Blame History

Test Quality Review - Validation Checklist

Use this checklist to validate that the test quality review workflow completed successfully and all quality criteria were properly evaluated.

Prerequisites

Test File Discovery

Test file(s) identified for review (single/directory/suite scope)
Test files exist and are readable
Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
Test framework configuration found (playwright.config.ts, jest.config.js, etc.)

Knowledge Base Loading

tea-index.csv loaded successfully
test-quality.md loaded (Definition of Done)
fixture-architecture.md loaded (Pure function → Fixture patterns)
network-first.md loaded (Route intercept before navigate)
data-factories.md loaded (Factory patterns)
test-levels-framework.md loaded (E2E vs API vs Component vs Unit)
All other enabled fragments loaded successfully

Context Gathering

Story file discovered or explicitly provided (if available)
Test design document discovered or explicitly provided (if available)
Acceptance criteria extracted from story (if available)
Priority context (P0/P1/P2/P3) extracted from test-design (if available)

Process Steps

Step 1: Context Loading

Review scope determined (single/directory/suite)
Test file paths collected
Related artifacts discovered (story, test-design)
Knowledge base fragments loaded successfully
Quality criteria flags read from workflow variables

Step 2: Test File Parsing

For Each Test File:

File read successfully
File size measured (lines, KB)
File structure parsed (describe blocks, it blocks)
Test IDs extracted (if present)
Priority markers extracted (if present)
Imports analyzed
Dependencies identified

Test Structure Analysis:

Describe block count calculated
It/test block count calculated
BDD structure identified (Given-When-Then)
Fixture usage detected
Data factory usage detected
Network interception patterns identified
Assertions counted
Waits and timeouts cataloged
Conditionals (if/else) detected
Try/catch blocks detected
Shared state or globals detected

Step 3: Quality Criteria Validation

For Each Enabled Criterion:

BDD Format (if `check_given_when_then: true`)

Given-When-Then structure evaluated
Status assigned (PASS/WARN/FAIL)
Violations recorded with line numbers
Examples of good/bad patterns noted

Test IDs (if `check_test_ids: true`)

Test ID presence validated
Test ID format checked (e.g., 1.3-E2E-001)
Status assigned (PASS/WARN/FAIL)
Missing IDs cataloged

Priority Markers (if `check_priority_markers: true`)

P0/P1/P2/P3 classification validated
Status assigned (PASS/WARN/FAIL)
Missing priorities cataloged

Hard Waits (if `check_hard_waits: true`)

sleep(), waitForTimeout(), hardcoded delays detected
Justification comments checked
Status assigned (PASS/WARN/FAIL)
Violations recorded with line numbers and recommended fixes

Determinism (if `check_determinism: true`)

Conditionals (if/else/switch) detected
Try/catch abuse detected
Random values (Math.random, Date.now) detected
Status assigned (PASS/WARN/FAIL)
Violations recorded with recommended fixes

Isolation (if `check_isolation: true`)

Cleanup hooks (afterEach/afterAll) validated
Shared state detected
Global variable mutations detected
Resource cleanup verified
Status assigned (PASS/WARN/FAIL)
Violations recorded with recommended fixes

Fixture Patterns (if `check_fixture_patterns: true`)

Fixtures detected (test.extend)
Pure functions validated
mergeTests usage checked
beforeEach complexity analyzed
Status assigned (PASS/WARN/FAIL)
Violations recorded with recommended fixes

Data Factories (if `check_data_factories: true`)

Factory functions detected
Hardcoded data (magic strings/numbers) detected
Faker.js or similar usage validated
API-first setup pattern checked
Status assigned (PASS/WARN/FAIL)
Violations recorded with recommended fixes

Network-First (if `check_network_first: true`)

page.route() before page.goto() validated
Race conditions detected (route after navigate)
waitForResponse patterns checked
Status assigned (PASS/WARN/FAIL)
Violations recorded with recommended fixes

Assertions (if `check_assertions: true`)

Explicit assertions counted
Implicit waits without assertions detected
Assertion specificity validated
Status assigned (PASS/WARN/FAIL)
Violations recorded with recommended fixes

Test Length (if `check_test_length: true`)

File line count calculated
Threshold comparison (≤300 lines ideal)
Status assigned (PASS/WARN/FAIL)
Splitting recommendations generated (if >300 lines)

Test Duration (if `check_test_duration: true`)

Test complexity analyzed (as proxy for duration if no execution data)
Threshold comparison (≤1.5 min target)
Status assigned (PASS/WARN/FAIL)
Optimization recommendations generated

Flakiness Patterns (if `check_flakiness_patterns: true`)

Tight timeouts detected (e.g., { timeout: 1000 })
Race conditions detected
Timing-dependent assertions detected
Retry logic detected
Environment-dependent assumptions detected
Status assigned (PASS/WARN/FAIL)
Violations recorded with recommended fixes

Step 4: Quality Score Calculation

Violation Counting:

Critical (P0) violations counted
High (P1) violations counted
Medium (P2) violations counted
Low (P3) violations counted
Violation breakdown by criterion recorded

Score Calculation:

Starting score: 100
Critical violations deducted (-10 each)
High violations deducted (-5 each)
Medium violations deducted (-2 each)
Low violations deducted (-1 each)
Bonus points added (max +30):
- Excellent BDD structure (+5 if applicable)
- Comprehensive fixtures (+5 if applicable)
- Comprehensive data factories (+5 if applicable)
- Network-first pattern (+5 if applicable)
- Perfect isolation (+5 if applicable)
- All test IDs present (+5 if applicable)
Final score calculated: max(0, min(100, Starting - Violations + Bonus))

Quality Grade:

Grade assigned based on score:
- 90-100: A+ (Excellent)
- 80-89: A (Good)
- 70-79: B (Acceptable)
- 60-69: C (Needs Improvement)
- <60: F (Critical Issues)

Step 5: Review Report Generation

Report Sections Created:

Header Section:
- Test file(s) reviewed listed
- Review date recorded
- Review scope noted (single/directory/suite)
- Quality score and grade displayed
Executive Summary:
- Overall assessment (Excellent/Good/Needs Improvement/Critical)
- Key strengths listed (3-5 bullet points)
- Key weaknesses listed (3-5 bullet points)
- Recommendation stated (Approve/Approve with comments/Request changes/Block)
Quality Criteria Assessment:
- Table with all criteria evaluated
- Status for each criterion (PASS/WARN/FAIL)
- Violation count per criterion
Critical Issues (Must Fix):
- P0/P1 violations listed
- Code location provided for each (file:line)
- Issue explanation clear
- Recommended fix provided with code example
- Knowledge base reference provided
Recommendations (Should Fix):
- P2/P3 violations listed
- Code location provided for each (file:line)
- Issue explanation clear
- Recommended improvement provided with code example
- Knowledge base reference provided
Best Practices Examples (if good patterns found):
- Good patterns highlighted from tests
- Knowledge base fragments referenced
- Examples provided for others to follow
Knowledge Base References:
- All fragments consulted listed
- Links to detailed guidance provided

Step 6: Optional Outputs Generation

Inline Comments (if generate_inline_comments: true):

Inline comments generated at violation locations
Comment format: // TODO (TEA Review): [Issue] - See test-review-{filename}.md
Comments added to test files (no logic changes)
Test files remain valid and executable

Quality Badge (if generate_quality_badge: true):

Badge created with quality score (e.g., "Test Quality: 87/100 (A)")
Badge format suitable for README or documentation
Badge saved to output folder

Story Update (if append_to_story: true and story file exists):

"Test Quality Review" section created
Quality score included
Critical issues summarized
Link to full review report provided
Story file updated successfully

Step 7: Save and Notify

Outputs Saved:

Review report saved to {output_file}
Inline comments written to test files (if enabled)
Quality badge saved (if enabled)
Story file updated (if enabled)
All outputs are valid and readable

Summary Message Generated:

Quality score and grade included
Critical issue count stated
Recommendation provided (Approve/Request changes/Block)
Next steps clarified
Message displayed to user

Output Validation

Review Report Completeness

All required sections present
No placeholder text or TODOs in report
All code locations are accurate (file:line)
All code examples are valid and demonstrate fix
All knowledge base references are correct

Review Report Accuracy

Quality score matches violation breakdown
Grade matches score range
Violations correctly categorized by severity (P0/P1/P2/P3)
Violations correctly attributed to quality criteria
No false positives (violations are legitimate issues)
No false negatives (critical issues not missed)

Review Report Clarity

Executive summary is clear and actionable
Issue explanations are understandable
Recommended fixes are implementable
Code examples are correct and runnable
Recommendation (Approve/Request changes) is clear

Quality Checks

Knowledge-Based Validation

All feedback grounded in knowledge base fragments
Recommendations follow proven patterns
No arbitrary or opinion-based feedback
Knowledge fragment references accurate and relevant

Actionable Feedback

Every issue includes recommended fix
Every fix includes code example
Code examples demonstrate correct pattern
Fixes reference knowledge base for more detail

Severity Classification

Critical (P0) issues are genuinely critical (hard waits, race conditions, no assertions)
High (P1) issues impact maintainability/reliability (missing IDs, hardcoded data)
Medium (P2) issues are nice-to-have improvements (long files, missing priorities)
Low (P3) issues are minor style/preference (verbose tests)

Context Awareness

Review considers project context (some patterns may be justified)
Violations with justification comments noted as acceptable
Edge cases acknowledged
Recommendations are pragmatic, not dogmatic

Integration Points

Story File Integration

Story file discovered correctly (if available)
Acceptance criteria extracted and used for context
Test quality section appended to story (if enabled)
Link to review report added to story

Test Design Integration

Test design document discovered correctly (if available)
Priority context (P0/P1/P2/P3) extracted and used
Review validates tests align with prioritization
Misalignment flagged (e.g., P0 scenario missing tests)

Knowledge Base Integration

tea-index.csv loaded successfully
All required fragments loaded
Fragments applied correctly to validation
Fragment references in report are accurate

Edge Cases and Special Situations

Empty or Minimal Tests

If test file is empty, report notes "No tests found"
If test file has only boilerplate, report notes "No meaningful tests"
Score reflects lack of content appropriately

Legacy Tests

Legacy tests acknowledged in context
Review provides practical recommendations for improvement
Recognizes that complete refactor may not be feasible
Prioritizes critical issues (flakiness) over style

Test Framework Variations

Review adapts to test framework (Playwright vs Jest vs Cypress)
Framework-specific patterns recognized (e.g., Playwright fixtures)
Framework-specific violations detected (e.g., Cypress anti-patterns)
Knowledge fragments applied appropriately for framework

Justified Violations

Violations with justification comments in code noted as acceptable
Justifications evaluated for legitimacy
Report acknowledges justified patterns
Score not penalized for justified violations

Final Validation

Review Completeness

All enabled quality criteria evaluated
All test files in scope reviewed
All violations cataloged
All recommendations provided
Review report is comprehensive

Review Accuracy

Quality score is accurate
Violations are correct (no false positives)
Critical issues not missed (no false negatives)
Code locations are correct
Knowledge base references are accurate

Review Usefulness

Feedback is actionable
Recommendations are implementable
Code examples are correct
Review helps developer improve tests
Review educates on best practices

Workflow Complete

All checklist items completed
All outputs validated and saved
User notified with summary
Review ready for developer consumption
Follow-up actions identified (if any)

Notes

Record any issues, observations, or important context during workflow execution:

Test Framework: [Playwright, Jest, Cypress, etc.]
Review Scope: [single file, directory, full suite]
Quality Score: [0-100 score, letter grade]
Critical Issues: [Count of P0/P1 violations]
Recommendation: [Approve / Approve with comments / Request changes / Block]
Special Considerations: [Legacy code, justified patterns, edge cases]
Follow-up Actions: [Re-review after fixes, pair programming, etc.]

15 KiB Raw Blame History