# Test Quality Review - Validation Checklist Use this checklist to validate that the test quality review workflow completed successfully and all quality criteria were properly evaluated. --- ## Prerequisites ### Test File Discovery - [ ] Test file(s) identified for review (single/directory/suite scope) - [ ] Test files exist and are readable - [ ] Test framework detected (Playwright, Jest, Cypress, Vitest, etc.) - [ ] Test framework configuration found (playwright.config.ts, jest.config.js, etc.) ### Knowledge Base Loading - [ ] tea-index.csv loaded successfully - [ ] `test-quality.md` loaded (Definition of Done) - [ ] `fixture-architecture.md` loaded (Pure function → Fixture patterns) - [ ] `network-first.md` loaded (Route intercept before navigate) - [ ] `data-factories.md` loaded (Factory patterns) - [ ] `test-levels-framework.md` loaded (E2E vs API vs Component vs Unit) - [ ] All other enabled fragments loaded successfully ### Context Gathering - [ ] Story file discovered or explicitly provided (if available) - [ ] Test design document discovered or explicitly provided (if available) - [ ] Acceptance criteria extracted from story (if available) - [ ] Priority context (P0/P1/P2/P3) extracted from test-design (if available) --- ## Process Steps ### Step 1: Context Loading - [ ] Review scope determined (single/directory/suite) - [ ] Test file paths collected - [ ] Related artifacts discovered (story, test-design) - [ ] Knowledge base fragments loaded successfully - [ ] Quality criteria flags read from workflow variables ### Step 2: Test File Parsing **For Each Test File:** - [ ] File read successfully - [ ] File size measured (lines, KB) - [ ] File structure parsed (describe blocks, it blocks) - [ ] Test IDs extracted (if present) - [ ] Priority markers extracted (if present) - [ ] Imports analyzed - [ ] Dependencies identified **Test Structure Analysis:** - [ ] Describe block count calculated - [ ] It/test block count calculated - [ ] BDD structure identified (Given-When-Then) - [ ] Fixture usage detected - [ ] Data factory usage detected - [ ] Network interception patterns identified - [ ] Assertions counted - [ ] Waits and timeouts cataloged - [ ] Conditionals (if/else) detected - [ ] Try/catch blocks detected - [ ] Shared state or globals detected ### Step 3: Quality Criteria Validation **For Each Enabled Criterion:** #### BDD Format (if `check_given_when_then: true`) - [ ] Given-When-Then structure evaluated - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Violations recorded with line numbers - [ ] Examples of good/bad patterns noted #### Test IDs (if `check_test_ids: true`) - [ ] Test ID presence validated - [ ] Test ID format checked (e.g., 1.3-E2E-001) - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Missing IDs cataloged #### Priority Markers (if `check_priority_markers: true`) - [ ] P0/P1/P2/P3 classification validated - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Missing priorities cataloged #### Hard Waits (if `check_hard_waits: true`) - [ ] sleep(), waitForTimeout(), hardcoded delays detected - [ ] Justification comments checked - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Violations recorded with line numbers and recommended fixes #### Determinism (if `check_determinism: true`) - [ ] Conditionals (if/else/switch) detected - [ ] Try/catch abuse detected - [ ] Random values (Math.random, Date.now) detected - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Violations recorded with recommended fixes #### Isolation (if `check_isolation: true`) - [ ] Cleanup hooks (afterEach/afterAll) validated - [ ] Shared state detected - [ ] Global variable mutations detected - [ ] Resource cleanup verified - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Violations recorded with recommended fixes #### Fixture Patterns (if `check_fixture_patterns: true`) - [ ] Fixtures detected (test.extend) - [ ] Pure functions validated - [ ] mergeTests usage checked - [ ] beforeEach complexity analyzed - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Violations recorded with recommended fixes #### Data Factories (if `check_data_factories: true`) - [ ] Factory functions detected - [ ] Hardcoded data (magic strings/numbers) detected - [ ] Faker.js or similar usage validated - [ ] API-first setup pattern checked - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Violations recorded with recommended fixes #### Network-First (if `check_network_first: true`) - [ ] page.route() before page.goto() validated - [ ] Race conditions detected (route after navigate) - [ ] waitForResponse patterns checked - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Violations recorded with recommended fixes #### Assertions (if `check_assertions: true`) - [ ] Explicit assertions counted - [ ] Implicit waits without assertions detected - [ ] Assertion specificity validated - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Violations recorded with recommended fixes #### Test Length (if `check_test_length: true`) - [ ] File line count calculated - [ ] Threshold comparison (≤300 lines ideal) - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Splitting recommendations generated (if >300 lines) #### Test Duration (if `check_test_duration: true`) - [ ] Test complexity analyzed (as proxy for duration if no execution data) - [ ] Threshold comparison (≤1.5 min target) - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Optimization recommendations generated #### Flakiness Patterns (if `check_flakiness_patterns: true`) - [ ] Tight timeouts detected (e.g., { timeout: 1000 }) - [ ] Race conditions detected - [ ] Timing-dependent assertions detected - [ ] Retry logic detected - [ ] Environment-dependent assumptions detected - [ ] Status assigned (PASS/WARN/FAIL) - [ ] Violations recorded with recommended fixes --- ### Step 4: Quality Score Calculation **Violation Counting:** - [ ] Critical (P0) violations counted - [ ] High (P1) violations counted - [ ] Medium (P2) violations counted - [ ] Low (P3) violations counted - [ ] Violation breakdown by criterion recorded **Score Calculation:** - [ ] Starting score: 100 - [ ] Critical violations deducted (-10 each) - [ ] High violations deducted (-5 each) - [ ] Medium violations deducted (-2 each) - [ ] Low violations deducted (-1 each) - [ ] Bonus points added (max +30): - [ ] Excellent BDD structure (+5 if applicable) - [ ] Comprehensive fixtures (+5 if applicable) - [ ] Comprehensive data factories (+5 if applicable) - [ ] Network-first pattern (+5 if applicable) - [ ] Perfect isolation (+5 if applicable) - [ ] All test IDs present (+5 if applicable) - [ ] Final score calculated: max(0, min(100, Starting - Violations + Bonus)) **Quality Grade:** - [ ] Grade assigned based on score: - 90-100: A+ (Excellent) - 80-89: A (Good) - 70-79: B (Acceptable) - 60-69: C (Needs Improvement) - <60: F (Critical Issues) --- ### Step 5: Review Report Generation **Report Sections Created:** - [ ] **Header Section**: - [ ] Test file(s) reviewed listed - [ ] Review date recorded - [ ] Review scope noted (single/directory/suite) - [ ] Quality score and grade displayed - [ ] **Executive Summary**: - [ ] Overall assessment (Excellent/Good/Needs Improvement/Critical) - [ ] Key strengths listed (3-5 bullet points) - [ ] Key weaknesses listed (3-5 bullet points) - [ ] Recommendation stated (Approve/Approve with comments/Request changes/Block) - [ ] **Quality Criteria Assessment**: - [ ] Table with all criteria evaluated - [ ] Status for each criterion (PASS/WARN/FAIL) - [ ] Violation count per criterion - [ ] **Critical Issues (Must Fix)**: - [ ] P0/P1 violations listed - [ ] Code location provided for each (file:line) - [ ] Issue explanation clear - [ ] Recommended fix provided with code example - [ ] Knowledge base reference provided - [ ] **Recommendations (Should Fix)**: - [ ] P2/P3 violations listed - [ ] Code location provided for each (file:line) - [ ] Issue explanation clear - [ ] Recommended improvement provided with code example - [ ] Knowledge base reference provided - [ ] **Best Practices Examples** (if good patterns found): - [ ] Good patterns highlighted from tests - [ ] Knowledge base fragments referenced - [ ] Examples provided for others to follow - [ ] **Knowledge Base References**: - [ ] All fragments consulted listed - [ ] Links to detailed guidance provided --- ### Step 6: Optional Outputs Generation **Inline Comments** (if `generate_inline_comments: true`): - [ ] Inline comments generated at violation locations - [ ] Comment format: `// TODO (TEA Review): [Issue] - See test-review-{filename}.md` - [ ] Comments added to test files (no logic changes) - [ ] Test files remain valid and executable **Quality Badge** (if `generate_quality_badge: true`): - [ ] Badge created with quality score (e.g., "Test Quality: 87/100 (A)") - [ ] Badge format suitable for README or documentation - [ ] Badge saved to output folder **Story Update** (if `append_to_story: true` and story file exists): - [ ] "Test Quality Review" section created - [ ] Quality score included - [ ] Critical issues summarized - [ ] Link to full review report provided - [ ] Story file updated successfully --- ### Step 7: Save and Notify **Outputs Saved:** - [ ] Review report saved to `{output_file}` - [ ] Inline comments written to test files (if enabled) - [ ] Quality badge saved (if enabled) - [ ] Story file updated (if enabled) - [ ] All outputs are valid and readable **Summary Message Generated:** - [ ] Quality score and grade included - [ ] Critical issue count stated - [ ] Recommendation provided (Approve/Request changes/Block) - [ ] Next steps clarified - [ ] Message displayed to user --- ## Output Validation ### Review Report Completeness - [ ] All required sections present - [ ] No placeholder text or TODOs in report - [ ] All code locations are accurate (file:line) - [ ] All code examples are valid and demonstrate fix - [ ] All knowledge base references are correct ### Review Report Accuracy - [ ] Quality score matches violation breakdown - [ ] Grade matches score range - [ ] Violations correctly categorized by severity (P0/P1/P2/P3) - [ ] Violations correctly attributed to quality criteria - [ ] No false positives (violations are legitimate issues) - [ ] No false negatives (critical issues not missed) ### Review Report Clarity - [ ] Executive summary is clear and actionable - [ ] Issue explanations are understandable - [ ] Recommended fixes are implementable - [ ] Code examples are correct and runnable - [ ] Recommendation (Approve/Request changes) is clear --- ## Quality Checks ### Knowledge-Based Validation - [ ] All feedback grounded in knowledge base fragments - [ ] Recommendations follow proven patterns - [ ] No arbitrary or opinion-based feedback - [ ] Knowledge fragment references accurate and relevant ### Actionable Feedback - [ ] Every issue includes recommended fix - [ ] Every fix includes code example - [ ] Code examples demonstrate correct pattern - [ ] Fixes reference knowledge base for more detail ### Severity Classification - [ ] Critical (P0) issues are genuinely critical (hard waits, race conditions, no assertions) - [ ] High (P1) issues impact maintainability/reliability (missing IDs, hardcoded data) - [ ] Medium (P2) issues are nice-to-have improvements (long files, missing priorities) - [ ] Low (P3) issues are minor style/preference (verbose tests) ### Context Awareness - [ ] Review considers project context (some patterns may be justified) - [ ] Violations with justification comments noted as acceptable - [ ] Edge cases acknowledged - [ ] Recommendations are pragmatic, not dogmatic --- ## Integration Points ### Story File Integration - [ ] Story file discovered correctly (if available) - [ ] Acceptance criteria extracted and used for context - [ ] Test quality section appended to story (if enabled) - [ ] Link to review report added to story ### Test Design Integration - [ ] Test design document discovered correctly (if available) - [ ] Priority context (P0/P1/P2/P3) extracted and used - [ ] Review validates tests align with prioritization - [ ] Misalignment flagged (e.g., P0 scenario missing tests) ### Knowledge Base Integration - [ ] tea-index.csv loaded successfully - [ ] All required fragments loaded - [ ] Fragments applied correctly to validation - [ ] Fragment references in report are accurate --- ## Edge Cases and Special Situations ### Empty or Minimal Tests - [ ] If test file is empty, report notes "No tests found" - [ ] If test file has only boilerplate, report notes "No meaningful tests" - [ ] Score reflects lack of content appropriately ### Legacy Tests - [ ] Legacy tests acknowledged in context - [ ] Review provides practical recommendations for improvement - [ ] Recognizes that complete refactor may not be feasible - [ ] Prioritizes critical issues (flakiness) over style ### Test Framework Variations - [ ] Review adapts to test framework (Playwright vs Jest vs Cypress) - [ ] Framework-specific patterns recognized (e.g., Playwright fixtures) - [ ] Framework-specific violations detected (e.g., Cypress anti-patterns) - [ ] Knowledge fragments applied appropriately for framework ### Justified Violations - [ ] Violations with justification comments in code noted as acceptable - [ ] Justifications evaluated for legitimacy - [ ] Report acknowledges justified patterns - [ ] Score not penalized for justified violations --- ## Final Validation ### Review Completeness - [ ] All enabled quality criteria evaluated - [ ] All test files in scope reviewed - [ ] All violations cataloged - [ ] All recommendations provided - [ ] Review report is comprehensive ### Review Accuracy - [ ] Quality score is accurate - [ ] Violations are correct (no false positives) - [ ] Critical issues not missed (no false negatives) - [ ] Code locations are correct - [ ] Knowledge base references are accurate ### Review Usefulness - [ ] Feedback is actionable - [ ] Recommendations are implementable - [ ] Code examples are correct - [ ] Review helps developer improve tests - [ ] Review educates on best practices ### Workflow Complete - [ ] All checklist items completed - [ ] All outputs validated and saved - [ ] User notified with summary - [ ] Review ready for developer consumption - [ ] Follow-up actions identified (if any) --- ## Notes Record any issues, observations, or important context during workflow execution: - **Test Framework**: [Playwright, Jest, Cypress, etc.] - **Review Scope**: [single file, directory, full suite] - **Quality Score**: [0-100 score, letter grade] - **Critical Issues**: [Count of P0/P1 violations] - **Recommendation**: [Approve / Approve with comments / Request changes / Block] - **Special Considerations**: [Legacy code, justified patterns, edge cases] - **Follow-up Actions**: [Re-review after fixes, pair programming, etc.]