bmad初始化

2025-11-01 19:22:39 +08:00
parent 5b21dc0bd5
commit 426ae41f54
447 changed files with 80633 additions and 0 deletions
--- a/bmad/bmm/workflows/testarch/README.md
+++ b/bmad/bmm/workflows/testarch/README.md
@@ -0,0 +1,26 @@
+# Test Architect Workflows
+
+This directory houses the per-command workflows used by the Test Architect agent (`tea`). Each workflow wraps the standalone instructions that used to live under `testarch/` so they can run through the standard BMAD workflow runner.
+
+## Available workflows
+
+- `framework` – scaffolds Playwright/Cypress harnesses.
+- `atdd` – generates failing acceptance tests before coding.
+- `automate` – expands regression coverage after implementation.
+- `ci` – bootstraps CI/CD pipelines aligned with TEA practices.
+- `test-design` – combines risk assessment and coverage planning.
+- `trace` – maps requirements to tests (Phase 1) and makes quality gate decisions (Phase 2).
+- `nfr-assess` – evaluates non-functional requirements.
+- `test-review` – reviews test quality using knowledge base patterns and generates quality score.
+
+**Note**: The `gate` workflow has been merged into `trace` as Phase 2. The `*trace` command now performs both requirements-to-tests traceability mapping AND quality gate decision (PASS/CONCERNS/FAIL/WAIVED) in a single atomic operation.
+
+Each subdirectory contains:
+
+- `README.md` – comprehensive workflow documentation with usage, inputs, outputs, and integration notes.
+- `instructions.md` – detailed workflow steps in pure markdown v4.0 format.
+- `workflow.yaml` – metadata, variables, and configuration for BMAD workflow runner.
+- `checklist.md` – validation checklist for quality assurance and completeness verification.
+- `template.md` – output template for workflow deliverables (where applicable).
+
+The TEA agent now invokes these workflows via `run-workflow` rather than executing instruction files directly.
--- a/bmad/bmm/workflows/testarch/atdd/README.md
+++ b/bmad/bmm/workflows/testarch/atdd/README.md
@@ -0,0 +1,672 @@
+# ATDD (Acceptance Test-Driven Development) Workflow
+
+Generates failing acceptance tests BEFORE implementation following TDD's red-green-refactor cycle. Creates comprehensive test coverage at appropriate levels (E2E, API, Component) with supporting infrastructure (fixtures, factories, mocks) and provides an implementation checklist to guide development toward passing tests.
+
+**Core Principle**: Tests fail first (red phase), guide development to green, then enable confident refactoring.
+
+## Usage
+
+```bash
+bmad tea *atdd
+```
+
+The TEA agent runs this workflow when:
+
+- User story is approved with clear acceptance criteria
+- Development is about to begin (before any implementation code)
+- Team is practicing Test-Driven Development (TDD)
+- Need to establish test-first contract with DEV team
+
+## Inputs
+
+**Required Context Files:**
+
+- **Story markdown** (`{story_file}`): User story with acceptance criteria, functional requirements, and technical constraints
+- **Framework configuration**: Test framework config (playwright.config.ts or cypress.config.ts) from framework workflow
+
+**Workflow Variables:**
+
+- `story_file`: Path to story markdown with acceptance criteria (required)
+- `test_dir`: Directory for test files (default: `{project-root}/tests`)
+- `test_framework`: Detected from framework workflow (playwright or cypress)
+- `test_levels`: Which test levels to generate (default: "e2e,api,component")
+- `primary_level`: Primary test level for acceptance criteria (default: "e2e")
+- `start_failing`: Tests must fail initially - red phase (default: true)
+- `use_given_when_then`: BDD-style test structure (default: true)
+- `network_first`: Route interception before navigation to prevent race conditions (default: true)
+- `one_assertion_per_test`: Atomic test design (default: true)
+- `generate_factories`: Create data factory stubs using faker (default: true)
+- `generate_fixtures`: Create fixture architecture with auto-cleanup (default: true)
+- `auto_cleanup`: Fixtures clean up their data automatically (default: true)
+- `include_data_testids`: List required data-testid attributes for DEV (default: true)
+- `include_mock_requirements`: Document mock/stub needs (default: true)
+- `auto_load_knowledge`: Load fixture-architecture, data-factories, component-tdd fragments (default: true)
+- `share_with_dev`: Provide implementation checklist to DEV agent (default: true)
+- `output_checklist`: Path for implementation checklist (default: `{output_folder}/atdd-checklist-{story_id}.md`)
+
+**Optional Context:**
+
+- **Test design document**: For risk/priority context alignment (P0-P3 scenarios)
+- **Existing fixtures/helpers**: For consistency with established patterns
+- **Architecture documents**: For understanding system boundaries and integration points
+
+## Outputs
+
+**Primary Deliverable:**
+
+- **ATDD Checklist** (`atdd-checklist-{story_id}.md`): Implementation guide containing:
+  - Story summary and acceptance criteria breakdown
+  - Test files created with paths and line counts
+  - Data factories created with patterns
+  - Fixtures created with auto-cleanup logic
+  - Mock requirements for external services
+  - Required data-testid attributes list
+  - Implementation checklist mapping tests to code tasks
+  - Red-green-refactor workflow guidance
+  - Execution commands for running tests
+
+**Test Files Created:**
+
+- **E2E tests** (`tests/e2e/{feature-name}.spec.ts`): Full user journey tests for critical paths
+- **API tests** (`tests/api/{feature-name}.api.spec.ts`): Business logic and service contract tests
+- **Component tests** (`tests/component/{ComponentName}.test.tsx`): UI component behavior tests
+
+**Supporting Infrastructure:**
+
+- **Data factories** (`tests/support/factories/{entity}.factory.ts`): Factory functions using @faker-js/faker for generating test data with overrides support
+- **Test fixtures** (`tests/support/fixtures/{feature}.fixture.ts`): Playwright fixtures with setup/teardown and auto-cleanup
+- **Mock/stub documentation**: Requirements for external service mocking (payment gateways, email services, etc.)
+- **data-testid requirements**: List of required test IDs for stable selectors in UI implementation
+
+**Validation Safeguards:**
+
+- All tests must fail initially (red phase verified by local test run)
+- Failure messages are clear and actionable
+- Tests use Given-When-Then format for readability
+- Network-first pattern applied (route interception before navigation)
+- One assertion per test (atomic test design)
+- No hard waits or sleeps (explicit waits only)
+
+## Key Features
+
+### Red-Green-Refactor Cycle
+
+**RED Phase** (TEA Agent responsibility):
+
+- Write failing tests first defining expected behavior
+- Tests fail for right reason (missing implementation, not test bugs)
+- All supporting infrastructure (factories, fixtures, mocks) created
+
+**GREEN Phase** (DEV Agent responsibility):
+
+- Implement minimal code to pass one test at a time
+- Use implementation checklist as guide
+- Run tests frequently to verify progress
+
+**REFACTOR Phase** (DEV Agent responsibility):
+
+- Improve code quality with confidence (tests provide safety net)
+- Extract duplications, optimize performance
+- Ensure tests still pass after changes
+
+### Test Level Selection Framework
+
+**E2E (End-to-End)**:
+
+- Critical user journeys (login, checkout, core workflows)
+- Multi-system integration
+- User-facing acceptance criteria
+- Characteristics: High confidence, slow execution, brittle
+
+**API (Integration)**:
+
+- Business logic validation
+- Service contracts and data transformations
+- Backend integration without UI
+- Characteristics: Fast feedback, good balance, stable
+
+**Component**:
+
+- UI component behavior (buttons, forms, modals)
+- Interaction testing (click, hover, keyboard navigation)
+- Visual regression and state management
+- Characteristics: Fast, isolated, granular
+
+**Unit**:
+
+- Pure business logic and algorithms
+- Edge cases and error handling
+- Minimal dependencies
+- Characteristics: Fastest, most granular
+
+**Selection Strategy**: Avoid duplicate coverage. Use E2E for critical happy path, API for business logic variations, component for UI edge cases, unit for pure logic.
+
+### Recording Mode (NEW - Phase 2.5)
+
+**atdd** can record complex UI interactions instead of AI generation.
+
+**Activation**: Automatic for complex UI when config.tea_use_mcp_enhancements is true and MCP available
+
+- Fallback: AI generation (silent, automatic)
+
+**When to Use Recording Mode:**
+
+- ✅ Complex UI interactions (drag-drop, multi-step forms, wizards)
+- ✅ Visual workflows (modals, dialogs, animations)
+- ✅ Unclear requirements (exploratory, discovering expected behavior)
+- ✅ Multi-page flows (checkout, registration, onboarding)
+- ❌ NOT for simple CRUD (AI generation faster)
+- ❌ NOT for API-only tests (no UI to record)
+
+**When to Use AI Generation (Default):**
+
+- ✅ Clear acceptance criteria available
+- ✅ Standard patterns (login, CRUD, navigation)
+- ✅ Need many tests quickly
+- ✅ API/backend tests (no UI interaction)
+
+**How Test Generation Works (Default - AI-Based):**
+
+TEA generates tests using AI by:
+
+1. **Analyzing acceptance criteria** from story markdown
+2. **Inferring selectors** from requirement descriptions (e.g., "login button" → `[data-testid="login-button"]`)
+3. **Synthesizing test code** based on knowledge base patterns
+4. **Estimating interactions** using common UI patterns (click, type, verify)
+5. **Applying best practices** from knowledge fragments (Given-When-Then, network-first, fixtures)
+
+**This works well for:**
+
+- ✅ Clear requirements with known UI patterns
+- ✅ Standard workflows (login, CRUD, navigation)
+- ✅ When selectors follow conventions (data-testid attributes)
+
+**What MCP Adds (Interactive Verification & Enhancement):**
+
+When Playwright MCP is available, TEA **additionally**:
+
+1. **Verifies generated tests** by:
+   - **Launching real browser** with `generator_setup_page`
+   - **Executing generated test steps** with `browser_*` tools (`navigate`, `click`, `type`)
+   - **Seeing actual UI** with `browser_snapshot` (visual verification)
+   - **Discovering real selectors** with `browser_generate_locator` (auto-generate from live DOM)
+
+2. **Enhances AI-generated tests** by:
+   - **Validating selectors exist** in actual DOM (not just guesses)
+   - **Verifying behavior** with `browser_verify_text`, `browser_verify_visible`, `browser_verify_url`
+   - **Capturing actual interaction log** with `generator_read_log`
+   - **Refining test code** with real observed behavior
+
+3. **Catches issues early** by:
+   - **Finding missing selectors** before DEV implements (requirements clarification)
+   - **Discovering edge cases** not in requirements (loading states, error messages)
+   - **Validating assumptions** about UI structure and behavior
+
+**Key Benefits of MCP Enhancement:**
+
+- ✅ **AI generates tests** (fast, based on requirements) **+** **MCP verifies tests** (accurate, based on reality)
+- ✅ **Accurate selectors**: Validated against actual DOM, not just inferred
+- ✅ **Visual validation**: TEA sees what user sees (modals, animations, state changes)
+- ✅ **Complex flows**: Records multi-step interactions precisely
+- ✅ **Edge case discovery**: Observes actual app behavior beyond requirements
+- ✅ **Selector resilience**: MCP generates robust locators from live page (role-based, text-based, fallback chains)
+
+**Example Enhancement Flow:**
+
+```
+1. AI generates test based on acceptance criteria
+   → await page.click('[data-testid="submit-button"]')
+
+2. MCP verifies selector exists (browser_generate_locator)
+   → Found: button[type="submit"].btn-primary
+   → No data-testid attribute exists!
+
+3. TEA refines test with actual selector
+   → await page.locator('button[type="submit"]').click()
+   → Documents requirement: "Add data-testid='submit-button' to button"
+```
+
+**Recording Workflow (MCP-Based):**
+
+```
+1. Set generation_mode: "recording"
+2. Use generator_setup_page to init recording session
+3. For each acceptance criterion:
+   a. Execute scenario with browser_* tools:
+      - browser_navigate, browser_click, browser_type
+      - browser_select, browser_check
+   b. Add verifications with browser_verify_* tools:
+      - browser_verify_text, browser_verify_visible
+      - browser_verify_url
+   c. Capture log with generator_read_log
+   d. Generate test with generator_write_test
+4. Enhance generated tests with knowledge base patterns:
+   - Add Given-When-Then comments
+   - Replace selectors with data-testid
+   - Add network-first interception
+   - Add fixtures/factories
+5. Verify tests fail (RED phase)
+```
+
+**Example: Recording a Checkout Flow**
+
+```markdown
+Recording session for: "User completes checkout with credit card"
+
+Actions recorded:
+
+1. browser_navigate('/cart')
+2. browser_click('[data-testid="checkout-button"]')
+3. browser_type('[data-testid="card-number"]', '4242424242424242')
+4. browser_type('[data-testid="expiry"]', '12/25')
+5. browser_type('[data-testid="cvv"]', '123')
+6. browser_click('[data-testid="place-order"]')
+7. browser_verify_text('Order confirmed')
+8. browser_verify_url('/confirmation')
+
+Generated test (enhanced):
+
+- Given-When-Then structure added
+- data-testid selectors used
+- Network-first payment API mock added
+- Card factory created for test data
+- Test verified to FAIL (checkout not implemented)
+```
+
+**Graceful Degradation:**
+
+- Recording mode is OPTIONAL (default: AI generation)
+- Requires Playwright MCP (falls back to AI if unavailable)
+- Generated tests enhanced with knowledge base patterns
+- Same quality output regardless of generation method
+
+### Given-When-Then Structure
+
+All tests follow BDD format for clarity:
+
+```typescript
+test('should display error for invalid credentials', async ({ page }) => {
+  // GIVEN: User is on login page
+  await page.goto('/login');
+
+  // WHEN: User submits invalid credentials
+  await page.fill('[data-testid="email-input"]', 'invalid@example.com');
+  await page.fill('[data-testid="password-input"]', 'wrongpassword');
+  await page.click('[data-testid="login-button"]');
+
+  // THEN: Error message is displayed
+  await expect(page.locator('[data-testid="error-message"]')).toHaveText('Invalid email or password');
+});
+```
+
+### Network-First Testing Pattern
+
+**Critical pattern to prevent race conditions**:
+
+```typescript
+// ✅ CORRECT: Intercept BEFORE navigation
+await page.route('**/api/data', handler);
+await page.goto('/page');
+
+// ❌ WRONG: Navigate then intercept (race condition)
+await page.goto('/page');
+await page.route('**/api/data', handler); // Too late!
+```
+
+Always set up route interception before navigating to pages that make network requests.
+
+### Data Factory Architecture
+
+Use faker for all test data generation:
+
+```typescript
+// tests/support/factories/user.factory.ts
+import { faker } from '@faker-js/faker';
+
+export const createUser = (overrides = {}) => ({
+  id: faker.number.int(),
+  email: faker.internet.email(),
+  name: faker.person.fullName(),
+  createdAt: faker.date.recent().toISOString(),
+  ...overrides,
+});
+
+export const createUsers = (count: number) => Array.from({ length: count }, () => createUser());
+```
+
+**Factory principles:**
+
+- Use faker for random data (no hardcoded values to prevent collisions)
+- Support overrides for specific test scenarios
+- Generate complete valid objects matching API contracts
+- Include helper functions for bulk creation
+
+### Fixture Architecture with Auto-Cleanup
+
+Playwright fixtures with automatic data cleanup:
+
+```typescript
+// tests/support/fixtures/auth.fixture.ts
+import { test as base } from '@playwright/test';
+
+export const test = base.extend({
+  authenticatedUser: async ({ page }, use) => {
+    // Setup: Create and authenticate user
+    const user = await createUser();
+    await page.goto('/login');
+    await page.fill('[data-testid="email"]', user.email);
+    await page.fill('[data-testid="password"]', 'password123');
+    await page.click('[data-testid="login-button"]');
+    await page.waitForURL('/dashboard');
+
+    // Provide to test
+    await use(user);
+
+    // Cleanup: Delete user (automatic)
+    await deleteUser(user.id);
+  },
+});
+```
+
+**Fixture principles:**
+
+- Auto-cleanup (always delete created data in teardown)
+- Composable (fixtures can use other fixtures via mergeTests)
+- Isolated (each test gets fresh data)
+- Type-safe with TypeScript
+
+### One Assertion Per Test (Atomic Design)
+
+Each test should verify exactly one behavior:
+
+```typescript
+// ✅ CORRECT: One assertion
+test('should display user name', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+});
+
+// ❌ WRONG: Multiple assertions (not atomic)
+test('should display user info', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+  await expect(page.locator('[data-testid="user-email"]')).toHaveText('john@example.com');
+});
+```
+
+**Why?** If second assertion fails, you don't know if first is still valid. Split into separate tests for clear failure diagnosis.
+
+### Implementation Checklist for DEV
+
+Maps each failing test to concrete implementation tasks:
+
+```markdown
+## Implementation Checklist
+
+### Test: User Login with Valid Credentials
+
+- [ ] Create `/login` route
+- [ ] Implement login form component
+- [ ] Add email/password validation
+- [ ] Integrate authentication API
+- [ ] Add `data-testid` attributes: `email-input`, `password-input`, `login-button`
+- [ ] Implement error handling
+- [ ] Run test: `npm run test:e2e -- login.spec.ts`
+- [ ] ✅ Test passes (green phase)
+```
+
+Provides clear path from red to green for each test.
+
+## Integration with Other Workflows
+
+**Before this workflow:**
+
+- **framework** workflow: Must run first to establish test framework architecture (Playwright or Cypress config, directory structure, base fixtures)
+- **test-design** workflow: Optional but recommended for P0-P3 priority alignment and risk assessment context
+
+**After this workflow:**
+
+- **DEV agent** implements features guided by failing tests and implementation checklist
+- **test-review** workflow: Review generated test quality before sharing with DEV team
+- **automate** workflow: After story completion, expand regression suite with additional edge case coverage
+
+**Coordinates with:**
+
+- **Story approval process**: ATDD runs after story is approved but before DEV begins implementation
+- **Quality gates**: Failing tests serve as acceptance criteria for story completion (all tests must pass)
+
+## Important Notes
+
+### ATDD is Test-First, Not Test-After
+
+**Critical timing**: Tests must be written BEFORE any implementation code. This ensures:
+
+- Tests define the contract (what needs to be built)
+- Implementation is guided by tests (no over-engineering)
+- Tests verify behavior, not implementation details
+- Confidence in refactoring (tests catch regressions)
+
+### All Tests Must Fail Initially
+
+**Red phase verification is mandatory**:
+
+- Run tests locally after creation to confirm RED phase
+- Failure should be due to missing implementation, not test bugs
+- Failure messages should be clear and actionable
+- Document expected failure messages in ATDD checklist
+
+If a test passes before implementation, it's not testing the right thing.
+
+### Use data-testid for Stable Selectors
+
+**Why data-testid?**
+
+- CSS classes change frequently (styling refactors)
+- IDs may not be unique or stable
+- Text content changes with localization
+- data-testid is explicit contract between tests and UI
+
+```typescript
+// ✅ CORRECT: Stable selector
+await page.click('[data-testid="login-button"]');
+
+// ❌ FRAGILE: Class-based selector
+await page.click('.btn.btn-primary.login-btn');
+```
+
+ATDD checklist includes complete list of required data-testid attributes for DEV team.
+
+### No Hard Waits or Sleeps
+
+**Use explicit waits only**:
+
+```typescript
+// ✅ CORRECT: Explicit wait for condition
+await page.waitForSelector('[data-testid="user-name"]');
+await expect(page.locator('[data-testid="user-name"]')).toBeVisible();
+
+// ❌ WRONG: Hard wait (flaky, slow)
+await page.waitForTimeout(2000);
+```
+
+Playwright's auto-waiting is preferred (expect() automatically waits up to timeout).
+
+### Component Tests for Complex UI Only
+
+**When to use component tests:**
+
+- Complex UI interactions (drag-drop, keyboard navigation)
+- Form validation logic
+- State management within component
+- Visual edge cases
+
+**When NOT to use:**
+
+- Simple rendering (snapshot tests are sufficient)
+- Integration with backend (use E2E or API tests)
+- Full user journeys (use E2E tests)
+
+Component tests are valuable but should complement, not replace, E2E and API tests.
+
+### Auto-Cleanup is Non-Negotiable
+
+**Every test must clean up its data**:
+
+- Use fixtures with automatic teardown
+- Never leave test data in database/storage
+- Each test should be isolated (no shared state)
+
+**Cleanup patterns:**
+
+- Fixtures: Cleanup in teardown function
+- Factories: Provide deletion helpers
+- Tests: Use `test.afterEach()` for manual cleanup if needed
+
+Without auto-cleanup, tests become flaky and depend on execution order.
+
+## Knowledge Base References
+
+This workflow automatically consults:
+
+- **fixture-architecture.md** - Test fixture patterns with setup/teardown and auto-cleanup using Playwright's test.extend()
+- **data-factories.md** - Factory patterns using @faker-js/faker for random test data generation with overrides support
+- **component-tdd.md** - Component test strategies using Playwright Component Testing (@playwright/experimental-ct-react)
+- **network-first.md** - Route interception patterns (intercept before navigation to prevent race conditions)
+- **test-quality.md** - Test design principles (Given-When-Then, one assertion per test, determinism, isolation)
+- **test-levels-framework.md** - Test level selection framework (E2E vs API vs Component vs Unit)
+
+See `tea-index.csv` for complete knowledge fragment mapping and additional references.
+
+## Example Output
+
+After running this workflow, the ATDD checklist will contain:
+
+````markdown
+# ATDD Checklist - Epic 3, Story 5: User Authentication
+
+## Story Summary
+
+As a user, I want to log in with email and password so that I can access my personalized dashboard.
+
+## Acceptance Criteria
+
+1. User can log in with valid credentials
+2. User sees error message with invalid credentials
+3. User is redirected to dashboard after successful login
+
+## Failing Tests Created (RED Phase)
+
+### E2E Tests (3 tests)
+
+- `tests/e2e/user-authentication.spec.ts` (87 lines)
+  - ✅ should log in with valid credentials (RED - missing /login route)
+  - ✅ should display error for invalid credentials (RED - error message not implemented)
+  - ✅ should redirect to dashboard after login (RED - redirect logic missing)
+
+### API Tests (2 tests)
+
+- `tests/api/auth.api.spec.ts` (54 lines)
+  - ✅ POST /api/auth/login - should return token for valid credentials (RED - endpoint not implemented)
+  - ✅ POST /api/auth/login - should return 401 for invalid credentials (RED - validation missing)
+
+## Data Factories Created
+
+- `tests/support/factories/user.factory.ts` - createUser(), createUsers(count)
+
+## Fixtures Created
+
+- `tests/support/fixtures/auth.fixture.ts` - authenticatedUser fixture with auto-cleanup
+
+## Required data-testid Attributes
+
+### Login Page
+
+- `email-input` - Email input field
+- `password-input` - Password input field
+- `login-button` - Submit button
+- `error-message` - Error message container
+
+### Dashboard Page
+
+- `user-name` - User name display
+- `logout-button` - Logout button
+
+## Implementation Checklist
+
+### Test: User Login with Valid Credentials
+
+- [ ] Create `/login` route
+- [ ] Implement login form component
+- [ ] Add email/password validation
+- [ ] Integrate authentication API
+- [ ] Add data-testid attributes: `email-input`, `password-input`, `login-button`
+- [ ] Run test: `npm run test:e2e -- user-authentication.spec.ts`
+- [ ] ✅ Test passes (green phase)
+
+### Test: Display Error for Invalid Credentials
+
+- [ ] Add error state management
+- [ ] Display error message UI
+- [ ] Add `data-testid="error-message"`
+- [ ] Run test: `npm run test:e2e -- user-authentication.spec.ts`
+- [ ] ✅ Test passes (green phase)
+
+### Test: Redirect to Dashboard After Login
+
+- [ ] Implement redirect logic after successful auth
+- [ ] Verify authentication token stored
+- [ ] Add dashboard route protection
+- [ ] Run test: `npm run test:e2e -- user-authentication.spec.ts`
+- [ ] ✅ Test passes (green phase)
+
+## Running Tests
+
+```bash
+# Run all failing tests
+npm run test:e2e
+
+# Run specific test file
+npm run test:e2e -- user-authentication.spec.ts
+
+# Run tests in headed mode (see browser)
+npm run test:e2e -- --headed
+
+# Debug specific test
+npm run test:e2e -- user-authentication.spec.ts --debug
+```
+````
+
+## Red-Green-Refactor Workflow
+
+**RED Phase** (Complete):
+
+- ✅ All tests written and failing
+- ✅ Fixtures and factories created
+- ✅ data-testid requirements documented
+
+**GREEN Phase** (DEV Team - Next Steps):
+
+1. Pick one failing test from checklist
+2. Implement minimal code to make it pass
+3. Run test to verify green
+4. Check off task in checklist
+5. Move to next test
+6. Repeat until all tests pass
+
+**REFACTOR Phase** (DEV Team - After All Tests Pass):
+
+1. All tests passing (green)
+2. Improve code quality (extract functions, optimize)
+3. Remove duplications
+4. Ensure tests still pass after each refactor
+
+## Next Steps
+
+1. Review this checklist with team
+2. Run failing tests to confirm RED phase: `npm run test:e2e`
+3. Begin implementation using checklist as guide
+4. Share progress in daily standup
+5. When all tests pass, run `bmad sm story-done` to move story to DONE
+
+```
+
+This comprehensive checklist guides DEV team from red to green with clear tasks and validation steps.
+```
--- a/bmad/bmm/workflows/testarch/atdd/atdd-checklist-template.md
+++ b/bmad/bmm/workflows/testarch/atdd/atdd-checklist-template.md
@@ -0,0 +1,363 @@
+# ATDD Checklist - Epic {epic_num}, Story {story_num}: {story_title}
+
+**Date:** {date}
+**Author:** {user_name}
+**Primary Test Level:** {primary_level}
+
+---
+
+## Story Summary
+
+{Brief 2-3 sentence summary of the user story}
+
+**As a** {user_role}
+**I want** {feature_description}
+**So that** {business_value}
+
+---
+
+## Acceptance Criteria
+
+{List all testable acceptance criteria from the story}
+
+1. {Acceptance criterion 1}
+2. {Acceptance criterion 2}
+3. {Acceptance criterion 3}
+
+---
+
+## Failing Tests Created (RED Phase)
+
+### E2E Tests ({e2e_test_count} tests)
+
+**File:** `{e2e_test_file_path}` ({line_count} lines)
+
+{List each E2E test with its current status and expected failure reason}
+
+- ✅ **Test:** {test_name}
+  - **Status:** RED - {failure_reason}
+  - **Verifies:** {what_this_test_validates}
+
+### API Tests ({api_test_count} tests)
+
+**File:** `{api_test_file_path}` ({line_count} lines)
+
+{List each API test with its current status and expected failure reason}
+
+- ✅ **Test:** {test_name}
+  - **Status:** RED - {failure_reason}
+  - **Verifies:** {what_this_test_validates}
+
+### Component Tests ({component_test_count} tests)
+
+**File:** `{component_test_file_path}` ({line_count} lines)
+
+{List each component test with its current status and expected failure reason}
+
+- ✅ **Test:** {test_name}
+  - **Status:** RED - {failure_reason}
+  - **Verifies:** {what_this_test_validates}
+
+---
+
+## Data Factories Created
+
+{List all data factory files created with their exports}
+
+### {Entity} Factory
+
+**File:** `tests/support/factories/{entity}.factory.ts`
+
+**Exports:**
+
+- `create{Entity}(overrides?)` - Create single entity with optional overrides
+- `create{Entity}s(count)` - Create array of entities
+
+**Example Usage:**
+
+```typescript
+const user = createUser({ email: 'specific@example.com' });
+const users = createUsers(5); // Generate 5 random users
+```
+
+---
+
+## Fixtures Created
+
+{List all test fixture files created with their fixture names and descriptions}
+
+### {Feature} Fixtures
+
+**File:** `tests/support/fixtures/{feature}.fixture.ts`
+
+**Fixtures:**
+
+- `{fixtureName}` - {description_of_what_fixture_provides}
+  - **Setup:** {what_setup_does}
+  - **Provides:** {what_test_receives}
+  - **Cleanup:** {what_cleanup_does}
+
+**Example Usage:**
+
+```typescript
+import { test } from './fixtures/{feature}.fixture';
+
+test('should do something', async ({ {fixtureName} }) => {
+  // {fixtureName} is ready to use with auto-cleanup
+});
+```
+
+---
+
+## Mock Requirements
+
+{Document external services that need mocking and their requirements}
+
+### {Service Name} Mock
+
+**Endpoint:** `{HTTP_METHOD} {endpoint_url}`
+
+**Success Response:**
+
+```json
+{
+  {success_response_example}
+}
+```
+
+**Failure Response:**
+
+```json
+{
+  {failure_response_example}
+}
+```
+
+**Notes:** {any_special_mock_requirements}
+
+---
+
+## Required data-testid Attributes
+
+{List all data-testid attributes required in UI implementation for test stability}
+
+### {Page or Component Name}
+
+- `{data-testid-name}` - {description_of_element}
+- `{data-testid-name}` - {description_of_element}
+
+**Implementation Example:**
+
+```tsx
+<button data-testid="login-button">Log In</button>
+<input data-testid="email-input" type="email" />
+<div data-testid="error-message">{errorText}</div>
+```
+
+---
+
+## Implementation Checklist
+
+{Map each failing test to concrete implementation tasks that will make it pass}
+
+### Test: {test_name_1}
+
+**File:** `{test_file_path}`
+
+**Tasks to make this test pass:**
+
+- [ ] {Implementation task 1}
+- [ ] {Implementation task 2}
+- [ ] {Implementation task 3}
+- [ ] Add required data-testid attributes: {list_of_testids}
+- [ ] Run test: `{test_execution_command}`
+- [ ] ✅ Test passes (green phase)
+
+**Estimated Effort:** {effort_estimate} hours
+
+---
+
+### Test: {test_name_2}
+
+**File:** `{test_file_path}`
+
+**Tasks to make this test pass:**
+
+- [ ] {Implementation task 1}
+- [ ] {Implementation task 2}
+- [ ] {Implementation task 3}
+- [ ] Add required data-testid attributes: {list_of_testids}
+- [ ] Run test: `{test_execution_command}`
+- [ ] ✅ Test passes (green phase)
+
+**Estimated Effort:** {effort_estimate} hours
+
+---
+
+## Running Tests
+
+```bash
+# Run all failing tests for this story
+{test_command_all}
+
+# Run specific test file
+{test_command_specific_file}
+
+# Run tests in headed mode (see browser)
+{test_command_headed}
+
+# Debug specific test
+{test_command_debug}
+
+# Run tests with coverage
+{test_command_coverage}
+```
+
+---
+
+## Red-Green-Refactor Workflow
+
+### RED Phase (Complete) ✅
+
+**TEA Agent Responsibilities:**
+
+- ✅ All tests written and failing
+- ✅ Fixtures and factories created with auto-cleanup
+- ✅ Mock requirements documented
+- ✅ data-testid requirements listed
+- ✅ Implementation checklist created
+
+**Verification:**
+
+- All tests run and fail as expected
+- Failure messages are clear and actionable
+- Tests fail due to missing implementation, not test bugs
+
+---
+
+### GREEN Phase (DEV Team - Next Steps)
+
+**DEV Agent Responsibilities:**
+
+1. **Pick one failing test** from implementation checklist (start with highest priority)
+2. **Read the test** to understand expected behavior
+3. **Implement minimal code** to make that specific test pass
+4. **Run the test** to verify it now passes (green)
+5. **Check off the task** in implementation checklist
+6. **Move to next test** and repeat
+
+**Key Principles:**
+
+- One test at a time (don't try to fix all at once)
+- Minimal implementation (don't over-engineer)
+- Run tests frequently (immediate feedback)
+- Use implementation checklist as roadmap
+
+**Progress Tracking:**
+
+- Check off tasks as you complete them
+- Share progress in daily standup
+- Mark story as IN PROGRESS in `bmm-workflow-status.md`
+
+---
+
+### REFACTOR Phase (DEV Team - After All Tests Pass)
+
+**DEV Agent Responsibilities:**
+
+1. **Verify all tests pass** (green phase complete)
+2. **Review code for quality** (readability, maintainability, performance)
+3. **Extract duplications** (DRY principle)
+4. **Optimize performance** (if needed)
+5. **Ensure tests still pass** after each refactor
+6. **Update documentation** (if API contracts change)
+
+**Key Principles:**
+
+- Tests provide safety net (refactor with confidence)
+- Make small refactors (easier to debug if tests fail)
+- Run tests after each change
+- Don't change test behavior (only implementation)
+
+**Completion:**
+
+- All tests pass
+- Code quality meets team standards
+- No duplications or code smells
+- Ready for code review and story approval
+
+---
+
+## Next Steps
+
+1. **Review this checklist** with team in standup or planning
+2. **Run failing tests** to confirm RED phase: `{test_command_all}`
+3. **Begin implementation** using implementation checklist as guide
+4. **Work one test at a time** (red → green for each)
+5. **Share progress** in daily standup
+6. **When all tests pass**, refactor code for quality
+7. **When refactoring complete**, run `bmad sm story-done` to move story to DONE
+
+---
+
+## Knowledge Base References Applied
+
+This ATDD workflow consulted the following knowledge fragments:
+
+- **fixture-architecture.md** - Test fixture patterns with setup/teardown and auto-cleanup using Playwright's `test.extend()`
+- **data-factories.md** - Factory patterns using `@faker-js/faker` for random test data generation with overrides support
+- **component-tdd.md** - Component test strategies using Playwright Component Testing
+- **network-first.md** - Route interception patterns (intercept BEFORE navigation to prevent race conditions)
+- **test-quality.md** - Test design principles (Given-When-Then, one assertion per test, determinism, isolation)
+- **test-levels-framework.md** - Test level selection framework (E2E vs API vs Component vs Unit)
+
+See `tea-index.csv` for complete knowledge fragment mapping.
+
+---
+
+## Test Execution Evidence
+
+### Initial Test Run (RED Phase Verification)
+
+**Command:** `{test_command_all}`
+
+**Results:**
+
+```
+{paste_test_run_output_showing_all_tests_failing}
+```
+
+**Summary:**
+
+- Total tests: {total_test_count}
+- Passing: 0 (expected)
+- Failing: {total_test_count} (expected)
+- Status: ✅ RED phase verified
+
+**Expected Failure Messages:**
+{list_expected_failure_messages_for_each_test}
+
+---
+
+## Notes
+
+{Any additional notes, context, or special considerations for this story}
+
+- {Note 1}
+- {Note 2}
+- {Note 3}
+
+---
+
+## Contact
+
+**Questions or Issues?**
+
+- Ask in team standup
+- Tag @{tea_agent_username} in Slack/Discord
+- Refer to `testarch/README.md` for workflow documentation
+- Consult `testarch/knowledge/` for testing best practices
+
+---
+
+**Generated by BMad TEA Agent** - {date}
--- a/bmad/bmm/workflows/testarch/atdd/checklist.md
+++ b/bmad/bmm/workflows/testarch/atdd/checklist.md
@@ -0,0 +1,373 @@
+# ATDD Workflow Validation Checklist
+
+Use this checklist to validate that the ATDD workflow has been executed correctly and all deliverables meet quality standards.
+
+## Prerequisites
+
+Before starting this workflow, verify:
+
+- [ ] Story approved with clear acceptance criteria (AC must be testable)
+- [ ] Development sandbox/environment ready
+- [ ] Framework scaffolding exists (run `framework` workflow if missing)
+- [ ] Test framework configuration available (playwright.config.ts or cypress.config.ts)
+- [ ] Package.json has test dependencies installed (Playwright or Cypress)
+
+**Halt if missing:** Framework scaffolding or story acceptance criteria
+
+---
+
+## Step 1: Story Context and Requirements
+
+- [ ] Story markdown file loaded and parsed successfully
+- [ ] All acceptance criteria identified and extracted
+- [ ] Affected systems and components identified
+- [ ] Technical constraints documented
+- [ ] Framework configuration loaded (playwright.config.ts or cypress.config.ts)
+- [ ] Test directory structure identified from config
+- [ ] Existing fixture patterns reviewed for consistency
+- [ ] Similar test patterns searched and found in `{test_dir}`
+- [ ] Knowledge base fragments loaded:
+  - [ ] `fixture-architecture.md`
+  - [ ] `data-factories.md`
+  - [ ] `component-tdd.md`
+  - [ ] `network-first.md`
+  - [ ] `test-quality.md`
+
+---
+
+## Step 2: Test Level Selection and Strategy
+
+- [ ] Each acceptance criterion analyzed for appropriate test level
+- [ ] Test level selection framework applied (E2E vs API vs Component vs Unit)
+- [ ] E2E tests: Critical user journeys and multi-system integration identified
+- [ ] API tests: Business logic and service contracts identified
+- [ ] Component tests: UI component behavior and interactions identified
+- [ ] Unit tests: Pure logic and edge cases identified (if applicable)
+- [ ] Duplicate coverage avoided (same behavior not tested at multiple levels unnecessarily)
+- [ ] Tests prioritized using P0-P3 framework (if test-design document exists)
+- [ ] Primary test level set in `primary_level` variable (typically E2E or API)
+- [ ] Test levels documented in ATDD checklist
+
+---
+
+## Step 3: Failing Tests Generated
+
+### Test File Structure Created
+
+- [ ] Test files organized in appropriate directories:
+  - [ ] `tests/e2e/` for end-to-end tests
+  - [ ] `tests/api/` for API tests
+  - [ ] `tests/component/` for component tests
+  - [ ] `tests/support/` for infrastructure (fixtures, factories, helpers)
+
+### E2E Tests (If Applicable)
+
+- [ ] E2E test files created in `tests/e2e/`
+- [ ] All tests follow Given-When-Then format
+- [ ] Tests use `data-testid` selectors (not CSS classes or fragile selectors)
+- [ ] One assertion per test (atomic test design)
+- [ ] No hard waits or sleeps (explicit waits only)
+- [ ] Network-first pattern applied (route interception BEFORE navigation)
+- [ ] Tests fail initially (RED phase verified by local test run)
+- [ ] Failure messages are clear and actionable
+
+### API Tests (If Applicable)
+
+- [ ] API test files created in `tests/api/`
+- [ ] Tests follow Given-When-Then format
+- [ ] API contracts validated (request/response structure)
+- [ ] HTTP status codes verified
+- [ ] Response body validation includes all required fields
+- [ ] Error cases tested (400, 401, 403, 404, 500)
+- [ ] Tests fail initially (RED phase verified)
+
+### Component Tests (If Applicable)
+
+- [ ] Component test files created in `tests/component/`
+- [ ] Tests follow Given-When-Then format
+- [ ] Component mounting works correctly
+- [ ] Interaction testing covers user actions (click, hover, keyboard)
+- [ ] State management within component validated
+- [ ] Props and events tested
+- [ ] Tests fail initially (RED phase verified)
+
+### Test Quality Validation
+
+- [ ] All tests use Given-When-Then structure with clear comments
+- [ ] All tests have descriptive names explaining what they test
+- [ ] No duplicate tests (same behavior tested multiple times)
+- [ ] No flaky patterns (race conditions, timing issues)
+- [ ] No test interdependencies (tests can run in any order)
+- [ ] Tests are deterministic (same input always produces same result)
+
+---
+
+## Step 4: Data Infrastructure Built
+
+### Data Factories Created
+
+- [ ] Factory files created in `tests/support/factories/`
+- [ ] All factories use `@faker-js/faker` for random data generation (no hardcoded values)
+- [ ] Factories support overrides for specific test scenarios
+- [ ] Factories generate complete valid objects matching API contracts
+- [ ] Helper functions for bulk creation provided (e.g., `createUsers(count)`)
+- [ ] Factory exports are properly typed (TypeScript)
+
+### Test Fixtures Created
+
+- [ ] Fixture files created in `tests/support/fixtures/`
+- [ ] All fixtures use Playwright's `test.extend()` pattern
+- [ ] Fixtures have setup phase (arrange test preconditions)
+- [ ] Fixtures provide data to tests via `await use(data)`
+- [ ] Fixtures have teardown phase with auto-cleanup (delete created data)
+- [ ] Fixtures are composable (can use other fixtures if needed)
+- [ ] Fixtures are isolated (each test gets fresh data)
+- [ ] Fixtures are type-safe (TypeScript types defined)
+
+### Mock Requirements Documented
+
+- [ ] External service mocking requirements identified
+- [ ] Mock endpoints documented with URLs and methods
+- [ ] Success response examples provided
+- [ ] Failure response examples provided
+- [ ] Mock requirements documented in ATDD checklist for DEV team
+
+### data-testid Requirements Listed
+
+- [ ] All required data-testid attributes identified from E2E tests
+- [ ] data-testid list organized by page or component
+- [ ] Each data-testid has clear description of element it targets
+- [ ] data-testid list included in ATDD checklist for DEV team
+
+---
+
+## Step 5: Implementation Checklist Created
+
+- [ ] Implementation checklist created with clear structure
+- [ ] Each failing test mapped to concrete implementation tasks
+- [ ] Tasks include:
+  - [ ] Route/component creation
+  - [ ] Business logic implementation
+  - [ ] API integration
+  - [ ] data-testid attribute additions
+  - [ ] Error handling
+  - [ ] Test execution command
+  - [ ] Completion checkbox
+- [ ] Red-Green-Refactor workflow documented in checklist
+- [ ] RED phase marked as complete (TEA responsibility)
+- [ ] GREEN phase tasks listed for DEV team
+- [ ] REFACTOR phase guidance provided
+- [ ] Execution commands provided:
+  - [ ] Run all tests: `npm run test:e2e`
+  - [ ] Run specific test file
+  - [ ] Run in headed mode
+  - [ ] Debug specific test
+- [ ] Estimated effort included (hours or story points)
+
+---
+
+## Step 6: Deliverables Generated
+
+### ATDD Checklist Document Created
+
+- [ ] Output file created at `{output_folder}/atdd-checklist-{story_id}.md`
+- [ ] Document follows template structure from `atdd-checklist-template.md`
+- [ ] Document includes all required sections:
+  - [ ] Story summary
+  - [ ] Acceptance criteria breakdown
+  - [ ] Failing tests created (paths and line counts)
+  - [ ] Data factories created
+  - [ ] Fixtures created
+  - [ ] Mock requirements
+  - [ ] Required data-testid attributes
+  - [ ] Implementation checklist
+  - [ ] Red-green-refactor workflow
+  - [ ] Execution commands
+  - [ ] Next steps for DEV team
+
+### All Tests Verified to Fail (RED Phase)
+
+- [ ] Full test suite run locally before finalizing
+- [ ] All tests fail as expected (RED phase confirmed)
+- [ ] No tests passing before implementation (if passing, test is invalid)
+- [ ] Failure messages documented in ATDD checklist
+- [ ] Failures are due to missing implementation, not test bugs
+- [ ] Test run output captured for reference
+
+### Summary Provided
+
+- [ ] Summary includes:
+  - [ ] Story ID
+  - [ ] Primary test level
+  - [ ] Test counts (E2E, API, Component)
+  - [ ] Test file paths
+  - [ ] Factory count
+  - [ ] Fixture count
+  - [ ] Mock requirements count
+  - [ ] data-testid count
+  - [ ] Implementation task count
+  - [ ] Estimated effort
+  - [ ] Next steps for DEV team
+  - [ ] Output file path
+  - [ ] Knowledge base references applied
+
+---
+
+## Quality Checks
+
+### Test Design Quality
+
+- [ ] Tests are readable (clear Given-When-Then structure)
+- [ ] Tests are maintainable (use factories and fixtures, not hardcoded data)
+- [ ] Tests are isolated (no shared state between tests)
+- [ ] Tests are deterministic (no race conditions or flaky patterns)
+- [ ] Tests are atomic (one assertion per test)
+- [ ] Tests are fast (no unnecessary waits or delays)
+
+### Knowledge Base Integration
+
+- [ ] fixture-architecture.md patterns applied to all fixtures
+- [ ] data-factories.md patterns applied to all factories
+- [ ] network-first.md patterns applied to E2E tests with network requests
+- [ ] component-tdd.md patterns applied to component tests
+- [ ] test-quality.md principles applied to all test design
+
+### Code Quality
+
+- [ ] All TypeScript types are correct and complete
+- [ ] No linting errors in generated test files
+- [ ] Consistent naming conventions followed
+- [ ] Imports are organized and correct
+- [ ] Code follows project style guide
+
+---
+
+## Integration Points
+
+### With DEV Agent
+
+- [ ] ATDD checklist provides clear implementation guidance
+- [ ] Implementation tasks are granular and actionable
+- [ ] data-testid requirements are complete and clear
+- [ ] Mock requirements include all necessary details
+- [ ] Execution commands work correctly
+
+### With Story Workflow
+
+- [ ] Story ID correctly referenced in output files
+- [ ] Acceptance criteria from story accurately reflected in tests
+- [ ] Technical constraints from story considered in test design
+
+### With Framework Workflow
+
+- [ ] Test framework configuration correctly detected and used
+- [ ] Directory structure matches framework setup
+- [ ] Fixtures and helpers follow established patterns
+- [ ] Naming conventions consistent with framework standards
+
+### With test-design Workflow (If Available)
+
+- [ ] P0 scenarios from test-design prioritized in ATDD
+- [ ] Risk assessment from test-design considered in test coverage
+- [ ] Coverage strategy from test-design aligned with ATDD tests
+
+---
+
+## Completion Criteria
+
+All of the following must be true before marking this workflow as complete:
+
+- [ ] **Story acceptance criteria analyzed** and mapped to appropriate test levels
+- [ ] **Failing tests created** at all appropriate levels (E2E, API, Component)
+- [ ] **Given-When-Then format** used consistently across all tests
+- [ ] **RED phase verified** by local test run (all tests failing as expected)
+- [ ] **Network-first pattern** applied to E2E tests with network requests
+- [ ] **Data factories created** using faker (no hardcoded test data)
+- [ ] **Fixtures created** with auto-cleanup in teardown
+- [ ] **Mock requirements documented** for external services
+- [ ] **data-testid attributes listed** for DEV team
+- [ ] **Implementation checklist created** mapping tests to code tasks
+- [ ] **Red-green-refactor workflow documented** in ATDD checklist
+- [ ] **Execution commands provided** and verified to work
+- [ ] **ATDD checklist document created** and saved to correct location
+- [ ] **Output file formatted correctly** using template structure
+- [ ] **Knowledge base references applied** and documented in summary
+- [ ] **No test quality issues** (flaky patterns, race conditions, hardcoded data)
+
+---
+
+## Common Issues and Resolutions
+
+### Issue: Tests pass before implementation
+
+**Problem:** A test passes even though no implementation code exists yet.
+
+**Resolution:**
+
+- Review test to ensure it's testing actual behavior, not mocked/stubbed behavior
+- Check if test is accidentally using existing functionality
+- Verify test assertions are correct and meaningful
+- Rewrite test to fail until implementation is complete
+
+### Issue: Network-first pattern not applied
+
+**Problem:** Route interception happens after navigation, causing race conditions.
+
+**Resolution:**
+
+- Move `await page.route()` calls BEFORE `await page.goto()`
+- Review `network-first.md` knowledge fragment
+- Update all E2E tests to follow network-first pattern
+
+### Issue: Hardcoded test data in tests
+
+**Problem:** Tests use hardcoded strings/numbers instead of factories.
+
+**Resolution:**
+
+- Replace all hardcoded data with factory function calls
+- Use `faker` for all random data generation
+- Update data-factories to support all required test scenarios
+
+### Issue: Fixtures missing auto-cleanup
+
+**Problem:** Fixtures create data but don't clean it up in teardown.
+
+**Resolution:**
+
+- Add cleanup logic after `await use(data)` in fixture
+- Call deletion/cleanup functions in teardown
+- Verify cleanup works by checking database/storage after test run
+
+### Issue: Tests have multiple assertions
+
+**Problem:** Tests verify multiple behaviors in single test (not atomic).
+
+**Resolution:**
+
+- Split into separate tests (one assertion per test)
+- Each test should verify exactly one behavior
+- Use descriptive test names to clarify what each test verifies
+
+### Issue: Tests depend on execution order
+
+**Problem:** Tests fail when run in isolation or different order.
+
+**Resolution:**
+
+- Remove shared state between tests
+- Each test should create its own test data
+- Use fixtures for consistent setup across tests
+- Verify tests can run with `.only` flag
+
+---
+
+## Notes for TEA Agent
+
+- **Preflight halt is critical:** Do not proceed if story has no acceptance criteria or framework is missing
+- **RED phase verification is mandatory:** Tests must fail before sharing with DEV team
+- **Network-first pattern:** Route interception BEFORE navigation prevents race conditions
+- **One assertion per test:** Atomic tests provide clear failure diagnosis
+- **Auto-cleanup is non-negotiable:** Every fixture must clean up data in teardown
+- **Use knowledge base:** Load relevant fragments (fixture-architecture, data-factories, network-first, component-tdd, test-quality) for guidance
+- **Share with DEV agent:** ATDD checklist provides implementation roadmap from red to green
--- a/bmad/bmm/workflows/testarch/atdd/instructions.md
+++ b/bmad/bmm/workflows/testarch/atdd/instructions.md
@@ -0,0 +1,785 @@
+<!-- Powered by BMAD-CORE™ -->
+
+# Acceptance Test-Driven Development (ATDD)
+
+**Workflow ID**: `bmad/bmm/testarch/atdd`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Generates failing acceptance tests BEFORE implementation following TDD's red-green-refactor cycle. This workflow creates comprehensive test coverage at appropriate levels (E2E, API, Component) with supporting infrastructure (fixtures, factories, mocks) and provides an implementation checklist to guide development.
+
+**Core Principle**: Tests fail first (red phase), then guide development to green, then enable confident refactoring.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ Story approved with clear acceptance criteria
+- ✅ Development sandbox/environment ready
+- ✅ Framework scaffolding exists (run `framework` workflow if missing)
+- ✅ Test framework configuration available (playwright.config.ts or cypress.config.ts)
+
+---
+
+## Step 1: Load Story Context and Requirements
+
+### Actions
+
+1. **Read Story Markdown**
+   - Load story file from `{story_file}` variable
+   - Extract acceptance criteria (all testable requirements)
+   - Identify affected systems and components
+   - Note any technical constraints or dependencies
+
+2. **Load Framework Configuration**
+   - Read framework config (playwright.config.ts or cypress.config.ts)
+   - Identify test directory structure
+   - Check existing fixture patterns
+   - Note test runner capabilities
+
+3. **Load Existing Test Patterns**
+   - Search `{test_dir}` for similar tests
+   - Identify reusable fixtures and helpers
+   - Check data factory patterns
+   - Note naming conventions
+
+4. **Load Knowledge Base Fragments**
+
+   **Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to load:
+   - `fixture-architecture.md` - Test fixture patterns with auto-cleanup (pure function → fixture → mergeTests composition, 406 lines, 5 examples)
+   - `data-factories.md` - Factory patterns using faker (override patterns, nested factories, API seeding, 498 lines, 5 examples)
+   - `component-tdd.md` - Component test strategies (red-green-refactor, provider isolation, accessibility, visual regression, 480 lines, 4 examples)
+   - `network-first.md` - Route interception patterns (intercept before navigate, HAR capture, deterministic waiting, 489 lines, 5 examples)
+   - `test-quality.md` - Test design principles (deterministic tests, isolated with cleanup, explicit assertions, length limits, execution time optimization, 658 lines, 5 examples)
+   - `test-healing-patterns.md` - Common failure patterns and healing strategies (stale selectors, race conditions, dynamic data, network errors, hard waits, 648 lines, 5 examples)
+   - `selector-resilience.md` - Selector best practices (data-testid > ARIA > text > CSS hierarchy, dynamic patterns, anti-patterns, 541 lines, 4 examples)
+   - `timing-debugging.md` - Race condition prevention and async debugging (network-first, deterministic waiting, anti-patterns, 370 lines, 3 examples)
+
+**Halt Condition:** If story has no acceptance criteria or framework is missing, HALT with message: "ATDD requires clear acceptance criteria and test framework setup"
+
+---
+
+## Step 1.5: Generation Mode Selection (NEW - Phase 2.5)
+
+### Actions
+
+1. **Detect Generation Mode**
+
+   Determine mode based on scenario complexity:
+
+   **AI Generation Mode (DEFAULT)**:
+   - Clear acceptance criteria with standard patterns
+   - Uses: AI-generated tests from requirements
+   - Appropriate for: CRUD, auth, navigation, API tests
+   - Fastest approach
+
+   **Recording Mode (OPTIONAL - Complex UI)**:
+   - Complex UI interactions (drag-drop, wizards, multi-page flows)
+   - Uses: Interactive test recording with Playwright MCP
+   - Appropriate for: Visual workflows, unclear requirements
+   - Only if config.tea_use_mcp_enhancements is true AND MCP available
+
+2. **AI Generation Mode (DEFAULT - Continue to Step 2)**
+
+   For standard scenarios:
+   - Continue with existing workflow (Step 2: Select Test Levels and Strategy)
+   - AI generates tests based on acceptance criteria from Step 1
+   - Use knowledge base patterns for test structure
+
+3. **Recording Mode (OPTIONAL - Complex UI Only)**
+
+   For complex UI scenarios AND config.tea_use_mcp_enhancements is true:
+
+   **A. Check MCP Availability**
+
+   If Playwright MCP tools are available in your IDE:
+   - Use MCP recording mode (Step 3.B)
+
+   If MCP unavailable:
+   - Fallback to AI generation mode (silent, automatic)
+   - Continue to Step 2
+
+   **B. Interactive Test Recording (MCP-Based)**
+
+   Use Playwright MCP test-generator tools:
+
+   **Setup:**
+
+   ```
+   1. Use generator_setup_page to initialize recording session
+   2. Navigate to application starting URL (from story context)
+   3. Ready to record user interactions
+   ```
+
+   **Recording Process (Per Acceptance Criterion):**
+
+   ```
+   4. Read acceptance criterion from story
+   5. Manually execute test scenario using browser_* tools:
+      - browser_navigate: Navigate to pages
+      - browser_click: Click buttons, links, elements
+      - browser_type: Fill form fields
+      - browser_select: Select dropdown options
+      - browser_check: Check/uncheck checkboxes
+   6. Add verification steps using browser_verify_* tools:
+      - browser_verify_text: Verify text content
+      - browser_verify_visible: Verify element visibility
+      - browser_verify_url: Verify URL navigation
+   7. Capture interaction log with generator_read_log
+   8. Generate test file with generator_write_test
+   9. Repeat for next acceptance criterion
+   ```
+
+   **Post-Recording Enhancement:**
+
+   ```
+   10. Review generated test code
+   11. Enhance with knowledge base patterns:
+       - Add Given-When-Then comments
+       - Replace recorded selectors with data-testid (if needed)
+       - Add network-first interception (from network-first.md)
+       - Add fixtures for auth/data setup (from fixture-architecture.md)
+       - Use factories for test data (from data-factories.md)
+   12. Verify tests fail (missing implementation)
+   13. Continue to Step 4 (Build Data Infrastructure)
+   ```
+
+   **When to Use Recording Mode:**
+   - ✅ Complex UI interactions (drag-drop, multi-step forms, wizards)
+   - ✅ Visual workflows (modals, dialogs, animations)
+   - ✅ Unclear requirements (exploratory, discovering expected behavior)
+   - ✅ Multi-page flows (checkout, registration, onboarding)
+   - ❌ NOT for simple CRUD (AI generation faster)
+   - ❌ NOT for API-only tests (no UI to record)
+
+   **When to Use AI Generation (Default):**
+   - ✅ Clear acceptance criteria available
+   - ✅ Standard patterns (login, CRUD, navigation)
+   - ✅ Need many tests quickly
+   - ✅ API/backend tests (no UI interaction)
+
+4. **Proceed to Test Level Selection**
+
+   After mode selection:
+   - AI Generation: Continue to Step 2 (Select Test Levels and Strategy)
+   - Recording: Skip to Step 4 (Build Data Infrastructure) - tests already generated
+
+---
+
+## Step 2: Select Test Levels and Strategy
+
+### Actions
+
+1. **Analyze Acceptance Criteria**
+
+   For each acceptance criterion, determine:
+   - Does it require full user journey? → E2E test
+   - Does it test business logic/API contract? → API test
+   - Does it validate UI component behavior? → Component test
+   - Can it be unit tested? → Unit test
+
+2. **Apply Test Level Selection Framework**
+
+   **Knowledge Base Reference**: `test-levels-framework.md`
+
+   **E2E (End-to-End)**:
+   - Critical user journeys (login, checkout, core workflow)
+   - Multi-system integration
+   - User-facing acceptance criteria
+   - **Characteristics**: High confidence, slow execution, brittle
+
+   **API (Integration)**:
+   - Business logic validation
+   - Service contracts
+   - Data transformations
+   - **Characteristics**: Fast feedback, good balance, stable
+
+   **Component**:
+   - UI component behavior (buttons, forms, modals)
+   - Interaction testing
+   - Visual regression
+   - **Characteristics**: Fast, isolated, granular
+
+   **Unit**:
+   - Pure business logic
+   - Edge cases
+   - Error handling
+   - **Characteristics**: Fastest, most granular
+
+3. **Avoid Duplicate Coverage**
+
+   Don't test same behavior at multiple levels unless necessary:
+   - Use E2E for critical happy path only
+   - Use API tests for complex business logic variations
+   - Use component tests for UI interaction edge cases
+   - Use unit tests for pure logic edge cases
+
+4. **Prioritize Tests**
+
+   If test-design document exists, align with priority levels:
+   - P0 scenarios → Must cover in failing tests
+   - P1 scenarios → Should cover if time permits
+   - P2/P3 scenarios → Optional for this iteration
+
+**Decision Point:** Set `primary_level` variable to main test level for this story (typically E2E or API)
+
+---
+
+## Step 3: Generate Failing Tests
+
+### Actions
+
+1. **Create Test File Structure**
+
+   ```
+   tests/
+   ├── e2e/
+   │   └── {feature-name}.spec.ts        # E2E acceptance tests
+   ├── api/
+   │   └── {feature-name}.api.spec.ts    # API contract tests
+   ├── component/
+   │   └── {ComponentName}.test.tsx      # Component tests
+   └── support/
+       ├── fixtures/                      # Test fixtures
+       ├── factories/                     # Data factories
+       └── helpers/                       # Utility functions
+   ```
+
+2. **Write Failing E2E Tests (If Applicable)**
+
+   **Use Given-When-Then format:**
+
+   ```typescript
+   import { test, expect } from '@playwright/test';
+
+   test.describe('User Login', () => {
+     test('should display error for invalid credentials', async ({ page }) => {
+       // GIVEN: User is on login page
+       await page.goto('/login');
+
+       // WHEN: User submits invalid credentials
+       await page.fill('[data-testid="email-input"]', 'invalid@example.com');
+       await page.fill('[data-testid="password-input"]', 'wrongpassword');
+       await page.click('[data-testid="login-button"]');
+
+       // THEN: Error message is displayed
+       await expect(page.locator('[data-testid="error-message"]')).toHaveText('Invalid email or password');
+     });
+   });
+   ```
+
+   **Critical patterns:**
+   - One assertion per test (atomic tests)
+   - Explicit waits (no hard waits/sleeps)
+   - Network-first approach (route interception before navigation)
+   - data-testid selectors for stability
+   - Clear Given-When-Then structure
+
+3. **Apply Network-First Pattern**
+
+   **Knowledge Base Reference**: `network-first.md`
+
+   ```typescript
+   test('should load user dashboard after login', async ({ page }) => {
+     // CRITICAL: Intercept routes BEFORE navigation
+     await page.route('**/api/user', (route) =>
+       route.fulfill({
+         status: 200,
+         body: JSON.stringify({ id: 1, name: 'Test User' }),
+       }),
+     );
+
+     // NOW navigate
+     await page.goto('/dashboard');
+
+     await expect(page.locator('[data-testid="user-name"]')).toHaveText('Test User');
+   });
+   ```
+
+4. **Write Failing API Tests (If Applicable)**
+
+   ```typescript
+   import { test, expect } from '@playwright/test';
+
+   test.describe('User API', () => {
+     test('POST /api/users - should create new user', async ({ request }) => {
+       // GIVEN: Valid user data
+       const userData = {
+         email: 'newuser@example.com',
+         name: 'New User',
+       };
+
+       // WHEN: Creating user via API
+       const response = await request.post('/api/users', {
+         data: userData,
+       });
+
+       // THEN: User is created successfully
+       expect(response.status()).toBe(201);
+       const body = await response.json();
+       expect(body).toMatchObject({
+         email: userData.email,
+         name: userData.name,
+         id: expect.any(Number),
+       });
+     });
+   });
+   ```
+
+5. **Write Failing Component Tests (If Applicable)**
+
+   **Knowledge Base Reference**: `component-tdd.md`
+
+   ```typescript
+   import { test, expect } from '@playwright/experimental-ct-react';
+   import { LoginForm } from './LoginForm';
+
+   test.describe('LoginForm Component', () => {
+     test('should disable submit button when fields are empty', async ({ mount }) => {
+       // GIVEN: LoginForm is mounted
+       const component = await mount(<LoginForm />);
+
+       // WHEN: Form is initially rendered
+       const submitButton = component.locator('button[type="submit"]');
+
+       // THEN: Submit button is disabled
+       await expect(submitButton).toBeDisabled();
+     });
+   });
+   ```
+
+6. **Verify Tests Fail Initially**
+
+   **Critical verification:**
+   - Run tests locally to confirm they fail
+   - Failure should be due to missing implementation, not test errors
+   - Failure messages should be clear and actionable
+   - All tests must be in RED phase before sharing with DEV
+
+**Important:** Tests MUST fail initially. If a test passes before implementation, it's not a valid acceptance test.
+
+---
+
+## Step 4: Build Data Infrastructure
+
+### Actions
+
+1. **Create Data Factories**
+
+   **Knowledge Base Reference**: `data-factories.md`
+
+   ```typescript
+   // tests/support/factories/user.factory.ts
+   import { faker } from '@faker-js/faker';
+
+   export const createUser = (overrides = {}) => ({
+     id: faker.number.int(),
+     email: faker.internet.email(),
+     name: faker.person.fullName(),
+     createdAt: faker.date.recent().toISOString(),
+     ...overrides,
+   });
+
+   export const createUsers = (count: number) => Array.from({ length: count }, () => createUser());
+   ```
+
+   **Factory principles:**
+   - Use faker for random data (no hardcoded values)
+   - Support overrides for specific scenarios
+   - Generate complete valid objects
+   - Include helper functions for bulk creation
+
+2. **Create Test Fixtures**
+
+   **Knowledge Base Reference**: `fixture-architecture.md`
+
+   ```typescript
+   // tests/support/fixtures/auth.fixture.ts
+   import { test as base } from '@playwright/test';
+
+   export const test = base.extend({
+     authenticatedUser: async ({ page }, use) => {
+       // Setup: Create and authenticate user
+       const user = await createUser();
+       await page.goto('/login');
+       await page.fill('[data-testid="email"]', user.email);
+       await page.fill('[data-testid="password"]', 'password123');
+       await page.click('[data-testid="login-button"]');
+       await page.waitForURL('/dashboard');
+
+       // Provide to test
+       await use(user);
+
+       // Cleanup: Delete user
+       await deleteUser(user.id);
+     },
+   });
+   ```
+
+   **Fixture principles:**
+   - Auto-cleanup (always delete created data)
+   - Composable (fixtures can use other fixtures)
+   - Isolated (each test gets fresh data)
+   - Type-safe
+
+3. **Document Mock Requirements**
+
+   If external services need mocking, document requirements:
+
+   ```markdown
+   ### Mock Requirements for DEV Team
+
+   **Payment Gateway Mock**:
+
+   - Endpoint: `POST /api/payments`
+   - Success response: `{ status: 'success', transactionId: '123' }`
+   - Failure response: `{ status: 'failed', error: 'Insufficient funds' }`
+
+   **Email Service Mock**:
+
+   - Should not send real emails in test environment
+   - Log email contents for verification
+   ```
+
+4. **List Required data-testid Attributes**
+
+   ```markdown
+   ### Required data-testid Attributes
+
+   **Login Page**:
+
+   - `email-input` - Email input field
+   - `password-input` - Password input field
+   - `login-button` - Submit button
+   - `error-message` - Error message container
+
+   **Dashboard Page**:
+
+   - `user-name` - User name display
+   - `logout-button` - Logout button
+   ```
+
+---
+
+## Step 5: Create Implementation Checklist
+
+### Actions
+
+1. **Map Tests to Implementation Tasks**
+
+   For each failing test, create corresponding implementation task:
+
+   ```markdown
+   ## Implementation Checklist
+
+   ### Epic X - User Authentication
+
+   #### Test: User Login with Valid Credentials
+
+   - [ ] Create `/login` route
+   - [ ] Implement login form component
+   - [ ] Add email/password validation
+   - [ ] Integrate authentication API
+   - [ ] Add `data-testid` attributes: `email-input`, `password-input`, `login-button`
+   - [ ] Implement error handling
+   - [ ] Run test: `npm run test:e2e -- login.spec.ts`
+   - [ ] ✅ Test passes (green phase)
+
+   #### Test: Display Error for Invalid Credentials
+
+   - [ ] Add error state management
+   - [ ] Display error message UI
+   - [ ] Add `data-testid="error-message"`
+   - [ ] Run test: `npm run test:e2e -- login.spec.ts`
+   - [ ] ✅ Test passes (green phase)
+   ```
+
+2. **Include Red-Green-Refactor Guidance**
+
+   ```markdown
+   ## Red-Green-Refactor Workflow
+
+   **RED Phase** (Complete):
+
+   - ✅ All tests written and failing
+   - ✅ Fixtures and factories created
+   - ✅ Mock requirements documented
+
+   **GREEN Phase** (DEV Team):
+
+   1. Pick one failing test
+   2. Implement minimal code to make it pass
+   3. Run test to verify green
+   4. Move to next test
+   5. Repeat until all tests pass
+
+   **REFACTOR Phase** (DEV Team):
+
+   1. All tests passing (green)
+   2. Improve code quality
+   3. Extract duplications
+   4. Optimize performance
+   5. Ensure tests still pass
+   ```
+
+3. **Add Execution Commands**
+
+   ````markdown
+   ## Running Tests
+
+   ```bash
+   # Run all failing tests
+   npm run test:e2e
+
+   # Run specific test file
+   npm run test:e2e -- login.spec.ts
+
+   # Run tests in headed mode (see browser)
+   npm run test:e2e -- --headed
+
+   # Debug specific test
+   npm run test:e2e -- login.spec.ts --debug
+   ```
+   ````
+
+   ```
+
+   ```
+
+---
+
+## Step 6: Generate Deliverables
+
+### Actions
+
+1. **Create ATDD Checklist Document**
+
+   Use template structure at `{installed_path}/atdd-checklist-template.md`:
+   - Story summary
+   - Acceptance criteria breakdown
+   - Test files created (with paths)
+   - Data factories created
+   - Fixtures created
+   - Mock requirements
+   - Required data-testid attributes
+   - Implementation checklist
+   - Red-green-refactor workflow
+   - Execution commands
+
+2. **Verify All Tests Fail**
+
+   Before finalizing:
+   - Run full test suite locally
+   - Confirm all tests in RED phase
+   - Document expected failure messages
+   - Ensure failures are due to missing implementation, not test bugs
+
+3. **Write to Output File**
+
+   Save to `{output_folder}/atdd-checklist-{story_id}.md`
+
+---
+
+## Important Notes
+
+### Red-Green-Refactor Cycle
+
+**RED Phase** (TEA responsibility):
+
+- Write failing tests first
+- Tests define expected behavior
+- Tests must fail for right reason (missing implementation)
+
+**GREEN Phase** (DEV responsibility):
+
+- Implement minimal code to pass tests
+- One test at a time
+- Don't over-engineer
+
+**REFACTOR Phase** (DEV responsibility):
+
+- Improve code quality with confidence
+- Tests provide safety net
+- Extract duplications, optimize
+
+### Given-When-Then Structure
+
+**GIVEN** (Setup):
+
+- Arrange test preconditions
+- Create necessary data
+- Navigate to starting point
+
+**WHEN** (Action):
+
+- Execute the behavior being tested
+- Single action per test
+
+**THEN** (Assertion):
+
+- Verify expected outcome
+- One assertion per test (atomic)
+
+### Network-First Testing
+
+**Critical pattern:**
+
+```typescript
+// ✅ CORRECT: Intercept BEFORE navigation
+await page.route('**/api/data', handler);
+await page.goto('/page');
+
+// ❌ WRONG: Navigate then intercept (race condition)
+await page.goto('/page');
+await page.route('**/api/data', handler); // Too late!
+```
+
+### Data Factory Best Practices
+
+**Use faker for all test data:**
+
+```typescript
+// ✅ CORRECT: Random data
+email: faker.internet.email();
+
+// ❌ WRONG: Hardcoded data (collisions, maintenance burden)
+email: 'test@example.com';
+```
+
+**Auto-cleanup principle:**
+
+- Every factory that creates data must provide cleanup
+- Fixtures automatically cleanup in teardown
+- No manual cleanup in test code
+
+### One Assertion Per Test
+
+**Atomic test design:**
+
+```typescript
+// ✅ CORRECT: One assertion
+test('should display user name', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+});
+
+// ❌ WRONG: Multiple assertions (not atomic)
+test('should display user info', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+  await expect(page.locator('[data-testid="user-email"]')).toHaveText('john@example.com');
+});
+```
+
+**Why?** If second assertion fails, you don't know if first is still valid.
+
+### Component Test Strategy
+
+**When to use component tests:**
+
+- Complex UI interactions (drag-drop, keyboard nav)
+- Form validation logic
+- State management within component
+- Visual edge cases
+
+**When NOT to use:**
+
+- Simple rendering (snapshot tests are sufficient)
+- Integration with backend (use E2E or API tests)
+- Full user journeys (use E2E tests)
+
+### Knowledge Base Integration
+
+**Core Fragments (Auto-loaded in Step 1):**
+
+- `fixture-architecture.md` - Pure function → fixture → mergeTests patterns (406 lines, 5 examples)
+- `data-factories.md` - Factory patterns with faker, overrides, API seeding (498 lines, 5 examples)
+- `component-tdd.md` - Red-green-refactor, provider isolation, accessibility, visual regression (480 lines, 4 examples)
+- `network-first.md` - Intercept before navigate, HAR capture, deterministic waiting (489 lines, 5 examples)
+- `test-quality.md` - Deterministic tests, cleanup, explicit assertions, length/time limits (658 lines, 5 examples)
+- `test-healing-patterns.md` - Common failure patterns: stale selectors, race conditions, dynamic data, network errors, hard waits (648 lines, 5 examples)
+- `selector-resilience.md` - Selector hierarchy (data-testid > ARIA > text > CSS), dynamic patterns, anti-patterns (541 lines, 4 examples)
+- `timing-debugging.md` - Race condition prevention, deterministic waiting, async debugging (370 lines, 3 examples)
+
+**Reference for Test Level Selection:**
+
+- `test-levels-framework.md` - E2E vs API vs Component vs Unit decision framework (467 lines, 4 examples)
+
+**Manual Reference (Optional):**
+
+- Use `tea-index.csv` to find additional specialized fragments as needed
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## ATDD Complete - Tests in RED Phase
+
+**Story**: {story_id}
+**Primary Test Level**: {primary_level}
+
+**Failing Tests Created**:
+
+- E2E tests: {e2e_count} tests in {e2e_files}
+- API tests: {api_count} tests in {api_files}
+- Component tests: {component_count} tests in {component_files}
+
+**Supporting Infrastructure**:
+
+- Data factories: {factory_count} factories created
+- Fixtures: {fixture_count} fixtures with auto-cleanup
+- Mock requirements: {mock_count} services documented
+
+**Implementation Checklist**:
+
+- Total tasks: {task_count}
+- Estimated effort: {effort_estimate} hours
+
+**Required data-testid Attributes**: {data_testid_count} attributes documented
+
+**Next Steps for DEV Team**:
+
+1. Run failing tests: `npm run test:e2e`
+2. Review implementation checklist
+3. Implement one test at a time (RED → GREEN)
+4. Refactor with confidence (tests provide safety net)
+5. Share progress in daily standup
+
+**Output File**: {output_file}
+
+**Knowledge Base References Applied**:
+
+- Fixture architecture patterns
+- Data factory patterns with faker
+- Network-first route interception
+- Component TDD strategies
+- Test quality principles
+```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] Story acceptance criteria analyzed and mapped to tests
+- [ ] Appropriate test levels selected (E2E, API, Component)
+- [ ] All tests written in Given-When-Then format
+- [ ] All tests fail initially (RED phase verified)
+- [ ] Network-first pattern applied (route interception before navigation)
+- [ ] Data factories created with faker
+- [ ] Fixtures created with auto-cleanup
+- [ ] Mock requirements documented for DEV team
+- [ ] Required data-testid attributes listed
+- [ ] Implementation checklist created with clear tasks
+- [ ] Red-green-refactor workflow documented
+- [ ] Execution commands provided
+- [ ] Output file created and formatted correctly
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/bmad/bmm/workflows/testarch/atdd/workflow.yaml
+++ b/bmad/bmm/workflows/testarch/atdd/workflow.yaml
@@ -0,0 +1,52 @@
+# Test Architect workflow: atdd
+name: testarch-atdd
+description: "Generate failing acceptance tests before implementation using TDD red-green-refactor cycle"
+author: "BMad"
+
+# Critical variables from config
+config_source: "{project-root}/bmad/bmm/config.yaml"
+output_folder: "{config_source}:output_folder"
+user_name: "{config_source}:user_name"
+communication_language: "{config_source}:communication_language"
+document_output_language: "{config_source}:document_output_language"
+date: system-generated
+
+# Workflow components
+installed_path: "{project-root}/bmad/bmm/workflows/testarch/atdd"
+instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/atdd-checklist-template.md"
+
+# Variables and inputs
+variables:
+  test_dir: "{project-root}/tests" # Root test directory
+
+# Output configuration
+default_output_file: "{output_folder}/atdd-checklist-{story_id}.md"
+
+# Required tools
+required_tools:
+  - read_file # Read story markdown, framework config
+  - write_file # Create test files, checklist, factory stubs
+  - create_directory # Create test directories
+  - list_files # Find existing fixtures and helpers
+  - search_repo # Search for similar test patterns
+
+# Recommended inputs
+recommended_inputs:
+  - story: "Story markdown with acceptance criteria (required)"
+  - framework_config: "Test framework configuration (playwright.config.ts, cypress.config.ts)"
+  - existing_fixtures: "Current fixture patterns for consistency"
+  - test_design: "Test design document (optional, for risk/priority context)"
+
+tags:
+  - qa
+  - atdd
+  - test-architect
+  - tdd
+  - red-green-refactor
+
+execution_hints:
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
+  iterative: true
--- a/bmad/bmm/workflows/testarch/automate/README.md
+++ b/bmad/bmm/workflows/testarch/automate/README.md
@@ -0,0 +1,869 @@
+# Automate Workflow
+
+Expands test automation coverage by generating comprehensive test suites at appropriate levels (E2E, API, Component, Unit) with supporting infrastructure. This workflow operates in **dual mode** - works seamlessly WITH or WITHOUT BMad artifacts.
+
+**Core Principle**: Generate prioritized, deterministic tests that avoid duplicate coverage and follow testing best practices.
+
+## Usage
+
+```bash
+bmad tea *automate
+```
+
+The TEA agent runs this workflow when:
+
+- **BMad-Integrated**: After story implementation to expand coverage beyond ATDD tests
+- **Standalone**: Point at any codebase/feature and generate tests independently ("work out of thin air")
+- **Auto-discover**: No targets specified - scans codebase for features needing tests
+
+## Inputs
+
+**Execution Modes:**
+
+1. **BMad-Integrated Mode** (story available) - OPTIONAL
+2. **Standalone Mode** (no BMad artifacts) - Direct code analysis
+3. **Auto-discover Mode** (no targets) - Scan for coverage gaps
+
+**Required Context Files:**
+
+- **Framework configuration**: Test framework config (playwright.config.ts or cypress.config.ts) - REQUIRED
+
+**Optional Context (BMad-Integrated Mode):**
+
+- **Story markdown** (`{story_file}`): User story with acceptance criteria (enhances coverage targeting but NOT required)
+- **Tech spec**: Technical specification (provides architectural context)
+- **Test design**: Risk/priority context (P0-P3 alignment)
+- **PRD**: Product requirements (business context)
+
+**Optional Context (Standalone Mode):**
+
+- **Source code**: Feature implementation to analyze
+- **Existing tests**: Current test suite for gap analysis
+
+**Workflow Variables:**
+
+- `standalone_mode`: Can work without BMad artifacts (default: true)
+- `story_file`: Path to story markdown (optional)
+- `target_feature`: Feature name or directory to analyze (e.g., "user-authentication" or "src/auth/")
+- `target_files`: Specific files to analyze (comma-separated paths)
+- `test_dir`: Directory for test files (default: `{project-root}/tests`)
+- `source_dir`: Source code directory (default: `{project-root}/src`)
+- `auto_discover_features`: Automatically find features needing tests (default: true)
+- `analyze_coverage`: Check existing test coverage gaps (default: true)
+- `coverage_target`: Coverage strategy - "critical-paths", "comprehensive", "selective" (default: "critical-paths")
+- `test_levels`: Which levels to generate - "e2e,api,component,unit" (default: all)
+- `avoid_duplicate_coverage`: Don't test same behavior at multiple levels (default: true)
+- `include_p0`: Include P0 critical path tests (default: true)
+- `include_p1`: Include P1 high priority tests (default: true)
+- `include_p2`: Include P2 medium priority tests (default: true)
+- `include_p3`: Include P3 low priority tests (default: false)
+- `use_given_when_then`: BDD-style test structure (default: true)
+- `one_assertion_per_test`: Atomic test design (default: true)
+- `network_first`: Route interception before navigation (default: true)
+- `deterministic_waits`: No hard waits or sleeps (default: true)
+- `generate_fixtures`: Create/enhance fixture architecture (default: true)
+- `generate_factories`: Create/enhance data factories (default: true)
+- `update_helpers`: Add utility functions (default: true)
+- `use_test_design`: Load test-design.md if exists (default: true)
+- `use_tech_spec`: Load tech-spec.md if exists (default: true)
+- `use_prd`: Load PRD.md if exists (default: true)
+- `update_readme`: Update test README with new specs (default: true)
+- `update_package_scripts`: Add test execution scripts (default: true)
+- `output_summary`: Path for automation summary (default: `{output_folder}/automation-summary.md`)
+- `max_test_duration`: Maximum seconds per test (default: 90)
+- `max_file_lines`: Maximum lines per test file (default: 300)
+- `require_self_cleaning`: All tests must clean up data (default: true)
+- `auto_load_knowledge`: Load relevant knowledge fragments (default: true)
+- `run_tests_after_generation`: Verify tests pass/fail as expected (default: true)
+- `auto_validate`: Run generated tests after creation (default: true) **NEW**
+- `auto_heal_failures`: Enable automatic healing (default: false, opt-in) **NEW**
+- `max_healing_iterations`: Maximum healing attempts per test (default: 3) **NEW**
+- `fail_on_unhealable`: Fail workflow if tests can't be healed (default: false) **NEW**
+- `mark_unhealable_as_fixme`: Mark unfixable tests with test.fixme() (default: true) **NEW**
+- `use_mcp_healing`: Use Playwright MCP if available (default: true) **NEW**
+- `healing_knowledge_fragments`: Healing patterns to load (default: "test-healing-patterns,selector-resilience,timing-debugging") **NEW**
+
+## Outputs
+
+**Primary Deliverable:**
+
+- **Automation Summary** (`automation-summary.md`): Comprehensive report containing:
+  - Execution mode (BMad-Integrated, Standalone, Auto-discover)
+  - Feature analysis (source files analyzed, coverage gaps)
+  - Tests created (E2E, API, Component, Unit) with counts and paths
+  - Infrastructure created (fixtures, factories, helpers)
+  - Test execution instructions
+  - Coverage analysis (P0-P3 breakdown, coverage percentage)
+  - Definition of Done checklist
+  - Next steps and recommendations
+
+**Test Files Created:**
+
+- **E2E tests** (`tests/e2e/{feature-name}.spec.ts`): Critical user journeys (P0-P1)
+- **API tests** (`tests/api/{feature-name}.api.spec.ts`): Business logic and contracts (P1-P2)
+- **Component tests** (`tests/component/{ComponentName}.test.tsx`): UI behavior (P1-P2)
+- **Unit tests** (`tests/unit/{module-name}.test.ts`): Pure logic (P2-P3)
+
+**Supporting Infrastructure:**
+
+- **Fixtures** (`tests/support/fixtures/{feature}.fixture.ts`): Setup/teardown with auto-cleanup
+- **Data factories** (`tests/support/factories/{entity}.factory.ts`): Random test data using faker
+- **Helpers** (`tests/support/helpers/{utility}.ts`): Utility functions (waitFor, retry, etc.)
+
+**Documentation Updates:**
+
+- **Test README** (`tests/README.md`): Test suite overview, execution instructions, priority tagging, patterns
+- **package.json scripts**: Test execution commands (test:e2e, test:e2e:p0, test:api, etc.)
+
+**Validation Safeguards:**
+
+- All tests follow Given-When-Then format
+- All tests have priority tags ([P0], [P1], [P2], [P3])
+- All tests use data-testid selectors (stable, not CSS classes)
+- All tests are self-cleaning (fixtures with auto-cleanup)
+- No hard waits or flaky patterns (deterministic)
+- Test files under 300 lines (lean and focused)
+- Tests run under 1.5 minutes each (fast feedback)
+
+## Key Features
+
+### Dual-Mode Operation
+
+**BMad-Integrated Mode** (story available):
+
+- Uses story acceptance criteria for coverage targeting
+- Aligns with test-design risk/priority assessment
+- Expands ATDD tests with edge cases and negative paths
+- Optional - story enhances coverage but not required
+
+**Standalone Mode** (no story):
+
+- Analyzes source code independently
+- Identifies coverage gaps automatically
+- Generates tests based on code analysis
+- Works with any project (BMad or non-BMad)
+
+**Auto-discover Mode** (no targets):
+
+- Scans codebase for features needing tests
+- Prioritizes features with no coverage
+- Generates comprehensive test plan
+
+### Avoid Duplicate Coverage
+
+**Critical principle**: Don't test same behavior at multiple levels
+
+**Good coverage strategy:**
+
+- **E2E**: User can login → Dashboard loads (critical happy path only)
+- **API**: POST /auth/login returns correct status codes (variations: 200, 401, 400)
+- **Component**: LoginForm validates input (UI edge cases: empty fields, invalid format)
+- **Unit**: validateEmail() logic (pure function edge cases)
+
+**Bad coverage (duplicate):**
+
+- E2E: User can login → Dashboard loads
+- E2E: User can login with different emails → Dashboard loads (unnecessary duplication)
+- API: POST /auth/login returns 200 (already covered in E2E)
+
+Use E2E sparingly for critical paths. Use API/Component/Unit for variations and edge cases.
+
+### Healing Capabilities (NEW - Phase 2.5)
+
+**automate** automatically validates and heals test failures after generation.
+
+**Configuration**: Controlled by `config.tea_use_mcp_enhancements` (default: true)
+
+- If true + MCP available → MCP-assisted healing
+- If true + MCP unavailable → Pattern-based healing
+- If false → No healing, document failures for manual review
+
+**Constants**: Max 3 healing attempts, unfixable tests marked as `test.fixme()`
+
+**How Healing Works (Default - Pattern-Based):**
+
+TEA heals tests using pattern-based analysis by:
+
+1. **Parsing error messages** from test output logs
+2. **Matching patterns** against known failure signatures
+3. **Applying fixes** from healing knowledge fragments:
+   - `test-healing-patterns.md` - Common failure patterns (selectors, timing, data, network)
+   - `selector-resilience.md` - Selector refactoring (CSS → data-testid, nth() → filter())
+   - `timing-debugging.md` - Race condition fixes (hard waits → event-based waits)
+4. **Re-running tests** to verify fix (max 3 iterations)
+5. **Marking unfixable tests** as `test.fixme()` with detailed comments
+
+**This works well for:**
+
+- ✅ Common failure patterns (stale selectors, timing issues, dynamic data)
+- ✅ Text-based errors with clear signatures
+- ✅ Issues documented in knowledge base
+- ✅ Automated CI environments without browser access
+
+**What MCP Adds (Interactive Debugging Enhancement):**
+
+When Playwright MCP is available, TEA **additionally**:
+
+1. **Debugs failures interactively** before applying pattern-based fixes:
+   - **Pause test execution** with `playwright_test_debug_test` (step through, inspect state)
+   - **See visual failure context** with `browser_snapshot` (screenshot of failure state)
+   - **Inspect live DOM** with browser tools (find why selector doesn't match)
+   - **Analyze console logs** with `browser_console_messages` (JS errors, warnings, debug output)
+   - **Inspect network activity** with `browser_network_requests` (failed API calls, CORS errors, timeouts)
+
+2. **Enhances pattern-based fixes** with real-world data:
+   - **Pattern match identifies issue** (e.g., "stale selector")
+   - **MCP discovers actual selector** with `browser_generate_locator` from live page
+   - **TEA applies refined fix** using real DOM structure (not just pattern guess)
+   - **Verification happens in browser** (see if fix works visually)
+
+3. **Catches root causes** pattern matching might miss:
+   - **Network failures**: MCP shows 500 error on API call (not just timeout)
+   - **JS errors**: MCP shows `TypeError: undefined` in console (not just "element not found")
+   - **Timing issues**: MCP shows loading spinner still visible (not just "selector timeout")
+   - **State problems**: MCP shows modal blocking button (not just "not clickable")
+
+**Key Benefits of MCP Enhancement:**
+
+- ✅ **Pattern-based fixes** (fast, automated) **+** **MCP verification** (accurate, context-aware)
+- ✅ **Visual debugging**: See exactly what user sees when test fails
+- ✅ **DOM inspection**: Discover why selectors don't match (element missing, wrong attributes, dynamic IDs)
+- ✅ **Network visibility**: Identify API failures, slow requests, CORS issues
+- ✅ **Console analysis**: Catch JS errors that break page functionality
+- ✅ **Robust selectors**: Generate locators from actual DOM (role, text, testid hierarchy)
+- ✅ **Faster iteration**: Debug and fix in same browser session (no restart needed)
+- ✅ **Higher success rate**: MCP helps diagnose failures pattern matching can't solve
+
+**Example Enhancement Flow:**
+
+```
+1. Pattern-based healing identifies issue
+   → Error: "Locator '.submit-btn' resolved to 0 elements"
+   → Pattern match: Stale selector (CSS class)
+   → Suggested fix: Replace with data-testid
+
+2. MCP enhances diagnosis (if available)
+   → browser_snapshot shows button exists but has class ".submit-button" (not ".submit-btn")
+   → browser_generate_locator finds: button[type="submit"].submit-button
+   → browser_console_messages shows no errors
+
+3. TEA applies refined fix
+   → await page.locator('button[type="submit"]').click()
+   → (More accurate than pattern-based guess)
+```
+
+**Healing Modes:**
+
+1. **MCP-Enhanced Healing** (when Playwright MCP available):
+   - Pattern-based analysis **+** Interactive debugging
+   - Visual context with `browser_snapshot`
+   - Console log analysis with `browser_console_messages`
+   - Network inspection with `browser_network_requests`
+   - Live DOM inspection with `browser_generate_locator`
+   - Step-by-step debugging with `playwright_test_debug_test`
+
+2. **Pattern-Based Healing** (always available):
+   - Error message parsing and pattern matching
+   - Automated fixes from healing knowledge fragments
+   - Text-based analysis (no visual/DOM inspection)
+   - Works in CI without browser access
+
+**Healing Workflow:**
+
+```
+1. Generate tests → Run tests
+2. IF pass → Success ✅
+3. IF fail AND auto_heal_failures=false → Report failures ⚠️
+4. IF fail AND auto_heal_failures=true → Enter healing loop:
+   a. Identify failure pattern (selector, timing, data, network)
+   b. Apply automated fix from knowledge base
+   c. Re-run test (max 3 iterations)
+   d. IF healed → Success ✅
+   e. IF unhealable → Mark test.fixme() with detailed comment
+```
+
+**Example Healing Outcomes:**
+
+```typescript
+// ❌ Original (failing): CSS class selector
+await page.locator('.btn-primary').click();
+
+// ✅ Healed: data-testid selector
+await page.getByTestId('submit-button').click();
+
+// ❌ Original (failing): Hard wait
+await page.waitForTimeout(3000);
+
+// ✅ Healed: Network-first pattern
+await page.waitForResponse('**/api/data');
+
+// ❌ Original (failing): Hardcoded ID
+await expect(page.getByText('User 123')).toBeVisible();
+
+// ✅ Healed: Regex pattern
+await expect(page.getByText(/User \d+/)).toBeVisible();
+```
+
+**Unfixable Tests (Marked as test.fixme()):**
+
+```typescript
+test.fixme('[P1] should handle complex interaction', async ({ page }) => {
+  // FIXME: Test healing failed after 3 attempts
+  // Failure: "Locator 'button[data-action="submit"]' resolved to 0 elements"
+  // Attempted fixes:
+  //   1. Replaced with page.getByTestId('submit-button') - still failing
+  //   2. Replaced with page.getByRole('button', { name: 'Submit' }) - still failing
+  //   3. Added waitForLoadState('networkidle') - still failing
+  // Manual investigation needed: Selector may require application code changes
+  // TODO: Review with team, may need data-testid added to button component
+  // Original test code...
+});
+```
+
+**When to Enable Healing:**
+
+- ✅ Enable for greenfield projects (catch generated test issues early)
+- ✅ Enable for brownfield projects (auto-fix legacy selector patterns)
+- ❌ Disable if environment not ready (application not deployed/seeded)
+- ❌ Disable if preferring manual review of all generated tests
+
+**Healing Report Example:**
+
+```markdown
+## Test Healing Report
+
+**Auto-Heal Enabled**: true
+**Healing Mode**: Pattern-based
+**Iterations Allowed**: 3
+
+### Validation Results
+
+- **Total tests**: 10
+- **Passing**: 7
+- **Failing**: 3
+
+### Healing Outcomes
+
+**Successfully Healed (2 tests):**
+
+- `tests/e2e/login.spec.ts:15` - Stale selector (CSS class → data-testid)
+- `tests/e2e/checkout.spec.ts:42` - Race condition (added network-first interception)
+
+**Unable to Heal (1 test):**
+
+- `tests/e2e/complex-flow.spec.ts:67` - Marked as test.fixme()
+  - Requires application code changes (add data-testid to component)
+
+### Healing Patterns Applied
+
+- **Selector fixes**: 1
+- **Timing fixes**: 1
+```
+
+**Graceful Degradation:**
+
+- Healing is OPTIONAL (default: disabled)
+- Works without Playwright MCP (pattern-based fallback)
+- Unfixable tests marked clearly (not silently broken)
+- Manual investigation path documented
+
+### Recording Mode (NEW - Phase 2.5)
+
+**automate** can record complex UI interactions instead of AI generation.
+
+**Activation**: Automatic for complex UI scenarios when config.tea_use_mcp_enhancements is true and MCP available
+
+- Complex scenarios: drag-drop, wizards, multi-page flows
+- Fallback: AI generation (silent, automatic)
+
+**When to Use Recording Mode:**
+
+- ✅ Complex UI interactions (drag-drop, multi-step forms, wizards)
+- ✅ Visual workflows (modals, dialogs, animations, transitions)
+- ✅ Unclear requirements (exploratory, discovering behavior)
+- ✅ Multi-page flows (checkout, registration, onboarding)
+- ❌ NOT for simple CRUD (AI generation faster)
+- ❌ NOT for API-only tests (no UI to record)
+
+**When to Use AI Generation (Default):**
+
+- ✅ Clear requirements available
+- ✅ Standard patterns (login, CRUD, navigation)
+- ✅ Need many tests quickly
+- ✅ API/backend tests (no UI interaction)
+
+**Recording Workflow (Same as atdd):**
+
+```
+1. Set generation_mode: "recording"
+2. Use generator_setup_page to init recording
+3. For each test scenario:
+   - Execute with browser_* tools (navigate, click, type, select)
+   - Add verifications with browser_verify_* tools
+   - Capture log and generate test file
+4. Enhance with knowledge base patterns:
+   - Given-When-Then structure
+   - data-testid selectors
+   - Network-first interception
+   - Fixtures/factories
+5. Validate (run tests if auto_validate enabled)
+6. Heal if needed (if auto_heal_failures enabled)
+```
+
+**Combination: Recording + Healing:**
+
+automate can use BOTH recording and healing together:
+
+- Generate tests via recording (complex flows captured interactively)
+- Run tests to validate (auto_validate)
+- Heal failures automatically (auto_heal_failures)
+
+This is particularly powerful for brownfield projects where:
+
+- Requirements unclear → Use recording to capture existing behavior
+- Application complex → Recording captures nuances AI might miss
+- Tests may fail → Healing fixes common issues automatically
+
+**Graceful Degradation:**
+
+- Recording mode is OPTIONAL (default: AI generation)
+- Requires Playwright MCP (falls back to AI if unavailable)
+- Works with or without healing enabled
+- Same quality output regardless of generation method
+
+### Test Level Selection Framework
+
+**E2E (End-to-End)**:
+
+- Critical user journeys (login, checkout, core workflows)
+- Multi-system integration
+- User-facing acceptance criteria
+- Characteristics: High confidence, slow execution, brittle
+
+**API (Integration)**:
+
+- Business logic validation
+- Service contracts and data transformations
+- Backend integration without UI
+- Characteristics: Fast feedback, good balance, stable
+
+**Component**:
+
+- UI component behavior (buttons, forms, modals)
+- Interaction testing (click, hover, keyboard navigation)
+- State management within component
+- Characteristics: Fast, isolated, granular
+
+**Unit**:
+
+- Pure business logic and algorithms
+- Edge cases and error handling
+- Minimal dependencies
+- Characteristics: Fastest, most granular
+
+### Priority Classification (P0-P3)
+
+**P0 (Critical - Every commit)**:
+
+- Critical user paths that must always work
+- Security-critical functionality (auth, permissions)
+- Data integrity scenarios
+- Run in pre-commit hooks or PR checks
+
+**P1 (High - PR to main)**:
+
+- Important features with high user impact
+- Integration points between systems
+- Error handling for common failures
+- Run before merging to main branch
+
+**P2 (Medium - Nightly)**:
+
+- Edge cases with moderate impact
+- Less-critical feature variations
+- Performance/load testing
+- Run in nightly CI builds
+
+**P3 (Low - On-demand)**:
+
+- Nice-to-have validations
+- Rarely-used features
+- Exploratory testing scenarios
+- Run manually or weekly
+
+**Priority tagging enables selective execution:**
+
+```bash
+npm run test:e2e:p0  # Run only P0 tests (critical paths)
+npm run test:e2e:p1  # Run P0 + P1 tests (pre-merge)
+```
+
+### Given-When-Then Test Structure
+
+All tests follow BDD format for clarity:
+
+```typescript
+test('[P0] should login with valid credentials and load dashboard', async ({ page }) => {
+  // GIVEN: User is on login page
+  await page.goto('/login');
+
+  // WHEN: User submits valid credentials
+  await page.fill('[data-testid="email-input"]', 'user@example.com');
+  await page.fill('[data-testid="password-input"]', 'Password123!');
+  await page.click('[data-testid="login-button"]');
+
+  // THEN: User is redirected to dashboard
+  await expect(page).toHaveURL('/dashboard');
+  await expect(page.locator('[data-testid="user-name"]')).toBeVisible();
+});
+```
+
+### One Assertion Per Test (Atomic Design)
+
+Each test verifies exactly one behavior:
+
+```typescript
+// ✅ CORRECT: One assertion
+test('[P0] should display user name', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+});
+
+// ❌ WRONG: Multiple assertions (not atomic)
+test('[P0] should display user info', async ({ page }) => {
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
+  await expect(page.locator('[data-testid="user-email"]')).toHaveText('john@example.com');
+});
+```
+
+**Why?** If second assertion fails, you don't know if first is still valid. Split into separate tests for clear failure diagnosis.
+
+### Network-First Testing Pattern
+
+**Critical pattern to prevent race conditions**:
+
+```typescript
+test('should load user dashboard after login', async ({ page }) => {
+  // CRITICAL: Intercept routes BEFORE navigation
+  await page.route('**/api/user', (route) =>
+    route.fulfill({
+      status: 200,
+      body: JSON.stringify({ id: 1, name: 'Test User' }),
+    }),
+  );
+
+  // NOW navigate
+  await page.goto('/dashboard');
+
+  await expect(page.locator('[data-testid="user-name"]')).toHaveText('Test User');
+});
+```
+
+Always set up route interception before navigating to pages that make network requests.
+
+### Fixture Architecture with Auto-Cleanup
+
+Playwright fixtures with automatic data cleanup:
+
+```typescript
+// tests/support/fixtures/auth.fixture.ts
+import { test as base } from '@playwright/test';
+import { createUser, deleteUser } from '../factories/user.factory';
+
+export const test = base.extend({
+  authenticatedUser: async ({ page }, use) => {
+    // Setup: Create and authenticate user
+    const user = await createUser();
+    await page.goto('/login');
+    await page.fill('[data-testid="email"]', user.email);
+    await page.fill('[data-testid="password"]', user.password);
+    await page.click('[data-testid="login-button"]');
+    await page.waitForURL('/dashboard');
+
+    // Provide to test
+    await use(user);
+
+    // Cleanup: Delete user automatically
+    await deleteUser(user.id);
+  },
+});
+```
+
+**Fixture principles:**
+
+- Auto-cleanup (always delete created data in teardown)
+- Composable (fixtures can use other fixtures)
+- Isolated (each test gets fresh data)
+- Type-safe with TypeScript
+
+### Data Factory Architecture
+
+Use faker for all test data generation:
+
+```typescript
+// tests/support/factories/user.factory.ts
+import { faker } from '@faker-js/faker';
+
+export const createUser = (overrides = {}) => ({
+  id: faker.number.int(),
+  email: faker.internet.email(),
+  password: faker.internet.password(),
+  name: faker.person.fullName(),
+  role: 'user',
+  createdAt: faker.date.recent().toISOString(),
+  ...overrides,
+});
+
+export const createUsers = (count: number) => Array.from({ length: count }, () => createUser());
+
+// API helper for cleanup
+export const deleteUser = async (userId: number) => {
+  await fetch(`/api/users/${userId}`, { method: 'DELETE' });
+};
+```
+
+**Factory principles:**
+
+- Use faker for random data (no hardcoded values to prevent collisions)
+- Support overrides for specific test scenarios
+- Generate complete valid objects matching API contracts
+- Include helper functions for bulk creation and cleanup
+
+### No Page Objects
+
+**Do NOT create page object classes.** Keep tests simple and direct:
+
+```typescript
+// ✅ CORRECT: Direct test
+test('should login', async ({ page }) => {
+  await page.goto('/login');
+  await page.fill('[data-testid="email"]', 'user@example.com');
+  await page.click('[data-testid="login-button"]');
+  await expect(page).toHaveURL('/dashboard');
+});
+
+// ❌ WRONG: Page object abstraction
+class LoginPage {
+  async login(email, password) { ... }
+}
+```
+
+Use fixtures for setup/teardown, not page objects for actions.
+
+### Deterministic Tests Only
+
+**No flaky patterns allowed:**
+
+```typescript
+// ❌ WRONG: Hard wait
+await page.waitForTimeout(2000);
+
+// ✅ CORRECT: Explicit wait
+await page.waitForSelector('[data-testid="user-name"]');
+await expect(page.locator('[data-testid="user-name"]')).toBeVisible();
+
+// ❌ WRONG: Conditional flow
+if (await element.isVisible()) {
+  await element.click();
+}
+
+// ✅ CORRECT: Deterministic assertion
+await expect(element).toBeVisible();
+await element.click();
+
+// ❌ WRONG: Try-catch for test logic
+try {
+  await element.click();
+} catch (e) {
+  // Test shouldn't catch errors
+}
+
+// ✅ CORRECT: Let test fail if element not found
+await element.click();
+```
+
+## Integration with Other Workflows
+
+**Before this workflow:**
+
+- **framework** workflow: Establish test framework architecture (Playwright/Cypress config, directory structure) - REQUIRED
+- **test-design** workflow: Optional for P0-P3 priority alignment and risk assessment context (BMad-Integrated mode only)
+- **atdd** workflow: Optional - automate expands beyond ATDD tests with edge cases (BMad-Integrated mode only)
+
+**After this workflow:**
+
+- **trace** workflow: Update traceability matrix with new test coverage (Phase 1) and make quality gate decision (Phase 2)
+- **CI pipeline**: Run tests in burn-in loop to detect flaky patterns
+
+**Coordinates with:**
+
+- **DEV agent**: Tests validate implementation correctness
+- **Story workflow**: Tests cover acceptance criteria (BMad-Integrated mode only)
+
+## Important Notes
+
+### Works Out of Thin Air
+
+**automate does NOT require BMad artifacts:**
+
+- Can analyze any codebase independently
+- User can point TEA at a feature: "automate tests for src/auth/"
+- Works on non-BMad projects
+- BMad artifacts (story, tech-spec, PRD) are OPTIONAL enhancements, not requirements
+
+**Similar to:**
+
+- **framework**: Can scaffold tests on any project
+- **ci**: Can generate CI config without BMad context
+
+**Different from:**
+
+- **atdd**: REQUIRES story with acceptance criteria (halt if missing)
+- **test-design**: REQUIRES PRD/epic context (halt if missing)
+- **trace (Phase 2)**: REQUIRES test results for gate decision (halt if missing)
+
+### File Size Limits
+
+**Keep test files lean (under 300 lines):**
+
+- If file exceeds limit, split into multiple files by feature area
+- Group related tests in describe blocks
+- Extract common setup to fixtures
+
+### Quality Standards Enforced
+
+**Every test must:**
+
+- ✅ Use Given-When-Then format
+- ✅ Have clear, descriptive name with priority tag
+- ✅ One assertion per test (atomic)
+- ✅ No hard waits or sleeps
+- ✅ Use data-testid selectors (not CSS classes)
+- ✅ Self-cleaning (fixtures with auto-cleanup)
+- ✅ Deterministic (no flaky patterns)
+- ✅ Fast (under 90 seconds)
+
+**Forbidden patterns:**
+
+- ❌ Hard waits: `await page.waitForTimeout(2000)`
+- ❌ Conditional flow: `if (await element.isVisible()) { ... }`
+- ❌ Try-catch for test logic
+- ❌ Hardcoded test data (use factories with faker)
+- ❌ Page objects
+- ❌ Shared state between tests
+
+## Knowledge Base References
+
+This workflow automatically consults:
+
+- **test-levels-framework.md** - Test level selection (E2E vs API vs Component vs Unit) with characteristics and use cases
+- **test-priorities.md** - Priority classification (P0-P3) with execution timing and risk alignment
+- **fixture-architecture.md** - Test fixture patterns with setup/teardown and auto-cleanup using Playwright's test.extend()
+- **data-factories.md** - Factory patterns using @faker-js/faker for random test data generation with overrides
+- **selective-testing.md** - Targeted test execution strategies for CI optimization
+- **ci-burn-in.md** - Flaky test detection patterns (10 iterations to catch intermittent failures)
+- **test-quality.md** - Test design principles (Given-When-Then, determinism, isolation, atomic assertions)
+
+**Healing Knowledge (If `auto_heal_failures` enabled):**
+
+- **test-healing-patterns.md** - Common failure patterns and automated fixes (selectors, timing, data, network, hard waits)
+- **selector-resilience.md** - Robust selector strategies and debugging (data-testid hierarchy, filter vs nth, anti-patterns)
+- **timing-debugging.md** - Race condition identification and deterministic wait fixes (network-first, event-based waits)
+
+See `tea-index.csv` for complete knowledge fragment mapping (22 fragments total).
+
+## Example Output
+
+### BMad-Integrated Mode
+
+````markdown
+# Automation Summary - User Authentication
+
+**Date:** 2025-10-14
+**Story:** Epic 3, Story 5
+**Coverage Target:** critical-paths
+
+## Tests Created
+
+### E2E Tests (2 tests, P0-P1)
+
+- `tests/e2e/user-authentication.spec.ts` (87 lines)
+  - [P0] Login with valid credentials → Dashboard loads
+  - [P1] Display error for invalid credentials
+
+### API Tests (3 tests, P1-P2)
+
+- `tests/api/auth.api.spec.ts` (102 lines)
+  - [P1] POST /auth/login - valid credentials → 200 + token
+  - [P1] POST /auth/login - invalid credentials → 401 + error
+  - [P2] POST /auth/login - missing fields → 400 + validation
+
+### Component Tests (2 tests, P1)
+
+- `tests/component/LoginForm.test.tsx` (45 lines)
+  - [P1] Empty fields → submit button disabled
+  - [P1] Valid input → submit button enabled
+
+## Infrastructure Created
+
+- Fixtures: `tests/support/fixtures/auth.fixture.ts`
+- Factories: `tests/support/factories/user.factory.ts`
+
+## Test Execution
+
+```bash
+npm run test:e2e       # Run all tests
+npm run test:e2e:p0    # Critical paths only
+npm run test:e2e:p1    # P0 + P1 tests
+```
+````
+
+## Coverage Analysis
+
+**Total:** 7 tests (P0: 1, P1: 5, P2: 1)
+**Levels:** E2E: 2, API: 3, Component: 2
+
+✅ All acceptance criteria covered
+✅ Happy path (E2E + API)
+✅ Error cases (API)
+✅ UI validation (Component)
+
+````
+
+### Standalone Mode
+
+```markdown
+# Automation Summary - src/auth/
+
+**Date:** 2025-10-14
+**Target:** src/auth/ (standalone analysis)
+**Coverage Target:** critical-paths
+
+## Feature Analysis
+
+**Source Files Analyzed:**
+- `src/auth/login.ts`
+- `src/auth/session.ts`
+- `src/auth/validation.ts`
+
+**Existing Coverage:** 0 tests found
+
+**Coverage Gaps:**
+- ❌ No E2E tests for login flow
+- ❌ No API tests for /auth/login endpoint
+- ❌ No unit tests for validateEmail()
+
+## Tests Created
+
+{Same structure as BMad-Integrated mode}
+
+## Recommendations
+
+1. **High Priority (P0-P1):**
+   - Add E2E test for password reset flow
+   - Add API tests for token refresh endpoint
+
+2. **Medium Priority (P2):**
+   - Add unit tests for session timeout logic
+````
+
+Ready to continue?
--- a/bmad/bmm/workflows/testarch/automate/checklist.md
+++ b/bmad/bmm/workflows/testarch/automate/checklist.md
@@ -0,0 +1,580 @@
+# Automate Workflow Validation Checklist
+
+Use this checklist to validate that the automate workflow has been executed correctly and all deliverables meet quality standards.
+
+## Prerequisites
+
+Before starting this workflow, verify:
+
+- [ ] Framework scaffolding configured (playwright.config.ts or cypress.config.ts exists)
+- [ ] Test directory structure exists (tests/ folder with subdirectories)
+- [ ] Package.json has test framework dependencies installed
+
+**Halt only if:** Framework scaffolding is completely missing (run `framework` workflow first)
+
+**Note:** BMad artifacts (story, tech-spec, PRD) are OPTIONAL - workflow can run without them
+
+---
+
+## Step 1: Execution Mode Determination and Context Loading
+
+### Mode Detection
+
+- [ ] Execution mode correctly determined:
+  - [ ] BMad-Integrated Mode (story_file variable set) OR
+  - [ ] Standalone Mode (target_feature or target_files set) OR
+  - [ ] Auto-discover Mode (no targets specified)
+
+### BMad Artifacts (If Available - OPTIONAL)
+
+- [ ] Story markdown loaded (if `{story_file}` provided)
+- [ ] Acceptance criteria extracted from story (if available)
+- [ ] Tech-spec.md loaded (if `{use_tech_spec}` true and file exists)
+- [ ] Test-design.md loaded (if `{use_test_design}` true and file exists)
+- [ ] PRD.md loaded (if `{use_prd}` true and file exists)
+- [ ] **Note**: Absence of BMad artifacts does NOT halt workflow
+
+### Framework Configuration
+
+- [ ] Test framework config loaded (playwright.config.ts or cypress.config.ts)
+- [ ] Test directory structure identified from `{test_dir}`
+- [ ] Existing test patterns reviewed
+- [ ] Test runner capabilities noted (parallel execution, fixtures, etc.)
+
+### Coverage Analysis
+
+- [ ] Existing test files searched in `{test_dir}` (if `{analyze_coverage}` true)
+- [ ] Tested features vs untested features identified
+- [ ] Coverage gaps mapped (tests to source files)
+- [ ] Existing fixture and factory patterns checked
+
+### Knowledge Base Fragments Loaded
+
+- [ ] `test-levels-framework.md` - Test level selection
+- [ ] `test-priorities.md` - Priority classification (P0-P3)
+- [ ] `fixture-architecture.md` - Fixture patterns with auto-cleanup
+- [ ] `data-factories.md` - Factory patterns using faker
+- [ ] `selective-testing.md` - Targeted test execution strategies
+- [ ] `ci-burn-in.md` - Flaky test detection patterns
+- [ ] `test-quality.md` - Test design principles
+
+---
+
+## Step 2: Automation Targets Identification
+
+### Target Determination
+
+**BMad-Integrated Mode (if story available):**
+
+- [ ] Acceptance criteria mapped to test scenarios
+- [ ] Features implemented in story identified
+- [ ] Existing ATDD tests checked (if any)
+- [ ] Expansion beyond ATDD planned (edge cases, negative paths)
+
+**Standalone Mode (if no story):**
+
+- [ ] Specific feature analyzed (if `{target_feature}` specified)
+- [ ] Specific files analyzed (if `{target_files}` specified)
+- [ ] Features auto-discovered (if `{auto_discover_features}` true)
+- [ ] Features prioritized by:
+  - [ ] No test coverage (highest priority)
+  - [ ] Complex business logic
+  - [ ] External integrations (API, database, auth)
+  - [ ] Critical user paths (login, checkout, etc.)
+
+### Test Level Selection
+
+- [ ] Test level selection framework applied (from `test-levels-framework.md`)
+- [ ] E2E tests identified: Critical user journeys, multi-system integration
+- [ ] API tests identified: Business logic, service contracts, data transformations
+- [ ] Component tests identified: UI behavior, interactions, state management
+- [ ] Unit tests identified: Pure logic, edge cases, error handling
+
+### Duplicate Coverage Avoidance
+
+- [ ] Same behavior NOT tested at multiple levels unnecessarily
+- [ ] E2E used for critical happy path only
+- [ ] API tests used for business logic variations
+- [ ] Component tests used for UI interaction edge cases
+- [ ] Unit tests used for pure logic edge cases
+
+### Priority Assignment
+
+- [ ] Test priorities assigned using `test-priorities.md` framework
+- [ ] P0 tests: Critical paths, security-critical, data integrity
+- [ ] P1 tests: Important features, integration points, error handling
+- [ ] P2 tests: Edge cases, less-critical variations, performance
+- [ ] P3 tests: Nice-to-have, rarely-used features, exploratory
+- [ ] Priority variables respected:
+  - [ ] `{include_p0}` = true (always include)
+  - [ ] `{include_p1}` = true (high priority)
+  - [ ] `{include_p2}` = true (medium priority)
+  - [ ] `{include_p3}` = false (low priority, skip by default)
+
+### Coverage Plan Created
+
+- [ ] Test coverage plan documented
+- [ ] What will be tested at each level listed
+- [ ] Priorities assigned to each test
+- [ ] Coverage strategy clear (critical-paths, comprehensive, or selective)
+
+---
+
+## Step 3: Test Infrastructure Generated
+
+### Fixture Architecture
+
+- [ ] Existing fixtures checked in `tests/support/fixtures/`
+- [ ] Fixture architecture created/enhanced (if `{generate_fixtures}` true)
+- [ ] All fixtures use Playwright's `test.extend()` pattern
+- [ ] All fixtures have auto-cleanup in teardown
+- [ ] Common fixtures created/enhanced:
+  - [ ] authenticatedUser (with auto-delete)
+  - [ ] apiRequest (authenticated client)
+  - [ ] mockNetwork (external service mocking)
+  - [ ] testDatabase (with auto-cleanup)
+
+### Data Factories
+
+- [ ] Existing factories checked in `tests/support/factories/`
+- [ ] Factory architecture created/enhanced (if `{generate_factories}` true)
+- [ ] All factories use `@faker-js/faker` for random data (no hardcoded values)
+- [ ] All factories support overrides for specific scenarios
+- [ ] Common factories created/enhanced:
+  - [ ] User factory (email, password, name, role)
+  - [ ] Product factory (name, price, SKU)
+  - [ ] Order factory (items, total, status)
+- [ ] Cleanup helpers provided (e.g., deleteUser(), deleteProduct())
+
+### Helper Utilities
+
+- [ ] Existing helpers checked in `tests/support/helpers/` (if `{update_helpers}` true)
+- [ ] Common utilities created/enhanced:
+  - [ ] waitFor (polling for complex conditions)
+  - [ ] retry (retry helper for flaky operations)
+  - [ ] testData (test data generation)
+  - [ ] assertions (custom assertion helpers)
+
+---
+
+## Step 4: Test Files Generated
+
+### Test File Structure
+
+- [ ] Test files organized correctly:
+  - [ ] `tests/e2e/` for E2E tests
+  - [ ] `tests/api/` for API tests
+  - [ ] `tests/component/` for component tests
+  - [ ] `tests/unit/` for unit tests
+  - [ ] `tests/support/` for fixtures/factories/helpers
+
+### E2E Tests (If Applicable)
+
+- [ ] E2E test files created in `tests/e2e/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags ([P0], [P1], [P2], [P3]) in test name
+- [ ] All tests use data-testid selectors (not CSS classes)
+- [ ] One assertion per test (atomic design)
+- [ ] No hard waits or sleeps (explicit waits only)
+- [ ] Network-first pattern applied (route interception BEFORE navigation)
+- [ ] Clear Given-When-Then comments in test code
+
+### API Tests (If Applicable)
+
+- [ ] API test files created in `tests/api/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags in test name
+- [ ] API contracts validated (request/response structure)
+- [ ] HTTP status codes verified
+- [ ] Response body validation includes required fields
+- [ ] Error cases tested (400, 401, 403, 404, 500)
+- [ ] JWT token format validated (if auth tests)
+
+### Component Tests (If Applicable)
+
+- [ ] Component test files created in `tests/component/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags in test name
+- [ ] Component mounting works correctly
+- [ ] Interaction testing covers user actions (click, hover, keyboard)
+- [ ] State management validated
+- [ ] Props and events tested
+
+### Unit Tests (If Applicable)
+
+- [ ] Unit test files created in `tests/unit/`
+- [ ] All tests follow Given-When-Then format
+- [ ] All tests have priority tags in test name
+- [ ] Pure logic tested (no dependencies)
+- [ ] Edge cases covered
+- [ ] Error handling tested
+
+### Quality Standards Enforced
+
+- [ ] All tests use Given-When-Then format with clear comments
+- [ ] All tests have descriptive names with priority tags
+- [ ] No duplicate tests (same behavior tested multiple times)
+- [ ] No flaky patterns (race conditions, timing issues)
+- [ ] No test interdependencies (tests can run in any order)
+- [ ] Tests are deterministic (same input always produces same result)
+- [ ] All tests use data-testid selectors (E2E tests)
+- [ ] No hard waits: `await page.waitForTimeout()` (forbidden)
+- [ ] No conditional flow: `if (await element.isVisible())` (forbidden)
+- [ ] No try-catch for test logic (only for cleanup)
+- [ ] No hardcoded test data (use factories with faker)
+- [ ] No page object classes (tests are direct and simple)
+- [ ] No shared state between tests
+
+### Network-First Pattern Applied
+
+- [ ] Route interception set up BEFORE navigation (E2E tests with network requests)
+- [ ] `page.route()` called before `page.goto()` to prevent race conditions
+- [ ] Network-first pattern verified in all E2E tests that make API calls
+
+---
+
+## Step 5: Test Validation and Healing (NEW - Phase 2.5)
+
+### Healing Configuration
+
+- [ ] Healing configuration checked:
+  - [ ] `{auto_validate}` setting noted (default: true)
+  - [ ] `{auto_heal_failures}` setting noted (default: false)
+  - [ ] `{max_healing_iterations}` setting noted (default: 3)
+  - [ ] `{use_mcp_healing}` setting noted (default: true)
+
+### Healing Knowledge Fragments Loaded (If Healing Enabled)
+
+- [ ] `test-healing-patterns.md` loaded (common failure patterns and fixes)
+- [ ] `selector-resilience.md` loaded (selector refactoring guide)
+- [ ] `timing-debugging.md` loaded (race condition fixes)
+
+### Test Execution and Validation
+
+- [ ] Generated tests executed (if `{auto_validate}` true)
+- [ ] Test results captured:
+  - [ ] Total tests run
+  - [ ] Passing tests count
+  - [ ] Failing tests count
+  - [ ] Error messages and stack traces captured
+
+### Healing Loop (If Enabled and Tests Failed)
+
+- [ ] Healing loop entered (if `{auto_heal_failures}` true AND tests failed)
+- [ ] For each failing test:
+  - [ ] Failure pattern identified (selector, timing, data, network, hard wait)
+  - [ ] Appropriate healing strategy applied:
+    - [ ] Stale selector → Replaced with data-testid or ARIA role
+    - [ ] Race condition → Added network-first interception or state waits
+    - [ ] Dynamic data → Replaced hardcoded values with regex/dynamic generation
+    - [ ] Network error → Added route mocking
+    - [ ] Hard wait → Replaced with event-based wait
+  - [ ] Healed test re-run to validate fix
+  - [ ] Iteration count tracked (max 3 attempts)
+
+### Unfixable Tests Handling
+
+- [ ] Tests that couldn't be healed after 3 iterations marked with `test.fixme()` (if `{mark_unhealable_as_fixme}` true)
+- [ ] Detailed comment added to test.fixme() tests:
+  - [ ] What failure occurred
+  - [ ] What healing was attempted (3 iterations)
+  - [ ] Why healing failed
+  - [ ] Manual investigation steps needed
+- [ ] Original test logic preserved in comments
+
+### Healing Report Generated
+
+- [ ] Healing report generated (if healing attempted)
+- [ ] Report includes:
+  - [ ] Auto-heal enabled status
+  - [ ] Healing mode (MCP-assisted or Pattern-based)
+  - [ ] Iterations allowed (max_healing_iterations)
+  - [ ] Validation results (total, passing, failing)
+  - [ ] Successfully healed tests (count, file:line, fix applied)
+  - [ ] Unable to heal tests (count, file:line, reason)
+  - [ ] Healing patterns applied (selector fixes, timing fixes, data fixes)
+  - [ ] Knowledge base references used
+
+---
+
+## Step 6: Documentation and Scripts Updated
+
+### Test README Updated
+
+- [ ] `tests/README.md` created or updated (if `{update_readme}` true)
+- [ ] Test suite structure overview included
+- [ ] Test execution instructions provided (all, specific files, by priority)
+- [ ] Fixture usage examples provided
+- [ ] Factory usage examples provided
+- [ ] Priority tagging convention explained ([P0], [P1], [P2], [P3])
+- [ ] How to write new tests documented
+- [ ] Common patterns documented
+- [ ] Anti-patterns documented (what to avoid)
+
+### package.json Scripts Updated
+
+- [ ] package.json scripts added/updated (if `{update_package_scripts}` true)
+- [ ] `test:e2e` script for all E2E tests
+- [ ] `test:e2e:p0` script for P0 tests only
+- [ ] `test:e2e:p1` script for P0 + P1 tests
+- [ ] `test:api` script for API tests
+- [ ] `test:component` script for component tests
+- [ ] `test:unit` script for unit tests (if applicable)
+
+### Test Suite Executed
+
+- [ ] Test suite run locally (if `{run_tests_after_generation}` true)
+- [ ] Test results captured (passing/failing counts)
+- [ ] No flaky patterns detected (tests are deterministic)
+- [ ] Setup requirements documented (if any)
+- [ ] Known issues documented (if any)
+
+---
+
+## Step 6: Automation Summary Generated
+
+### Automation Summary Document
+
+- [ ] Output file created at `{output_summary}`
+- [ ] Document includes execution mode (BMad-Integrated, Standalone, Auto-discover)
+- [ ] Feature analysis included (source files, coverage gaps) - Standalone mode
+- [ ] Tests created listed (E2E, API, Component, Unit) with counts and paths
+- [ ] Infrastructure created listed (fixtures, factories, helpers)
+- [ ] Test execution instructions provided
+- [ ] Coverage analysis included:
+  - [ ] Total test count
+  - [ ] Priority breakdown (P0, P1, P2, P3 counts)
+  - [ ] Test level breakdown (E2E, API, Component, Unit counts)
+  - [ ] Coverage percentage (if calculated)
+  - [ ] Coverage status (acceptance criteria covered, gaps identified)
+- [ ] Definition of Done checklist included
+- [ ] Next steps provided
+- [ ] Recommendations included (if Standalone mode)
+
+### Summary Provided to User
+
+- [ ] Concise summary output provided
+- [ ] Total tests created across test levels
+- [ ] Priority breakdown (P0, P1, P2, P3 counts)
+- [ ] Infrastructure counts (fixtures, factories, helpers)
+- [ ] Test execution command provided
+- [ ] Output file path provided
+- [ ] Next steps listed
+
+---
+
+## Quality Checks
+
+### Test Design Quality
+
+- [ ] Tests are readable (clear Given-When-Then structure)
+- [ ] Tests are maintainable (use factories/fixtures, not hardcoded data)
+- [ ] Tests are isolated (no shared state between tests)
+- [ ] Tests are deterministic (no race conditions or flaky patterns)
+- [ ] Tests are atomic (one assertion per test)
+- [ ] Tests are fast (no unnecessary waits or delays)
+- [ ] Tests are lean (files under {max_file_lines} lines)
+
+### Knowledge Base Integration
+
+- [ ] Test level selection framework applied (from `test-levels-framework.md`)
+- [ ] Priority classification applied (from `test-priorities.md`)
+- [ ] Fixture architecture patterns applied (from `fixture-architecture.md`)
+- [ ] Data factory patterns applied (from `data-factories.md`)
+- [ ] Selective testing strategies considered (from `selective-testing.md`)
+- [ ] Flaky test detection patterns considered (from `ci-burn-in.md`)
+- [ ] Test quality principles applied (from `test-quality.md`)
+
+### Code Quality
+
+- [ ] All TypeScript types are correct and complete
+- [ ] No linting errors in generated test files
+- [ ] Consistent naming conventions followed
+- [ ] Imports are organized and correct
+- [ ] Code follows project style guide
+- [ ] No console.log or debug statements in test code
+
+---
+
+## Integration Points
+
+### With Framework Workflow
+
+- [ ] Test framework configuration detected and used
+- [ ] Directory structure matches framework setup
+- [ ] Fixtures and helpers follow established patterns
+- [ ] Naming conventions consistent with framework standards
+
+### With BMad Workflows (If Available - OPTIONAL)
+
+**With Story Workflow:**
+
+- [ ] Story ID correctly referenced in output (if story available)
+- [ ] Acceptance criteria from story reflected in tests (if story available)
+- [ ] Technical constraints from story considered (if story available)
+
+**With test-design Workflow:**
+
+- [ ] P0 scenarios from test-design prioritized (if test-design available)
+- [ ] Risk assessment from test-design considered (if test-design available)
+- [ ] Coverage strategy aligned with test-design (if test-design available)
+
+**With atdd Workflow:**
+
+- [ ] Existing ATDD tests checked (if story had ATDD workflow run)
+- [ ] Expansion beyond ATDD planned (edge cases, negative paths)
+- [ ] No duplicate coverage with ATDD tests
+
+### With CI Pipeline
+
+- [ ] Tests can run in CI environment
+- [ ] Tests are parallelizable (no shared state)
+- [ ] Tests have appropriate timeouts
+- [ ] Tests clean up their data (no CI environment pollution)
+
+---
+
+## Completion Criteria
+
+All of the following must be true before marking this workflow as complete:
+
+- [ ] **Execution mode determined** (BMad-Integrated, Standalone, or Auto-discover)
+- [ ] **Framework configuration loaded** and validated
+- [ ] **Coverage analysis completed** (gaps identified if analyze_coverage true)
+- [ ] **Automation targets identified** (what needs testing)
+- [ ] **Test levels selected** appropriately (E2E, API, Component, Unit)
+- [ ] **Duplicate coverage avoided** (same behavior not tested at multiple levels)
+- [ ] **Test priorities assigned** (P0, P1, P2, P3)
+- [ ] **Fixture architecture created/enhanced** with auto-cleanup
+- [ ] **Data factories created/enhanced** using faker (no hardcoded data)
+- [ ] **Helper utilities created/enhanced** (if needed)
+- [ ] **Test files generated** at appropriate levels (E2E, API, Component, Unit)
+- [ ] **Given-When-Then format used** consistently across all tests
+- [ ] **Priority tags added** to all test names ([P0], [P1], [P2], [P3])
+- [ ] **data-testid selectors used** in E2E tests (not CSS classes)
+- [ ] **Network-first pattern applied** (route interception before navigation)
+- [ ] **Quality standards enforced** (no hard waits, no flaky patterns, self-cleaning, deterministic)
+- [ ] **Test README updated** with execution instructions and patterns
+- [ ] **package.json scripts updated** with test execution commands
+- [ ] **Test suite run locally** (if run_tests_after_generation true)
+- [ ] **Tests validated** (if auto_validate enabled)
+- [ ] **Failures healed** (if auto_heal_failures enabled and tests failed)
+- [ ] **Healing report generated** (if healing attempted)
+- [ ] **Unfixable tests marked** with test.fixme() and detailed comments (if any)
+- [ ] **Automation summary created** and saved to correct location
+- [ ] **Output file formatted correctly**
+- [ ] **Knowledge base references applied** and documented (including healing fragments if used)
+- [ ] **No test quality issues** (flaky patterns, race conditions, hardcoded data, page objects)
+
+---
+
+## Common Issues and Resolutions
+
+### Issue: BMad artifacts not found
+
+**Problem:** Story, tech-spec, or PRD files not found when variables are set.
+
+**Resolution:**
+
+- **automate does NOT require BMad artifacts** - they are OPTIONAL enhancements
+- If files not found, switch to Standalone Mode automatically
+- Analyze source code directly without BMad context
+- Continue workflow without halting
+
+### Issue: Framework configuration not found
+
+**Problem:** No playwright.config.ts or cypress.config.ts found.
+
+**Resolution:**
+
+- **HALT workflow** - framework is required
+- Message: "Framework scaffolding required. Run `bmad tea *framework` first."
+- User must run framework workflow before automate
+
+### Issue: No automation targets identified
+
+**Problem:** Neither story, target_feature, nor target_files specified, and auto-discover finds nothing.
+
+**Resolution:**
+
+- Check if source_dir variable is correct
+- Verify source code exists in project
+- Ask user to specify target_feature or target_files explicitly
+- Provide examples: `target_feature: "src/auth/"` or `target_files: "src/auth/login.ts,src/auth/session.ts"`
+
+### Issue: Duplicate coverage detected
+
+**Problem:** Same behavior tested at multiple levels (E2E + API + Component).
+
+**Resolution:**
+
+- Review test level selection framework (test-levels-framework.md)
+- Use E2E for critical happy path ONLY
+- Use API for business logic variations
+- Use Component for UI edge cases
+- Remove redundant tests that duplicate coverage
+
+### Issue: Tests have hardcoded data
+
+**Problem:** Tests use hardcoded email addresses, passwords, or other data.
+
+**Resolution:**
+
+- Replace all hardcoded data with factory function calls
+- Use faker for all random data generation
+- Update data-factories to support all required test scenarios
+- Example: `createUser({ email: faker.internet.email() })`
+
+### Issue: Tests are flaky
+
+**Problem:** Tests fail intermittently, pass on retry.
+
+**Resolution:**
+
+- Remove all hard waits (`page.waitForTimeout()`)
+- Use explicit waits (`page.waitForSelector()`)
+- Apply network-first pattern (route interception before navigation)
+- Remove conditional flow (`if (await element.isVisible())`)
+- Ensure tests are deterministic (no race conditions)
+- Run burn-in loop (10 iterations) to detect flakiness
+
+### Issue: Fixtures don't clean up data
+
+**Problem:** Test data persists after test run, causing test pollution.
+
+**Resolution:**
+
+- Ensure all fixtures have cleanup in teardown phase
+- Cleanup happens AFTER `await use(data)`
+- Call deletion/cleanup functions (deleteUser, deleteProduct, etc.)
+- Verify cleanup works by checking database/storage after test run
+
+### Issue: Tests too slow
+
+**Problem:** Tests take longer than 90 seconds (max_test_duration).
+
+**Resolution:**
+
+- Remove unnecessary waits and delays
+- Use parallel execution where possible
+- Mock external services (don't make real API calls)
+- Use API tests instead of E2E for business logic
+- Optimize test data creation (use in-memory database, etc.)
+
+---
+
+## Notes for TEA Agent
+
+- **automate is flexible:** Can work with or without BMad artifacts (story, tech-spec, PRD are OPTIONAL)
+- **Standalone mode is powerful:** Analyze any codebase and generate tests independently
+- **Auto-discover mode:** Scan codebase for features needing tests when no targets specified
+- **Framework is the ONLY hard requirement:** HALT if framework config missing, otherwise proceed
+- **Avoid duplicate coverage:** E2E for critical paths only, API/Component for variations
+- **Priority tagging enables selective execution:** P0 tests run on every commit, P1 on PR, P2 nightly
+- **Network-first pattern prevents race conditions:** Route interception BEFORE navigation
+- **No page objects:** Keep tests simple, direct, and maintainable
+- **Use knowledge base:** Load relevant fragments (test-levels, test-priorities, fixture-architecture, data-factories, healing patterns) for guidance
+- **Deterministic tests only:** No hard waits, no conditional flow, no flaky patterns allowed
+- **Optional healing:** auto_heal_failures disabled by default (opt-in for automatic test healing)
+- **Graceful degradation:** Healing works without Playwright MCP (pattern-based fallback)
+- **Unfixable tests handled:** Mark with test.fixme() and detailed comments (not silently broken)
--- a/bmad/bmm/workflows/testarch/automate/instructions.md
+++ b/bmad/bmm/workflows/testarch/automate/instructions.md
--- a/bmad/bmm/workflows/testarch/automate/workflow.yaml
+++ b/bmad/bmm/workflows/testarch/automate/workflow.yaml
@@ -0,0 +1,61 @@
+# Test Architect workflow: automate
+name: testarch-automate
+description: "Expand test automation coverage after implementation or analyze existing codebase to generate comprehensive test suite"
+author: "BMad"
+
+# Critical variables from config
+config_source: "{project-root}/bmad/bmm/config.yaml"
+output_folder: "{config_source}:output_folder"
+user_name: "{config_source}:user_name"
+communication_language: "{config_source}:communication_language"
+document_output_language: "{config_source}:document_output_language"
+date: system-generated
+
+# Workflow components
+installed_path: "{project-root}/bmad/bmm/workflows/testarch/automate"
+instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: false
+
+# Variables and inputs
+variables:
+  # Execution mode and targeting
+  standalone_mode: true # Can work without BMad artifacts (true) or integrate with BMad (false)
+  coverage_target: "critical-paths" # critical-paths, comprehensive, selective
+
+  # Directory paths
+  test_dir: "{project-root}/tests" # Root test directory
+  source_dir: "{project-root}/src" # Source code directory
+
+# Output configuration
+default_output_file: "{output_folder}/automation-summary.md"
+
+# Required tools
+required_tools:
+  - read_file # Read source code, existing tests, BMad artifacts
+  - write_file # Create test files, fixtures, factories, summaries
+  - create_directory # Create test directories
+  - list_files # Discover features and existing tests
+  - search_repo # Find coverage gaps and patterns
+  - glob # Find test files and source files
+
+# Recommended inputs (optional - depends on mode)
+recommended_inputs:
+  - story: "Story markdown with acceptance criteria (optional - BMad mode only)"
+  - tech_spec: "Technical specification (optional - BMad mode only)"
+  - test_design: "Test design document with risk/priority (optional - BMad mode only)"
+  - source_code: "Feature implementation to analyze (required for standalone mode)"
+  - existing_tests: "Current test suite for gap analysis (always helpful)"
+  - framework_config: "Test framework configuration (playwright.config.ts, cypress.config.ts)"
+
+tags:
+  - qa
+  - automation
+  - test-architect
+  - regression
+  - coverage
+
+execution_hints:
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
+  iterative: true
--- a/bmad/bmm/workflows/testarch/ci/README.md
+++ b/bmad/bmm/workflows/testarch/ci/README.md
@@ -0,0 +1,493 @@
+# CI/CD Pipeline Setup Workflow
+
+Scaffolds a production-ready CI/CD quality pipeline with test execution, burn-in loops for flaky test detection, parallel sharding, and artifact collection. This workflow creates platform-specific CI configuration optimized for fast feedback (< 45 min total) and reliable test execution with 20× speedup over sequential runs.
+
+## Usage
+
+```bash
+bmad tea *ci
+```
+
+The TEA agent runs this workflow when:
+
+- Test framework is configured and tests pass locally
+- Team is ready to enable continuous integration
+- Existing CI pipeline needs optimization or modernization
+- Burn-in loop is needed for flaky test detection
+
+## Inputs
+
+**Required Context Files:**
+
+- **Framework config** (playwright.config.ts, cypress.config.ts): Determines test commands and configuration
+- **package.json**: Dependencies and scripts for caching strategy
+- **.nvmrc**: Node version for CI (optional, defaults to Node 20 LTS)
+
+**Optional Context Files:**
+
+- **Existing CI config**: To update rather than create new
+- **.git/config**: For CI platform auto-detection
+
+**Workflow Variables:**
+
+- `ci_platform`: Auto-detected (github-actions/gitlab-ci/circle-ci) or explicit
+- `test_framework`: Detected from framework config (playwright/cypress)
+- `parallel_jobs`: Number of parallel shards (default: 4)
+- `burn_in_enabled`: Enable burn-in loop (default: true)
+- `burn_in_iterations`: Burn-in iterations (default: 10)
+- `selective_testing_enabled`: Run only changed tests (default: true)
+- `artifact_retention_days`: Artifact storage duration (default: 30)
+- `cache_enabled`: Enable dependency caching (default: true)
+
+## Outputs
+
+**Primary Deliverables:**
+
+1. **CI Configuration File**
+   - `.github/workflows/test.yml` (GitHub Actions)
+   - `.gitlab-ci.yml` (GitLab CI)
+   - Platform-specific optimizations and best practices
+
+2. **Pipeline Stages**
+   - **Lint**: Code quality checks (<2 min)
+   - **Test**: Parallel execution with 4 shards (<10 min per shard)
+   - **Burn-In**: Flaky test detection with 10 iterations (<30 min)
+   - **Report**: Aggregate results and publish artifacts
+
+3. **Helper Scripts**
+   - `scripts/test-changed.sh`: Selective testing (run only affected tests)
+   - `scripts/ci-local.sh`: Local CI mirror for debugging
+   - `scripts/burn-in.sh`: Standalone burn-in execution
+
+4. **Documentation**
+   - `docs/ci.md`: Pipeline guide, debugging, secrets setup
+   - `docs/ci-secrets-checklist.md`: Required secrets and configuration
+   - Inline comments in CI configuration files
+
+5. **Optimization Features**
+   - Dependency caching (npm + browser binaries): 2-5 min savings
+   - Parallel sharding: 75% time reduction
+   - Retry logic: Handles transient failures (2 retries)
+   - Failure-only artifacts: Cost-effective debugging
+
+**Performance Targets:**
+
+- Lint: <2 minutes
+- Test (per shard): <10 minutes
+- Burn-in: <30 minutes
+- **Total: <45 minutes** (20× faster than sequential)
+
+**Validation Safeguards:**
+
+- ✅ Git repository initialized
+- ✅ Local tests pass before CI setup
+- ✅ Framework configuration exists
+- ✅ CI platform accessible
+
+## Key Features
+
+### Burn-In Loop for Flaky Test Detection
+
+**Critical production pattern:**
+
+```yaml
+burn-in:
+  runs-on: ubuntu-latest
+  steps:
+    - run: |
+        for i in {1..10}; do
+          echo "🔥 Burn-in iteration $i/10"
+          npm run test:e2e || exit 1
+        done
+```
+
+**Purpose**: Runs tests 10 times to catch non-deterministic failures before they reach main branch.
+
+**When to run:**
+
+- On PRs to main/develop
+- Weekly on cron schedule
+- After test infrastructure changes
+
+**Failure threshold**: Even ONE failure → tests are flaky, must fix before merging.
+
+### Parallel Sharding
+
+**Splits tests across 4 jobs:**
+
+```yaml
+strategy:
+  matrix:
+    shard: [1, 2, 3, 4]
+steps:
+  - run: npm run test:e2e -- --shard=${{ matrix.shard }}/4
+```
+
+**Benefits:**
+
+- 75% time reduction (40 min → 10 min per shard)
+- Faster feedback on PRs
+- Configurable shard count
+
+### Smart Caching
+
+**Node modules + browser binaries:**
+
+```yaml
+- uses: actions/cache@v4
+  with:
+    path: ~/.npm
+    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
+```
+
+**Benefits:**
+
+- 2-5 min savings per run
+- Consistent across builds
+- Automatic invalidation on dependency changes
+
+### Selective Testing
+
+**Run only tests affected by code changes:**
+
+```bash
+# scripts/test-changed.sh
+CHANGED_FILES=$(git diff --name-only HEAD~1)
+npm run test:e2e -- --grep="$AFFECTED_TESTS"
+```
+
+**Benefits:**
+
+- 50-80% time reduction for focused PRs
+- Faster feedback cycle
+- Full suite still runs on main branch
+
+### Failure-Only Artifacts
+
+**Upload debugging materials only on test failures:**
+
+- Traces (Playwright): 5-10 MB per test
+- Screenshots: 100-500 KB each
+- Videos: 2-5 MB per test
+- HTML reports: 1-2 MB
+
+**Benefits:**
+
+- Reduces storage costs by 90%
+- Maintains full debugging capability
+- 30-day retention default
+
+### Local CI Mirror
+
+**Debug CI failures locally:**
+
+```bash
+./scripts/ci-local.sh
+# Runs: lint → test → burn-in (3 iterations)
+```
+
+**Mirrors CI environment:**
+
+- Same Node version
+- Same commands
+- Reduced burn-in (3 vs 10 for faster feedback)
+
+### Knowledge Base Integration
+
+Automatically consults TEA knowledge base:
+
+- `ci-burn-in.md` - Burn-in loop patterns and iterations
+- `selective-testing.md` - Changed test detection strategies
+- `visual-debugging.md` - Artifact collection best practices
+- `test-quality.md` - CI-specific quality criteria
+
+## Integration with Other Workflows
+
+**Before ci:**
+
+- **framework**: Sets up test infrastructure and configuration
+- **test-design** (optional): Plans test coverage strategy
+
+**After ci:**
+
+- **atdd**: Generate failing tests that run in CI
+- **automate**: Expand test coverage that CI executes
+- **trace (Phase 2)**: Use CI results for quality gate decisions
+
+**Coordinates with:**
+
+- **dev-story**: Tests run in CI after story implementation
+- **retrospective**: CI metrics inform process improvements
+
+**Updates:**
+
+- `bmm-workflow-status.md`: Adds CI setup to Quality & Testing Progress section
+
+## Important Notes
+
+### CI Platform Auto-Detection
+
+**GitHub Actions** (default):
+
+- Auto-selected if `github.com` in git remote
+- Free 2000 min/month for private repos
+- Unlimited for public repos
+- `.github/workflows/test.yml`
+
+**GitLab CI**:
+
+- Auto-selected if `gitlab.com` in git remote
+- Free 400 min/month
+- `.gitlab-ci.yml`
+
+**Circle CI** / **Jenkins**:
+
+- User must specify explicitly
+- Templates provided for both
+
+### Burn-In Strategy
+
+**Iterations:**
+
+- **3**: Quick feedback (local development)
+- **10**: Standard (PR checks) ← recommended
+- **100**: High-confidence (release branches)
+
+**When to run:**
+
+- ✅ On PRs to main/develop
+- ✅ Weekly scheduled (cron)
+- ✅ After test infra changes
+- ❌ Not on every commit (too slow)
+
+**Cost-benefit:**
+
+- 30 minutes of CI time → Prevents hours of debugging flaky tests
+
+### Artifact Collection Strategy
+
+**Failure-only collection:**
+
+- Saves 90% storage costs
+- Maintains debugging capability
+- Automatic cleanup after retention period
+
+**What to collect:**
+
+- Traces: Full execution context (Playwright)
+- Screenshots: Visual evidence
+- Videos: Interaction playback
+- HTML reports: Detailed results
+- Console logs: Error messages
+
+**What NOT to collect:**
+
+- Passing test artifacts (waste of space)
+- Large binaries
+- Sensitive data (use secrets instead)
+
+### Selective Testing Trade-offs
+
+**Benefits:**
+
+- 50-80% time reduction for focused changes
+- Faster feedback loop
+- Lower CI costs
+
+**Risks:**
+
+- May miss integration issues
+- Relies on accurate change detection
+- False positives if detection is too aggressive
+
+**Mitigation:**
+
+- Always run full suite on merge to main
+- Use burn-in loop on main branch
+- Monitor for missed issues
+
+### Parallelism Configuration
+
+**4 shards** (default):
+
+- Optimal for 40-80 test files
+- ~10 min per shard
+- Balances speed vs resource usage
+
+**Adjust if:**
+
+- Tests complete in <5 min → reduce shards
+- Tests take >15 min → increase shards
+- CI limits concurrent jobs → reduce shards
+
+**Formula:**
+
+```
+Total test time / Target shard time = Optimal shards
+Example: 40 min / 10 min = 4 shards
+```
+
+### Retry Logic
+
+**2 retries** (default):
+
+- Handles transient network issues
+- Mitigates race conditions
+- Does NOT mask flaky tests (burn-in catches those)
+
+**When retries trigger:**
+
+- Network timeouts
+- Service unavailability
+- Resource constraints
+
+**When retries DON'T help:**
+
+- Assertion failures (logic errors)
+- Flaky tests (non-deterministic)
+- Configuration errors
+
+### Notification Setup (Optional)
+
+**Supported channels:**
+
+- Slack: Webhook integration
+- Email: SMTP configuration
+- Discord: Webhook integration
+
+**Configuration:**
+
+```yaml
+notify_on_failure: true
+notification_channels: 'slack'
+# Requires SLACK_WEBHOOK secret in CI settings
+```
+
+**Best practice:** Enable for main/develop branches only, not PRs.
+
+## Validation Checklist
+
+After workflow completion, verify:
+
+- [ ] CI configuration file created and syntactically valid
+- [ ] Burn-in loop configured (10 iterations)
+- [ ] Parallel sharding enabled (4 jobs)
+- [ ] Caching configured (dependencies + browsers)
+- [ ] Artifact collection on failure only
+- [ ] Helper scripts created and executable
+- [ ] Documentation complete (ci.md, secrets checklist)
+- [ ] No errors or warnings during scaffold
+- [ ] First CI run triggered and passes
+
+Refer to `checklist.md` for comprehensive validation criteria.
+
+## Example Execution
+
+**Scenario 1: New GitHub Actions setup**
+
+```bash
+bmad tea *ci
+
+# TEA detects:
+# - GitHub repository (github.com in git remote)
+# - Playwright framework
+# - Node 20 from .nvmrc
+# - 60 test files
+
+# TEA scaffolds:
+# - .github/workflows/test.yml
+# - 4-shard parallel execution
+# - Burn-in loop (10 iterations)
+# - Dependency + browser caching
+# - Failure artifacts (traces, screenshots)
+# - Helper scripts
+# - Documentation
+
+# Result:
+# Total CI time: 42 minutes (was 8 hours sequential)
+# - Lint: 1.5 min
+# - Test (4 shards): 9 min each
+# - Burn-in: 28 min
+```
+
+**Scenario 2: Update existing GitLab CI**
+
+```bash
+bmad tea *ci
+
+# TEA detects:
+# - Existing .gitlab-ci.yml
+# - Cypress framework
+# - No caching configured
+
+# TEA asks: "Update existing CI or create new?"
+# User: "Update"
+
+# TEA enhances:
+# - Adds burn-in job
+# - Configures caching (cache: paths)
+# - Adds parallel: 4
+# - Updates artifact collection
+# - Documents secrets needed
+
+# Result:
+# CI time reduced from 45 min → 12 min
+```
+
+**Scenario 3: Standalone burn-in setup**
+
+```bash
+# User wants only burn-in, no full CI
+bmad tea *ci
+# Set burn_in_enabled: true, skip other stages
+
+# TEA creates:
+# - Minimal workflow with burn-in only
+# - scripts/burn-in.sh for local testing
+# - Documentation for running burn-in
+
+# Use case:
+# - Validate test stability before full CI setup
+# - Debug intermittent failures
+# - Confidence check before release
+```
+
+## Troubleshooting
+
+**Issue: "Git repository not found"**
+
+- **Cause**: No .git/ directory
+- **Solution**: Run `git init` and `git remote add origin <url>`
+
+**Issue: "Tests fail locally but should set up CI anyway"**
+
+- **Cause**: Workflow halts if local tests fail
+- **Solution**: Fix tests first, or temporarily skip preflight (not recommended)
+
+**Issue: "CI takes longer than 10 min per shard"**
+
+- **Cause**: Too many tests per shard
+- **Solution**: Increase shard count (e.g., 4 → 8)
+
+**Issue: "Burn-in passes locally but fails in CI"**
+
+- **Cause**: Environment differences (timing, resources)
+- **Solution**: Use `scripts/ci-local.sh` to mirror CI environment
+
+**Issue: "Caching not working"**
+
+- **Cause**: Cache key mismatch or cache limit exceeded
+- **Solution**: Check cache key formula, verify platform limits
+
+## Related Workflows
+
+- **framework**: Set up test infrastructure → [framework/README.md](../framework/README.md)
+- **atdd**: Generate acceptance tests → [atdd/README.md](../atdd/README.md)
+- **automate**: Expand test coverage → [automate/README.md](../automate/README.md)
+- **trace**: Traceability and quality gate decisions → [trace/README.md](../trace/README.md)
+
+## Version History
+
+- **v4.0 (BMad v6)**: Pure markdown instructions, enhanced workflow.yaml, burn-in loop integration
+- **v3.x**: XML format instructions, basic CI setup
+- **v2.x**: Legacy task-based approach
--- a/bmad/bmm/workflows/testarch/ci/checklist.md
+++ b/bmad/bmm/workflows/testarch/ci/checklist.md
@@ -0,0 +1,246 @@
+# CI/CD Pipeline Setup - Validation Checklist
+
+## Prerequisites
+
+- [ ] Git repository initialized (`.git/` exists)
+- [ ] Git remote configured (`git remote -v` shows origin)
+- [ ] Test framework configured (playwright.config._ or cypress.config._)
+- [ ] Local tests pass (`npm run test:e2e` succeeds)
+- [ ] Team agrees on CI platform
+- [ ] Access to CI platform settings (if updating)
+
+## Process Steps
+
+### Step 1: Preflight Checks
+
+- [ ] Git repository validated
+- [ ] Framework configuration detected
+- [ ] Local test execution successful
+- [ ] CI platform detected or selected
+- [ ] Node version identified (.nvmrc or default)
+- [ ] No blocking issues found
+
+### Step 2: CI Pipeline Configuration
+
+- [ ] CI configuration file created (`.github/workflows/test.yml` or `.gitlab-ci.yml`)
+- [ ] File is syntactically valid (no YAML errors)
+- [ ] Correct framework commands configured
+- [ ] Node version matches project
+- [ ] Test directory paths correct
+
+### Step 3: Parallel Sharding
+
+- [ ] Matrix strategy configured (4 shards default)
+- [ ] Shard syntax correct for framework
+- [ ] fail-fast set to false
+- [ ] Shard count appropriate for test suite size
+
+### Step 4: Burn-In Loop
+
+- [ ] Burn-in job created
+- [ ] 10 iterations configured
+- [ ] Proper exit on failure (`|| exit 1`)
+- [ ] Runs on appropriate triggers (PR, cron)
+- [ ] Failure artifacts uploaded
+
+### Step 5: Caching Configuration
+
+- [ ] Dependency cache configured (npm/yarn)
+- [ ] Cache key uses lockfile hash
+- [ ] Browser cache configured (Playwright/Cypress)
+- [ ] Restore-keys defined for fallback
+- [ ] Cache paths correct for platform
+
+### Step 6: Artifact Collection
+
+- [ ] Artifacts upload on failure only
+- [ ] Correct artifact paths (test-results/, traces/, etc.)
+- [ ] Retention days set (30 default)
+- [ ] Artifact names unique per shard
+- [ ] No sensitive data in artifacts
+
+### Step 7: Retry Logic
+
+- [ ] Retry action/strategy configured
+- [ ] Max attempts: 2-3
+- [ ] Timeout appropriate (30 min)
+- [ ] Retry only on transient errors
+
+### Step 8: Helper Scripts
+
+- [ ] `scripts/test-changed.sh` created
+- [ ] `scripts/ci-local.sh` created
+- [ ] `scripts/burn-in.sh` created (optional)
+- [ ] Scripts are executable (`chmod +x`)
+- [ ] Scripts use correct test commands
+- [ ] Shebang present (`#!/bin/bash`)
+
+### Step 9: Documentation
+
+- [ ] `docs/ci.md` created with pipeline guide
+- [ ] `docs/ci-secrets-checklist.md` created
+- [ ] Required secrets documented
+- [ ] Setup instructions clear
+- [ ] Troubleshooting section included
+- [ ] Badge URLs provided (optional)
+
+## Output Validation
+
+### Configuration Validation
+
+- [ ] CI file loads without errors
+- [ ] All paths resolve correctly
+- [ ] No hardcoded values (use env vars)
+- [ ] Triggers configured (push, pull_request, schedule)
+- [ ] Platform-specific syntax correct
+
+### Execution Validation
+
+- [ ] First CI run triggered (push to remote)
+- [ ] Pipeline starts without errors
+- [ ] All jobs appear in CI dashboard
+- [ ] Caching works (check logs for cache hit)
+- [ ] Tests execute in parallel
+- [ ] Artifacts collected on failure
+
+### Performance Validation
+
+- [ ] Lint stage: <2 minutes
+- [ ] Test stage (per shard): <10 minutes
+- [ ] Burn-in stage: <30 minutes
+- [ ] Total pipeline: <45 minutes
+- [ ] Cache reduces install time by 2-5 minutes
+
+## Quality Checks
+
+### Best Practices Compliance
+
+- [ ] Burn-in loop follows production patterns
+- [ ] Parallel sharding configured optimally
+- [ ] Failure-only artifact collection
+- [ ] Selective testing enabled (optional)
+- [ ] Retry logic handles transient failures only
+- [ ] No secrets in configuration files
+
+### Knowledge Base Alignment
+
+- [ ] Burn-in pattern matches `ci-burn-in.md`
+- [ ] Selective testing matches `selective-testing.md`
+- [ ] Artifact collection matches `visual-debugging.md`
+- [ ] Test quality matches `test-quality.md`
+
+### Security Checks
+
+- [ ] No credentials in CI configuration
+- [ ] Secrets use platform secret management
+- [ ] Environment variables for sensitive data
+- [ ] Artifact retention appropriate (not too long)
+- [ ] No debug output exposing secrets
+
+## Integration Points
+
+### Status File Integration
+
+- [ ] `bmm-workflow-status.md` exists
+- [ ] CI setup logged in Quality & Testing Progress section
+- [ ] Status updated with completion timestamp
+- [ ] Platform and configuration noted
+
+### Knowledge Base Integration
+
+- [ ] Relevant knowledge fragments loaded
+- [ ] Patterns applied from knowledge base
+- [ ] Documentation references knowledge base
+- [ ] Knowledge base references in README
+
+### Workflow Dependencies
+
+- [ ] `framework` workflow completed first
+- [ ] Can proceed to `atdd` workflow after CI setup
+- [ ] Can proceed to `automate` workflow
+- [ ] CI integrates with `gate` workflow
+
+## Completion Criteria
+
+**All must be true:**
+
+- [ ] All prerequisites met
+- [ ] All process steps completed
+- [ ] All output validations passed
+- [ ] All quality checks passed
+- [ ] All integration points verified
+- [ ] First CI run successful
+- [ ] Performance targets met
+- [ ] Documentation complete
+
+## Post-Workflow Actions
+
+**User must complete:**
+
+1. [ ] Commit CI configuration
+2. [ ] Push to remote repository
+3. [ ] Configure required secrets in CI platform
+4. [ ] Open PR to trigger first CI run
+5. [ ] Monitor and verify pipeline execution
+6. [ ] Adjust parallelism if needed (based on actual run times)
+7. [ ] Set up notifications (optional)
+
+**Recommended next workflows:**
+
+1. [ ] Run `atdd` workflow for test generation
+2. [ ] Run `automate` workflow for coverage expansion
+3. [ ] Run `gate` workflow for quality gates
+
+## Rollback Procedure
+
+If workflow fails:
+
+1. [ ] Delete CI configuration file
+2. [ ] Remove helper scripts directory
+3. [ ] Remove documentation (docs/ci.md, etc.)
+4. [ ] Clear CI platform secrets (if added)
+5. [ ] Review error logs
+6. [ ] Fix issues and retry workflow
+
+## Notes
+
+### Common Issues
+
+**Issue**: CI file syntax errors
+
+- **Solution**: Validate YAML syntax online or with linter
+
+**Issue**: Tests fail in CI but pass locally
+
+- **Solution**: Use `scripts/ci-local.sh` to mirror CI environment
+
+**Issue**: Caching not working
+
+- **Solution**: Check cache key formula, verify paths
+
+**Issue**: Burn-in too slow
+
+- **Solution**: Reduce iterations or run on cron only
+
+### Platform-Specific
+
+**GitHub Actions:**
+
+- Secrets: Repository Settings → Secrets and variables → Actions
+- Runners: Ubuntu latest recommended
+- Concurrency limits: 20 jobs for free tier
+
+**GitLab CI:**
+
+- Variables: Project Settings → CI/CD → Variables
+- Runners: Shared or project-specific
+- Pipeline quota: 400 minutes/month free tier
+
+---
+
+**Checklist Complete**: Sign off when all items validated.
+
+**Completed by:** **\*\***\_\_\_**\*\***
+**Date:** **\*\***\_\_\_**\*\***
+**Platform:** **\*\***\_\_\_**\*\*** (GitHub Actions / GitLab CI)
+**Notes:** \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
--- a/bmad/bmm/workflows/testarch/ci/github-actions-template.yaml
+++ b/bmad/bmm/workflows/testarch/ci/github-actions-template.yaml
@@ -0,0 +1,165 @@
+# GitHub Actions CI/CD Pipeline for Test Execution
+# Generated by BMad TEA Agent - Test Architect Module
+# Optimized for: Playwright/Cypress, Parallel Sharding, Burn-In Loop
+
+name: Test Pipeline
+
+on:
+  push:
+    branches: [main, develop]
+  pull_request:
+    branches: [main, develop]
+  schedule:
+    # Weekly burn-in on Sundays at 2 AM UTC
+    - cron: "0 2 * * 0"
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  # Lint stage - Code quality checks
+  lint:
+    name: Lint
+    runs-on: ubuntu-latest
+    timeout-minutes: 5
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ".nvmrc"
+          cache: "npm"
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Run linter
+        run: npm run lint
+
+  # Test stage - Parallel execution with sharding
+  test:
+    name: Test (Shard ${{ matrix.shard }})
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    needs: lint
+
+    strategy:
+      fail-fast: false
+      matrix:
+        shard: [1, 2, 3, 4]
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ".nvmrc"
+          cache: "npm"
+
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/ms-playwright
+          key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
+          restore-keys: |
+            ${{ runner.os }}-playwright-
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Install Playwright browsers
+        run: npx playwright install --with-deps chromium
+
+      - name: Run tests (shard ${{ matrix.shard }}/4)
+        run: npm run test:e2e -- --shard=${{ matrix.shard }}/4
+
+      - name: Upload test results
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: test-results-${{ matrix.shard }}
+          path: |
+            test-results/
+            playwright-report/
+          retention-days: 30
+
+  # Burn-in stage - Flaky test detection
+  burn-in:
+    name: Burn-In (Flaky Detection)
+    runs-on: ubuntu-latest
+    timeout-minutes: 60
+    needs: test
+    # Only run burn-in on PRs to main/develop or on schedule
+    if: github.event_name == 'pull_request' || github.event_name == 'schedule'
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: ".nvmrc"
+          cache: "npm"
+
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/ms-playwright
+          key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Install Playwright browsers
+        run: npx playwright install --with-deps chromium
+
+      - name: Run burn-in loop (10 iterations)
+        run: |
+          echo "🔥 Starting burn-in loop - detecting flaky tests"
+          for i in {1..10}; do
+            echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+            echo "🔥 Burn-in iteration $i/10"
+            echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+            npm run test:e2e || exit 1
+          done
+          echo "✅ Burn-in complete - no flaky tests detected"
+
+      - name: Upload burn-in failure artifacts
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: burn-in-failures
+          path: |
+            test-results/
+            playwright-report/
+          retention-days: 30
+
+  # Report stage - Aggregate and publish results
+  report:
+    name: Test Report
+    runs-on: ubuntu-latest
+    needs: [test, burn-in]
+    if: always()
+
+    steps:
+      - name: Download all artifacts
+        uses: actions/download-artifact@v4
+        with:
+          path: artifacts
+
+      - name: Generate summary
+        run: |
+          echo "## Test Execution Summary" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "- **Status**: ${{ needs.test.result }}" >> $GITHUB_STEP_SUMMARY
+          echo "- **Burn-in**: ${{ needs.burn-in.result }}" >> $GITHUB_STEP_SUMMARY
+          echo "- **Shards**: 4" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+
+          if [ "${{ needs.burn-in.result }}" == "failure" ]; then
+            echo "⚠️ **Flaky tests detected** - Review burn-in artifacts" >> $GITHUB_STEP_SUMMARY
+          fi
--- a/bmad/bmm/workflows/testarch/ci/gitlab-ci-template.yaml
+++ b/bmad/bmm/workflows/testarch/ci/gitlab-ci-template.yaml
@@ -0,0 +1,128 @@
+# GitLab CI/CD Pipeline for Test Execution
+# Generated by BMad TEA Agent - Test Architect Module
+# Optimized for: Playwright/Cypress, Parallel Sharding, Burn-In Loop
+
+stages:
+  - lint
+  - test
+  - burn-in
+  - report
+
+variables:
+  # Disable git depth for accurate change detection
+  GIT_DEPTH: 0
+  # Use npm ci for faster, deterministic installs
+  npm_config_cache: "$CI_PROJECT_DIR/.npm"
+  # Playwright browser cache
+  PLAYWRIGHT_BROWSERS_PATH: "$CI_PROJECT_DIR/.cache/ms-playwright"
+
+# Caching configuration
+cache:
+  key:
+    files:
+      - package-lock.json
+  paths:
+    - .npm/
+    - .cache/ms-playwright/
+    - node_modules/
+
+# Lint stage - Code quality checks
+lint:
+  stage: lint
+  image: node:20
+  script:
+    - npm ci
+    - npm run lint
+  timeout: 5 minutes
+
+# Test stage - Parallel execution with sharding
+.test-template: &test-template
+  stage: test
+  image: node:20
+  needs:
+    - lint
+  before_script:
+    - npm ci
+    - npx playwright install --with-deps chromium
+  artifacts:
+    when: on_failure
+    paths:
+      - test-results/
+      - playwright-report/
+    expire_in: 30 days
+  timeout: 30 minutes
+
+test:shard-1:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=1/4
+
+test:shard-2:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=2/4
+
+test:shard-3:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=3/4
+
+test:shard-4:
+  <<: *test-template
+  script:
+    - npm run test:e2e -- --shard=4/4
+
+# Burn-in stage - Flaky test detection
+burn-in:
+  stage: burn-in
+  image: node:20
+  needs:
+    - test:shard-1
+    - test:shard-2
+    - test:shard-3
+    - test:shard-4
+  # Only run burn-in on merge requests to main/develop or on schedule
+  rules:
+    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
+    - if: '$CI_PIPELINE_SOURCE == "schedule"'
+  before_script:
+    - npm ci
+    - npx playwright install --with-deps chromium
+  script:
+    - |
+      echo "🔥 Starting burn-in loop - detecting flaky tests"
+      for i in {1..10}; do
+        echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+        echo "🔥 Burn-in iteration $i/10"
+        echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+        npm run test:e2e || exit 1
+      done
+      echo "✅ Burn-in complete - no flaky tests detected"
+  artifacts:
+    when: on_failure
+    paths:
+      - test-results/
+      - playwright-report/
+    expire_in: 30 days
+  timeout: 60 minutes
+
+# Report stage - Aggregate results
+report:
+  stage: report
+  image: alpine:latest
+  needs:
+    - test:shard-1
+    - test:shard-2
+    - test:shard-3
+    - test:shard-4
+    - burn-in
+  when: always
+  script:
+    - |
+      echo "## Test Execution Summary"
+      echo ""
+      echo "- Pipeline: $CI_PIPELINE_ID"
+      echo "- Shards: 4"
+      echo "- Branch: $CI_COMMIT_REF_NAME"
+      echo ""
+      echo "View detailed results in job artifacts"
--- a/bmad/bmm/workflows/testarch/ci/instructions.md
+++ b/bmad/bmm/workflows/testarch/ci/instructions.md
@@ -0,0 +1,517 @@
+<!-- Powered by BMAD-CORE™ -->
+
+# CI/CD Pipeline Setup
+
+**Workflow ID**: `bmad/bmm/testarch/ci`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Scaffolds a production-ready CI/CD quality pipeline with test execution, burn-in loops for flaky test detection, parallel sharding, artifact collection, and notification configuration. This workflow creates platform-specific CI configuration optimized for fast feedback and reliable test execution.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ Git repository is initialized (`.git/` directory exists)
+- ✅ Local test suite passes (`npm run test:e2e` succeeds)
+- ✅ Test framework is configured (from `framework` workflow)
+- ✅ Team agrees on target CI platform (GitHub Actions, GitLab CI, Circle CI, etc.)
+- ✅ Access to CI platform settings/secrets available (if updating existing pipeline)
+
+---
+
+## Step 1: Run Preflight Checks
+
+### Actions
+
+1. **Verify Git Repository**
+   - Check for `.git/` directory
+   - Confirm remote repository configured (`git remote -v`)
+   - If not initialized, HALT with message: "Git repository required for CI/CD setup"
+
+2. **Validate Test Framework**
+   - Look for `playwright.config.*` or `cypress.config.*`
+   - Read framework configuration to extract:
+     - Test directory location
+     - Test command
+     - Reporter configuration
+     - Timeout settings
+   - If not found, HALT with message: "Run `framework` workflow first to set up test infrastructure"
+
+3. **Run Local Tests**
+   - Execute `npm run test:e2e` (or equivalent from package.json)
+   - Ensure tests pass before CI setup
+   - If tests fail, HALT with message: "Fix failing tests before setting up CI/CD"
+
+4. **Detect CI Platform**
+   - Check for existing CI configuration:
+     - `.github/workflows/*.yml` (GitHub Actions)
+     - `.gitlab-ci.yml` (GitLab CI)
+     - `.circleci/config.yml` (Circle CI)
+     - `Jenkinsfile` (Jenkins)
+   - If found, ask user: "Update existing CI configuration or create new?"
+   - If not found, detect platform from git remote:
+     - `github.com` → GitHub Actions (default)
+     - `gitlab.com` → GitLab CI
+     - Ask user if unable to auto-detect
+
+5. **Read Environment Configuration**
+   - Check for `.nvmrc` to determine Node version
+   - Default to Node 20 LTS if not found
+   - Read `package.json` to identify dependencies (affects caching strategy)
+
+**Halt Condition:** If preflight checks fail, stop immediately and report which requirement failed.
+
+---
+
+## Step 2: Scaffold CI Pipeline
+
+### Actions
+
+1. **Select CI Platform Template**
+
+   Based on detection or user preference, use the appropriate template:
+
+   **GitHub Actions** (`.github/workflows/test.yml`):
+   - Most common platform
+   - Excellent caching and matrix support
+   - Free for public repos, generous free tier for private
+
+   **GitLab CI** (`.gitlab-ci.yml`):
+   - Integrated with GitLab
+   - Built-in registry and runners
+   - Powerful pipeline features
+
+   **Circle CI** (`.circleci/config.yml`):
+   - Fast execution with parallelism
+   - Docker-first approach
+   - Enterprise features
+
+   **Jenkins** (`Jenkinsfile`):
+   - Self-hosted option
+   - Maximum customization
+   - Requires infrastructure management
+
+2. **Generate Pipeline Configuration**
+
+   Use templates from `{installed_path}/` directory:
+   - `github-actions-template.yml`
+   - `gitlab-ci-template.yml`
+
+   **Key pipeline stages:**
+
+   ```yaml
+   stages:
+     - lint # Code quality checks
+     - test # Test execution (parallel shards)
+     - burn-in # Flaky test detection
+     - report # Aggregate results and publish
+   ```
+
+3. **Configure Test Execution**
+
+   **Parallel Sharding:**
+
+   ```yaml
+   strategy:
+     fail-fast: false
+     matrix:
+       shard: [1, 2, 3, 4]
+
+   steps:
+     - name: Run tests
+       run: npm run test:e2e -- --shard=${{ matrix.shard }}/${{ strategy.job-total }}
+   ```
+
+   **Purpose:** Splits tests into N parallel jobs for faster execution (target: <10 min per shard)
+
+4. **Add Burn-In Loop**
+
+   **Critical pattern from production systems:**
+
+   ```yaml
+   burn-in:
+     name: Flaky Test Detection
+     runs-on: ubuntu-latest
+     steps:
+       - uses: actions/checkout@v4
+
+       - name: Setup Node
+         uses: actions/setup-node@v4
+         with:
+           node-version-file: '.nvmrc'
+
+       - name: Install dependencies
+         run: npm ci
+
+       - name: Run burn-in loop (10 iterations)
+         run: |
+           for i in {1..10}; do
+             echo "🔥 Burn-in iteration $i/10"
+             npm run test:e2e || exit 1
+           done
+
+       - name: Upload failure artifacts
+         if: failure()
+         uses: actions/upload-artifact@v4
+         with:
+           name: burn-in-failures
+           path: test-results/
+           retention-days: 30
+   ```
+
+   **Purpose:** Runs tests multiple times to catch non-deterministic failures before they reach main branch.
+
+   **When to run:**
+   - On pull requests to main/develop
+   - Weekly on cron schedule
+   - After significant test infrastructure changes
+
+5. **Configure Caching**
+
+   **Node modules cache:**
+
+   ```yaml
+   - name: Cache dependencies
+     uses: actions/cache@v4
+     with:
+       path: ~/.npm
+       key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
+       restore-keys: |
+         ${{ runner.os }}-node-
+   ```
+
+   **Browser binaries cache (Playwright):**
+
+   ```yaml
+   - name: Cache Playwright browsers
+     uses: actions/cache@v4
+     with:
+       path: ~/.cache/ms-playwright
+       key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
+   ```
+
+   **Purpose:** Reduces CI execution time by 2-5 minutes per run.
+
+6. **Configure Artifact Collection**
+
+   **Failure artifacts only:**
+
+   ```yaml
+   - name: Upload test results
+     if: failure()
+     uses: actions/upload-artifact@v4
+     with:
+       name: test-results-${{ matrix.shard }}
+       path: |
+         test-results/
+         playwright-report/
+       retention-days: 30
+   ```
+
+   **Artifacts to collect:**
+   - Traces (Playwright) - full debugging context
+   - Screenshots - visual evidence of failures
+   - Videos - interaction playback
+   - HTML reports - detailed test results
+   - Console logs - error messages and warnings
+
+7. **Add Retry Logic**
+
+   ```yaml
+   - name: Run tests with retries
+     uses: nick-invision/retry@v2
+     with:
+       timeout_minutes: 30
+       max_attempts: 3
+       retry_on: error
+       command: npm run test:e2e
+   ```
+
+   **Purpose:** Handles transient failures (network issues, race conditions)
+
+8. **Configure Notifications** (Optional)
+
+   If `notify_on_failure` is enabled:
+
+   ```yaml
+   - name: Notify on failure
+     if: failure()
+     uses: 8398a7/action-slack@v3
+     with:
+       status: ${{ job.status }}
+       text: 'Test failures detected in PR #${{ github.event.pull_request.number }}'
+       webhook_url: ${{ secrets.SLACK_WEBHOOK }}
+   ```
+
+9. **Generate Helper Scripts**
+
+   **Selective testing script** (`scripts/test-changed.sh`):
+
+   ```bash
+   #!/bin/bash
+   # Run only tests for changed files
+
+   CHANGED_FILES=$(git diff --name-only HEAD~1)
+
+   if echo "$CHANGED_FILES" | grep -q "src/.*\.ts$"; then
+     echo "Running affected tests..."
+     npm run test:e2e -- --grep="$(echo $CHANGED_FILES | sed 's/src\///g' | sed 's/\.ts//g')"
+   else
+     echo "No test-affecting changes detected"
+   fi
+   ```
+
+   **Local mirror script** (`scripts/ci-local.sh`):
+
+   ```bash
+   #!/bin/bash
+   # Mirror CI execution locally for debugging
+
+   echo "🔍 Running CI pipeline locally..."
+
+   # Lint
+   npm run lint || exit 1
+
+   # Tests
+   npm run test:e2e || exit 1
+
+   # Burn-in (reduced iterations)
+   for i in {1..3}; do
+     echo "🔥 Burn-in $i/3"
+     npm run test:e2e || exit 1
+   done
+
+   echo "✅ Local CI pipeline passed"
+   ```
+
+10. **Generate Documentation**
+
+    **CI README** (`docs/ci.md`):
+    - Pipeline stages and purpose
+    - How to run locally
+    - Debugging failed CI runs
+    - Secrets and environment variables needed
+    - Notification setup
+    - Badge URLs for README
+
+    **Secrets checklist** (`docs/ci-secrets-checklist.md`):
+    - Required secrets list (SLACK_WEBHOOK, etc.)
+    - Where to configure in CI platform
+    - Security best practices
+
+---
+
+## Step 3: Deliverables
+
+### Primary Artifacts Created
+
+1. **CI Configuration File**
+   - `.github/workflows/test.yml` (GitHub Actions)
+   - `.gitlab-ci.yml` (GitLab CI)
+   - `.circleci/config.yml` (Circle CI)
+
+2. **Pipeline Stages**
+   - **Lint**: Code quality checks (ESLint, Prettier)
+   - **Test**: Parallel test execution (4 shards)
+   - **Burn-in**: Flaky test detection (10 iterations)
+   - **Report**: Result aggregation and publishing
+
+3. **Helper Scripts**
+   - `scripts/test-changed.sh` - Selective testing
+   - `scripts/ci-local.sh` - Local CI mirror
+   - `scripts/burn-in.sh` - Standalone burn-in execution
+
+4. **Documentation**
+   - `docs/ci.md` - CI pipeline guide
+   - `docs/ci-secrets-checklist.md` - Required secrets
+   - Inline comments in CI configuration
+
+5. **Optimization Features**
+   - Dependency caching (npm, browser binaries)
+   - Parallel sharding (4 jobs default)
+   - Retry logic (2 retries on failure)
+   - Failure-only artifact upload
+
+### Performance Targets
+
+- **Lint stage**: <2 minutes
+- **Test stage** (per shard): <10 minutes
+- **Burn-in stage**: <30 minutes (10 iterations)
+- **Total pipeline**: <45 minutes
+
+**Speedup:** 20× faster than sequential execution through parallelism and caching.
+
+---
+
+## Important Notes
+
+### Knowledge Base Integration
+
+**Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to identify and load relevant knowledge fragments:
+
+- `ci-burn-in.md` - Burn-in loop patterns: 10-iteration detection, GitHub Actions workflow, shard orchestration, selective execution (678 lines, 4 examples)
+- `selective-testing.md` - Changed test detection strategies: tag-based, spec filters, diff-based selection, promotion rules (727 lines, 4 examples)
+- `visual-debugging.md` - Artifact collection best practices: trace viewer, HAR recording, custom artifacts, accessibility integration (522 lines, 5 examples)
+- `test-quality.md` - CI-specific test quality criteria: deterministic tests, isolated with cleanup, explicit assertions, length/time optimization (658 lines, 5 examples)
+- `playwright-config.md` - CI-optimized configuration: parallelization, artifact output, project dependencies, sharding (722 lines, 5 examples)
+
+### CI Platform-Specific Guidance
+
+**GitHub Actions:**
+
+- Use `actions/cache` for caching
+- Matrix strategy for parallelism
+- Secrets in repository settings
+- Free 2000 minutes/month for private repos
+
+**GitLab CI:**
+
+- Use `.gitlab-ci.yml` in root
+- `cache:` directive for caching
+- Parallel execution with `parallel: 4`
+- Variables in project CI/CD settings
+
+**Circle CI:**
+
+- Use `.circleci/config.yml`
+- Docker executors recommended
+- Parallelism with `parallelism: 4`
+- Context for shared secrets
+
+### Burn-In Loop Strategy
+
+**When to run:**
+
+- ✅ On PRs to main/develop branches
+- ✅ Weekly on schedule (cron)
+- ✅ After test infrastructure changes
+- ❌ Not on every commit (too slow)
+
+**Iterations:**
+
+- **10 iterations** for thorough detection
+- **3 iterations** for quick feedback
+- **100 iterations** for high-confidence stability
+
+**Failure threshold:**
+
+- Even ONE failure in burn-in → tests are flaky
+- Must fix before merging
+
+### Artifact Retention
+
+**Failure artifacts only:**
+
+- Saves storage costs
+- Maintains debugging capability
+- 30-day retention default
+
+**Artifact types:**
+
+- Traces (Playwright) - 5-10 MB per test
+- Screenshots - 100-500 KB per screenshot
+- Videos - 2-5 MB per test
+- HTML reports - 1-2 MB per run
+
+### Selective Testing
+
+**Detect changed files:**
+
+```bash
+git diff --name-only HEAD~1
+```
+
+**Run affected tests only:**
+
+- Faster feedback for small changes
+- Full suite still runs on main branch
+- Reduces CI time by 50-80% for focused PRs
+
+**Trade-off:**
+
+- May miss integration issues
+- Run full suite at least on merge
+
+### Local CI Mirror
+
+**Purpose:** Debug CI failures locally
+
+**Usage:**
+
+```bash
+./scripts/ci-local.sh
+```
+
+**Mirrors CI environment:**
+
+- Same Node version
+- Same test command
+- Same stages (lint → test → burn-in)
+- Reduced burn-in iterations (3 vs 10)
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## CI/CD Pipeline Complete
+
+**Platform**: GitHub Actions (or GitLab CI, etc.)
+
+**Artifacts Created**:
+
+- ✅ Pipeline configuration: .github/workflows/test.yml
+- ✅ Burn-in loop: 10 iterations for flaky detection
+- ✅ Parallel sharding: 4 jobs for fast execution
+- ✅ Caching: Dependencies + browser binaries
+- ✅ Artifact collection: Failure-only traces/screenshots/videos
+- ✅ Helper scripts: test-changed.sh, ci-local.sh, burn-in.sh
+- ✅ Documentation: docs/ci.md, docs/ci-secrets-checklist.md
+
+**Performance:**
+
+- Lint: <2 min
+- Test (per shard): <10 min
+- Burn-in: <30 min
+- Total: <45 min (20× speedup vs sequential)
+
+**Next Steps**:
+
+1. Commit CI configuration: `git add .github/workflows/test.yml && git commit -m "ci: add test pipeline"`
+2. Push to remote: `git push`
+3. Configure required secrets in CI platform settings (see docs/ci-secrets-checklist.md)
+4. Open a PR to trigger first CI run
+5. Monitor pipeline execution and adjust parallelism if needed
+
+**Knowledge Base References Applied**:
+
+- Burn-in loop pattern (ci-burn-in.md)
+- Selective testing strategy (selective-testing.md)
+- Artifact collection (visual-debugging.md)
+- Test quality criteria (test-quality.md)
+```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] CI configuration file created and syntactically valid
+- [ ] Burn-in loop configured (10 iterations)
+- [ ] Parallel sharding enabled (4 jobs)
+- [ ] Caching configured (dependencies + browsers)
+- [ ] Artifact collection on failure only
+- [ ] Helper scripts created and executable (`chmod +x`)
+- [ ] Documentation complete (ci.md, secrets checklist)
+- [ ] No errors or warnings during scaffold
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/bmad/bmm/workflows/testarch/ci/workflow.yaml
+++ b/bmad/bmm/workflows/testarch/ci/workflow.yaml
@@ -0,0 +1,53 @@
+# Test Architect workflow: ci
+name: testarch-ci
+description: "Scaffold CI/CD quality pipeline with test execution, burn-in loops, and artifact collection"
+author: "BMad"
+
+# Critical variables from config
+config_source: "{project-root}/bmad/bmm/config.yaml"
+output_folder: "{config_source}:output_folder"
+user_name: "{config_source}:user_name"
+communication_language: "{config_source}:communication_language"
+document_output_language: "{config_source}:document_output_language"
+date: system-generated
+
+# Workflow components
+installed_path: "{project-root}/bmad/bmm/workflows/testarch/ci"
+instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+
+# Variables and inputs
+variables:
+  ci_platform: "auto" # auto, github-actions, gitlab-ci, circle-ci, jenkins - user can override
+  test_dir: "{project-root}/tests" # Root test directory
+
+# Output configuration
+default_output_file: "{project-root}/.github/workflows/test.yml" # GitHub Actions default
+
+# Required tools
+required_tools:
+  - read_file # Read .nvmrc, package.json, framework config
+  - write_file # Create CI config, scripts, documentation
+  - create_directory # Create .github/workflows/ or .gitlab-ci/ directories
+  - list_files # Detect existing CI configuration
+  - search_repo # Find test files for selective testing
+
+# Recommended inputs
+recommended_inputs:
+  - framework_config: "Framework configuration (playwright.config.ts, cypress.config.ts)"
+  - package_json: "Project dependencies and scripts"
+  - nvmrc: ".nvmrc for Node version (optional, defaults to LTS)"
+  - existing_ci: "Existing CI configuration to update (optional)"
+  - git_info: "Git repository information for platform detection"
+
+tags:
+  - qa
+  - ci-cd
+  - test-architect
+  - pipeline
+  - automation
+
+execution_hints:
+  interactive: false # Minimize prompts, auto-detect when possible
+  autonomous: true # Proceed without user input unless blocked
+  iterative: true
--- a/bmad/bmm/workflows/testarch/framework/README.md
+++ b/bmad/bmm/workflows/testarch/framework/README.md
@@ -0,0 +1,340 @@
+# Test Framework Setup Workflow
+
+Initializes a production-ready test framework architecture (Playwright or Cypress) with fixtures, helpers, configuration, and industry best practices. This workflow scaffolds the complete testing infrastructure for modern web applications, providing a robust foundation for test automation.
+
+## Usage
+
+```bash
+bmad tea *framework
+```
+
+The TEA agent runs this workflow when:
+
+- Starting a new project that needs test infrastructure
+- Migrating from an older testing approach
+- Setting up testing from scratch
+- Standardizing test architecture across teams
+
+## Inputs
+
+**Required Context Files:**
+
+- **package.json**: Project dependencies and scripts to detect project type and bundler
+
+**Optional Context Files:**
+
+- **Architecture docs** (architecture.md, tech-spec.md): Informs framework configuration decisions
+- **Existing tests**: Detects current framework to avoid conflicts
+
+**Workflow Variables:**
+
+- `test_framework`: Auto-detected (playwright/cypress) or manually specified
+- `project_type`: Auto-detected from package.json (react/vue/angular/next/node)
+- `bundler`: Auto-detected from package.json (vite/webpack/rollup/esbuild)
+- `test_dir`: Root test directory (default: `{project-root}/tests`)
+- `use_typescript`: Prefer TypeScript configuration (default: true)
+- `framework_preference`: Auto-detection or force specific framework (default: "auto")
+
+## Outputs
+
+**Primary Deliverables:**
+
+1. **Configuration File**
+   - `playwright.config.ts` or `cypress.config.ts` with production-ready settings
+   - Timeouts: action 15s, navigation 30s, test 60s
+   - Reporters: HTML + JUnit XML
+   - Failure-only artifacts (traces, screenshots, videos)
+
+2. **Directory Structure**
+
+   ```
+   tests/
+   ├── e2e/                          # Test files (organize as needed)
+   ├── support/                      # Framework infrastructure (key pattern)
+   │   ├── fixtures/                 # Test fixtures with auto-cleanup
+   │   │   ├── index.ts             # Fixture merging
+   │   │   └── factories/           # Data factories (faker-based)
+   │   ├── helpers/                 # Utility functions
+   │   └── page-objects/            # Page object models (optional)
+   └── README.md                    # Setup and usage guide
+   ```
+
+   **Note**: Test organization (e2e/, api/, integration/, etc.) is flexible. The **support/** folder contains reusable fixtures, helpers, and factories - the core framework pattern.
+
+3. **Environment Configuration**
+   - `.env.example` with `TEST_ENV`, `BASE_URL`, `API_URL`, auth credentials
+   - `.nvmrc` with Node version (LTS)
+
+4. **Test Infrastructure**
+   - Fixture architecture using `mergeTests` pattern
+   - Data factories with auto-cleanup (faker-based)
+   - Sample tests demonstrating best practices
+   - Helper utilities for common operations
+
+5. **Documentation**
+   - `tests/README.md` with comprehensive setup instructions
+   - Inline comments explaining configuration choices
+   - References to TEA knowledge base
+
+**Secondary Deliverables:**
+
+- Updated `package.json` with minimal test script (`test:e2e`)
+- Sample test demonstrating fixture usage
+- Network-first testing patterns
+- Selector strategy guidance (data-testid)
+
+**Validation Safeguards:**
+
+- ✅ No existing framework detected (prevents conflicts)
+- ✅ package.json exists and is valid
+- ✅ Framework auto-detection successful or explicit choice provided
+- ✅ Sample test runs successfully
+- ✅ All generated files are syntactically correct
+
+## Key Features
+
+### Smart Framework Selection
+
+- **Auto-detection logic** based on project characteristics:
+  - **Playwright** recommended for: Large repos (100+ files), performance-critical apps, multi-browser support, complex debugging needs
+  - **Cypress** recommended for: Small teams prioritizing DX, component testing focus, real-time test development
+- Falls back to Playwright as default if uncertain
+
+### Production-Ready Patterns
+
+- **Fixture Architecture**: Pure function → fixture → `mergeTests` composition pattern
+- **Auto-Cleanup**: Fixtures automatically clean up test data in teardown
+- **Network-First**: Route interception before navigation to prevent race conditions
+- **Failure-Only Artifacts**: Screenshots/videos/traces only captured on failure to reduce storage
+- **Parallel Execution**: Configured for optimal CI performance
+
+### Industry Best Practices
+
+- **Selector Strategy**: Prescriptive guidance on `data-testid` attributes
+- **Data Factories**: Faker-based factories for realistic test data
+- **Contract Testing**: Recommends Pact for microservices architectures
+- **Error Handling**: Comprehensive timeout and retry configuration
+- **Reporting**: Multiple reporter formats (HTML, JUnit, console)
+
+### Knowledge Base Integration
+
+Automatically consults TEA knowledge base:
+
+- `fixture-architecture.md` - Pure function → fixture → mergeTests pattern
+- `data-factories.md` - Faker-based factories with auto-cleanup
+- `network-first.md` - Network interception before navigation
+- `playwright-config.md` - Playwright-specific best practices
+- `test-config.md` - General configuration guidelines
+
+## Integration with Other Workflows
+
+**Before framework:**
+
+- **prd** (Phase 2): Determines project scope and testing needs
+- **workflow-status**: Verifies project readiness
+
+**After framework:**
+
+- **ci**: Scaffold CI/CD pipeline using framework configuration
+- **test-design**: Plan test coverage strategy for the project
+- **atdd**: Generate failing acceptance tests using the framework
+
+**Coordinates with:**
+
+- **architecture** (Phase 3): Aligns test structure with system architecture
+- **tech-spec**: Uses technical specifications to inform test configuration
+
+**Updates:**
+
+- `bmm-workflow-status.md`: Adds framework initialization to Quality & Testing Progress section
+
+## Important Notes
+
+### Preflight Checks
+
+**Critical requirements** verified before scaffolding:
+
+- package.json exists in project root
+- No modern E2E framework already configured
+- Architecture/stack context available
+
+If any check fails, workflow **HALTS** and notifies user.
+
+### Framework-Specific Guidance
+
+**Playwright Advantages:**
+
+- Worker parallelism (significantly faster for large suites)
+- Trace viewer (powerful debugging with screenshots, network, console logs)
+- Multi-language support (TypeScript, JavaScript, Python, C#, Java)
+- Built-in API testing capabilities
+- Better handling of multiple browser contexts
+
+**Cypress Advantages:**
+
+- Superior developer experience (real-time reloading)
+- Excellent for component testing
+- Simpler setup for small teams
+- Better suited for watch mode during development
+
+**Avoid Cypress when:**
+
+- API chains are heavy and complex
+- Multi-tab/window scenarios are common
+- Worker parallelism is critical for CI performance
+
+### Selector Strategy
+
+**Always recommend:**
+
+- `data-testid` attributes for UI elements (framework-agnostic)
+- `data-cy` attributes if Cypress is chosen (Cypress-specific)
+- Avoid brittle CSS selectors or XPath
+
+### Standalone Operation
+
+This workflow operates independently:
+
+- **No story required**: Can be run at project initialization
+- **No epic context needed**: Works for greenfield and brownfield projects
+- **Autonomous**: Auto-detects configuration and proceeds without user input
+
+### Output Summary Format
+
+After completion, provides structured summary:
+
+```markdown
+## Framework Scaffold Complete
+
+**Framework Selected**: Playwright (or Cypress)
+
+**Artifacts Created**:
+
+- ✅ Configuration file: playwright.config.ts
+- ✅ Directory structure: tests/e2e/, tests/support/
+- ✅ Environment config: .env.example
+- ✅ Node version: .nvmrc
+- ✅ Fixture architecture: tests/support/fixtures/
+- ✅ Data factories: tests/support/fixtures/factories/
+- ✅ Sample tests: tests/e2e/example.spec.ts
+- ✅ Documentation: tests/README.md
+
+**Next Steps**:
+
+1. Copy .env.example to .env and fill in environment variables
+2. Run npm install to install test dependencies
+3. Run npm run test:e2e to execute sample tests
+4. Review tests/README.md for detailed setup instructions
+
+**Knowledge Base References Applied**:
+
+- Fixture architecture pattern (pure functions + mergeTests)
+- Data factories with auto-cleanup (faker-based)
+- Network-first testing safeguards
+- Failure-only artifact capture
+```
+
+## Validation Checklist
+
+After workflow completion, verify:
+
+- [ ] Configuration file created and syntactically valid
+- [ ] Directory structure exists with all folders
+- [ ] Environment configuration generated (.env.example, .nvmrc)
+- [ ] Sample tests run successfully (npm run test:e2e)
+- [ ] Documentation complete and accurate (tests/README.md)
+- [ ] No errors or warnings during scaffold
+- [ ] package.json scripts updated correctly
+- [ ] Fixtures and factories follow patterns from knowledge base
+
+Refer to `checklist.md` for comprehensive validation criteria.
+
+## Example Execution
+
+**Scenario 1: New React + Vite project**
+
+```bash
+# User runs framework workflow
+bmad tea *framework
+
+# TEA detects:
+# - React project (from package.json)
+# - Vite bundler
+# - No existing test framework
+# - 150+ files (recommends Playwright)
+
+# TEA scaffolds:
+# - playwright.config.ts with Vite detection
+# - Component testing configuration
+# - React Testing Library helpers
+# - Sample component + E2E tests
+```
+
+**Scenario 2: Existing Node.js API project**
+
+```bash
+# User runs framework workflow
+bmad tea *framework
+
+# TEA detects:
+# - Node.js backend (no frontend framework)
+# - Express framework
+# - Small project (50 files)
+# - API endpoints in routes/
+
+# TEA scaffolds:
+# - playwright.config.ts focused on API testing
+# - tests/api/ directory structure
+# - API helper utilities
+# - Sample API tests with auth
+```
+
+**Scenario 3: Cypress preferred (explicit)**
+
+```bash
+# User sets framework preference
+# (in workflow config: framework_preference: "cypress")
+
+bmad tea *framework
+
+# TEA scaffolds:
+# - cypress.config.ts
+# - tests/e2e/ with Cypress patterns
+# - Cypress-specific commands
+# - data-cy selector strategy
+```
+
+## Troubleshooting
+
+**Issue: "Existing test framework detected"**
+
+- **Cause**: playwright.config._ or cypress.config._ already exists
+- **Solution**: Use `upgrade-framework` workflow (TBD) or manually remove existing config
+
+**Issue: "Cannot detect project type"**
+
+- **Cause**: package.json missing or malformed
+- **Solution**: Ensure package.json exists and has valid dependencies
+
+**Issue: "Sample test fails to run"**
+
+- **Cause**: Missing dependencies or incorrect BASE_URL
+- **Solution**: Run `npm install` and configure `.env` with correct URLs
+
+**Issue: "TypeScript compilation errors"**
+
+- **Cause**: Missing @types packages or tsconfig misconfiguration
+- **Solution**: Ensure TypeScript and type definitions are installed
+
+## Related Workflows
+
+- **ci**: Scaffold CI/CD pipeline → [ci/README.md](../ci/README.md)
+- **test-design**: Plan test coverage → [test-design/README.md](../test-design/README.md)
+- **atdd**: Generate acceptance tests → [atdd/README.md](../atdd/README.md)
+- **automate**: Expand regression suite → [automate/README.md](../automate/README.md)
+
+## Version History
+
+- **v4.0 (BMad v6)**: Pure markdown instructions, enhanced workflow.yaml, comprehensive README
+- **v3.x**: XML format instructions
+- **v2.x**: Legacy task-based approach
--- a/bmad/bmm/workflows/testarch/framework/checklist.md
+++ b/bmad/bmm/workflows/testarch/framework/checklist.md
@@ -0,0 +1,321 @@
+# Test Framework Setup - Validation Checklist
+
+This checklist ensures the framework workflow completes successfully and all deliverables meet quality standards.
+
+---
+
+## Prerequisites
+
+Before starting the workflow:
+
+- [ ] Project root contains valid `package.json`
+- [ ] No existing modern E2E framework detected (`playwright.config.*`, `cypress.config.*`)
+- [ ] Project type identifiable (React, Vue, Angular, Next.js, Node, etc.)
+- [ ] Bundler identifiable (Vite, Webpack, Rollup, esbuild) or not applicable
+- [ ] User has write permissions to create directories and files
+
+---
+
+## Process Steps
+
+### Step 1: Preflight Checks
+
+- [ ] package.json successfully read and parsed
+- [ ] Project type extracted correctly
+- [ ] Bundler identified (or marked as N/A for backend projects)
+- [ ] No framework conflicts detected
+- [ ] Architecture documents located (if available)
+
+### Step 2: Framework Selection
+
+- [ ] Framework auto-detection logic executed
+- [ ] Framework choice justified (Playwright vs Cypress)
+- [ ] Framework preference respected (if explicitly set)
+- [ ] User notified of framework selection and rationale
+
+### Step 3: Directory Structure
+
+- [ ] `tests/` root directory created
+- [ ] `tests/e2e/` directory created (or user's preferred structure)
+- [ ] `tests/support/` directory created (critical pattern)
+- [ ] `tests/support/fixtures/` directory created
+- [ ] `tests/support/fixtures/factories/` directory created
+- [ ] `tests/support/helpers/` directory created
+- [ ] `tests/support/page-objects/` directory created (if applicable)
+- [ ] All directories have correct permissions
+
+**Note**: Test organization is flexible (e2e/, api/, integration/). The **support/** folder is the key pattern.
+
+### Step 4: Configuration Files
+
+- [ ] Framework config file created (`playwright.config.ts` or `cypress.config.ts`)
+- [ ] Config file uses TypeScript (if `use_typescript: true`)
+- [ ] Timeouts configured correctly (action: 15s, navigation: 30s, test: 60s)
+- [ ] Base URL configured with environment variable fallback
+- [ ] Trace/screenshot/video set to retain-on-failure
+- [ ] Multiple reporters configured (HTML + JUnit + console)
+- [ ] Parallel execution enabled
+- [ ] CI-specific settings configured (retries, workers)
+- [ ] Config file is syntactically valid (no compilation errors)
+
+### Step 5: Environment Configuration
+
+- [ ] `.env.example` created in project root
+- [ ] `TEST_ENV` variable defined
+- [ ] `BASE_URL` variable defined with default
+- [ ] `API_URL` variable defined (if applicable)
+- [ ] Authentication variables defined (if applicable)
+- [ ] Feature flag variables defined (if applicable)
+- [ ] `.nvmrc` created with appropriate Node version
+
+### Step 6: Fixture Architecture
+
+- [ ] `tests/support/fixtures/index.ts` created
+- [ ] Base fixture extended from Playwright/Cypress
+- [ ] Type definitions for fixtures created
+- [ ] mergeTests pattern implemented (if multiple fixtures)
+- [ ] Auto-cleanup logic included in fixtures
+- [ ] Fixture architecture follows knowledge base patterns
+
+### Step 7: Data Factories
+
+- [ ] At least one factory created (e.g., UserFactory)
+- [ ] Factories use @faker-js/faker for realistic data
+- [ ] Factories track created entities (for cleanup)
+- [ ] Factories implement `cleanup()` method
+- [ ] Factories integrate with fixtures
+- [ ] Factories follow knowledge base patterns
+
+### Step 8: Sample Tests
+
+- [ ] Example test file created (`tests/e2e/example.spec.ts`)
+- [ ] Test uses fixture architecture
+- [ ] Test demonstrates data factory usage
+- [ ] Test uses proper selector strategy (data-testid)
+- [ ] Test follows Given-When-Then structure
+- [ ] Test includes proper assertions
+- [ ] Network interception demonstrated (if applicable)
+
+### Step 9: Helper Utilities
+
+- [ ] API helper created (if API testing needed)
+- [ ] Network helper created (if network mocking needed)
+- [ ] Auth helper created (if authentication needed)
+- [ ] Helpers follow functional patterns
+- [ ] Helpers have proper error handling
+
+### Step 10: Documentation
+
+- [ ] `tests/README.md` created
+- [ ] Setup instructions included
+- [ ] Running tests section included
+- [ ] Architecture overview section included
+- [ ] Best practices section included
+- [ ] CI integration section included
+- [ ] Knowledge base references included
+- [ ] Troubleshooting section included
+
+### Step 11: Package.json Updates
+
+- [ ] Minimal test script added to package.json: `test:e2e`
+- [ ] Test framework dependency added (if not already present)
+- [ ] Type definitions added (if TypeScript)
+- [ ] Users can extend with additional scripts as needed
+
+---
+
+## Output Validation
+
+### Configuration Validation
+
+- [ ] Config file loads without errors
+- [ ] Config file passes linting (if linter configured)
+- [ ] Config file uses correct syntax for chosen framework
+- [ ] All paths in config resolve correctly
+- [ ] Reporter output directories exist or are created on test run
+
+### Test Execution Validation
+
+- [ ] Sample test runs successfully
+- [ ] Test execution produces expected output (pass/fail)
+- [ ] Test artifacts generated correctly (traces, screenshots, videos)
+- [ ] Test report generated successfully
+- [ ] No console errors or warnings during test run
+
+### Directory Structure Validation
+
+- [ ] All required directories exist
+- [ ] Directory structure matches framework conventions
+- [ ] No duplicate or conflicting directories
+- [ ] Directories accessible with correct permissions
+
+### File Integrity Validation
+
+- [ ] All generated files are syntactically correct
+- [ ] No placeholder text left in files (e.g., "TODO", "FIXME")
+- [ ] All imports resolve correctly
+- [ ] No hardcoded credentials or secrets in files
+- [ ] All file paths use correct separators for OS
+
+---
+
+## Quality Checks
+
+### Code Quality
+
+- [ ] Generated code follows project coding standards
+- [ ] TypeScript types are complete and accurate (no `any` unless necessary)
+- [ ] No unused imports or variables
+- [ ] Consistent code formatting (matches project style)
+- [ ] No linting errors in generated files
+
+### Best Practices Compliance
+
+- [ ] Fixture architecture follows pure function → fixture → mergeTests pattern
+- [ ] Data factories implement auto-cleanup
+- [ ] Network interception occurs before navigation
+- [ ] Selectors use data-testid strategy
+- [ ] Artifacts only captured on failure
+- [ ] Tests follow Given-When-Then structure
+- [ ] No hard-coded waits or sleeps
+
+### Knowledge Base Alignment
+
+- [ ] Fixture pattern matches `fixture-architecture.md`
+- [ ] Data factories match `data-factories.md`
+- [ ] Network handling matches `network-first.md`
+- [ ] Config follows `playwright-config.md` or `test-config.md`
+- [ ] Test quality matches `test-quality.md`
+
+### Security Checks
+
+- [ ] No credentials in configuration files
+- [ ] .env.example contains placeholders, not real values
+- [ ] Sensitive test data handled securely
+- [ ] API keys and tokens use environment variables
+- [ ] No secrets committed to version control
+
+---
+
+## Integration Points
+
+### Status File Integration
+
+- [ ] `bmm-workflow-status.md` exists
+- [ ] Framework initialization logged in Quality & Testing Progress section
+- [ ] Status file updated with completion timestamp
+- [ ] Status file shows framework: Playwright or Cypress
+
+### Knowledge Base Integration
+
+- [ ] Relevant knowledge fragments identified from tea-index.csv
+- [ ] Knowledge fragments successfully loaded
+- [ ] Patterns from knowledge base applied correctly
+- [ ] Knowledge base references included in documentation
+
+### Workflow Dependencies
+
+- [ ] Can proceed to `ci` workflow after completion
+- [ ] Can proceed to `test-design` workflow after completion
+- [ ] Can proceed to `atdd` workflow after completion
+- [ ] Framework setup compatible with downstream workflows
+
+---
+
+## Completion Criteria
+
+**All of the following must be true:**
+
+- [ ] All prerequisite checks passed
+- [ ] All process steps completed without errors
+- [ ] All output validations passed
+- [ ] All quality checks passed
+- [ ] All integration points verified
+- [ ] Sample test executes successfully
+- [ ] User can run `npm run test:e2e` without errors
+- [ ] Documentation is complete and accurate
+- [ ] No critical issues or blockers identified
+
+---
+
+## Post-Workflow Actions
+
+**User must complete:**
+
+1. [ ] Copy `.env.example` to `.env`
+2. [ ] Fill in environment-specific values in `.env`
+3. [ ] Run `npm install` to install test dependencies
+4. [ ] Run `npm run test:e2e` to verify setup
+5. [ ] Review `tests/README.md` for project-specific guidance
+
+**Recommended next workflows:**
+
+1. [ ] Run `ci` workflow to set up CI/CD pipeline
+2. [ ] Run `test-design` workflow to plan test coverage
+3. [ ] Run `atdd` workflow when ready to develop stories
+
+---
+
+## Rollback Procedure
+
+If workflow fails and needs to be rolled back:
+
+1. [ ] Delete `tests/` directory
+2. [ ] Remove test scripts from package.json
+3. [ ] Delete `.env.example` (if created)
+4. [ ] Delete `.nvmrc` (if created)
+5. [ ] Delete framework config file
+6. [ ] Remove test dependencies from package.json (if added)
+7. [ ] Run `npm install` to clean up node_modules
+
+---
+
+## Notes
+
+### Common Issues
+
+**Issue**: Config file has TypeScript errors
+
+- **Solution**: Ensure `@playwright/test` or `cypress` types are installed
+
+**Issue**: Sample test fails to run
+
+- **Solution**: Check BASE_URL in .env, ensure app is running
+
+**Issue**: Fixture cleanup not working
+
+- **Solution**: Verify cleanup() is called in fixture teardown
+
+**Issue**: Network interception not working
+
+- **Solution**: Ensure route setup occurs before page.goto()
+
+### Framework-Specific Considerations
+
+**Playwright:**
+
+- Requires Node.js 18+
+- Browser binaries auto-installed on first run
+- Trace viewer requires running `npx playwright show-trace`
+
+**Cypress:**
+
+- Requires Node.js 18+
+- Cypress app opens on first run
+- Component testing requires additional setup
+
+### Version Compatibility
+
+- [ ] Node.js version matches .nvmrc
+- [ ] Framework version compatible with Node.js version
+- [ ] TypeScript version compatible with framework
+- [ ] All peer dependencies satisfied
+
+---
+
+**Checklist Complete**: Sign off when all items checked and validated.
+
+**Completed by:** **\*\***\_\_\_**\*\***
+**Date:** **\*\***\_\_\_**\*\***
+**Framework:** **\*\***\_\_\_**\*\*** (Playwright / Cypress)
+**Notes:** \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
--- a/bmad/bmm/workflows/testarch/framework/instructions.md
+++ b/bmad/bmm/workflows/testarch/framework/instructions.md
@@ -0,0 +1,455 @@
+<!-- Powered by BMAD-CORE™ -->
+
+# Test Framework Setup
+
+**Workflow ID**: `bmad/bmm/testarch/framework`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Initialize a production-ready test framework architecture (Playwright or Cypress) with fixtures, helpers, configuration, and best practices. This workflow scaffolds the complete testing infrastructure for modern web applications.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ `package.json` exists in project root
+- ✅ No modern E2E test harness is already configured (check for existing `playwright.config.*` or `cypress.config.*`)
+- ✅ Architectural/stack context available (project type, bundler, dependencies)
+
+---
+
+## Step 1: Run Preflight Checks
+
+### Actions
+
+1. **Validate package.json**
+   - Read `{project-root}/package.json`
+   - Extract project type (React, Vue, Angular, Next.js, Node, etc.)
+   - Identify bundler (Vite, Webpack, Rollup, esbuild)
+   - Note existing test dependencies
+
+2. **Check for Existing Framework**
+   - Search for `playwright.config.*`, `cypress.config.*`, `cypress.json`
+   - Check `package.json` for `@playwright/test` or `cypress` dependencies
+   - If found, HALT with message: "Existing test framework detected. Use workflow `upgrade-framework` instead."
+
+3. **Gather Context**
+   - Look for architecture documents (`architecture.md`, `tech-spec*.md`)
+   - Check for API documentation or endpoint lists
+   - Identify authentication requirements
+
+**Halt Condition:** If preflight checks fail, stop immediately and report which requirement failed.
+
+---
+
+## Step 2: Scaffold Framework
+
+### Actions
+
+1. **Framework Selection**
+
+   **Default Logic:**
+   - **Playwright** (recommended for):
+     - Large repositories (100+ files)
+     - Performance-critical applications
+     - Multi-browser support needed
+     - Complex user flows requiring video/trace debugging
+     - Projects requiring worker parallelism
+
+   - **Cypress** (recommended for):
+     - Small teams prioritizing developer experience
+     - Component testing focus
+     - Real-time reloading during test development
+     - Simpler setup requirements
+
+   **Detection Strategy:**
+   - Check `package.json` for existing preference
+   - Consider `project_size` variable from workflow config
+   - Use `framework_preference` variable if set
+   - Default to **Playwright** if uncertain
+
+2. **Create Directory Structure**
+
+   ```
+   {project-root}/
+   ├── tests/                        # Root test directory
+   │   ├── e2e/                      # Test files (users organize as needed)
+   │   ├── support/                  # Framework infrastructure (key pattern)
+   │   │   ├── fixtures/             # Test fixtures (data, mocks)
+   │   │   ├── helpers/              # Utility functions
+   │   │   └── page-objects/         # Page object models (optional)
+   │   └── README.md                 # Test suite documentation
+   ```
+
+   **Note**: Users organize test files (e2e/, api/, integration/, component/) as needed. The **support/** folder is the critical pattern for fixtures and helpers used across tests.
+
+3. **Generate Configuration File**
+
+   **For Playwright** (`playwright.config.ts` or `playwright.config.js`):
+
+   ```typescript
+   import { defineConfig, devices } from '@playwright/test';
+
+   export default defineConfig({
+     testDir: './tests/e2e',
+     fullyParallel: true,
+     forbidOnly: !!process.env.CI,
+     retries: process.env.CI ? 2 : 0,
+     workers: process.env.CI ? 1 : undefined,
+
+     timeout: 60 * 1000, // Test timeout: 60s
+     expect: {
+       timeout: 15 * 1000, // Assertion timeout: 15s
+     },
+
+     use: {
+       baseURL: process.env.BASE_URL || 'http://localhost:3000',
+       trace: 'retain-on-failure',
+       screenshot: 'only-on-failure',
+       video: 'retain-on-failure',
+       actionTimeout: 15 * 1000, // Action timeout: 15s
+       navigationTimeout: 30 * 1000, // Navigation timeout: 30s
+     },
+
+     reporter: [['html', { outputFolder: 'test-results/html' }], ['junit', { outputFile: 'test-results/junit.xml' }], ['list']],
+
+     projects: [
+       { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
+       { name: 'firefox', use: { ...devices['Desktop Firefox'] } },
+       { name: 'webkit', use: { ...devices['Desktop Safari'] } },
+     ],
+   });
+   ```
+
+   **For Cypress** (`cypress.config.ts` or `cypress.config.js`):
+
+   ```typescript
+   import { defineConfig } from 'cypress';
+
+   export default defineConfig({
+     e2e: {
+       baseUrl: process.env.BASE_URL || 'http://localhost:3000',
+       specPattern: 'tests/e2e/**/*.cy.{js,jsx,ts,tsx}',
+       supportFile: 'tests/support/e2e.ts',
+       video: false,
+       screenshotOnRunFailure: true,
+
+       setupNodeEvents(on, config) {
+         // implement node event listeners here
+       },
+     },
+
+     retries: {
+       runMode: 2,
+       openMode: 0,
+     },
+
+     defaultCommandTimeout: 15000,
+     requestTimeout: 30000,
+     responseTimeout: 30000,
+     pageLoadTimeout: 60000,
+   });
+   ```
+
+4. **Generate Environment Configuration**
+
+   Create `.env.example`:
+
+   ```bash
+   # Test Environment Configuration
+   TEST_ENV=local
+   BASE_URL=http://localhost:3000
+   API_URL=http://localhost:3001/api
+
+   # Authentication (if applicable)
+   TEST_USER_EMAIL=test@example.com
+   TEST_USER_PASSWORD=
+
+   # Feature Flags (if applicable)
+   FEATURE_FLAG_NEW_UI=true
+
+   # API Keys (if applicable)
+   TEST_API_KEY=
+   ```
+
+5. **Generate Node Version File**
+
+   Create `.nvmrc`:
+
+   ```
+   20.11.0
+   ```
+
+   (Use Node version from existing `.nvmrc` or default to current LTS)
+
+6. **Implement Fixture Architecture**
+
+   **Knowledge Base Reference**: `testarch/knowledge/fixture-architecture.md`
+
+   Create `tests/support/fixtures/index.ts`:
+
+   ```typescript
+   import { test as base } from '@playwright/test';
+   import { UserFactory } from './factories/user-factory';
+
+   type TestFixtures = {
+     userFactory: UserFactory;
+   };
+
+   export const test = base.extend<TestFixtures>({
+     userFactory: async ({}, use) => {
+       const factory = new UserFactory();
+       await use(factory);
+       await factory.cleanup(); // Auto-cleanup
+     },
+   });
+
+   export { expect } from '@playwright/test';
+   ```
+
+7. **Implement Data Factories**
+
+   **Knowledge Base Reference**: `testarch/knowledge/data-factories.md`
+
+   Create `tests/support/fixtures/factories/user-factory.ts`:
+
+   ```typescript
+   import { faker } from '@faker-js/faker';
+
+   export class UserFactory {
+     private createdUsers: string[] = [];
+
+     async createUser(overrides = {}) {
+       const user = {
+         email: faker.internet.email(),
+         name: faker.person.fullName(),
+         password: faker.internet.password({ length: 12 }),
+         ...overrides,
+       };
+
+       // API call to create user
+       const response = await fetch(`${process.env.API_URL}/users`, {
+         method: 'POST',
+         headers: { 'Content-Type': 'application/json' },
+         body: JSON.stringify(user),
+       });
+
+       const created = await response.json();
+       this.createdUsers.push(created.id);
+       return created;
+     }
+
+     async cleanup() {
+       // Delete all created users
+       for (const userId of this.createdUsers) {
+         await fetch(`${process.env.API_URL}/users/${userId}`, {
+           method: 'DELETE',
+         });
+       }
+       this.createdUsers = [];
+     }
+   }
+   ```
+
+8. **Generate Sample Tests**
+
+   Create `tests/e2e/example.spec.ts`:
+
+   ```typescript
+   import { test, expect } from '../support/fixtures';
+
+   test.describe('Example Test Suite', () => {
+     test('should load homepage', async ({ page }) => {
+       await page.goto('/');
+       await expect(page).toHaveTitle(/Home/i);
+     });
+
+     test('should create user and login', async ({ page, userFactory }) => {
+       // Create test user
+       const user = await userFactory.createUser();
+
+       // Login
+       await page.goto('/login');
+       await page.fill('[data-testid="email-input"]', user.email);
+       await page.fill('[data-testid="password-input"]', user.password);
+       await page.click('[data-testid="login-button"]');
+
+       // Assert login success
+       await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
+     });
+   });
+   ```
+
+9. **Update package.json Scripts**
+
+   Add minimal test script to `package.json`:
+
+   ```json
+   {
+     "scripts": {
+       "test:e2e": "playwright test"
+     }
+   }
+   ```
+
+   **Note**: Users can add additional scripts as needed (e.g., `--ui`, `--headed`, `--debug`, `show-report`).
+
+10. **Generate Documentation**
+
+    Create `tests/README.md` with setup instructions (see Step 3 deliverables).
+
+---
+
+## Step 3: Deliverables
+
+### Primary Artifacts Created
+
+1. **Configuration File**
+   - `playwright.config.ts` or `cypress.config.ts`
+   - Timeouts: action 15s, navigation 30s, test 60s
+   - Reporters: HTML + JUnit XML
+
+2. **Directory Structure**
+   - `tests/` with `e2e/`, `api/`, `support/` subdirectories
+   - `support/fixtures/` for test fixtures
+   - `support/helpers/` for utility functions
+
+3. **Environment Configuration**
+   - `.env.example` with `TEST_ENV`, `BASE_URL`, `API_URL`
+   - `.nvmrc` with Node version
+
+4. **Test Infrastructure**
+   - Fixture architecture (`mergeTests` pattern)
+   - Data factories (faker-based, with auto-cleanup)
+   - Sample tests demonstrating patterns
+
+5. **Documentation**
+   - `tests/README.md` with setup instructions
+   - Comments in config files explaining options
+
+### README Contents
+
+The generated `tests/README.md` should include:
+
+- **Setup Instructions**: How to install dependencies, configure environment
+- **Running Tests**: Commands for local execution, headed mode, debug mode
+- **Architecture Overview**: Fixture pattern, data factories, page objects
+- **Best Practices**: Selector strategy (data-testid), test isolation, cleanup
+- **CI Integration**: How tests run in CI/CD pipeline
+- **Knowledge Base References**: Links to relevant TEA knowledge fragments
+
+---
+
+## Important Notes
+
+### Knowledge Base Integration
+
+**Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to identify and load relevant knowledge fragments:
+
+- `fixture-architecture.md` - Pure function → fixture → `mergeTests` composition with auto-cleanup (406 lines, 5 examples)
+- `data-factories.md` - Faker-based factories with overrides, nested factories, API seeding, auto-cleanup (498 lines, 5 examples)
+- `network-first.md` - Network-first testing safeguards: intercept before navigate, HAR capture, deterministic waiting (489 lines, 5 examples)
+- `playwright-config.md` - Playwright-specific configuration: environment-based, timeout standards, artifact output, parallelization, project config (722 lines, 5 examples)
+- `test-quality.md` - Test design principles: deterministic, isolated with cleanup, explicit assertions, length/time limits (658 lines, 5 examples)
+
+### Framework-Specific Guidance
+
+**Playwright Advantages:**
+
+- Worker parallelism (significantly faster for large suites)
+- Trace viewer (powerful debugging with screenshots, network, console)
+- Multi-language support (TypeScript, JavaScript, Python, C#, Java)
+- Built-in API testing capabilities
+- Better handling of multiple browser contexts
+
+**Cypress Advantages:**
+
+- Superior developer experience (real-time reloading)
+- Excellent for component testing (Cypress CT or use Vitest)
+- Simpler setup for small teams
+- Better suited for watch mode during development
+
+**Avoid Cypress when:**
+
+- API chains are heavy and complex
+- Multi-tab/window scenarios are common
+- Worker parallelism is critical for CI performance
+
+### Selector Strategy
+
+**Always recommend**:
+
+- `data-testid` attributes for UI elements
+- `data-cy` attributes if Cypress is chosen
+- Avoid brittle CSS selectors or XPath
+
+### Contract Testing
+
+For microservices architectures, **recommend Pact** for consumer-driven contract testing alongside E2E tests.
+
+### Failure Artifacts
+
+Configure **failure-only** capture:
+
+- Screenshots: only on failure
+- Videos: retain on failure (delete on success)
+- Traces: retain on failure (Playwright)
+
+This reduces storage overhead while maintaining debugging capability.
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## Framework Scaffold Complete
+
+**Framework Selected**: Playwright (or Cypress)
+
+**Artifacts Created**:
+
+- ✅ Configuration file: `playwright.config.ts`
+- ✅ Directory structure: `tests/e2e/`, `tests/support/`
+- ✅ Environment config: `.env.example`
+- ✅ Node version: `.nvmrc`
+- ✅ Fixture architecture: `tests/support/fixtures/`
+- ✅ Data factories: `tests/support/fixtures/factories/`
+- ✅ Sample tests: `tests/e2e/example.spec.ts`
+- ✅ Documentation: `tests/README.md`
+
+**Next Steps**:
+
+1. Copy `.env.example` to `.env` and fill in environment variables
+2. Run `npm install` to install test dependencies
+3. Run `npm run test:e2e` to execute sample tests
+4. Review `tests/README.md` for detailed setup instructions
+
+**Knowledge Base References Applied**:
+
+- Fixture architecture pattern (pure functions + mergeTests)
+- Data factories with auto-cleanup (faker-based)
+- Network-first testing safeguards
+- Failure-only artifact capture
+```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] Configuration file created and valid
+- [ ] Directory structure exists
+- [ ] Environment configuration generated
+- [ ] Sample tests run successfully
+- [ ] Documentation complete and accurate
+- [ ] No errors or warnings during scaffold
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/bmad/bmm/workflows/testarch/framework/workflow.yaml
+++ b/bmad/bmm/workflows/testarch/framework/workflow.yaml
@@ -0,0 +1,53 @@
+# Test Architect workflow: framework
+name: testarch-framework
+description: "Initialize production-ready test framework architecture (Playwright or Cypress) with fixtures, helpers, and configuration"
+author: "BMad"
+
+# Critical variables from config
+config_source: "{project-root}/bmad/bmm/config.yaml"
+output_folder: "{config_source}:output_folder"
+user_name: "{config_source}:user_name"
+communication_language: "{config_source}:communication_language"
+document_output_language: "{config_source}:document_output_language"
+date: system-generated
+
+# Workflow components
+installed_path: "{project-root}/bmad/bmm/workflows/testarch/framework"
+instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+
+# Variables and inputs
+variables:
+  test_dir: "{project-root}/tests" # Root test directory
+  use_typescript: true # Prefer TypeScript configuration
+  framework_preference: "auto" # auto, playwright, cypress - user can override auto-detection
+  project_size: "auto" # auto, small, large - influences framework recommendation
+
+# Output configuration
+default_output_file: "{test_dir}/README.md" # Main deliverable is test setup README
+
+# Required tools
+required_tools:
+  - read_file # Read package.json, existing configs
+  - write_file # Create config files, helpers, fixtures, tests
+  - create_directory # Create test directory structure
+  - list_files # Check for existing framework
+  - search_repo # Find architecture docs
+
+# Recommended inputs
+recommended_inputs:
+  - package_json: "package.json with project dependencies and scripts"
+  - architecture_docs: "Architecture or tech stack documentation (optional)"
+  - existing_tests: "Existing test files to detect current framework (optional)"
+
+tags:
+  - qa
+  - setup
+  - test-architect
+  - framework
+  - initialization
+
+execution_hints:
+  interactive: false # Minimize prompts; auto-detect when possible
+  autonomous: true # Proceed without user input unless blocked
+  iterative: true
--- a/bmad/bmm/workflows/testarch/nfr-assess/README.md
+++ b/bmad/bmm/workflows/testarch/nfr-assess/README.md
@@ -0,0 +1,469 @@
+# Non-Functional Requirements Assessment Workflow
+
+**Workflow ID:** `testarch-nfr`
+**Agent:** Test Architect (TEA)
+**Command:** `bmad tea *nfr-assess`
+
+---
+
+## Overview
+
+The **nfr-assess** workflow performs a comprehensive assessment of non-functional requirements (NFRs) to validate that the implementation meets performance, security, reliability, and maintainability standards before release. It uses evidence-based validation with deterministic PASS/CONCERNS/FAIL rules and provides actionable recommendations for remediation.
+
+**Key Features:**
+
+- Assess multiple NFR categories (performance, security, reliability, maintainability, custom)
+- Validate NFRs against defined thresholds from tech specs, PRD, or defaults
+- Classify status deterministically (PASS/CONCERNS/FAIL) based on evidence
+- Never guess thresholds - mark as CONCERNS if unknown
+- Generate CI/CD-ready YAML snippets for quality gates
+- Provide quick wins and recommended actions for remediation
+- Create evidence checklists for gaps
+
+---
+
+## When to Use This Workflow
+
+Use `*nfr-assess` when you need to:
+
+- ✅ Validate non-functional requirements before release
+- ✅ Assess performance against defined thresholds
+- ✅ Verify security requirements are met
+- ✅ Validate reliability and error handling
+- ✅ Check maintainability standards (coverage, quality, documentation)
+- ✅ Generate NFR assessment reports for stakeholders
+- ✅ Create gate-ready metrics for CI/CD pipelines
+
+**Typical Timing:**
+
+- Before release (validate all NFRs)
+- Before PR merge (validate critical NFRs)
+- During sprint retrospectives (assess maintainability)
+- After performance testing (validate performance NFRs)
+- After security audit (validate security NFRs)
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Implementation deployed locally or accessible for evaluation
+- Evidence sources available (test results, metrics, logs, CI results)
+
+**Recommended:**
+
+- NFR requirements defined in tech-spec.md, PRD.md, or story
+- Test results from performance, security, reliability tests
+- Application metrics (response times, error rates, throughput)
+- CI/CD pipeline results for burn-in validation
+
+**Halt Conditions:**
+
+- NFR targets are undefined and cannot be obtained → Halt and request definition
+- Implementation is not accessible for evaluation → Halt and request deployment
+
+---
+
+## Usage
+
+### Basic Usage (BMad Mode)
+
+```bash
+bmad tea *nfr-assess
+```
+
+The workflow will:
+
+1. Read tech-spec.md for NFR requirements
+2. Gather evidence from test results, metrics, logs
+3. Assess each NFR category against thresholds
+4. Generate NFR assessment report
+5. Save to `bmad/output/nfr-assessment.md`
+
+### Standalone Mode (No Tech Spec)
+
+```bash
+bmad tea *nfr-assess --feature-name "User Authentication"
+```
+
+### Custom Configuration
+
+```bash
+bmad tea *nfr-assess \
+  --assess-performance true \
+  --assess-security true \
+  --assess-reliability true \
+  --assess-maintainability true \
+  --performance-response-time-ms 500 \
+  --security-score-min 85
+```
+
+---
+
+## Workflow Steps
+
+1. **Load Context** - Read tech spec, PRD, knowledge base fragments
+2. **Identify NFRs** - Determine categories and thresholds
+3. **Gather Evidence** - Read test results, metrics, logs, CI results
+4. **Assess NFRs** - Apply deterministic PASS/CONCERNS/FAIL rules
+5. **Identify Actions** - Quick wins, recommended actions, monitoring hooks
+6. **Generate Deliverables** - NFR assessment report, gate YAML, evidence checklist
+
+---
+
+## Outputs
+
+### NFR Assessment Report (`nfr-assessment.md`)
+
+Comprehensive markdown file with:
+
+- Executive summary (overall status, critical issues)
+- Assessment by category (performance, security, reliability, maintainability)
+- Evidence for each NFR (test results, metrics, thresholds)
+- Status classification (PASS/CONCERNS/FAIL)
+- Quick wins section
+- Recommended actions section
+- Evidence gaps checklist
+
+### Gate YAML Snippet (Optional)
+
+```yaml
+nfr_assessment:
+  date: '2025-10-14'
+  categories:
+    performance: 'PASS'
+    security: 'CONCERNS'
+    reliability: 'PASS'
+    maintainability: 'PASS'
+  overall_status: 'CONCERNS'
+  critical_issues: 0
+  high_priority_issues: 1
+  concerns: 1
+  blockers: false
+```
+
+### Evidence Checklist (Optional)
+
+- List of NFRs with missing or incomplete evidence
+- Owners for evidence collection
+- Suggested evidence sources
+- Deadlines for evidence collection
+
+---
+
+## NFR Categories
+
+### Performance
+
+**Criteria:** Response time, throughput, resource usage, scalability
+**Thresholds (Default):**
+
+- Response time p95: 500ms
+- Throughput: 100 RPS
+- CPU usage: < 70%
+- Memory usage: < 80%
+
+**Evidence Sources:** Load test results, APM data, Lighthouse reports, Playwright traces
+
+---
+
+### Security
+
+**Criteria:** Authentication, authorization, data protection, vulnerability management
+**Thresholds (Default):**
+
+- Security score: >= 85/100
+- Critical vulnerabilities: 0
+- High vulnerabilities: < 3
+- MFA enabled
+
+**Evidence Sources:** SAST results, DAST results, dependency scanning, pentest reports
+
+---
+
+### Reliability
+
+**Criteria:** Availability, error handling, fault tolerance, disaster recovery
+**Thresholds (Default):**
+
+- Uptime: >= 99.9%
+- Error rate: < 0.1%
+- MTTR: < 15 minutes
+- CI burn-in: 100 consecutive runs
+
+**Evidence Sources:** Uptime monitoring, error logs, CI burn-in results, chaos tests
+
+---
+
+### Maintainability
+
+**Criteria:** Code quality, test coverage, documentation, technical debt
+**Thresholds (Default):**
+
+- Test coverage: >= 80%
+- Code quality: >= 85/100
+- Technical debt: < 5%
+- Documentation: >= 90%
+
+**Evidence Sources:** Coverage reports, static analysis, documentation audit, test review
+
+---
+
+## Assessment Rules
+
+### PASS ✅
+
+- Evidence exists AND meets or exceeds threshold
+- No concerns flagged in evidence
+- Quality is acceptable
+
+### CONCERNS ⚠️
+
+- Threshold is UNKNOWN (not defined)
+- Evidence is MISSING or INCOMPLETE
+- Evidence is close to threshold (within 10%)
+- Evidence shows intermittent issues
+
+### FAIL ❌
+
+- Evidence exists BUT does not meet threshold
+- Critical evidence is MISSING
+- Evidence shows consistent failures
+- Quality is unacceptable
+
+---
+
+## Configuration
+
+### workflow.yaml Variables
+
+```yaml
+variables:
+  # NFR categories to assess
+  assess_performance: true
+  assess_security: true
+  assess_reliability: true
+  assess_maintainability: true
+
+  # Custom NFR categories
+  custom_nfr_categories: '' # e.g., "accessibility,compliance"
+
+  # Evidence sources
+  test_results_dir: '{project-root}/test-results'
+  metrics_dir: '{project-root}/metrics'
+  logs_dir: '{project-root}/logs'
+  include_ci_results: true
+
+  # Thresholds
+  performance_response_time_ms: 500
+  performance_throughput_rps: 100
+  security_score_min: 85
+  reliability_uptime_pct: 99.9
+  maintainability_coverage_pct: 80
+
+  # Assessment configuration
+  use_deterministic_rules: true
+  never_guess_thresholds: true
+  require_evidence: true
+  suggest_monitoring: true
+
+  # Output configuration
+  output_file: '{output_folder}/nfr-assessment.md'
+  generate_gate_yaml: true
+  generate_evidence_checklist: true
+```
+
+---
+
+## Knowledge Base Integration
+
+This workflow automatically loads relevant knowledge fragments:
+
+- `nfr-criteria.md` - Non-functional requirements criteria
+- `ci-burn-in.md` - CI/CD burn-in patterns for reliability
+- `test-quality.md` - Test quality expectations (maintainability)
+- `playwright-config.md` - Performance configuration patterns
+
+---
+
+## Examples
+
+### Example 1: Full NFR Assessment Before Release
+
+```bash
+bmad tea *nfr-assess
+```
+
+**Output:**
+
+```markdown
+# NFR Assessment - Story 1.3
+
+**Overall Status:** PASS ✅ (No blockers)
+
+## Performance Assessment
+
+- Response Time p95: PASS ✅ (320ms < 500ms threshold)
+- Throughput: PASS ✅ (250 RPS > 100 RPS threshold)
+
+## Security Assessment
+
+- Authentication: PASS ✅ (MFA enforced)
+- Data Protection: PASS ✅ (AES-256 + TLS 1.3)
+
+## Reliability Assessment
+
+- Uptime: PASS ✅ (99.95% > 99.9% threshold)
+- Error Rate: PASS ✅ (0.05% < 0.1% threshold)
+
+## Maintainability Assessment
+
+- Test Coverage: PASS ✅ (87% > 80% threshold)
+- Code Quality: PASS ✅ (92/100 > 85/100 threshold)
+
+Gate Status: PASS ✅ - Ready for release
+```
+
+### Example 2: NFR Assessment with Concerns
+
+```bash
+bmad tea *nfr-assess --feature-name "User Authentication"
+```
+
+**Output:**
+
+```markdown
+# NFR Assessment - User Authentication
+
+**Overall Status:** CONCERNS ⚠️ (1 HIGH issue)
+
+## Security Assessment
+
+### Authentication Strength
+
+- **Status:** CONCERNS ⚠️
+- **Threshold:** MFA enabled for all users
+- **Actual:** MFA optional (not enforced)
+- **Evidence:** Security audit (security-audit-2025-10-14.md)
+- **Recommendation:** HIGH - Enforce MFA for all new accounts
+
+## Quick Wins
+
+1. **Enforce MFA (Security)** - HIGH - 4 hours
+   - Add configuration flag to enforce MFA
+   - No code changes needed
+
+Gate Status: CONCERNS ⚠️ - Address HIGH priority issues before release
+```
+
+### Example 3: Performance-Only Assessment
+
+```bash
+bmad tea *nfr-assess \
+  --assess-performance true \
+  --assess-security false \
+  --assess-reliability false \
+  --assess-maintainability false
+```
+
+---
+
+## Troubleshooting
+
+### "NFR thresholds not defined"
+
+- Check tech-spec.md for NFR requirements
+- Check PRD.md for product-level SLAs
+- Check story file for feature-specific requirements
+- If thresholds truly unknown, mark as CONCERNS and recommend defining them
+
+### "No evidence found"
+
+- Check evidence directories (test-results, metrics, logs)
+- Check CI/CD pipeline for test results
+- If evidence truly missing, mark NFR as "NO EVIDENCE" and recommend generating it
+
+### "CONCERNS status but no threshold exceeded"
+
+- CONCERNS is correct when threshold is UNKNOWN or evidence is MISSING/INCOMPLETE
+- CONCERNS is also correct when evidence is close to threshold (within 10%)
+- Document why CONCERNS was assigned in assessment report
+
+### "FAIL status blocks release"
+
+- This is intentional - FAIL means critical NFR not met
+- Recommend remediation actions with specific steps
+- Re-run assessment after remediation
+
+---
+
+## Integration with Other Workflows
+
+- **testarch-test-design** → `*nfr-assess` - Define NFR requirements, then assess
+- **testarch-framework** → `*nfr-assess` - Set up frameworks, then validate NFRs
+- **testarch-ci** → `*nfr-assess` - Configure CI, then assess reliability with burn-in
+- `*nfr-assess` → **testarch-trace (Phase 2)** - Assess NFRs, then apply quality gates
+- `*nfr-assess` → **testarch-test-review** - Assess maintainability, then review tests
+
+---
+
+## Best Practices
+
+1. **Never Guess Thresholds**
+   - If threshold is unknown, mark as CONCERNS
+   - Recommend defining threshold in tech-spec.md
+   - Don't infer thresholds from similar features
+
+2. **Evidence-Based Assessment**
+   - Every assessment must be backed by evidence
+   - Mark NFRs without evidence as "NO EVIDENCE"
+   - Don't assume or infer - require explicit evidence
+
+3. **Deterministic Rules**
+   - Apply PASS/CONCERNS/FAIL consistently
+   - Document reasoning for each classification
+   - Use same rules across all NFR categories
+
+4. **Actionable Recommendations**
+   - Provide specific steps, not generic advice
+   - Include priority, effort estimate, owner suggestion
+   - Focus on quick wins first
+
+5. **Gate Integration**
+   - Enable `generate_gate_yaml` for CI/CD integration
+   - Use YAML snippets in pipeline quality gates
+   - Export metrics for dashboard visualization
+
+---
+
+## Quality Gates
+
+| Status      | Criteria                     | Action                      |
+| ----------- | ---------------------------- | --------------------------- |
+| PASS ✅     | All NFRs have PASS status    | Ready for release           |
+| CONCERNS ⚠️ | Any NFR has CONCERNS status  | Address before next release |
+| FAIL ❌     | Critical NFR has FAIL status | Do not release - BLOCKER    |
+
+---
+
+## Related Commands
+
+- `bmad tea *test-design` - Define NFR requirements and test plan
+- `bmad tea *framework` - Set up performance/security testing frameworks
+- `bmad tea *ci` - Configure CI/CD for NFR validation
+- `bmad tea *trace` (Phase 2) - Apply quality gates using NFR assessment metrics
+- `bmad tea *test-review` - Review test quality (maintainability NFR)
+
+---
+
+## Resources
+
+- [Instructions](./instructions.md) - Detailed workflow steps
+- [Checklist](./checklist.md) - Validation checklist
+- [Template](./nfr-report-template.md) - NFR assessment report template
+- [Knowledge Base](../../testarch/knowledge/) - NFR criteria and best practices
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/bmad/bmm/workflows/testarch/nfr-assess/checklist.md
+++ b/bmad/bmm/workflows/testarch/nfr-assess/checklist.md
@@ -0,0 +1,405 @@
+# Non-Functional Requirements Assessment - Validation Checklist
+
+**Workflow:** `testarch-nfr`
+**Purpose:** Ensure comprehensive and evidence-based NFR assessment with actionable recommendations
+
+---
+
+## Prerequisites Validation
+
+- [ ] Implementation is deployed and accessible for evaluation
+- [ ] Evidence sources are available (test results, metrics, logs, CI results)
+- [ ] NFR categories are determined (performance, security, reliability, maintainability, custom)
+- [ ] Evidence directories exist and are accessible (`test_results_dir`, `metrics_dir`, `logs_dir`)
+- [ ] Knowledge base is loaded (nfr-criteria, ci-burn-in, test-quality)
+
+---
+
+## Context Loading
+
+- [ ] Tech-spec.md loaded successfully (if available)
+- [ ] PRD.md loaded (if available)
+- [ ] Story file loaded (if applicable)
+- [ ] Relevant knowledge fragments loaded from `tea-index.csv`:
+  - [ ] `nfr-criteria.md`
+  - [ ] `ci-burn-in.md`
+  - [ ] `test-quality.md`
+  - [ ] `playwright-config.md` (if using Playwright)
+
+---
+
+## NFR Categories and Thresholds
+
+### Performance
+
+- [ ] Response time threshold defined or marked as UNKNOWN
+- [ ] Throughput threshold defined or marked as UNKNOWN
+- [ ] Resource usage thresholds defined or marked as UNKNOWN
+- [ ] Scalability requirements defined or marked as UNKNOWN
+
+### Security
+
+- [ ] Authentication requirements defined or marked as UNKNOWN
+- [ ] Authorization requirements defined or marked as UNKNOWN
+- [ ] Data protection requirements defined or marked as UNKNOWN
+- [ ] Vulnerability management thresholds defined or marked as UNKNOWN
+- [ ] Compliance requirements identified (GDPR, HIPAA, PCI-DSS, etc.)
+
+### Reliability
+
+- [ ] Availability (uptime) threshold defined or marked as UNKNOWN
+- [ ] Error rate threshold defined or marked as UNKNOWN
+- [ ] MTTR (Mean Time To Recovery) threshold defined or marked as UNKNOWN
+- [ ] Fault tolerance requirements defined or marked as UNKNOWN
+- [ ] Disaster recovery requirements defined (RTO, RPO) or marked as UNKNOWN
+
+### Maintainability
+
+- [ ] Test coverage threshold defined or marked as UNKNOWN
+- [ ] Code quality threshold defined or marked as UNKNOWN
+- [ ] Technical debt threshold defined or marked as UNKNOWN
+- [ ] Documentation completeness threshold defined or marked as UNKNOWN
+
+### Custom NFR Categories (if applicable)
+
+- [ ] Custom NFR category 1: Thresholds defined or marked as UNKNOWN
+- [ ] Custom NFR category 2: Thresholds defined or marked as UNKNOWN
+- [ ] Custom NFR category 3: Thresholds defined or marked as UNKNOWN
+
+---
+
+## Evidence Gathering
+
+### Performance Evidence
+
+- [ ] Load test results collected (JMeter, k6, Gatling, etc.)
+- [ ] Application metrics collected (response times, throughput, resource usage)
+- [ ] APM data collected (New Relic, Datadog, Dynatrace, etc.)
+- [ ] Lighthouse reports collected (if web app)
+- [ ] Playwright performance traces collected (if applicable)
+
+### Security Evidence
+
+- [ ] SAST results collected (SonarQube, Checkmarx, Veracode, etc.)
+- [ ] DAST results collected (OWASP ZAP, Burp Suite, etc.)
+- [ ] Dependency scanning results collected (Snyk, Dependabot, npm audit)
+- [ ] Penetration test reports collected (if available)
+- [ ] Security audit logs collected
+- [ ] Compliance audit results collected (if applicable)
+
+### Reliability Evidence
+
+- [ ] Uptime monitoring data collected (Pingdom, UptimeRobot, StatusCake)
+- [ ] Error logs collected
+- [ ] Error rate metrics collected
+- [ ] CI burn-in results collected (stability over time)
+- [ ] Chaos engineering test results collected (if available)
+- [ ] Failover/recovery test results collected (if available)
+- [ ] Incident reports and postmortems collected (if applicable)
+
+### Maintainability Evidence
+
+- [ ] Code coverage reports collected (Istanbul, NYC, c8, JaCoCo)
+- [ ] Static analysis results collected (ESLint, SonarQube, CodeClimate)
+- [ ] Technical debt metrics collected
+- [ ] Documentation audit results collected
+- [ ] Test review report collected (from test-review workflow, if available)
+- [ ] Git metrics collected (code churn, commit frequency, etc.)
+
+---
+
+## NFR Assessment with Deterministic Rules
+
+### Performance Assessment
+
+- [ ] Response time assessed against threshold
+- [ ] Throughput assessed against threshold
+- [ ] Resource usage assessed against threshold
+- [ ] Scalability assessed against requirements
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, metric name)
+
+### Security Assessment
+
+- [ ] Authentication strength assessed against requirements
+- [ ] Authorization controls assessed against requirements
+- [ ] Data protection assessed against requirements
+- [ ] Vulnerability management assessed against thresholds
+- [ ] Compliance assessed against requirements
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, scan result)
+
+### Reliability Assessment
+
+- [ ] Availability (uptime) assessed against threshold
+- [ ] Error rate assessed against threshold
+- [ ] MTTR assessed against threshold
+- [ ] Fault tolerance assessed against requirements
+- [ ] Disaster recovery assessed against requirements (RTO, RPO)
+- [ ] CI burn-in assessed (stability over time)
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, monitoring data)
+
+### Maintainability Assessment
+
+- [ ] Test coverage assessed against threshold
+- [ ] Code quality assessed against threshold
+- [ ] Technical debt assessed against threshold
+- [ ] Documentation completeness assessed against threshold
+- [ ] Test quality assessed (from test-review, if available)
+- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
+- [ ] Evidence source documented (file path, coverage report)
+
+### Custom NFR Assessment (if applicable)
+
+- [ ] Custom NFR 1 assessed against threshold with justification
+- [ ] Custom NFR 2 assessed against threshold with justification
+- [ ] Custom NFR 3 assessed against threshold with justification
+
+---
+
+## Status Classification Validation
+
+### PASS Criteria Verified
+
+- [ ] Evidence exists for PASS status
+- [ ] Evidence meets or exceeds threshold
+- [ ] No concerns flagged in evidence
+- [ ] Quality is acceptable
+
+### CONCERNS Criteria Verified
+
+- [ ] Threshold is UNKNOWN (documented) OR
+- [ ] Evidence is MISSING or INCOMPLETE (documented) OR
+- [ ] Evidence is close to threshold (within 10%, documented) OR
+- [ ] Evidence shows intermittent issues (documented)
+
+### FAIL Criteria Verified
+
+- [ ] Evidence exists BUT does not meet threshold (documented) OR
+- [ ] Critical evidence is MISSING (documented) OR
+- [ ] Evidence shows consistent failures (documented) OR
+- [ ] Quality is unacceptable (documented)
+
+### No Threshold Guessing
+
+- [ ] All thresholds are either defined or marked as UNKNOWN
+- [ ] No thresholds were guessed or inferred
+- [ ] All UNKNOWN thresholds result in CONCERNS status
+
+---
+
+## Quick Wins and Recommended Actions
+
+### Quick Wins Identified
+
+- [ ] Low-effort, high-impact improvements identified for CONCERNS/FAIL
+- [ ] Configuration changes (no code changes) identified
+- [ ] Optimization opportunities identified (caching, indexing, compression)
+- [ ] Monitoring additions identified (detect issues before failures)
+
+### Recommended Actions
+
+- [ ] Specific remediation steps provided (not generic advice)
+- [ ] Priority assigned (CRITICAL, HIGH, MEDIUM, LOW)
+- [ ] Estimated effort provided (hours, days)
+- [ ] Owner suggestions provided (dev, ops, security)
+
+### Monitoring Hooks
+
+- [ ] Performance monitoring suggested (APM, synthetic monitoring)
+- [ ] Error tracking suggested (Sentry, Rollbar, error logs)
+- [ ] Security monitoring suggested (intrusion detection, audit logs)
+- [ ] Alerting thresholds suggested (notify before breach)
+
+### Fail-Fast Mechanisms
+
+- [ ] Circuit breakers suggested for reliability
+- [ ] Rate limiting suggested for performance
+- [ ] Validation gates suggested for security
+- [ ] Smoke tests suggested for maintainability
+
+---
+
+## Deliverables Generated
+
+### NFR Assessment Report
+
+- [ ] File created at `{output_folder}/nfr-assessment.md`
+- [ ] Template from `nfr-report-template.md` used
+- [ ] Executive summary included (overall status, critical issues)
+- [ ] Assessment by category included (performance, security, reliability, maintainability)
+- [ ] Evidence for each NFR documented
+- [ ] Status classifications documented (PASS/CONCERNS/FAIL)
+- [ ] Findings summary included (PASS count, CONCERNS count, FAIL count)
+- [ ] Quick wins section included
+- [ ] Recommended actions section included
+- [ ] Evidence gaps checklist included
+
+### Gate YAML Snippet (if enabled)
+
+- [ ] YAML snippet generated
+- [ ] Date included
+- [ ] Categories status included (performance, security, reliability, maintainability)
+- [ ] Overall status included (PASS/CONCERNS/FAIL)
+- [ ] Issue counts included (critical, high, medium, concerns)
+- [ ] Blockers flag included (true/false)
+- [ ] Recommendations included
+
+### Evidence Checklist (if enabled)
+
+- [ ] All NFRs with MISSING or INCOMPLETE evidence listed
+- [ ] Owners assigned for evidence collection
+- [ ] Suggested evidence sources provided
+- [ ] Deadlines set for evidence collection
+
+### Updated Story File (if enabled and requested)
+
+- [ ] "NFR Assessment" section added to story markdown
+- [ ] Link to NFR assessment report included
+- [ ] Overall status and critical issues included
+- [ ] Gate status included
+
+---
+
+## Quality Assurance
+
+### Accuracy Checks
+
+- [ ] All NFR categories assessed (none skipped)
+- [ ] All thresholds documented (defined or UNKNOWN)
+- [ ] All evidence sources documented (file paths, metric names)
+- [ ] Status classifications are deterministic and consistent
+- [ ] No false positives (status correctly assigned)
+- [ ] No false negatives (all issues identified)
+
+### Completeness Checks
+
+- [ ] All NFR categories covered (performance, security, reliability, maintainability, custom)
+- [ ] All evidence sources checked (test results, metrics, logs, CI results)
+- [ ] All status types used appropriately (PASS, CONCERNS, FAIL)
+- [ ] All NFRs with CONCERNS/FAIL have recommendations
+- [ ] All evidence gaps have owners and deadlines
+
+### Actionability Checks
+
+- [ ] Recommendations are specific (not generic)
+- [ ] Remediation steps are clear and actionable
+- [ ] Priorities are assigned (CRITICAL, HIGH, MEDIUM, LOW)
+- [ ] Effort estimates are provided (hours, days)
+- [ ] Owners are suggested (dev, ops, security)
+
+---
+
+## Integration with BMad Artifacts
+
+### With tech-spec.md
+
+- [ ] Tech spec loaded for NFR requirements and thresholds
+- [ ] Performance targets extracted
+- [ ] Security requirements extracted
+- [ ] Reliability SLAs extracted
+- [ ] Architectural decisions considered
+
+### With test-design.md
+
+- [ ] Test design loaded for NFR test plan
+- [ ] Test priorities referenced (P0/P1/P2/P3)
+- [ ] Assessment aligned with planned NFR validation
+
+### With PRD.md
+
+- [ ] PRD loaded for product-level NFR context
+- [ ] User experience goals considered
+- [ ] Unstated requirements checked
+- [ ] Product-level SLAs referenced
+
+---
+
+## Quality Gates Validation
+
+### Release Blocker (FAIL)
+
+- [ ] Critical NFR status checked (security, reliability)
+- [ ] Performance failures assessed for user impact
+- [ ] Release blocker flagged if critical NFR has FAIL status
+
+### PR Blocker (HIGH CONCERNS)
+
+- [ ] High-priority NFR status checked
+- [ ] Multiple CONCERNS assessed
+- [ ] PR blocker flagged if HIGH priority issues exist
+
+### Warning (CONCERNS)
+
+- [ ] Any NFR with CONCERNS status flagged
+- [ ] Missing or incomplete evidence documented
+- [ ] Warning issued to address before next release
+
+### Pass (PASS)
+
+- [ ] All NFRs have PASS status
+- [ ] No blockers or concerns exist
+- [ ] Ready for release confirmed
+
+---
+
+## Non-Prescriptive Validation
+
+- [ ] NFR categories adapted to team needs
+- [ ] Thresholds appropriate for project context
+- [ ] Assessment criteria customized as needed
+- [ ] Teams can extend with custom NFR categories
+- [ ] Integration with external tools supported (New Relic, Datadog, SonarQube, JIRA)
+
+---
+
+## Documentation and Communication
+
+- [ ] NFR assessment report is readable and well-formatted
+- [ ] Tables render correctly in markdown
+- [ ] Code blocks have proper syntax highlighting
+- [ ] Links are valid and accessible
+- [ ] Recommendations are clear and prioritized
+- [ ] Overall status is prominent and unambiguous
+- [ ] Executive summary provides quick understanding
+
+---
+
+## Final Validation
+
+- [ ] All prerequisites met
+- [ ] All NFR categories assessed with evidence (or gaps documented)
+- [ ] No thresholds were guessed (all defined or UNKNOWN)
+- [ ] Status classifications are deterministic and justified
+- [ ] Quick wins identified for all CONCERNS/FAIL
+- [ ] Recommended actions are specific and actionable
+- [ ] Evidence gaps documented with owners and deadlines
+- [ ] NFR assessment report generated and saved
+- [ ] Gate YAML snippet generated (if enabled)
+- [ ] Evidence checklist generated (if enabled)
+- [ ] Workflow completed successfully
+
+---
+
+## Sign-Off
+
+**NFR Assessment Status:**
+
+- [ ] ✅ PASS - All NFRs meet requirements, ready for release
+- [ ] ⚠️ CONCERNS - Some NFRs have concerns, address before next release
+- [ ] ❌ FAIL - Critical NFRs not met, BLOCKER for release
+
+**Next Actions:**
+
+- If PASS ✅: Proceed to `*gate` workflow or release
+- If CONCERNS ⚠️: Address HIGH/CRITICAL issues, re-run `*nfr-assess`
+- If FAIL ❌: Resolve FAIL status NFRs, re-run `*nfr-assess`
+
+**Critical Issues:** {COUNT}
+**High Priority Issues:** {COUNT}
+**Concerns:** {COUNT}
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/bmad/bmm/workflows/testarch/nfr-assess/instructions.md
+++ b/bmad/bmm/workflows/testarch/nfr-assess/instructions.md
@@ -0,0 +1,722 @@
+# Non-Functional Requirements Assessment - Instructions v4.0
+
+**Workflow:** `testarch-nfr`
+**Purpose:** Assess non-functional requirements (performance, security, reliability, maintainability) before release with evidence-based validation
+**Agent:** Test Architect (TEA)
+**Format:** Pure Markdown v4.0 (no XML blocks)
+
+---
+
+## Overview
+
+This workflow performs a comprehensive assessment of non-functional requirements (NFRs) to validate that the implementation meets performance, security, reliability, and maintainability standards before release. It uses evidence-based validation with deterministic PASS/CONCERNS/FAIL rules and provides actionable recommendations for remediation.
+
+**Key Capabilities:**
+
+- Assess multiple NFR categories (performance, security, reliability, maintainability, custom)
+- Validate NFRs against defined thresholds from tech specs, PRD, or defaults
+- Classify status deterministically (PASS/CONCERNS/FAIL) based on evidence
+- Never guess thresholds - mark as CONCERNS if unknown
+- Generate gate-ready YAML snippets for CI/CD integration
+- Provide quick wins and recommended actions for remediation
+- Create evidence checklists for gaps
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Implementation deployed locally or accessible for evaluation
+- Evidence sources available (test results, metrics, logs, CI results)
+
+**Recommended:**
+
+- NFR requirements defined in tech-spec.md, PRD.md, or story
+- Test results from performance, security, reliability tests
+- Application metrics (response times, error rates, throughput)
+- CI/CD pipeline results for burn-in validation
+
+**Halt Conditions:**
+
+- If NFR targets are undefined and cannot be obtained, halt and request definition
+- If implementation is not accessible for evaluation, halt and request deployment
+
+---
+
+## Workflow Steps
+
+### Step 1: Load Context and Knowledge Base
+
+**Actions:**
+
+1. Load relevant knowledge fragments from `{project-root}/bmad/bmm/testarch/tea-index.csv`:
+   - `nfr-criteria.md` - Non-functional requirements criteria and thresholds (security, performance, reliability, maintainability with code examples, 658 lines, 4 examples)
+   - `ci-burn-in.md` - CI/CD burn-in patterns for reliability validation (10-iteration detection, sharding, selective execution, 678 lines, 4 examples)
+   - `test-quality.md` - Test quality expectations for maintainability (deterministic, isolated, explicit assertions, length/time limits, 658 lines, 5 examples)
+   - `playwright-config.md` - Performance configuration patterns: parallelization, timeout standards, artifact output (722 lines, 5 examples)
+   - `error-handling.md` - Reliability validation patterns: scoped exceptions, retry validation, telemetry logging, graceful degradation (736 lines, 4 examples)
+
+2. Read story file (if provided):
+   - Extract NFR requirements
+   - Identify specific thresholds or SLAs
+   - Note any custom NFR categories
+
+3. Read related BMad artifacts (if available):
+   - `tech-spec.md` - Technical NFR requirements and targets
+   - `PRD.md` - Product-level NFR context (user expectations)
+   - `test-design.md` - NFR test plan and priorities
+
+**Output:** Complete understanding of NFR targets, evidence sources, and validation criteria
+
+---
+
+### Step 2: Identify NFR Categories and Thresholds
+
+**Actions:**
+
+1. Determine which NFR categories to assess (default: performance, security, reliability, maintainability):
+   - **Performance**: Response time, throughput, resource usage
+   - **Security**: Authentication, authorization, data protection, vulnerability scanning
+   - **Reliability**: Error handling, recovery, availability, fault tolerance
+   - **Maintainability**: Code quality, test coverage, documentation, technical debt
+
+2. Add custom NFR categories if specified (e.g., accessibility, internationalization, compliance)
+
+3. Gather thresholds for each NFR:
+   - From tech-spec.md (primary source)
+   - From PRD.md (product-level SLAs)
+   - From story file (feature-specific requirements)
+   - From workflow variables (default thresholds)
+   - Mark thresholds as UNKNOWN if not defined
+
+4. Never guess thresholds - if a threshold is unknown, mark the NFR as CONCERNS
+
+**Output:** Complete list of NFRs to assess with defined (or UNKNOWN) thresholds
+
+---
+
+### Step 3: Gather Evidence
+
+**Actions:**
+
+1. For each NFR category, discover evidence sources:
+
+   **Performance Evidence:**
+   - Load test results (JMeter, k6, Lighthouse)
+   - Application metrics (response times, throughput, resource usage)
+   - Performance monitoring data (New Relic, Datadog, APM)
+   - Playwright performance traces (if applicable)
+
+   **Security Evidence:**
+   - Security scan results (SAST, DAST, dependency scanning)
+   - Authentication/authorization test results
+   - Penetration test reports
+   - Vulnerability assessment reports
+   - Compliance audit results
+
+   **Reliability Evidence:**
+   - Error logs and error rates
+   - Uptime monitoring data
+   - Chaos engineering test results
+   - Failover/recovery test results
+   - CI burn-in results (stability over time)
+
+   **Maintainability Evidence:**
+   - Code coverage reports (Istanbul, NYC, c8)
+   - Static analysis results (ESLint, SonarQube)
+   - Technical debt metrics
+   - Documentation completeness
+   - Test quality assessment (from test-review workflow)
+
+2. Read relevant files from evidence directories:
+   - `{test_results_dir}` for test execution results
+   - `{metrics_dir}` for application metrics
+   - `{logs_dir}` for application logs
+   - CI/CD pipeline results (if `include_ci_results` is true)
+
+3. Mark NFRs without evidence as "NO EVIDENCE" - never infer or assume
+
+**Output:** Comprehensive evidence inventory for each NFR
+
+---
+
+### Step 4: Assess NFRs with Deterministic Rules
+
+**Actions:**
+
+1. For each NFR, apply deterministic PASS/CONCERNS/FAIL rules:
+
+   **PASS Criteria:**
+   - Evidence exists AND meets defined threshold
+   - No concerns flagged in evidence
+   - Example: Response time is 350ms (threshold: 500ms) → PASS
+
+   **CONCERNS Criteria:**
+   - Threshold is UNKNOWN (not defined)
+   - Evidence is MISSING or INCOMPLETE
+   - Evidence is close to threshold (within 10%)
+   - Evidence shows intermittent issues
+   - Example: Response time is 480ms (threshold: 500ms, 96% of threshold) → CONCERNS
+
+   **FAIL Criteria:**
+   - Evidence exists BUT does not meet threshold
+   - Critical evidence is MISSING
+   - Evidence shows consistent failures
+   - Example: Response time is 750ms (threshold: 500ms) → FAIL
+
+2. Document findings for each NFR:
+   - Status (PASS/CONCERNS/FAIL)
+   - Evidence source (file path, test name, metric name)
+   - Actual value vs threshold
+   - Justification for status classification
+
+3. Classify severity based on category:
+   - **CRITICAL**: Security failures, reliability failures (affect users immediately)
+   - **HIGH**: Performance failures, maintainability failures (affect users soon)
+   - **MEDIUM**: Concerns without failures (may affect users eventually)
+   - **LOW**: Missing evidence for non-critical NFRs
+
+**Output:** Complete NFR assessment with deterministic status classifications
+
+---
+
+### Step 5: Identify Quick Wins and Recommended Actions
+
+**Actions:**
+
+1. For each NFR with CONCERNS or FAIL status, identify quick wins:
+   - Low-effort, high-impact improvements
+   - Configuration changes (no code changes needed)
+   - Optimization opportunities (caching, indexing, compression)
+   - Monitoring additions (detect issues before they become failures)
+
+2. Provide recommended actions for each issue:
+   - Specific steps to remediate (not generic advice)
+   - Priority (CRITICAL, HIGH, MEDIUM, LOW)
+   - Estimated effort (hours, days)
+   - Owner suggestion (dev, ops, security)
+
+3. Suggest monitoring hooks for gaps:
+   - Add performance monitoring (APM, synthetic monitoring)
+   - Add error tracking (Sentry, Rollbar, error logs)
+   - Add security monitoring (intrusion detection, audit logs)
+   - Add alerting thresholds (notify before thresholds are breached)
+
+4. Suggest fail-fast mechanisms:
+   - Add circuit breakers for reliability
+   - Add rate limiting for performance
+   - Add validation gates for security
+   - Add smoke tests for maintainability
+
+**Output:** Actionable remediation plan with prioritized recommendations
+
+---
+
+### Step 6: Generate Deliverables
+
+**Actions:**
+
+1. Create NFR assessment markdown file:
+   - Use template from `nfr-report-template.md`
+   - Include executive summary (overall status, critical issues)
+   - Add NFR-by-NFR assessment (status, evidence, thresholds)
+   - Add findings summary (PASS count, CONCERNS count, FAIL count)
+   - Add quick wins section
+   - Add recommended actions section
+   - Add evidence gaps checklist
+   - Save to `{output_folder}/nfr-assessment.md`
+
+2. Generate gate YAML snippet (if enabled):
+
+   ```yaml
+   nfr_assessment:
+     date: '2025-10-14'
+     categories:
+       performance: 'PASS'
+       security: 'CONCERNS'
+       reliability: 'PASS'
+       maintainability: 'PASS'
+     overall_status: 'CONCERNS'
+     critical_issues: 0
+     high_priority_issues: 1
+     concerns: 2
+     blockers: false
+   ```
+
+3. Generate evidence checklist (if enabled):
+   - List all NFRs with MISSING or INCOMPLETE evidence
+   - Assign owners for evidence collection
+   - Suggest evidence sources (tests, metrics, logs)
+   - Set deadlines for evidence collection
+
+4. Update story file (if enabled and requested):
+   - Add "NFR Assessment" section to story markdown
+   - Link to NFR assessment report
+   - Include overall status and critical issues
+   - Add gate status
+
+**Output:** Complete NFR assessment documentation ready for review and CI/CD integration
+
+---
+
+## Non-Prescriptive Approach
+
+**Minimal Examples:** This workflow provides principles and patterns, not rigid templates. Teams should adapt NFR categories, thresholds, and assessment criteria to their needs.
+
+**Key Patterns to Follow:**
+
+- Use evidence-based validation (no guessing or inference)
+- Apply deterministic rules (consistent PASS/CONCERNS/FAIL classification)
+- Never guess thresholds (mark as CONCERNS if unknown)
+- Provide actionable recommendations (specific steps, not generic advice)
+- Generate gate-ready artifacts (YAML snippets for CI/CD)
+
+**Extend as Needed:**
+
+- Add custom NFR categories (accessibility, internationalization, compliance)
+- Integrate with external tools (New Relic, Datadog, SonarQube, JIRA)
+- Add custom thresholds and rules
+- Link to external assessment systems
+
+---
+
+## NFR Categories and Criteria
+
+### Performance
+
+**Criteria:**
+
+- Response time (p50, p95, p99 percentiles)
+- Throughput (requests per second, transactions per second)
+- Resource usage (CPU, memory, disk, network)
+- Scalability (horizontal, vertical)
+
+**Thresholds (Default):**
+
+- Response time p95: 500ms
+- Throughput: 100 RPS
+- CPU usage: < 70% average
+- Memory usage: < 80% max
+
+**Evidence Sources:**
+
+- Load test results (JMeter, k6, Gatling)
+- APM data (New Relic, Datadog, Dynatrace)
+- Lighthouse reports (for web apps)
+- Playwright performance traces
+
+---
+
+### Security
+
+**Criteria:**
+
+- Authentication (login security, session management)
+- Authorization (access control, permissions)
+- Data protection (encryption, PII handling)
+- Vulnerability management (SAST, DAST, dependency scanning)
+- Compliance (GDPR, HIPAA, PCI-DSS)
+
+**Thresholds (Default):**
+
+- Security score: >= 85/100
+- Critical vulnerabilities: 0
+- High vulnerabilities: < 3
+- Authentication strength: MFA enabled
+
+**Evidence Sources:**
+
+- SAST results (SonarQube, Checkmarx, Veracode)
+- DAST results (OWASP ZAP, Burp Suite)
+- Dependency scanning (Snyk, Dependabot, npm audit)
+- Penetration test reports
+- Security audit logs
+
+---
+
+### Reliability
+
+**Criteria:**
+
+- Availability (uptime percentage)
+- Error handling (graceful degradation, error recovery)
+- Fault tolerance (redundancy, failover)
+- Disaster recovery (backup, restore, RTO/RPO)
+- Stability (CI burn-in, chaos engineering)
+
+**Thresholds (Default):**
+
+- Uptime: >= 99.9% (three nines)
+- Error rate: < 0.1% (1 in 1000 requests)
+- MTTR (Mean Time To Recovery): < 15 minutes
+- CI burn-in: 100 consecutive successful runs
+
+**Evidence Sources:**
+
+- Uptime monitoring (Pingdom, UptimeRobot, StatusCake)
+- Error logs and error rates
+- CI burn-in results (see `ci-burn-in.md`)
+- Chaos engineering test results (Chaos Monkey, Gremlin)
+- Incident reports and postmortems
+
+---
+
+### Maintainability
+
+**Criteria:**
+
+- Code quality (complexity, duplication, code smells)
+- Test coverage (unit, integration, E2E)
+- Documentation (code comments, README, architecture docs)
+- Technical debt (debt ratio, code churn)
+- Test quality (from test-review workflow)
+
+**Thresholds (Default):**
+
+- Test coverage: >= 80%
+- Code quality score: >= 85/100
+- Technical debt ratio: < 5%
+- Documentation completeness: >= 90%
+
+**Evidence Sources:**
+
+- Coverage reports (Istanbul, NYC, c8, JaCoCo)
+- Static analysis (ESLint, SonarQube, CodeClimate)
+- Documentation audit (manual or automated)
+- Test review report (from test-review workflow)
+- Git metrics (code churn, commit frequency)
+
+---
+
+## Deterministic Assessment Rules
+
+### PASS Rules
+
+- Evidence exists
+- Evidence meets or exceeds threshold
+- No concerns flagged
+- Quality is acceptable
+
+**Example:**
+
+```markdown
+NFR: Response Time p95
+Threshold: 500ms
+Evidence: Load test result shows 350ms p95
+Status: PASS ✅
+```
+
+---
+
+### CONCERNS Rules
+
+- Threshold is UNKNOWN
+- Evidence is MISSING or INCOMPLETE
+- Evidence is close to threshold (within 10%)
+- Evidence shows intermittent issues
+- Quality is marginal
+
+**Example:**
+
+```markdown
+NFR: Response Time p95
+Threshold: 500ms
+Evidence: Load test result shows 480ms p95 (96% of threshold)
+Status: CONCERNS ⚠️
+Recommendation: Optimize before production - very close to threshold
+```
+
+---
+
+### FAIL Rules
+
+- Evidence exists BUT does not meet threshold
+- Critical evidence is MISSING
+- Evidence shows consistent failures
+- Quality is unacceptable
+
+**Example:**
+
+```markdown
+NFR: Response Time p95
+Threshold: 500ms
+Evidence: Load test result shows 750ms p95 (150% of threshold)
+Status: FAIL ❌
+Recommendation: BLOCKER - optimize performance before release
+```
+
+---
+
+## Integration with BMad Artifacts
+
+### With tech-spec.md
+
+- Primary source for NFR requirements and thresholds
+- Load performance targets, security requirements, reliability SLAs
+- Use architectural decisions to understand NFR trade-offs
+
+### With test-design.md
+
+- Understand NFR test plan and priorities
+- Reference test priorities (P0/P1/P2/P3) for severity classification
+- Align assessment with planned NFR validation
+
+### With PRD.md
+
+- Understand product-level NFR expectations
+- Verify NFRs align with user experience goals
+- Check for unstated NFR requirements (implied by product goals)
+
+---
+
+## Quality Gates
+
+### Release Blocker (FAIL)
+
+- Critical NFR has FAIL status (security, reliability)
+- Performance failure affects user experience severely
+- Do not release until FAIL is resolved
+
+### PR Blocker (HIGH CONCERNS)
+
+- High-priority NFR has FAIL status
+- Multiple CONCERNS exist
+- Block PR merge until addressed
+
+### Warning (CONCERNS)
+
+- Any NFR has CONCERNS status
+- Evidence is missing or incomplete
+- Address before next release
+
+### Pass (PASS)
+
+- All NFRs have PASS status
+- No blockers or concerns
+- Ready for release
+
+---
+
+## Example NFR Assessment
+
+````markdown
+# NFR Assessment - Story 1.3
+
+**Feature:** User Authentication
+**Date:** 2025-10-14
+**Overall Status:** CONCERNS ⚠️ (1 HIGH issue)
+
+## Executive Summary
+
+**Assessment:** 3 PASS, 1 CONCERNS, 0 FAIL
+**Blockers:** None
+**High Priority Issues:** 1 (Security - MFA not enforced)
+**Recommendation:** Address security concern before release
+
+## Performance Assessment
+
+### Response Time (p95)
+
+- **Status:** PASS ✅
+- **Threshold:** 500ms
+- **Actual:** 320ms (64% of threshold)
+- **Evidence:** Load test results (test-results/load-2025-10-14.json)
+- **Findings:** Response time well below threshold across all percentiles
+
+### Throughput
+
+- **Status:** PASS ✅
+- **Threshold:** 100 RPS
+- **Actual:** 250 RPS (250% of threshold)
+- **Evidence:** Load test results (test-results/load-2025-10-14.json)
+- **Findings:** System handles 2.5x target load without degradation
+
+## Security Assessment
+
+### Authentication Strength
+
+- **Status:** CONCERNS ⚠️
+- **Threshold:** MFA enabled for all users
+- **Actual:** MFA optional (not enforced)
+- **Evidence:** Security audit (security-audit-2025-10-14.md)
+- **Findings:** MFA is implemented but not enforced by default
+- **Recommendation:** HIGH - Enforce MFA for all new accounts, provide migration path for existing users
+
+### Data Protection
+
+- **Status:** PASS ✅
+- **Threshold:** PII encrypted at rest and in transit
+- **Actual:** AES-256 at rest, TLS 1.3 in transit
+- **Evidence:** Security scan (security-scan-2025-10-14.json)
+- **Findings:** All PII properly encrypted
+
+## Reliability Assessment
+
+### Uptime
+
+- **Status:** PASS ✅
+- **Threshold:** 99.9% (three nines)
+- **Actual:** 99.95% over 30 days
+- **Evidence:** Uptime monitoring (uptime-report-2025-10-14.csv)
+- **Findings:** Exceeds target with margin
+
+### Error Rate
+
+- **Status:** PASS ✅
+- **Threshold:** < 0.1% (1 in 1000)
+- **Actual:** 0.05% (1 in 2000)
+- **Evidence:** Error logs (logs/errors-2025-10.log)
+- **Findings:** Error rate well below threshold
+
+## Maintainability Assessment
+
+### Test Coverage
+
+- **Status:** PASS ✅
+- **Threshold:** >= 80%
+- **Actual:** 87%
+- **Evidence:** Coverage report (coverage/lcov-report/index.html)
+- **Findings:** Coverage exceeds threshold with good distribution
+
+### Code Quality
+
+- **Status:** PASS ✅
+- **Threshold:** >= 85/100
+- **Actual:** 92/100
+- **Evidence:** SonarQube analysis (sonarqube-report-2025-10-14.pdf)
+- **Findings:** High code quality score with low technical debt
+
+## Quick Wins
+
+1. **Enforce MFA (Security)** - HIGH - 4 hours
+   - Add configuration flag to enforce MFA for new accounts
+   - No code changes needed, only config adjustment
+
+## Recommended Actions
+
+### Immediate (Before Release)
+
+1. **Enforce MFA for all new accounts** - HIGH - 4 hours - Security Team
+   - Add `ENFORCE_MFA=true` to production config
+   - Update user onboarding flow to require MFA setup
+   - Test MFA enforcement in staging environment
+
+### Short-term (Next Sprint)
+
+1. **Migrate existing users to MFA** - MEDIUM - 3 days - Product + Engineering
+   - Design migration UX (prompt, incentives, deadline)
+   - Implement migration flow with grace period
+   - Communicate migration to existing users
+
+## Evidence Gaps
+
+- [ ] Chaos engineering test results (reliability)
+  - Owner: DevOps Team
+  - Deadline: 2025-10-21
+  - Suggested evidence: Run chaos monkey tests in staging
+
+- [ ] Penetration test report (security)
+  - Owner: Security Team
+  - Deadline: 2025-10-28
+  - Suggested evidence: Schedule third-party pentest
+
+## Gate YAML Snippet
+
+```yaml
+nfr_assessment:
+  date: '2025-10-14'
+  story_id: '1.3'
+  categories:
+    performance: 'PASS'
+    security: 'CONCERNS'
+    reliability: 'PASS'
+    maintainability: 'PASS'
+  overall_status: 'CONCERNS'
+  critical_issues: 0
+  high_priority_issues: 1
+  medium_priority_issues: 0
+  concerns: 1
+  blockers: false
+  recommendations:
+    - 'Enforce MFA for all new accounts (HIGH - 4 hours)'
+  evidence_gaps: 2
+```
+````
+
+## Recommendations Summary
+
+- **Release Blocker:** None ✅
+- **High Priority:** 1 (Enforce MFA before release)
+- **Medium Priority:** 1 (Migrate existing users to MFA)
+- **Next Steps:** Address HIGH priority item, then proceed to gate workflow
+
+```
+
+---
+
+## Validation Checklist
+
+Before completing this workflow, verify:
+
+- ✅ All NFR categories assessed (performance, security, reliability, maintainability, custom)
+- ✅ Thresholds defined or marked as UNKNOWN
+- ✅ Evidence gathered for each NFR (or marked as MISSING)
+- ✅ Status classified deterministically (PASS/CONCERNS/FAIL)
+- ✅ No thresholds were guessed (marked as CONCERNS if unknown)
+- ✅ Quick wins identified for CONCERNS/FAIL
+- ✅ Recommended actions are specific and actionable
+- ✅ Evidence gaps documented with owners and deadlines
+- ✅ NFR assessment report generated and saved
+- ✅ Gate YAML snippet generated (if enabled)
+- ✅ Evidence checklist generated (if enabled)
+
+---
+
+## Notes
+
+- **Never Guess Thresholds:** If a threshold is unknown, mark as CONCERNS and recommend defining it
+- **Evidence-Based:** Every assessment must be backed by evidence (tests, metrics, logs, CI results)
+- **Deterministic Rules:** Use consistent PASS/CONCERNS/FAIL classification based on evidence
+- **Actionable Recommendations:** Provide specific steps, not generic advice
+- **Gate Integration:** Generate YAML snippets that can be consumed by CI/CD pipelines
+
+---
+
+## Troubleshooting
+
+### "NFR thresholds not defined"
+- Check tech-spec.md for NFR requirements
+- Check PRD.md for product-level SLAs
+- Check story file for feature-specific requirements
+- If thresholds truly unknown, mark as CONCERNS and recommend defining them
+
+### "No evidence found"
+- Check evidence directories (test-results, metrics, logs)
+- Check CI/CD pipeline for test results
+- If evidence truly missing, mark NFR as "NO EVIDENCE" and recommend generating it
+
+### "CONCERNS status but no threshold exceeded"
+- CONCERNS is correct when threshold is UNKNOWN or evidence is MISSING/INCOMPLETE
+- CONCERNS is also correct when evidence is close to threshold (within 10%)
+- Document why CONCERNS was assigned
+
+### "FAIL status blocks release"
+- This is intentional - FAIL means critical NFR not met
+- Recommend remediation actions with specific steps
+- Re-run assessment after remediation
+
+---
+
+## Related Workflows
+
+- **testarch-test-design** - Define NFR requirements and test plan
+- **testarch-framework** - Set up performance/security testing frameworks
+- **testarch-ci** - Configure CI/CD for NFR validation
+- **testarch-gate** - Use NFR assessment as input for quality gate decisions
+- **testarch-test-review** - Review test quality (maintainability NFR)
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
+```
--- a/bmad/bmm/workflows/testarch/nfr-assess/nfr-report-template.md
+++ b/bmad/bmm/workflows/testarch/nfr-assess/nfr-report-template.md
@@ -0,0 +1,443 @@
+# NFR Assessment - {FEATURE_NAME}
+
+**Date:** {DATE}
+**Story:** {STORY_ID} (if applicable)
+**Overall Status:** {OVERALL_STATUS} {STATUS_ICON}
+
+---
+
+## Executive Summary
+
+**Assessment:** {PASS_COUNT} PASS, {CONCERNS_COUNT} CONCERNS, {FAIL_COUNT} FAIL
+
+**Blockers:** {BLOCKER_COUNT} {BLOCKER_DESCRIPTION}
+
+**High Priority Issues:** {HIGH_PRIORITY_COUNT} {HIGH_PRIORITY_DESCRIPTION}
+
+**Recommendation:** {OVERALL_RECOMMENDATION}
+
+---
+
+## Performance Assessment
+
+### Response Time (p95)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE}
+- **Actual:** {ACTUAL_VALUE}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Throughput
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE}
+- **Actual:** {ACTUAL_VALUE}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Resource Usage
+
+- **CPU Usage**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+- **Memory Usage**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+### Scalability
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Security Assessment
+
+### Authentication Strength
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+- **Recommendation:** {RECOMMENDATION} (if CONCERNS or FAIL)
+
+### Authorization Controls
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Data Protection
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Vulnerability Management
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION} (e.g., "0 critical, <3 high vulnerabilities")
+- **Actual:** {ACTUAL_DESCRIPTION} (e.g., "0 critical, 1 high, 5 medium vulnerabilities")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Snyk scan results - scan-2025-10-14.json")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Compliance (if applicable)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Standards:** {COMPLIANCE_STANDARDS} (e.g., "GDPR, HIPAA, PCI-DSS")
+- **Actual:** {ACTUAL_COMPLIANCE_STATUS}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Reliability Assessment
+
+### Availability (Uptime)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "99.9%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "99.95%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Uptime monitoring - uptime-report-2025-10-14.csv")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Error Rate
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "<0.1%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "0.05%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Error logs - logs/errors-2025-10.log")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### MTTR (Mean Time To Recovery)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "<15 minutes")
+- **Actual:** {ACTUAL_VALUE} (e.g., "12 minutes")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Incident reports - incidents/")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Fault Tolerance
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### CI Burn-In (Stability)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "100 consecutive successful runs")
+- **Actual:** {ACTUAL_VALUE} (e.g., "150 consecutive successful runs")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "CI burn-in results - ci-burn-in-2025-10-14.log")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Disaster Recovery (if applicable)
+
+- **RTO (Recovery Time Objective)**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+- **RPO (Recovery Point Objective)**
+  - **Status:** {STATUS} {STATUS_ICON}
+  - **Threshold:** {THRESHOLD_VALUE}
+  - **Actual:** {ACTUAL_VALUE}
+  - **Evidence:** {EVIDENCE_SOURCE}
+
+---
+
+## Maintainability Assessment
+
+### Test Coverage
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=80%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "87%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Coverage report - coverage/lcov-report/index.html")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Code Quality
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=85/100")
+- **Actual:** {ACTUAL_VALUE} (e.g., "92/100")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "SonarQube analysis - sonarqube-report-2025-10-14.pdf")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Technical Debt
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., "<5% debt ratio")
+- **Actual:** {ACTUAL_VALUE} (e.g., "3.2% debt ratio")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "CodeClimate analysis - codeclimate-2025-10-14.json")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Documentation Completeness
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=90%")
+- **Actual:** {ACTUAL_VALUE} (e.g., "95%")
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Documentation audit - docs-audit-2025-10-14.md")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### Test Quality (from test-review, if available)
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Test review report - test-review-2025-10-14.md")
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Custom NFR Assessments (if applicable)
+
+### {CUSTOM_NFR_NAME_1}
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+### {CUSTOM_NFR_NAME_2}
+
+- **Status:** {STATUS} {STATUS_ICON}
+- **Threshold:** {THRESHOLD_DESCRIPTION}
+- **Actual:** {ACTUAL_DESCRIPTION}
+- **Evidence:** {EVIDENCE_SOURCE}
+- **Findings:** {FINDINGS_DESCRIPTION}
+
+---
+
+## Quick Wins
+
+{QUICK_WIN_COUNT} quick wins identified for immediate implementation:
+
+1. **{QUICK_WIN_TITLE_1}** ({NFR_CATEGORY}) - {PRIORITY} - {ESTIMATED_EFFORT}
+   - {QUICK_WIN_DESCRIPTION}
+   - No code changes needed / Minimal code changes
+
+2. **{QUICK_WIN_TITLE_2}** ({NFR_CATEGORY}) - {PRIORITY} - {ESTIMATED_EFFORT}
+   - {QUICK_WIN_DESCRIPTION}
+
+---
+
+## Recommended Actions
+
+### Immediate (Before Release) - CRITICAL/HIGH Priority
+
+1. **{ACTION_TITLE_1}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+   - {SPECIFIC_STEPS}
+   - {VALIDATION_CRITERIA}
+
+2. **{ACTION_TITLE_2}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+   - {SPECIFIC_STEPS}
+   - {VALIDATION_CRITERIA}
+
+### Short-term (Next Sprint) - MEDIUM Priority
+
+1. **{ACTION_TITLE_3}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+
+2. **{ACTION_TITLE_4}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+
+### Long-term (Backlog) - LOW Priority
+
+1. **{ACTION_TITLE_5}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
+   - {ACTION_DESCRIPTION}
+
+---
+
+## Monitoring Hooks
+
+{MONITORING_HOOK_COUNT} monitoring hooks recommended to detect issues before failures:
+
+### Performance Monitoring
+
+- [ ] {MONITORING_TOOL_1} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+- [ ] {MONITORING_TOOL_2} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+### Security Monitoring
+
+- [ ] {MONITORING_TOOL_3} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+### Reliability Monitoring
+
+- [ ] {MONITORING_TOOL_4} - {MONITORING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+### Alerting Thresholds
+
+- [ ] {ALERT_DESCRIPTION} - Notify when {THRESHOLD_CONDITION}
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+
+---
+
+## Fail-Fast Mechanisms
+
+{FAIL_FAST_COUNT} fail-fast mechanisms recommended to prevent failures:
+
+### Circuit Breakers (Reliability)
+
+- [ ] {CIRCUIT_BREAKER_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+### Rate Limiting (Performance)
+
+- [ ] {RATE_LIMITING_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+### Validation Gates (Security)
+
+- [ ] {VALIDATION_GATE_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+### Smoke Tests (Maintainability)
+
+- [ ] {SMOKE_TEST_DESCRIPTION}
+  - **Owner:** {OWNER}
+  - **Estimated Effort:** {EFFORT}
+
+---
+
+## Evidence Gaps
+
+{EVIDENCE_GAP_COUNT} evidence gaps identified - action required:
+
+- [ ] **{NFR_NAME_1}** ({NFR_CATEGORY})
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+  - **Suggested Evidence:** {SUGGESTED_EVIDENCE_SOURCE}
+  - **Impact:** {IMPACT_DESCRIPTION}
+
+- [ ] **{NFR_NAME_2}** ({NFR_CATEGORY})
+  - **Owner:** {OWNER}
+  - **Deadline:** {DEADLINE}
+  - **Suggested Evidence:** {SUGGESTED_EVIDENCE_SOURCE}
+  - **Impact:** {IMPACT_DESCRIPTION}
+
+---
+
+## Findings Summary
+
+| Category        | PASS             | CONCERNS             | FAIL             | Overall Status                      |
+| --------------- | ---------------- | -------------------- | ---------------- | ----------------------------------- |
+| Performance     | {P_PASS_COUNT}   | {P_CONCERNS_COUNT}   | {P_FAIL_COUNT}   | {P_STATUS} {P_ICON}                 |
+| Security        | {S_PASS_COUNT}   | {S_CONCERNS_COUNT}   | {S_FAIL_COUNT}   | {S_STATUS} {S_ICON}                 |
+| Reliability     | {R_PASS_COUNT}   | {R_CONCERNS_COUNT}   | {R_FAIL_COUNT}   | {R_STATUS} {R_ICON}                 |
+| Maintainability | {M_PASS_COUNT}   | {M_CONCERNS_COUNT}   | {M_FAIL_COUNT}   | {M_STATUS} {M_ICON}                 |
+| **Total**       | **{TOTAL_PASS}** | **{TOTAL_CONCERNS}** | **{TOTAL_FAIL}** | **{OVERALL_STATUS} {OVERALL_ICON}** |
+
+---
+
+## Gate YAML Snippet
+
+```yaml
+nfr_assessment:
+  date: '{DATE}'
+  story_id: '{STORY_ID}'
+  feature_name: '{FEATURE_NAME}'
+  categories:
+    performance: '{PERFORMANCE_STATUS}'
+    security: '{SECURITY_STATUS}'
+    reliability: '{RELIABILITY_STATUS}'
+    maintainability: '{MAINTAINABILITY_STATUS}'
+  overall_status: '{OVERALL_STATUS}'
+  critical_issues: { CRITICAL_COUNT }
+  high_priority_issues: { HIGH_COUNT }
+  medium_priority_issues: { MEDIUM_COUNT }
+  concerns: { CONCERNS_COUNT }
+  blockers: { BLOCKER_BOOLEAN } # true/false
+  quick_wins: { QUICK_WIN_COUNT }
+  evidence_gaps: { EVIDENCE_GAP_COUNT }
+  recommendations:
+    - '{RECOMMENDATION_1}'
+    - '{RECOMMENDATION_2}'
+    - '{RECOMMENDATION_3}'
+```
+
+---
+
+## Related Artifacts
+
+- **Story File:** {STORY_FILE_PATH} (if applicable)
+- **Tech Spec:** {TECH_SPEC_PATH} (if available)
+- **PRD:** {PRD_PATH} (if available)
+- **Test Design:** {TEST_DESIGN_PATH} (if available)
+- **Evidence Sources:**
+  - Test Results: {TEST_RESULTS_DIR}
+  - Metrics: {METRICS_DIR}
+  - Logs: {LOGS_DIR}
+  - CI Results: {CI_RESULTS_PATH}
+
+---
+
+## Recommendations Summary
+
+**Release Blocker:** {RELEASE_BLOCKER_SUMMARY}
+
+**High Priority:** {HIGH_PRIORITY_SUMMARY}
+
+**Medium Priority:** {MEDIUM_PRIORITY_SUMMARY}
+
+**Next Steps:** {NEXT_STEPS_DESCRIPTION}
+
+---
+
+## Sign-Off
+
+**NFR Assessment:**
+
+- Overall Status: {OVERALL_STATUS} {OVERALL_ICON}
+- Critical Issues: {CRITICAL_COUNT}
+- High Priority Issues: {HIGH_COUNT}
+- Concerns: {CONCERNS_COUNT}
+- Evidence Gaps: {EVIDENCE_GAP_COUNT}
+
+**Gate Status:** {GATE_STATUS} {GATE_ICON}
+
+**Next Actions:**
+
+- If PASS ✅: Proceed to `*gate` workflow or release
+- If CONCERNS ⚠️: Address HIGH/CRITICAL issues, re-run `*nfr-assess`
+- If FAIL ❌: Resolve FAIL status NFRs, re-run `*nfr-assess`
+
+**Generated:** {DATE}
+**Workflow:** testarch-nfr v4.0
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/bmad/bmm/workflows/testarch/nfr-assess/workflow.yaml
+++ b/bmad/bmm/workflows/testarch/nfr-assess/workflow.yaml
@@ -0,0 +1,56 @@
+# Test Architect workflow: nfr-assess
+name: testarch-nfr
+description: "Assess non-functional requirements (performance, security, reliability, maintainability) before release with evidence-based validation"
+author: "BMad"
+
+# Critical variables from config
+config_source: "{project-root}/bmad/bmm/config.yaml"
+output_folder: "{config_source}:output_folder"
+user_name: "{config_source}:user_name"
+communication_language: "{config_source}:communication_language"
+document_output_language: "{config_source}:document_output_language"
+date: system-generated
+
+# Workflow components
+installed_path: "{project-root}/bmad/bmm/workflows/testarch/nfr-assess"
+instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/nfr-report-template.md"
+
+# Variables and inputs
+variables:
+  # NFR category assessment (defaults to all categories)
+  custom_nfr_categories: "" # Optional additional categories beyond standard (security, performance, reliability, maintainability)
+
+# Output configuration
+default_output_file: "{output_folder}/nfr-assessment.md"
+
+# Required tools
+required_tools:
+  - read_file # Read story, test results, metrics, logs, BMad artifacts
+  - write_file # Create NFR assessment, gate YAML, evidence checklist
+  - list_files # Discover test results, metrics, logs
+  - search_repo # Find NFR-related tests and evidence
+  - glob # Find result files matching patterns
+
+# Recommended inputs
+recommended_inputs:
+  - story: "Story markdown with NFR requirements (optional)"
+  - tech_spec: "Technical specification with NFR targets (recommended)"
+  - test_results: "Test execution results (performance, security, etc.)"
+  - metrics: "Application metrics (response times, error rates, etc.)"
+  - logs: "Application logs for reliability analysis"
+  - ci_results: "CI/CD pipeline results for burn-in validation"
+
+tags:
+  - qa
+  - nfr
+  - test-architect
+  - performance
+  - security
+  - reliability
+
+execution_hints:
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
+  iterative: true
--- a/bmad/bmm/workflows/testarch/test-design/README.md
+++ b/bmad/bmm/workflows/testarch/test-design/README.md
@@ -0,0 +1,493 @@
+# Test Design and Risk Assessment Workflow
+
+Plans comprehensive test coverage strategy with risk assessment (probability × impact scoring), priority classification (P0-P3), and resource estimation. This workflow generates a test design document that identifies high-risk areas, maps requirements to appropriate test levels, and provides execution ordering for optimal feedback.
+
+## Usage
+
+```bash
+bmad tea *test-design
+```
+
+The TEA agent runs this workflow when:
+
+- Planning test coverage before development starts
+- Assessing risks for an epic or story
+- Prioritizing test scenarios by business impact
+- Estimating testing effort and resources
+
+## Inputs
+
+**Required Context Files:**
+
+- **Story markdown**: Acceptance criteria and requirements
+- **PRD or epics.md**: High-level product context
+- **Architecture docs** (optional): Technical constraints and integration points
+
+**Workflow Variables:**
+
+- `epic_num`: Epic number for scoped design
+- `story_path`: Specific story for design (optional)
+- `design_level`: full/targeted/minimal (default: full)
+- `risk_threshold`: Score for high-priority flag (default: 6)
+- `risk_categories`: TECH,SEC,PERF,DATA,BUS,OPS (all enabled)
+- `priority_levels`: P0,P1,P2,P3 (all enabled)
+
+## Outputs
+
+**Primary Deliverable:**
+
+**Test Design Document** (`test-design-epic-{N}.md`):
+
+1. **Risk Assessment Matrix**
+   - Risk ID, category, description
+   - Probability (1-3) × Impact (1-3) = Score
+   - Scores ≥6 flagged as high-priority
+   - Mitigation plans with owners and timelines
+
+2. **Coverage Matrix**
+   - Requirement → Test Level (E2E/API/Component/Unit)
+   - Priority assignment (P0-P3)
+   - Risk linkage
+   - Test count estimates
+
+3. **Execution Order**
+   - Smoke tests (P0 subset, <5 min)
+   - P0 tests (critical paths, <10 min)
+   - P1 tests (important features, <30 min)
+   - P2/P3 tests (full regression, <60 min)
+
+4. **Resource Estimates**
+   - Hours per priority level
+   - Total effort in days
+   - Tooling and data prerequisites
+
+5. **Quality Gate Criteria**
+   - P0 pass rate: 100%
+   - P1 pass rate: ≥95%
+   - High-risk mitigations: 100%
+   - Coverage target: ≥80%
+
+## Key Features
+
+### Risk Scoring Framework
+
+**Probability × Impact = Risk Score**
+
+**Probability** (1-3):
+
+- 1 (Unlikely): <10% chance
+- 2 (Possible): 10-50% chance
+- 3 (Likely): >50% chance
+
+**Impact** (1-3):
+
+- 1 (Minor): Cosmetic, workaround exists
+- 2 (Degraded): Feature impaired, difficult workaround
+- 3 (Critical): System failure, no workaround
+
+**Scores**:
+
+- 1-2: Low risk (monitor)
+- 3-4: Medium risk (plan mitigation)
+- **6-9: High risk** (immediate mitigation required)
+
+### Risk Categories (6 types)
+
+**TECH** (Technical/Architecture):
+
+- Architecture flaws, integration failures
+- Scalability issues, technical debt
+
+**SEC** (Security):
+
+- Missing access controls, auth bypass
+- Data exposure, injection vulnerabilities
+
+**PERF** (Performance):
+
+- SLA violations, response time degradation
+- Resource exhaustion, scalability limits
+
+**DATA** (Data Integrity):
+
+- Data loss/corruption, inconsistent state
+- Migration failures
+
+**BUS** (Business Impact):
+
+- UX degradation, business logic errors
+- Revenue impact, compliance violations
+
+**OPS** (Operations):
+
+- Deployment failures, configuration errors
+- Monitoring gaps, rollback issues
+
+### Priority Classification (P0-P3)
+
+**P0 (Critical)** - Run on every commit:
+
+- Blocks core user journey
+- High-risk (score ≥6)
+- Revenue-impacting or security-critical
+
+**P1 (High)** - Run on PR to main:
+
+- Important user features
+- Medium-risk (score 3-4)
+- Common workflows
+
+**P2 (Medium)** - Run nightly/weekly:
+
+- Secondary features
+- Low-risk (score 1-2)
+- Edge cases
+
+**P3 (Low)** - Run on-demand:
+
+- Nice-to-have, exploratory
+- Performance benchmarks
+
+### Test Level Selection
+
+**E2E (End-to-End)**:
+
+- Critical user journeys
+- Multi-system integration
+- Highest confidence, slowest
+
+**API (Integration)**:
+
+- Service contracts
+- Business logic validation
+- Fast feedback, stable
+
+**Component**:
+
+- UI component behavior
+- Visual regression
+- Fast, isolated
+
+**Unit**:
+
+- Business logic, edge cases
+- Error handling
+- Fastest, most granular
+
+**Key principle**: Avoid duplicate coverage - don't test same behavior at multiple levels.
+
+### Exploratory Mode (NEW - Phase 2.5)
+
+**test-design** supports UI exploration for brownfield applications with missing documentation.
+
+**Activation**: Automatic when requirements missing/incomplete for brownfield apps
+
+- If config.tea_use_mcp_enhancements is true + MCP available → MCP-assisted exploration
+- Otherwise → Manual exploration with user documentation
+
+**When to Use Exploratory Mode:**
+
+- ✅ Brownfield projects with missing documentation
+- ✅ Legacy systems lacking requirements
+- ✅ Undocumented features needing test coverage
+- ✅ Unknown user journeys requiring discovery
+- ❌ NOT for greenfield projects with clear requirements
+
+**Exploration Modes:**
+
+1. **MCP-Assisted Exploration** (if Playwright MCP available):
+   - Interactive browser exploration using MCP tools
+   - `planner_setup_page` - Initialize browser
+   - `browser_navigate` - Explore pages
+   - `browser_click` - Interact with UI elements
+   - `browser_hover` - Reveal hidden menus
+   - `browser_snapshot` - Capture state at each step
+   - `browser_screenshot` - Document visually
+   - `browser_console_messages` - Find JavaScript errors
+   - `browser_network_requests` - Identify API endpoints
+
+2. **Manual Exploration** (fallback without MCP):
+   - User explores application manually
+   - Documents findings in markdown:
+     - Pages/features discovered
+     - User journeys identified
+     - API endpoints observed (DevTools Network)
+     - JavaScript errors noted (DevTools Console)
+     - Critical workflows mapped
+   - Provides exploration findings to workflow
+
+**Exploration Workflow:**
+
+```
+1. Enable exploratory_mode and set exploration_url
+2. IF MCP available:
+   - Use planner_setup_page to init browser
+   - Explore UI with browser_* tools
+   - Capture snapshots and screenshots
+   - Monitor console and network
+   - Document discoveries
+3. IF MCP unavailable:
+   - Notify user to explore manually
+   - Wait for exploration findings
+4. Convert discoveries to testable requirements
+5. Continue with standard risk assessment (Step 2)
+```
+
+**Example Output from Exploratory Mode:**
+
+```markdown
+## Exploration Findings - Legacy Admin Panel
+
+**Exploration URL**: https://admin.example.com
+**Mode**: MCP-Assisted
+
+### Discovered Features:
+
+1. User Management (/admin/users)
+   - List users (table with 10 columns)
+   - Edit user (modal form)
+   - Delete user (confirmation dialog)
+   - Export to CSV (download button)
+
+2. Reporting Dashboard (/admin/reports)
+   - Date range picker
+   - Filter by department
+   - Generate PDF report
+   - Email report to stakeholders
+
+3. API Endpoints Discovered:
+   - GET /api/admin/users
+   - PUT /api/admin/users/:id
+   - DELETE /api/admin/users/:id
+   - POST /api/reports/generate
+
+### User Journeys Mapped:
+
+1. Admin deletes inactive user
+   - Navigate to /admin/users
+   - Click delete icon
+   - Confirm in modal
+   - User removed from table
+
+2. Admin generates monthly report
+   - Navigate to /admin/reports
+   - Select date range (last month)
+   - Click generate
+   - Download PDF
+
+### Risks Identified (from exploration):
+
+- R-001 (SEC): No RBAC check observed (any admin can delete any user)
+- R-002 (DATA): No confirmation on bulk delete
+- R-003 (PERF): User table loads slowly (5s for 1000 rows)
+
+**Next**: Proceed to risk assessment with discovered requirements
+```
+
+**Graceful Degradation:**
+
+- Exploratory mode is OPTIONAL (default: disabled)
+- Works without Playwright MCP (manual fallback)
+- If exploration fails, can disable mode and provide requirements documentation
+- Seamlessly transitions to standard risk assessment workflow
+
+### Knowledge Base Integration
+
+Automatically consults TEA knowledge base:
+
+- `risk-governance.md` - Risk classification framework
+- `probability-impact.md` - Risk scoring methodology
+- `test-levels-framework.md` - Test level selection
+- `test-priorities-matrix.md` - P0-P3 prioritization
+
+## Integration with Other Workflows
+
+**Before test-design:**
+
+- **prd** (Phase 2): Creates PRD and epics
+- **architecture** (Phase 3): Defines technical approach
+- **tech-spec** (Phase 3): Implementation details
+
+**After test-design:**
+
+- **atdd**: Generate failing tests for P0 scenarios
+- **automate**: Expand coverage for P1/P2 scenarios
+- **trace (Phase 2)**: Use quality gate criteria for release decisions
+
+**Coordinates with:**
+
+- **framework**: Test infrastructure must exist
+- **ci**: Execution order maps to CI stages
+
+**Updates:**
+
+- `bmm-workflow-status.md`: Adds test design to Quality & Testing Progress
+
+## Important Notes
+
+### Evidence-Based Assessment
+
+**Critical principle**: Base risk assessment on **evidence**, not speculation.
+
+**Evidence sources:**
+
+- PRD and user research
+- Architecture documentation
+- Historical bug data
+- User feedback
+- Security audit results
+
+**When uncertain**: Document assumptions, request user clarification.
+
+**Avoid**:
+
+- Guessing business impact
+- Assuming user behavior
+- Inventing requirements
+
+### Resource Estimation Formula
+
+```
+P0: 2 hours per test (setup + complex scenarios)
+P1: 1 hour per test (standard coverage)
+P2: 0.5 hours per test (simple scenarios)
+P3: 0.25 hours per test (exploratory)
+
+Total Days = Total Hours / 8
+```
+
+Example:
+
+- 15 P0 × 2h = 30h
+- 25 P1 × 1h = 25h
+- 40 P2 × 0.5h = 20h
+- **Total: 75 hours (~10 days)**
+
+### Execution Order Strategy
+
+**Smoke tests** (subset of P0, <5 min):
+
+- Login successful
+- Dashboard loads
+- Core API responds
+
+**Purpose**: Fast feedback, catch build-breaking issues immediately.
+
+**P0 tests** (critical paths, <10 min):
+
+- All scenarios blocking user journeys
+- Security-critical flows
+
+**P1 tests** (important features, <30 min):
+
+- Common workflows
+- Medium-risk areas
+
+**P2/P3 tests** (full regression, <60 min):
+
+- Edge cases
+- Performance benchmarks
+
+### Quality Gate Criteria
+
+**Pass/Fail thresholds:**
+
+- P0: 100% pass (no exceptions)
+- P1: ≥95% pass (2-3 failures acceptable with waivers)
+- P2/P3: ≥90% pass (informational)
+- High-risk items: All mitigated or have approved waivers
+
+**Coverage targets:**
+
+- Critical paths: ≥80%
+- Security scenarios: 100%
+- Business logic: ≥70%
+
+## Validation Checklist
+
+After workflow completion:
+
+- [ ] Risk assessment complete (all categories)
+- [ ] Risks scored (probability × impact)
+- [ ] High-priority risks (≥6) flagged
+- [ ] Coverage matrix maps requirements to test levels
+- [ ] Priorities assigned (P0-P3)
+- [ ] Execution order defined
+- [ ] Resource estimates provided
+- [ ] Quality gate criteria defined
+- [ ] Output file created
+
+Refer to `checklist.md` for comprehensive validation.
+
+## Example Execution
+
+**Scenario: E-commerce checkout epic**
+
+```bash
+bmad tea *test-design
+# Epic 3: Checkout flow redesign
+
+# Risk Assessment identifies:
+- R-001 (SEC): Payment bypass, P=2 × I=3 = 6 (HIGH)
+- R-002 (PERF): Cart load time, P=3 × I=2 = 6 (HIGH)
+- R-003 (BUS): Order confirmation email, P=2 × I=2 = 4 (MEDIUM)
+
+# Coverage Plan:
+P0 scenarios: 12 tests (payment security, order creation)
+P1 scenarios: 18 tests (cart management, promo codes)
+P2 scenarios: 25 tests (edge cases, error handling)
+
+Total effort: 65 hours (~8 days)
+
+# Test Levels:
+- E2E: 8 tests (critical checkout path)
+- API: 30 tests (business logic, payment processing)
+- Unit: 17 tests (calculations, validations)
+
+# Execution Order:
+1. Smoke: Payment successful, order created (2 min)
+2. P0: All payment & security flows (8 min)
+3. P1: Cart & promo codes (20 min)
+4. P2: Edge cases (40 min)
+
+# Quality Gates:
+- P0 pass rate: 100%
+- P1 pass rate: ≥95%
+- R-001 mitigated: Add payment validation layer
+- R-002 mitigated: Implement cart caching
+```
+
+## Troubleshooting
+
+**Issue: "Unable to score risks - missing context"**
+
+- **Cause**: Insufficient documentation
+- **Solution**: Request PRD, architecture docs, or user clarification
+
+**Issue: "All tests marked as P0"**
+
+- **Cause**: Over-prioritization
+- **Solution**: Apply strict P0 criteria (blocks core journey + high risk + no workaround)
+
+**Issue: "Duplicate coverage at multiple test levels"**
+
+- **Cause**: Not following test pyramid
+- **Solution**: Use E2E for critical paths only, API for logic, unit for edge cases
+
+**Issue: "Resource estimates too high"**
+
+- **Cause**: Complex test setup or insufficient automation
+- **Solution**: Invest in fixtures/factories upfront, reduce per-test setup time
+
+## Related Workflows
+
+- **atdd**: Generate failing tests → [atdd/README.md](../atdd/README.md)
+- **automate**: Expand regression coverage → [automate/README.md](../automate/README.md)
+- **trace**: Traceability and quality gate decisions → [trace/README.md](../trace/README.md)
+- **framework**: Test infrastructure → [framework/README.md](../framework/README.md)
+
+## Version History
+
+- **v4.0 (BMad v6)**: Pure markdown instructions, risk scoring framework, template-based output
+- **v3.x**: XML format instructions
+- **v2.x**: Legacy task-based approach
--- a/bmad/bmm/workflows/testarch/test-design/checklist.md
+++ b/bmad/bmm/workflows/testarch/test-design/checklist.md
@@ -0,0 +1,234 @@
+# Test Design and Risk Assessment - Validation Checklist
+
+## Prerequisites
+
+- [ ] Story markdown with clear acceptance criteria exists
+- [ ] PRD or epic documentation available
+- [ ] Architecture documents available (optional)
+- [ ] Requirements are testable and unambiguous
+
+## Process Steps
+
+### Step 1: Context Loading
+
+- [ ] PRD.md read and requirements extracted
+- [ ] Epics.md or specific epic documentation loaded
+- [ ] Story markdown with acceptance criteria analyzed
+- [ ] Architecture documents reviewed (if available)
+- [ ] Existing test coverage analyzed
+- [ ] Knowledge base fragments loaded (risk-governance, probability-impact, test-levels, test-priorities)
+
+### Step 2: Risk Assessment
+
+- [ ] Genuine risks identified (not just features)
+- [ ] Risks classified by category (TECH/SEC/PERF/DATA/BUS/OPS)
+- [ ] Probability scored (1-3 for each risk)
+- [ ] Impact scored (1-3 for each risk)
+- [ ] Risk scores calculated (probability × impact)
+- [ ] High-priority risks (score ≥6) flagged
+- [ ] Mitigation plans defined for high-priority risks
+- [ ] Owners assigned for each mitigation
+- [ ] Timelines set for mitigations
+- [ ] Residual risk documented
+
+### Step 3: Coverage Design
+
+- [ ] Acceptance criteria broken into atomic scenarios
+- [ ] Test levels selected (E2E/API/Component/Unit)
+- [ ] No duplicate coverage across levels
+- [ ] Priority levels assigned (P0/P1/P2/P3)
+- [ ] P0 scenarios meet strict criteria (blocks core + high risk + no workaround)
+- [ ] Data prerequisites identified
+- [ ] Tooling requirements documented
+- [ ] Execution order defined (smoke → P0 → P1 → P2/P3)
+
+### Step 4: Deliverables Generation
+
+- [ ] Risk assessment matrix created
+- [ ] Coverage matrix created
+- [ ] Execution order documented
+- [ ] Resource estimates calculated
+- [ ] Quality gate criteria defined
+- [ ] Output file written to correct location
+- [ ] Output file uses template structure
+
+## Output Validation
+
+### Risk Assessment Matrix
+
+- [ ] All risks have unique IDs (R-001, R-002, etc.)
+- [ ] Each risk has category assigned
+- [ ] Probability values are 1, 2, or 3
+- [ ] Impact values are 1, 2, or 3
+- [ ] Scores calculated correctly (P × I)
+- [ ] High-priority risks (≥6) clearly marked
+- [ ] Mitigation strategies specific and actionable
+
+### Coverage Matrix
+
+- [ ] All requirements mapped to test levels
+- [ ] Priorities assigned to all scenarios
+- [ ] Risk linkage documented
+- [ ] Test counts realistic
+- [ ] Owners assigned where applicable
+- [ ] No duplicate coverage (same behavior at multiple levels)
+
+### Execution Order
+
+- [ ] Smoke tests defined (<5 min target)
+- [ ] P0 tests listed (<10 min target)
+- [ ] P1 tests listed (<30 min target)
+- [ ] P2/P3 tests listed (<60 min target)
+- [ ] Order optimizes for fast feedback
+
+### Resource Estimates
+
+- [ ] P0 hours calculated (count × 2 hours)
+- [ ] P1 hours calculated (count × 1 hour)
+- [ ] P2 hours calculated (count × 0.5 hours)
+- [ ] P3 hours calculated (count × 0.25 hours)
+- [ ] Total hours summed
+- [ ] Days estimate provided (hours / 8)
+- [ ] Estimates include setup time
+
+### Quality Gate Criteria
+
+- [ ] P0 pass rate threshold defined (should be 100%)
+- [ ] P1 pass rate threshold defined (typically ≥95%)
+- [ ] High-risk mitigation completion required
+- [ ] Coverage targets specified (≥80% recommended)
+
+## Quality Checks
+
+### Evidence-Based Assessment
+
+- [ ] Risk assessment based on documented evidence
+- [ ] No speculation on business impact
+- [ ] Assumptions clearly documented
+- [ ] Clarifications requested where needed
+- [ ] Historical data referenced where available
+
+### Risk Classification Accuracy
+
+- [ ] TECH risks are architecture/integration issues
+- [ ] SEC risks are security vulnerabilities
+- [ ] PERF risks are performance/scalability concerns
+- [ ] DATA risks are data integrity issues
+- [ ] BUS risks are business/revenue impacts
+- [ ] OPS risks are deployment/operational issues
+
+### Priority Assignment Accuracy
+
+- [ ] P0: Truly blocks core functionality
+- [ ] P0: High-risk (score ≥6)
+- [ ] P0: No workaround exists
+- [ ] P1: Important but not blocking
+- [ ] P2/P3: Nice-to-have or edge cases
+
+### Test Level Selection
+
+- [ ] E2E used only for critical paths
+- [ ] API tests cover complex business logic
+- [ ] Component tests for UI interactions
+- [ ] Unit tests for edge cases and algorithms
+- [ ] No redundant coverage
+
+## Integration Points
+
+### Knowledge Base Integration
+
+- [ ] risk-governance.md consulted
+- [ ] probability-impact.md applied
+- [ ] test-levels-framework.md referenced
+- [ ] test-priorities-matrix.md used
+- [ ] Additional fragments loaded as needed
+
+### Status File Integration
+
+- [ ] bmm-workflow-status.md exists
+- [ ] Test design logged in Quality & Testing Progress
+- [ ] Epic number and scope documented
+- [ ] Completion timestamp recorded
+
+### Workflow Dependencies
+
+- [ ] Can proceed to `atdd` workflow with P0 scenarios
+- [ ] Can proceed to `automate` workflow with full coverage plan
+- [ ] Risk assessment informs `gate` workflow criteria
+- [ ] Integrates with `ci` workflow execution order
+
+## Completion Criteria
+
+**All must be true:**
+
+- [ ] All prerequisites met
+- [ ] All process steps completed
+- [ ] All output validations passed
+- [ ] All quality checks passed
+- [ ] All integration points verified
+- [ ] Output file complete and well-formatted
+- [ ] Team review scheduled (if required)
+
+## Post-Workflow Actions
+
+**User must complete:**
+
+1. [ ] Review risk assessment with team
+2. [ ] Prioritize mitigation for high-priority risks (score ≥6)
+3. [ ] Allocate resources per estimates
+4. [ ] Run `atdd` workflow to generate P0 tests
+5. [ ] Set up test data factories and fixtures
+6. [ ] Schedule team review of test design document
+
+**Recommended next workflows:**
+
+1. [ ] Run `atdd` workflow for P0 test generation
+2. [ ] Run `framework` workflow if not already done
+3. [ ] Run `ci` workflow to configure pipeline stages
+
+## Rollback Procedure
+
+If workflow fails:
+
+1. [ ] Delete output file
+2. [ ] Review error logs
+3. [ ] Fix missing context (PRD, architecture docs)
+4. [ ] Clarify ambiguous requirements
+5. [ ] Retry workflow
+
+## Notes
+
+### Common Issues
+
+**Issue**: Too many P0 tests
+
+- **Solution**: Apply strict P0 criteria - must block core AND high risk AND no workaround
+
+**Issue**: Risk scores all high
+
+- **Solution**: Differentiate between high-impact (3) and degraded (2) impacts
+
+**Issue**: Duplicate coverage across levels
+
+- **Solution**: Use test pyramid - E2E for critical paths only
+
+**Issue**: Resource estimates too high
+
+- **Solution**: Invest in fixtures/factories to reduce per-test setup time
+
+### Best Practices
+
+- Base risk assessment on evidence, not assumptions
+- High-priority risks (≥6) require immediate mitigation
+- P0 tests should cover <10% of total scenarios
+- Avoid testing same behavior at multiple levels
+- Include smoke tests (P0 subset) for fast feedback
+
+---
+
+**Checklist Complete**: Sign off when all items validated.
+
+**Completed by:** **\*\***\_\_\_**\*\***
+**Date:** **\*\***\_\_\_**\*\***
+**Epic:** **\*\***\_\_\_**\*\***
+**Notes:** \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
--- a/bmad/bmm/workflows/testarch/test-design/instructions.md
+++ b/bmad/bmm/workflows/testarch/test-design/instructions.md
@@ -0,0 +1,621 @@
+<!-- Powered by BMAD-CORE™ -->
+
+# Test Design and Risk Assessment
+
+**Workflow ID**: `bmad/bmm/testarch/test-design`
+**Version**: 4.0 (BMad v6)
+
+---
+
+## Overview
+
+Plans comprehensive test coverage strategy with risk assessment, priority classification, and execution ordering. This workflow generates a test design document that identifies high-risk areas, maps requirements to test levels, prioritizes scenarios (P0-P3), and provides resource estimates for the testing effort.
+
+---
+
+## Preflight Requirements
+
+**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
+
+- ✅ Story markdown with acceptance criteria available
+- ✅ PRD or epic documentation exists for context
+- ✅ Architecture documents available (optional but recommended)
+- ✅ Requirements are clear and testable
+
+---
+
+## Step 1: Load Context and Requirements
+
+### Actions
+
+1. **Read Requirements Documentation**
+   - Load PRD.md for high-level product requirements
+   - Read epics.md or specific epic for feature scope
+   - Read story markdown for detailed acceptance criteria
+   - Identify all testable requirements
+
+2. **Load Architecture Context**
+   - Read architecture.md for system design
+   - Read tech-spec for implementation details
+   - Identify technical constraints and dependencies
+   - Note integration points and external systems
+
+3. **Analyze Existing Test Coverage**
+   - Search for existing test files in `{test_dir}`
+   - Identify coverage gaps
+   - Note areas with insufficient testing
+   - Check for flaky or outdated tests
+
+4. **Load Knowledge Base Fragments**
+
+   **Critical:** Consult `{project-root}/bmad/bmm/testarch/tea-index.csv` to load:
+   - `risk-governance.md` - Risk classification framework (6 categories: TECH, SEC, PERF, DATA, BUS, OPS), automated scoring, gate decision engine, owner tracking (625 lines, 4 examples)
+   - `probability-impact.md` - Risk scoring methodology (probability × impact matrix, automated classification, dynamic re-assessment, gate integration, 604 lines, 4 examples)
+   - `test-levels-framework.md` - Test level selection guidance (E2E vs API vs Component vs Unit with decision matrix, characteristics, when to use each, 467 lines, 4 examples)
+   - `test-priorities-matrix.md` - P0-P3 prioritization criteria (automated priority calculation, risk-based mapping, tagging strategy, time budgets, 389 lines, 2 examples)
+
+**Halt Condition:** If story data or acceptance criteria are missing, check if brownfield exploration is needed. If neither requirements NOR exploration possible, HALT with message: "Test design requires clear requirements, acceptance criteria, or brownfield app URL for exploration"
+
+---
+
+## Step 1.5: Mode Selection (NEW - Phase 2.5)
+
+### Actions
+
+1. **Detect Planning Mode**
+
+   Determine mode based on context:
+
+   **Requirements-Based Mode (DEFAULT)**:
+   - Have clear story/PRD with acceptance criteria
+   - Uses: Existing workflow (Steps 2-4)
+   - Appropriate for: Documented features, greenfield projects
+
+   **Exploratory Mode (OPTIONAL - Brownfield)**:
+   - Missing/incomplete requirements AND brownfield application exists
+   - Uses: UI exploration to discover functionality
+   - Appropriate for: Undocumented brownfield apps, legacy systems
+
+2. **Requirements-Based Mode (DEFAULT - Skip to Step 2)**
+
+   If requirements are clear:
+   - Continue with existing workflow (Step 2: Assess and Classify Risks)
+   - Use loaded requirements from Step 1
+   - Proceed with risk assessment based on documented requirements
+
+3. **Exploratory Mode (OPTIONAL - Brownfield Apps)**
+
+   If exploring brownfield application:
+
+   **A. Check MCP Availability**
+
+   If config.tea_use_mcp_enhancements is true AND Playwright MCP tools available:
+   - Use MCP-assisted exploration (Step 3.B)
+
+   If MCP unavailable OR config.tea_use_mcp_enhancements is false:
+   - Use manual exploration fallback (Step 3.C)
+
+   **B. MCP-Assisted Exploration (If MCP Tools Available)**
+
+   Use Playwright MCP browser tools to explore UI:
+
+   **Setup:**
+
+   ```
+   1. Use planner_setup_page to initialize browser
+   2. Navigate to {exploration_url}
+   3. Capture initial state with browser_snapshot
+   ```
+
+   **Exploration Process:**
+
+   ```
+   4. Use browser_navigate to explore different pages
+   5. Use browser_click to interact with buttons, links, forms
+   6. Use browser_hover to reveal hidden menus/tooltips
+   7. Capture browser_snapshot at each significant state
+   8. Take browser_screenshot for documentation
+   9. Monitor browser_console_messages for JavaScript errors
+   10. Track browser_network_requests to identify API calls
+   11. Map user flows and interactive elements
+   12. Document discovered functionality
+   ```
+
+   **Discovery Documentation:**
+   - Create list of discovered features (pages, workflows, forms)
+   - Identify user journeys (navigation paths)
+   - Map API endpoints (from network requests)
+   - Note error states (from console messages)
+   - Capture screenshots for visual reference
+
+   **Convert to Test Scenarios:**
+   - Transform discoveries into testable requirements
+   - Prioritize based on user flow criticality
+   - Identify risks from discovered functionality
+   - Continue with Step 2 (Assess and Classify Risks) using discovered requirements
+
+   **C. Manual Exploration Fallback (If MCP Unavailable)**
+
+   If Playwright MCP is not available:
+
+   **Notify User:**
+
+   ```markdown
+   Exploratory mode enabled but Playwright MCP unavailable.
+
+   **Manual exploration required:**
+
+   1. Open application at: {exploration_url}
+   2. Explore all pages, workflows, and features
+   3. Document findings in markdown:
+      - List of pages/features discovered
+      - User journeys identified
+      - API endpoints observed (DevTools Network tab)
+      - JavaScript errors noted (DevTools Console)
+      - Critical workflows mapped
+
+   4. Provide exploration findings to continue workflow
+
+   **Alternative:** Disable exploratory_mode and provide requirements documentation
+   ```
+
+   Wait for user to provide exploration findings, then:
+   - Parse user-provided discovery documentation
+   - Convert to testable requirements
+   - Continue with Step 2 (risk assessment)
+
+4. **Proceed to Risk Assessment**
+
+   After mode selection (Requirements-Based OR Exploratory):
+   - Continue to Step 2: Assess and Classify Risks
+   - Use requirements from documentation (Requirements-Based) OR discoveries (Exploratory)
+
+---
+
+## Step 2: Assess and Classify Risks
+
+### Actions
+
+1. **Identify Genuine Risks**
+
+   Filter requirements to isolate actual risks (not just features):
+   - Unresolved technical gaps
+   - Security vulnerabilities
+   - Performance bottlenecks
+   - Data loss or corruption potential
+   - Business impact failures
+   - Operational deployment issues
+
+2. **Classify Risks by Category**
+
+   Use these standard risk categories:
+
+   **TECH** (Technical/Architecture):
+   - Architecture flaws
+   - Integration failures
+   - Scalability issues
+   - Technical debt
+
+   **SEC** (Security):
+   - Missing access controls
+   - Authentication bypass
+   - Data exposure
+   - Injection vulnerabilities
+
+   **PERF** (Performance):
+   - SLA violations
+   - Response time degradation
+   - Resource exhaustion
+   - Scalability limits
+
+   **DATA** (Data Integrity):
+   - Data loss
+   - Data corruption
+   - Inconsistent state
+   - Migration failures
+
+   **BUS** (Business Impact):
+   - User experience degradation
+   - Business logic errors
+   - Revenue impact
+   - Compliance violations
+
+   **OPS** (Operations):
+   - Deployment failures
+   - Configuration errors
+   - Monitoring gaps
+   - Rollback issues
+
+3. **Score Risk Probability**
+
+   Rate likelihood (1-3):
+   - **1 (Unlikely)**: <10% chance, edge case
+   - **2 (Possible)**: 10-50% chance, known scenario
+   - **3 (Likely)**: >50% chance, common occurrence
+
+4. **Score Risk Impact**
+
+   Rate severity (1-3):
+   - **1 (Minor)**: Cosmetic, workaround exists, limited users
+   - **2 (Degraded)**: Feature impaired, workaround difficult, affects many users
+   - **3 (Critical)**: System failure, data loss, no workaround, blocks usage
+
+5. **Calculate Risk Score**
+
+   ```
+   Risk Score = Probability × Impact
+
+   Scores:
+   1-2: Low risk (monitor)
+   3-4: Medium risk (plan mitigation)
+   6-9: High risk (immediate mitigation required)
+   ```
+
+6. **Highlight High-Priority Risks**
+
+   Flag all risks with score ≥6 for immediate attention.
+
+7. **Request Clarification**
+
+   If evidence is missing or assumptions required:
+   - Document assumptions clearly
+   - Request user clarification
+   - Do NOT speculate on business impact
+
+8. **Plan Mitigations**
+
+   For each high-priority risk:
+   - Define mitigation strategy
+   - Assign owner (dev, QA, ops)
+   - Set timeline
+   - Update residual risk expectation
+
+---
+
+## Step 3: Design Test Coverage
+
+### Actions
+
+1. **Break Down Acceptance Criteria**
+
+   Convert each acceptance criterion into atomic test scenarios:
+   - One scenario per testable behavior
+   - Scenarios are independent
+   - Scenarios are repeatable
+   - Scenarios tie back to risk mitigations
+
+2. **Select Appropriate Test Levels**
+
+   **Knowledge Base Reference**: `test-levels-framework.md`
+
+   Map requirements to optimal test levels (avoid duplication):
+
+   **E2E (End-to-End)**:
+   - Critical user journeys
+   - Multi-system integration
+   - Production-like environment
+   - Highest confidence, slowest execution
+
+   **API (Integration)**:
+   - Service contracts
+   - Business logic validation
+   - Fast feedback
+   - Good for complex scenarios
+
+   **Component**:
+   - UI component behavior
+   - Interaction testing
+   - Visual regression
+   - Fast, isolated
+
+   **Unit**:
+   - Business logic
+   - Edge cases
+   - Error handling
+   - Fastest, most granular
+
+   **Avoid duplicate coverage**: Don't test same behavior at multiple levels unless necessary.
+
+3. **Assign Priority Levels**
+
+   **Knowledge Base Reference**: `test-priorities-matrix.md`
+
+   **P0 (Critical)**:
+   - Blocks core user journey
+   - High-risk areas (score ≥6)
+   - Revenue-impacting
+   - Security-critical
+   - **Run on every commit**
+
+   **P1 (High)**:
+   - Important user features
+   - Medium-risk areas (score 3-4)
+   - Common workflows
+   - **Run on PR to main**
+
+   **P2 (Medium)**:
+   - Secondary features
+   - Low-risk areas (score 1-2)
+   - Edge cases
+   - **Run nightly or weekly**
+
+   **P3 (Low)**:
+   - Nice-to-have
+   - Exploratory
+   - Performance benchmarks
+   - **Run on-demand**
+
+4. **Outline Data and Tooling Prerequisites**
+
+   For each test scenario, identify:
+   - Test data requirements (factories, fixtures)
+   - External services (mocks, stubs)
+   - Environment setup
+   - Tools and dependencies
+
+5. **Define Execution Order**
+
+   Recommend test execution sequence:
+   1. **Smoke tests** (P0 subset, <5 min)
+   2. **P0 tests** (critical paths, <10 min)
+   3. **P1 tests** (important features, <30 min)
+   4. **P2/P3 tests** (full regression, <60 min)
+
+---
+
+## Step 4: Generate Deliverables
+
+### Actions
+
+1. **Create Risk Assessment Matrix**
+
+   Use template structure:
+
+   ```markdown
+   | Risk ID | Category | Description | Probability | Impact | Score | Mitigation      |
+   | ------- | -------- | ----------- | ----------- | ------ | ----- | --------------- |
+   | R-001   | SEC      | Auth bypass | 2           | 3      | 6     | Add authz check |
+   ```
+
+2. **Create Coverage Matrix**
+
+   ```markdown
+   | Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
+   | ----------- | ---------- | -------- | --------- | ---------- | ----- |
+   | Login flow  | E2E        | P0       | R-001     | 3          | QA    |
+   ```
+
+3. **Document Execution Order**
+
+   ```markdown
+   ### Smoke Tests (<5 min)
+
+   - Login successful
+   - Dashboard loads
+
+   ### P0 Tests (<10 min)
+
+   - [Full P0 list]
+
+   ### P1 Tests (<30 min)
+
+   - [Full P1 list]
+   ```
+
+4. **Include Resource Estimates**
+
+   ```markdown
+   ### Test Effort Estimates
+
+   - P0 scenarios: 15 tests × 2 hours = 30 hours
+   - P1 scenarios: 25 tests × 1 hour = 25 hours
+   - P2 scenarios: 40 tests × 0.5 hour = 20 hours
+   - **Total:** 75 hours (~10 days)
+   ```
+
+5. **Add Gate Criteria**
+
+   ```markdown
+   ### Quality Gate Criteria
+
+   - All P0 tests pass (100%)
+   - P1 tests pass rate ≥95%
+   - No high-risk (score ≥6) items unmitigated
+   - Test coverage ≥80% for critical paths
+   ```
+
+6. **Write to Output File**
+
+   Save to `{output_folder}/test-design-epic-{epic_num}.md` using template structure.
+
+---
+
+## Important Notes
+
+### Risk Category Definitions
+
+**TECH** (Technical/Architecture):
+
+- Architecture flaws or technical debt
+- Integration complexity
+- Scalability concerns
+
+**SEC** (Security):
+
+- Missing security controls
+- Authentication/authorization gaps
+- Data exposure risks
+
+**PERF** (Performance):
+
+- SLA risk or performance degradation
+- Resource constraints
+- Scalability bottlenecks
+
+**DATA** (Data Integrity):
+
+- Data loss or corruption potential
+- State consistency issues
+- Migration risks
+
+**BUS** (Business Impact):
+
+- User experience harm
+- Business logic errors
+- Revenue or compliance impact
+
+**OPS** (Operations):
+
+- Deployment or runtime failures
+- Configuration issues
+- Monitoring/observability gaps
+
+### Risk Scoring Methodology
+
+**Probability × Impact = Risk Score**
+
+Examples:
+
+- High likelihood (3) × Critical impact (3) = **Score 9** (highest priority)
+- Possible (2) × Critical (3) = **Score 6** (high priority threshold)
+- Unlikely (1) × Minor (1) = **Score 1** (low priority)
+
+**Threshold**: Scores ≥6 require immediate mitigation.
+
+### Test Level Selection Strategy
+
+**Avoid duplication:**
+
+- Don't test same behavior at E2E and API level
+- Use E2E for critical paths only
+- Use API tests for complex business logic
+- Use unit tests for edge cases
+
+**Tradeoffs:**
+
+- E2E: High confidence, slow execution, brittle
+- API: Good balance, fast, stable
+- Unit: Fastest feedback, narrow scope
+
+### Priority Assignment Guidelines
+
+**P0 criteria** (all must be true):
+
+- Blocks core functionality
+- High-risk (score ≥6)
+- No workaround exists
+- Affects majority of users
+
+**P1 criteria**:
+
+- Important feature
+- Medium risk (score 3-5)
+- Workaround exists but difficult
+
+**P2/P3**: Everything else, prioritized by value
+
+### Knowledge Base Integration
+
+**Core Fragments (Auto-loaded in Step 1):**
+
+- `risk-governance.md` - Risk classification (6 categories), automated scoring, gate decision engine, coverage traceability, owner tracking (625 lines, 4 examples)
+- `probability-impact.md` - Probability × impact matrix, automated classification thresholds, dynamic re-assessment, gate integration (604 lines, 4 examples)
+- `test-levels-framework.md` - E2E vs API vs Component vs Unit decision framework with characteristics matrix (467 lines, 4 examples)
+- `test-priorities-matrix.md` - P0-P3 automated priority calculation, risk-based mapping, tagging strategy, time budgets (389 lines, 2 examples)
+
+**Reference for Test Planning:**
+
+- `selective-testing.md` - Execution strategy: tag-based, spec filters, diff-based selection, promotion rules (727 lines, 4 examples)
+- `fixture-architecture.md` - Data setup patterns: pure function → fixture → mergeTests, auto-cleanup (406 lines, 5 examples)
+
+**Manual Reference (Optional):**
+
+- Use `tea-index.csv` to find additional specialized fragments as needed
+
+### Evidence-Based Assessment
+
+**Critical principle:** Base risk assessment on evidence, not speculation.
+
+**Evidence sources:**
+
+- PRD and user research
+- Architecture documentation
+- Historical bug data
+- User feedback
+- Security audit results
+
+**Avoid:**
+
+- Guessing business impact
+- Assuming user behavior
+- Inventing requirements
+
+**When uncertain:** Document assumptions and request clarification from user.
+
+---
+
+## Output Summary
+
+After completing this workflow, provide a summary:
+
+```markdown
+## Test Design Complete
+
+**Epic**: {epic_num}
+**Scope**: {design_level}
+
+**Risk Assessment**:
+
+- Total risks identified: {count}
+- High-priority risks (≥6): {high_count}
+- Categories: {categories}
+
+**Coverage Plan**:
+
+- P0 scenarios: {p0_count} ({p0_hours} hours)
+- P1 scenarios: {p1_count} ({p1_hours} hours)
+- P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours)
+- **Total effort**: {total_hours} hours (~{total_days} days)
+
+**Test Levels**:
+
+- E2E: {e2e_count}
+- API: {api_count}
+- Component: {component_count}
+- Unit: {unit_count}
+
+**Quality Gate Criteria**:
+
+- P0 pass rate: 100%
+- P1 pass rate: ≥95%
+- High-risk mitigations: 100%
+- Coverage: ≥80%
+
+**Output File**: {output_file}
+
+**Next Steps**:
+
+1. Review risk assessment with team
+2. Prioritize mitigation for high-risk items (score ≥6)
+3. Run `atdd` workflow to generate failing tests for P0 scenarios
+4. Allocate resources per effort estimates
+5. Set up test data factories and fixtures
+```
+
+---
+
+## Validation
+
+After completing all steps, verify:
+
+- [ ] Risk assessment complete with all categories
+- [ ] All risks scored (probability × impact)
+- [ ] High-priority risks (≥6) flagged
+- [ ] Coverage matrix maps requirements to test levels
+- [ ] Priority levels assigned (P0-P3)
+- [ ] Execution order defined
+- [ ] Resource estimates provided
+- [ ] Quality gate criteria defined
+- [ ] Output file created and formatted correctly
+
+Refer to `checklist.md` for comprehensive validation criteria.
--- a/bmad/bmm/workflows/testarch/test-design/test-design-template.md
+++ b/bmad/bmm/workflows/testarch/test-design/test-design-template.md
@@ -0,0 +1,285 @@
+# Test Design: Epic {epic_num} - {epic_title}
+
+**Date:** {date}
+**Author:** {user_name}
+**Status:** Draft / Approved
+
+---
+
+## Executive Summary
+
+**Scope:** {design_level} test design for Epic {epic_num}
+
+**Risk Summary:**
+
+- Total risks identified: {total_risks}
+- High-priority risks (≥6): {high_priority_count}
+- Critical categories: {top_categories}
+
+**Coverage Summary:**
+
+- P0 scenarios: {p0_count} ({p0_hours} hours)
+- P1 scenarios: {p1_count} ({p1_hours} hours)
+- P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours)
+- **Total effort**: {total_hours} hours (~{total_days} days)
+
+---
+
+## Risk Assessment
+
+### High-Priority Risks (Score ≥6)
+
+| Risk ID | Category | Description   | Probability | Impact | Score | Mitigation   | Owner   | Timeline |
+| ------- | -------- | ------------- | ----------- | ------ | ----- | ------------ | ------- | -------- |
+| R-001   | SEC      | {description} | 2           | 3      | 6     | {mitigation} | {owner} | {date}   |
+| R-002   | PERF     | {description} | 3           | 2      | 6     | {mitigation} | {owner} | {date}   |
+
+### Medium-Priority Risks (Score 3-4)
+
+| Risk ID | Category | Description   | Probability | Impact | Score | Mitigation   | Owner   |
+| ------- | -------- | ------------- | ----------- | ------ | ----- | ------------ | ------- |
+| R-003   | TECH     | {description} | 2           | 2      | 4     | {mitigation} | {owner} |
+| R-004   | DATA     | {description} | 1           | 3      | 3     | {mitigation} | {owner} |
+
+### Low-Priority Risks (Score 1-2)
+
+| Risk ID | Category | Description   | Probability | Impact | Score | Action  |
+| ------- | -------- | ------------- | ----------- | ------ | ----- | ------- |
+| R-005   | OPS      | {description} | 1           | 2      | 2     | Monitor |
+| R-006   | BUS      | {description} | 1           | 1      | 1     | Monitor |
+
+### Risk Category Legend
+
+- **TECH**: Technical/Architecture (flaws, integration, scalability)
+- **SEC**: Security (access controls, auth, data exposure)
+- **PERF**: Performance (SLA violations, degradation, resource limits)
+- **DATA**: Data Integrity (loss, corruption, inconsistency)
+- **BUS**: Business Impact (UX harm, logic errors, revenue)
+- **OPS**: Operations (deployment, config, monitoring)
+
+---
+
+## Test Coverage Plan
+
+### P0 (Critical) - Run on every commit
+
+**Criteria**: Blocks core journey + High risk (≥6) + No workaround
+
+| Requirement   | Test Level | Risk Link | Test Count | Owner | Notes   |
+| ------------- | ---------- | --------- | ---------- | ----- | ------- |
+| {requirement} | E2E        | R-001     | 3          | QA    | {notes} |
+| {requirement} | API        | R-002     | 5          | QA    | {notes} |
+
+**Total P0**: {p0_count} tests, {p0_hours} hours
+
+### P1 (High) - Run on PR to main
+
+**Criteria**: Important features + Medium risk (3-4) + Common workflows
+
+| Requirement   | Test Level | Risk Link | Test Count | Owner | Notes   |
+| ------------- | ---------- | --------- | ---------- | ----- | ------- |
+| {requirement} | API        | R-003     | 4          | QA    | {notes} |
+| {requirement} | Component  | -         | 6          | DEV   | {notes} |
+
+**Total P1**: {p1_count} tests, {p1_hours} hours
+
+### P2 (Medium) - Run nightly/weekly
+
+**Criteria**: Secondary features + Low risk (1-2) + Edge cases
+
+| Requirement   | Test Level | Risk Link | Test Count | Owner | Notes   |
+| ------------- | ---------- | --------- | ---------- | ----- | ------- |
+| {requirement} | API        | R-004     | 8          | QA    | {notes} |
+| {requirement} | Unit       | -         | 15         | DEV   | {notes} |
+
+**Total P2**: {p2_count} tests, {p2_hours} hours
+
+### P3 (Low) - Run on-demand
+
+**Criteria**: Nice-to-have + Exploratory + Performance benchmarks
+
+| Requirement   | Test Level | Test Count | Owner | Notes   |
+| ------------- | ---------- | ---------- | ----- | ------- |
+| {requirement} | E2E        | 2          | QA    | {notes} |
+| {requirement} | Unit       | 8          | DEV   | {notes} |
+
+**Total P3**: {p3_count} tests, {p3_hours} hours
+
+---
+
+## Execution Order
+
+### Smoke Tests (<5 min)
+
+**Purpose**: Fast feedback, catch build-breaking issues
+
+- [ ] {scenario} (30s)
+- [ ] {scenario} (45s)
+- [ ] {scenario} (1min)
+
+**Total**: {smoke_count} scenarios
+
+### P0 Tests (<10 min)
+
+**Purpose**: Critical path validation
+
+- [ ] {scenario} (E2E)
+- [ ] {scenario} (API)
+- [ ] {scenario} (API)
+
+**Total**: {p0_count} scenarios
+
+### P1 Tests (<30 min)
+
+**Purpose**: Important feature coverage
+
+- [ ] {scenario} (API)
+- [ ] {scenario} (Component)
+
+**Total**: {p1_count} scenarios
+
+### P2/P3 Tests (<60 min)
+
+**Purpose**: Full regression coverage
+
+- [ ] {scenario} (Unit)
+- [ ] {scenario} (API)
+
+**Total**: {p2p3_count} scenarios
+
+---
+
+## Resource Estimates
+
+### Test Development Effort
+
+| Priority  | Count             | Hours/Test | Total Hours       | Notes                   |
+| --------- | ----------------- | ---------- | ----------------- | ----------------------- |
+| P0        | {p0_count}        | 2.0        | {p0_hours}        | Complex setup, security |
+| P1        | {p1_count}        | 1.0        | {p1_hours}        | Standard coverage       |
+| P2        | {p2_count}        | 0.5        | {p2_hours}        | Simple scenarios        |
+| P3        | {p3_count}        | 0.25       | {p3_hours}        | Exploratory             |
+| **Total** | **{total_count}** | **-**      | **{total_hours}** | **~{total_days} days**  |
+
+### Prerequisites
+
+**Test Data:**
+
+- {factory_name} factory (faker-based, auto-cleanup)
+- {fixture_name} fixture (setup/teardown)
+
+**Tooling:**
+
+- {tool} for {purpose}
+- {tool} for {purpose}
+
+**Environment:**
+
+- {env_requirement}
+- {env_requirement}
+
+---
+
+## Quality Gate Criteria
+
+### Pass/Fail Thresholds
+
+- **P0 pass rate**: 100% (no exceptions)
+- **P1 pass rate**: ≥95% (waivers required for failures)
+- **P2/P3 pass rate**: ≥90% (informational)
+- **High-risk mitigations**: 100% complete or approved waivers
+
+### Coverage Targets
+
+- **Critical paths**: ≥80%
+- **Security scenarios**: 100%
+- **Business logic**: ≥70%
+- **Edge cases**: ≥50%
+
+### Non-Negotiable Requirements
+
+- [ ] All P0 tests pass
+- [ ] No high-risk (≥6) items unmitigated
+- [ ] Security tests (SEC category) pass 100%
+- [ ] Performance targets met (PERF category)
+
+---
+
+## Mitigation Plans
+
+### R-001: {Risk Description} (Score: 6)
+
+**Mitigation Strategy:** {detailed_mitigation}
+**Owner:** {owner}
+**Timeline:** {date}
+**Status:** Planned / In Progress / Complete
+**Verification:** {how_to_verify}
+
+### R-002: {Risk Description} (Score: 6)
+
+**Mitigation Strategy:** {detailed_mitigation}
+**Owner:** {owner}
+**Timeline:** {date}
+**Status:** Planned / In Progress / Complete
+**Verification:** {how_to_verify}
+
+---
+
+## Assumptions and Dependencies
+
+### Assumptions
+
+1. {assumption}
+2. {assumption}
+3. {assumption}
+
+### Dependencies
+
+1. {dependency} - Required by {date}
+2. {dependency} - Required by {date}
+
+### Risks to Plan
+
+- **Risk**: {risk_to_plan}
+  - **Impact**: {impact}
+  - **Contingency**: {contingency}
+
+---
+
+## Approval
+
+**Test Design Approved By:**
+
+- [ ] Product Manager: **\*\***\_\_\_**\*\*** Date: **\*\***\_\_\_**\*\***
+- [ ] Tech Lead: **\*\***\_\_\_**\*\*** Date: **\*\***\_\_\_**\*\***
+- [ ] QA Lead: **\*\***\_\_\_**\*\*** Date: **\*\***\_\_\_**\*\***
+
+**Comments:**
+
+---
+
+---
+
+---
+
+## Appendix
+
+### Knowledge Base References
+
+- `risk-governance.md` - Risk classification framework
+- `probability-impact.md` - Risk scoring methodology
+- `test-levels-framework.md` - Test level selection
+- `test-priorities-matrix.md` - P0-P3 prioritization
+
+### Related Documents
+
+- PRD: {prd_link}
+- Epic: {epic_link}
+- Architecture: {arch_link}
+- Tech Spec: {tech_spec_link}
+
+---
+
+**Generated by**: BMad TEA Agent - Test Architect Module
+**Workflow**: `bmad/bmm/testarch/test-design`
+**Version**: 4.0 (BMad v6)
--- a/bmad/bmm/workflows/testarch/test-design/workflow.yaml
+++ b/bmad/bmm/workflows/testarch/test-design/workflow.yaml
@@ -0,0 +1,52 @@
+# Test Architect workflow: test-design
+name: testarch-test-design
+description: "Plan risk mitigation and test coverage strategy before development with risk assessment and prioritization"
+author: "BMad"
+
+# Critical variables from config
+config_source: "{project-root}/bmad/bmm/config.yaml"
+output_folder: "{config_source}:output_folder"
+user_name: "{config_source}:user_name"
+communication_language: "{config_source}:communication_language"
+document_output_language: "{config_source}:document_output_language"
+date: system-generated
+
+# Workflow components
+installed_path: "{project-root}/bmad/bmm/workflows/testarch/test-design"
+instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/test-design-template.md"
+
+# Variables and inputs
+variables:
+  design_level: "full" # full, targeted, minimal - scope of design effort
+
+# Output configuration
+default_output_file: "{output_folder}/test-design-epic-{epic_num}.md"
+
+# Required tools
+required_tools:
+  - read_file # Read PRD, epics, stories, architecture docs
+  - write_file # Create test design document
+  - list_files # Find related documentation
+  - search_repo # Search for existing tests and patterns
+
+# Recommended inputs
+recommended_inputs:
+  - prd: "Product Requirements Document for context"
+  - epics: "Epic documentation (epics.md or specific epic)"
+  - story: "Story markdown with acceptance criteria"
+  - architecture: "Architecture documents (architecture.md, tech-spec)"
+  - existing_tests: "Current test coverage for gap analysis"
+
+tags:
+  - qa
+  - planning
+  - test-architect
+  - risk-assessment
+  - coverage
+
+execution_hints:
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
+  iterative: true
--- a/bmad/bmm/workflows/testarch/test-review/README.md
+++ b/bmad/bmm/workflows/testarch/test-review/README.md
@@ -0,0 +1,775 @@
+# Test Quality Review Workflow
+
+The Test Quality Review workflow performs comprehensive quality validation of test code using TEA's knowledge base of best practices. It detects flaky patterns, validates structure, and provides actionable feedback to improve test maintainability and reliability.
+
+## Overview
+
+This workflow reviews test quality against proven patterns from TEA's knowledge base including fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. It generates a quality score (0-100) with detailed feedback on violations and recommendations.
+
+**Key Features:**
+
+- **Knowledge-Based Review**: Applies patterns from 19+ knowledge fragments in tea-index.csv
+- **Quality Scoring**: 0-100 score with letter grade (A+ to F) based on violations
+- **Multi-Scope Review**: Single file, directory, or entire test suite
+- **Pattern Detection**: Identifies hard waits, race conditions, shared state, conditionals
+- **Best Practice Validation**: BDD format, test IDs, priorities, assertions, test length
+- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
+- **Code Examples**: Every issue includes recommended fix with code snippets
+- **Integration**: Works with story files, test-design, acceptance criteria context
+
+---
+
+## Usage
+
+```bash
+bmad tea *test-review
+```
+
+The TEA agent runs this workflow when:
+
+- After `*atdd` workflow → validate generated acceptance tests
+- After `*automate` workflow → ensure regression suite quality
+- After developer writes tests → provide quality feedback
+- Before `*gate` workflow → confirm test quality before release
+- User explicitly requests review: `bmad tea *test-review`
+- Periodic quality audits of existing test suite
+
+**Typical workflow sequence:**
+
+1. `*atdd` → Generate failing acceptance tests
+2. **`*test-review`** → Validate test quality ⬅️ YOU ARE HERE (option 1)
+3. `*dev story` → Implement feature with tests passing
+4. **`*test-review`** → Review implementation tests ⬅️ YOU ARE HERE (option 2)
+5. `*automate` → Expand regression suite
+6. **`*test-review`** → Validate new regression tests ⬅️ YOU ARE HERE (option 3)
+7. `*gate` → Final quality gate decision
+
+---
+
+## Inputs
+
+### Required Context Files
+
+- **Test File(s)**: One or more test files to review (auto-discovered or explicitly provided)
+- **Test Framework Config**: playwright.config.ts, jest.config.js, etc. (for context)
+
+### Recommended Context Files
+
+- **Story File**: Acceptance criteria for context (e.g., `story-1.3.md`)
+- **Test Design**: Priority context (P0/P1/P2/P3) from test-design.md
+- **Knowledge Base**: tea-index.csv with best practice fragments (required for thorough review)
+
+### Workflow Variables
+
+Key variables that control review behavior (configured in `workflow.yaml`):
+
+- **review_scope**: `single` | `directory` | `suite` (default: `single`)
+  - `single`: Review one test file
+  - `directory`: Review all tests in a directory
+  - `suite`: Review entire test suite
+
+- **quality_score_enabled**: Enable 0-100 quality scoring (default: `true`)
+- **append_to_file**: Add inline comments to test files (default: `false`)
+- **check_against_knowledge**: Use tea-index.csv fragments (default: `true`)
+- **strict_mode**: Fail on any violation vs advisory only (default: `false`)
+
+**Quality Criteria Flags** (all default to `true`):
+
+- `check_given_when_then`: BDD format validation
+- `check_test_ids`: Test ID conventions
+- `check_priority_markers`: P0/P1/P2/P3 classification
+- `check_hard_waits`: Detect sleep(), wait(X)
+- `check_determinism`: No conditionals/try-catch abuse
+- `check_isolation`: Tests clean up, no shared state
+- `check_fixture_patterns`: Pure function → Fixture → mergeTests
+- `check_data_factories`: Factory usage vs hardcoded data
+- `check_network_first`: Route intercept before navigate
+- `check_assertions`: Explicit assertions present
+- `check_test_length`: Warn if >300 lines
+- `check_test_duration`: Warn if >1.5 min
+- `check_flakiness_patterns`: Common flaky patterns
+
+---
+
+## Outputs
+
+### Primary Deliverable
+
+**Test Quality Review Report** (`test-review-{filename}.md`):
+
+- **Executive Summary**: Overall assessment, key strengths/weaknesses, recommendation
+- **Quality Score**: 0-100 score with letter grade (A+ to F)
+- **Quality Criteria Assessment**: Table with all criteria evaluated (PASS/WARN/FAIL)
+- **Critical Issues**: P0/P1 violations that must be fixed
+- **Recommendations**: P2/P3 violations that should be fixed
+- **Best Practices Examples**: Good patterns found in tests
+- **Knowledge Base References**: Links to detailed guidance
+
+Each issue includes:
+
+- Code location (file:line)
+- Explanation of problem
+- Recommended fix with code example
+- Knowledge base fragment reference
+
+### Secondary Outputs
+
+- **Inline Comments**: TODO comments in test files at violation locations (if enabled)
+- **Quality Badge**: Badge with score (e.g., "Test Quality: 87/100 (A)")
+- **Story Update**: Test quality section appended to story file (if enabled)
+
+### Validation Safeguards
+
+- ✅ All knowledge base fragments loaded successfully
+- ✅ Test files parsed and structure analyzed
+- ✅ All enabled quality criteria evaluated
+- ✅ Violations categorized by severity (P0/P1/P2/P3)
+- ✅ Quality score calculated with breakdown
+- ✅ Actionable feedback with code examples provided
+
+---
+
+## Quality Criteria Explained
+
+### 1. BDD Format (Given-When-Then)
+
+**PASS**: Tests use clear Given-When-Then structure
+
+```typescript
+// Given: User is logged in
+const user = await createTestUser();
+await loginPage.login(user.email, user.password);
+
+// When: User navigates to dashboard
+await page.goto('/dashboard');
+
+// Then: User sees welcome message
+await expect(page.locator('[data-testid="welcome"]')).toContainText(user.name);
+```
+
+**FAIL**: Tests lack structure, hard to understand intent
+
+```typescript
+await page.goto('/dashboard');
+await page.click('.button');
+await expect(page.locator('.text')).toBeVisible();
+```
+
+**Knowledge**: test-quality.md, tdd-cycles.md
+
+---
+
+### 2. Test IDs
+
+**PASS**: All tests have IDs following convention
+
+```typescript
+test.describe('1.3-E2E-001: User Login Flow', () => {
+  test('should log in successfully with valid credentials', async ({ page }) => {
+    // Test implementation
+  });
+});
+```
+
+**FAIL**: No test IDs, can't trace to requirements
+
+```typescript
+test.describe('Login', () => {
+  test('login works', async ({ page }) => {
+    // Test implementation
+  });
+});
+```
+
+**Knowledge**: traceability.md, test-quality.md
+
+---
+
+### 3. Priority Markers
+
+**PASS**: Tests classified as P0/P1/P2/P3
+
+```typescript
+test.describe('P0: Critical User Journey - Checkout', () => {
+  // Critical tests
+});
+
+test.describe('P2: Edge Case - International Addresses', () => {
+  // Nice-to-have tests
+});
+```
+
+**Knowledge**: test-priorities.md, risk-governance.md
+
+---
+
+### 4. No Hard Waits
+
+**PASS**: No sleep(), wait(), hardcoded delays
+
+```typescript
+// ✅ Good: Explicit wait for condition
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
+```
+
+**FAIL**: Hard waits introduce flakiness
+
+```typescript
+// ❌ Bad: Hard wait
+await page.waitForTimeout(2000);
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
+```
+
+**Knowledge**: test-quality.md, network-first.md
+
+---
+
+### 5. Determinism
+
+**PASS**: Tests work deterministically, no conditionals
+
+```typescript
+// ✅ Good: Deterministic test
+await expect(page.locator('[data-testid="status"]')).toHaveText('Active');
+```
+
+**FAIL**: Conditionals make tests unpredictable
+
+```typescript
+// ❌ Bad: Conditional logic
+const status = await page.locator('[data-testid="status"]').textContent();
+if (status === 'Active') {
+  await page.click('[data-testid="deactivate"]');
+} else {
+  await page.click('[data-testid="activate"]');
+}
+```
+
+**Knowledge**: test-quality.md, data-factories.md
+
+---
+
+### 6. Isolation
+
+**PASS**: Tests clean up, no shared state
+
+```typescript
+test.afterEach(async ({ page, testUser }) => {
+  // Cleanup: Delete test user
+  await api.deleteUser(testUser.id);
+});
+```
+
+**FAIL**: Shared state, tests depend on order
+
+```typescript
+// ❌ Bad: Shared global variable
+let userId: string;
+
+test('create user', async () => {
+  userId = await createUser(); // Sets global
+});
+
+test('update user', async () => {
+  await updateUser(userId); // Depends on previous test
+});
+```
+
+**Knowledge**: test-quality.md, data-factories.md
+
+---
+
+### 7. Fixture Patterns
+
+**PASS**: Pure function → Fixture → mergeTests
+
+```typescript
+// ✅ Good: Pure function fixture
+const createAuthenticatedPage = async (page: Page, user: User) => {
+  await loginPage.login(user.email, user.password);
+  return page;
+};
+
+const test = base.extend({
+  authenticatedPage: async ({ page }, use) => {
+    const user = createTestUser();
+    const authedPage = await createAuthenticatedPage(page, user);
+    await use(authedPage);
+  },
+});
+```
+
+**FAIL**: No fixtures, repeated setup
+
+```typescript
+// ❌ Bad: Repeated setup in every test
+test('test 1', async ({ page }) => {
+  await page.goto('/login');
+  await page.fill('[name="email"]', 'test@example.com');
+  await page.fill('[name="password"]', 'password123');
+  await page.click('[type="submit"]');
+  // Test logic
+});
+```
+
+**Knowledge**: fixture-architecture.md
+
+---
+
+### 8. Data Factories
+
+**PASS**: Factory functions with overrides
+
+```typescript
+// ✅ Good: Factory function
+import { createTestUser } from './factories/user-factory';
+
+test('user can update profile', async ({ page }) => {
+  const user = createTestUser({ role: 'admin' });
+  await api.createUser(user); // API-first setup
+  // Test UI interaction
+});
+```
+
+**FAIL**: Hardcoded test data
+
+```typescript
+// ❌ Bad: Magic strings
+await page.fill('[name="email"]', 'test@example.com');
+await page.fill('[name="phone"]', '555-1234');
+```
+
+**Knowledge**: data-factories.md
+
+---
+
+### 9. Network-First Pattern
+
+**PASS**: Route intercept before navigate
+
+```typescript
+// ✅ Good: Intercept before navigation
+await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers }));
+await page.goto('/users'); // Navigate after route setup
+```
+
+**FAIL**: Race condition risk
+
+```typescript
+// ❌ Bad: Navigate before intercept
+await page.goto('/users');
+await page.route('**/api/users', (route) => route.fulfill({ json: mockUsers })); // Too late!
+```
+
+**Knowledge**: network-first.md
+
+---
+
+### 10. Explicit Assertions
+
+**PASS**: Clear, specific assertions
+
+```typescript
+await expect(page.locator('[data-testid="username"]')).toHaveText('John Doe');
+await expect(page.locator('[data-testid="status"]')).toHaveClass(/active/);
+```
+
+**FAIL**: Missing or vague assertions
+
+```typescript
+await page.locator('[data-testid="username"]').isVisible(); // No assertion!
+```
+
+**Knowledge**: test-quality.md
+
+---
+
+### 11. Test Length
+
+**PASS**: ≤300 lines per file (ideal: ≤200)
+**WARN**: 301-500 lines (consider splitting)
+**FAIL**: >500 lines (too large)
+
+**Knowledge**: test-quality.md
+
+---
+
+### 12. Test Duration
+
+**PASS**: ≤1.5 minutes per test (target: <30 seconds)
+**WARN**: 1.5-3 minutes (consider optimization)
+**FAIL**: >3 minutes (too slow)
+
+**Knowledge**: test-quality.md, selective-testing.md
+
+---
+
+### 13. Flakiness Patterns
+
+Common flaky patterns detected:
+
+- Tight timeouts (e.g., `{ timeout: 1000 }`)
+- Race conditions (navigation before route interception)
+- Timing-dependent assertions
+- Retry logic hiding flakiness
+- Environment-dependent assumptions
+
+**Knowledge**: test-quality.md, network-first.md, ci-burn-in.md
+
+---
+
+## Quality Scoring
+
+### Score Calculation
+
+```
+Starting Score: 100
+
+Deductions:
+- Critical Violations (P0): -10 points each
+- High Violations (P1): -5 points each
+- Medium Violations (P2): -2 points each
+- Low Violations (P3): -1 point each
+
+Bonus Points (max +30):
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern consistently used: +5
+ Perfect isolation (all tests clean up): +5
+ All test IDs present and correct: +5
+
+Final Score: max(0, min(100, Starting Score - Violations + Bonus))
+```
+
+### Quality Grades
+
+- **90-100** (A+): Excellent - Production-ready, best practices followed
+- **80-89** (A): Good - Minor improvements recommended
+- **70-79** (B): Acceptable - Some issues to address
+- **60-69** (C): Needs Improvement - Several issues detected
+- **<60** (F): Critical Issues - Significant problems, not production-ready
+
+---
+
+## Example Scenarios
+
+### Scenario 1: Excellent Quality (Score: 95)
+
+```markdown
+# Test Quality Review: checkout-flow.spec.ts
+
+**Quality Score**: 95/100 (A+ - Excellent)
+**Recommendation**: Approve - Production Ready
+
+## Executive Summary
+
+Excellent test quality with comprehensive coverage and best practices throughout.
+Tests demonstrate expert-level patterns including fixture architecture, data
+factories, network-first approach, and perfect isolation.
+
+**Strengths:**
+✅ Clear Given-When-Then structure in all tests
+✅ Comprehensive fixtures for authenticated states
+✅ Data factories with faker.js for realistic test data
+✅ Network-first pattern prevents race conditions
+✅ Perfect test isolation with cleanup
+✅ All test IDs present (1.2-E2E-001 through 1.2-E2E-005)
+
+**Minor Recommendations:**
+⚠️ One test slightly verbose (245 lines) - consider extracting helper function
+
+**Recommendation**: Approve without changes. Use as reference for other tests.
+```
+
+---
+
+### Scenario 2: Good Quality (Score: 82)
+
+```markdown
+# Test Quality Review: user-profile.spec.ts
+
+**Quality Score**: 82/100 (A - Good)
+**Recommendation**: Approve with Comments
+
+## Executive Summary
+
+Solid test quality with good structure and coverage. A few improvements would
+enhance maintainability and reduce flakiness risk.
+
+**Strengths:**
+✅ Good BDD structure
+✅ Test IDs present
+✅ Explicit assertions
+
+**Issues to Address:**
+⚠️ 2 hard waits detected (lines 34, 67) - use explicit waits instead
+⚠️ Hardcoded test data (line 23) - use factory functions
+⚠️ Missing cleanup in one test (line 89) - add afterEach hook
+
+**Recommendation**: Address hard waits before merging. Other improvements
+can be addressed in follow-up PR.
+```
+
+---
+
+### Scenario 3: Needs Improvement (Score: 68)
+
+```markdown
+# Test Quality Review: legacy-report.spec.ts
+
+**Quality Score**: 68/100 (C - Needs Improvement)
+**Recommendation**: Request Changes
+
+## Executive Summary
+
+Test has several quality issues that should be addressed before merging.
+Primarily concerns around flakiness risk and maintainability.
+
+**Critical Issues:**
+❌ 5 hard waits detected (flakiness risk)
+❌ Race condition: navigation before route interception (line 45)
+❌ Shared global state between tests (line 12)
+❌ Missing test IDs (can't trace to requirements)
+
+**Recommendations:**
+⚠️ Test file is 487 lines - consider splitting
+⚠️ Hardcoded data throughout - use factories
+⚠️ Missing cleanup in afterEach
+
+**Recommendation**: Address all critical issues (❌) before re-review.
+Significant refactoring needed.
+```
+
+---
+
+### Scenario 4: Critical Issues (Score: 42)
+
+```markdown
+# Test Quality Review: data-export.spec.ts
+
+**Quality Score**: 42/100 (F - Critical Issues)
+**Recommendation**: Block - Not Production Ready
+
+## Executive Summary
+
+CRITICAL: Test has severe quality issues that make it unsuitable for
+production. Significant refactoring required.
+
+**Critical Issues:**
+❌ 12 hard waits (page.waitForTimeout) throughout
+❌ No test IDs or structure
+❌ Try/catch blocks swallowing errors (lines 23, 45, 67, 89)
+❌ No cleanup - tests leave data in database
+❌ Conditional logic (if/else) throughout tests
+❌ No assertions in 3 tests (tests do nothing!)
+❌ 687 lines - far too large
+❌ Multiple race conditions
+❌ Hardcoded credentials in plain text (SECURITY ISSUE)
+
+**Recommendation**: BLOCK MERGE. Complete rewrite recommended following
+TEA knowledge base patterns. Suggest pairing session with QA engineer.
+```
+
+---
+
+## Integration with Other Workflows
+
+### Before Test Review
+
+1. **atdd** - Generates acceptance tests → TEA reviews for quality
+2. **dev story** - Developer implements tests → TEA provides feedback
+3. **automate** - Expands regression suite → TEA validates new tests
+
+### After Test Review
+
+1. **Developer** - Addresses critical issues, improves based on recommendations
+2. **gate** - Test quality feeds into release decision (high-quality tests increase confidence)
+
+### Coordinates With
+
+- **Story File**: Review links to acceptance criteria for context
+- **Test Design**: Review validates tests align with P0/P1/P2/P3 prioritization
+- **Knowledge Base**: All feedback references tea-index.csv fragments
+
+---
+
+## Review Scopes
+
+### Single File Review
+
+```bash
+# Review specific test file
+bmad tea *test-review
+# Provide test_file_path when prompted: tests/auth/login.spec.ts
+```
+
+**Use When:**
+
+- Reviewing tests just written
+- PR review of specific test file
+- Debugging flaky test
+- Learning test quality patterns
+
+---
+
+### Directory Review
+
+```bash
+# Review all tests in directory
+bmad tea *test-review
+# Provide review_scope: directory
+# Provide test_dir: tests/auth/
+```
+
+**Use When:**
+
+- Feature branch has multiple test files
+- Reviewing entire feature test suite
+- Auditing test quality for module
+
+---
+
+### Suite Review
+
+```bash
+# Review entire test suite
+bmad tea *test-review
+# Provide review_scope: suite
+```
+
+**Use When:**
+
+- Periodic quality audit (monthly/quarterly)
+- Before major release
+- Identifying patterns across codebase
+- Establishing quality baseline
+
+---
+
+## Configuration Examples
+
+### Strict Review (Fail on Violations)
+
+```yaml
+review_scope: 'single'
+quality_score_enabled: true
+strict_mode: true # Fail if score <70
+check_against_knowledge: true
+# All check_* flags: true
+```
+
+Use for: PR gates, production releases
+
+---
+
+### Balanced Review (Advisory)
+
+```yaml
+review_scope: 'single'
+quality_score_enabled: true
+strict_mode: false # Advisory only
+check_against_knowledge: true
+# All check_* flags: true
+```
+
+Use for: Most development workflows (default)
+
+---
+
+### Focused Review (Specific Criteria)
+
+```yaml
+review_scope: 'single'
+check_hard_waits: true
+check_flakiness_patterns: true
+check_network_first: true
+# Other checks: false
+```
+
+Use for: Debugging flaky tests, targeted improvements
+
+---
+
+## Important Notes
+
+1. **Non-Prescriptive**: Review provides guidance, not rigid rules
+2. **Context Matters**: Some violations may be justified (document with comments)
+3. **Knowledge-Based**: All feedback grounded in proven patterns
+4. **Actionable**: Every issue includes recommended fix with code example
+5. **Quality Score**: Use as indicator, not absolute measure
+6. **Continuous Improvement**: Review tests periodically as patterns evolve
+7. **Learning Tool**: Use reviews to learn best practices, not just find bugs
+
+---
+
+## Knowledge Base References
+
+This workflow automatically consults:
+
+- **test-quality.md** - Definition of Done (no hard waits, <300 lines, <1.5 min, self-cleaning)
+- **fixture-architecture.md** - Pure function → Fixture → mergeTests pattern
+- **network-first.md** - Route intercept before navigate (race condition prevention)
+- **data-factories.md** - Factory functions with overrides, API-first setup
+- **test-levels-framework.md** - E2E vs API vs Component vs Unit appropriateness
+- **playwright-config.md** - Environment-based configuration patterns
+- **tdd-cycles.md** - Red-Green-Refactor patterns
+- **selective-testing.md** - Duplicate coverage detection
+- **ci-burn-in.md** - Flakiness detection patterns
+- **test-priorities.md** - P0/P1/P2/P3 classification framework
+- **traceability.md** - Requirements-to-tests mapping
+
+See `tea-index.csv` for complete knowledge fragment mapping.
+
+---
+
+## Troubleshooting
+
+### Problem: Quality score seems too low
+
+**Solution:**
+
+- Review violation breakdown - focus on critical issues first
+- Consider project context - some patterns may be justified
+- Check if criteria are appropriate for project type
+- Score is indicator, not absolute - focus on actionable feedback
+
+---
+
+### Problem: No test files found
+
+**Solution:**
+
+- Verify test_dir path is correct
+- Check test file extensions (_.spec.ts, _.test.js, etc.)
+- Use glob pattern to discover: `tests/**/*.spec.ts`
+
+---
+
+### Problem: Knowledge fragments not loading
+
+**Solution:**
+
+- Verify tea-index.csv exists in testarch/ directory
+- Check fragment file paths are correct in tea-index.csv
+- Ensure auto_load_knowledge: true in workflow variables
+
+---
+
+### Problem: Too many false positives
+
+**Solution:**
+
+- Add justification comments in code for legitimate violations
+- Adjust check\_\* flags to disable specific criteria
+- Use strict_mode: false for advisory-only feedback
+- Context matters - document why pattern is appropriate
+
+---
+
+## Related Commands
+
+- `bmad tea *atdd` - Generate acceptance tests (review after generation)
+- `bmad tea *automate` - Expand regression suite (review new tests)
+- `bmad tea *gate` - Quality gate decision (test quality feeds into decision)
+- `bmad dev story` - Implement story (review tests after implementation)
--- a/bmad/bmm/workflows/testarch/test-review/checklist.md
+++ b/bmad/bmm/workflows/testarch/test-review/checklist.md
@@ -0,0 +1,470 @@
+# Test Quality Review - Validation Checklist
+
+Use this checklist to validate that the test quality review workflow completed successfully and all quality criteria were properly evaluated.
+
+---
+
+## Prerequisites
+
+### Test File Discovery
+
+- [ ] Test file(s) identified for review (single/directory/suite scope)
+- [ ] Test files exist and are readable
+- [ ] Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
+- [ ] Test framework configuration found (playwright.config.ts, jest.config.js, etc.)
+
+### Knowledge Base Loading
+
+- [ ] tea-index.csv loaded successfully
+- [ ] `test-quality.md` loaded (Definition of Done)
+- [ ] `fixture-architecture.md` loaded (Pure function → Fixture patterns)
+- [ ] `network-first.md` loaded (Route intercept before navigate)
+- [ ] `data-factories.md` loaded (Factory patterns)
+- [ ] `test-levels-framework.md` loaded (E2E vs API vs Component vs Unit)
+- [ ] All other enabled fragments loaded successfully
+
+### Context Gathering
+
+- [ ] Story file discovered or explicitly provided (if available)
+- [ ] Test design document discovered or explicitly provided (if available)
+- [ ] Acceptance criteria extracted from story (if available)
+- [ ] Priority context (P0/P1/P2/P3) extracted from test-design (if available)
+
+---
+
+## Process Steps
+
+### Step 1: Context Loading
+
+- [ ] Review scope determined (single/directory/suite)
+- [ ] Test file paths collected
+- [ ] Related artifacts discovered (story, test-design)
+- [ ] Knowledge base fragments loaded successfully
+- [ ] Quality criteria flags read from workflow variables
+
+### Step 2: Test File Parsing
+
+**For Each Test File:**
+
+- [ ] File read successfully
+- [ ] File size measured (lines, KB)
+- [ ] File structure parsed (describe blocks, it blocks)
+- [ ] Test IDs extracted (if present)
+- [ ] Priority markers extracted (if present)
+- [ ] Imports analyzed
+- [ ] Dependencies identified
+
+**Test Structure Analysis:**
+
+- [ ] Describe block count calculated
+- [ ] It/test block count calculated
+- [ ] BDD structure identified (Given-When-Then)
+- [ ] Fixture usage detected
+- [ ] Data factory usage detected
+- [ ] Network interception patterns identified
+- [ ] Assertions counted
+- [ ] Waits and timeouts cataloged
+- [ ] Conditionals (if/else) detected
+- [ ] Try/catch blocks detected
+- [ ] Shared state or globals detected
+
+### Step 3: Quality Criteria Validation
+
+**For Each Enabled Criterion:**
+
+#### BDD Format (if `check_given_when_then: true`)
+
+- [ ] Given-When-Then structure evaluated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with line numbers
+- [ ] Examples of good/bad patterns noted
+
+#### Test IDs (if `check_test_ids: true`)
+
+- [ ] Test ID presence validated
+- [ ] Test ID format checked (e.g., 1.3-E2E-001)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Missing IDs cataloged
+
+#### Priority Markers (if `check_priority_markers: true`)
+
+- [ ] P0/P1/P2/P3 classification validated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Missing priorities cataloged
+
+#### Hard Waits (if `check_hard_waits: true`)
+
+- [ ] sleep(), waitForTimeout(), hardcoded delays detected
+- [ ] Justification comments checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with line numbers and recommended fixes
+
+#### Determinism (if `check_determinism: true`)
+
+- [ ] Conditionals (if/else/switch) detected
+- [ ] Try/catch abuse detected
+- [ ] Random values (Math.random, Date.now) detected
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Isolation (if `check_isolation: true`)
+
+- [ ] Cleanup hooks (afterEach/afterAll) validated
+- [ ] Shared state detected
+- [ ] Global variable mutations detected
+- [ ] Resource cleanup verified
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Fixture Patterns (if `check_fixture_patterns: true`)
+
+- [ ] Fixtures detected (test.extend)
+- [ ] Pure functions validated
+- [ ] mergeTests usage checked
+- [ ] beforeEach complexity analyzed
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Data Factories (if `check_data_factories: true`)
+
+- [ ] Factory functions detected
+- [ ] Hardcoded data (magic strings/numbers) detected
+- [ ] Faker.js or similar usage validated
+- [ ] API-first setup pattern checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Network-First (if `check_network_first: true`)
+
+- [ ] page.route() before page.goto() validated
+- [ ] Race conditions detected (route after navigate)
+- [ ] waitForResponse patterns checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Assertions (if `check_assertions: true`)
+
+- [ ] Explicit assertions counted
+- [ ] Implicit waits without assertions detected
+- [ ] Assertion specificity validated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Test Length (if `check_test_length: true`)
+
+- [ ] File line count calculated
+- [ ] Threshold comparison (≤300 lines ideal)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Splitting recommendations generated (if >300 lines)
+
+#### Test Duration (if `check_test_duration: true`)
+
+- [ ] Test complexity analyzed (as proxy for duration if no execution data)
+- [ ] Threshold comparison (≤1.5 min target)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Optimization recommendations generated
+
+#### Flakiness Patterns (if `check_flakiness_patterns: true`)
+
+- [ ] Tight timeouts detected (e.g., { timeout: 1000 })
+- [ ] Race conditions detected
+- [ ] Timing-dependent assertions detected
+- [ ] Retry logic detected
+- [ ] Environment-dependent assumptions detected
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+---
+
+### Step 4: Quality Score Calculation
+
+**Violation Counting:**
+
+- [ ] Critical (P0) violations counted
+- [ ] High (P1) violations counted
+- [ ] Medium (P2) violations counted
+- [ ] Low (P3) violations counted
+- [ ] Violation breakdown by criterion recorded
+
+**Score Calculation:**
+
+- [ ] Starting score: 100
+- [ ] Critical violations deducted (-10 each)
+- [ ] High violations deducted (-5 each)
+- [ ] Medium violations deducted (-2 each)
+- [ ] Low violations deducted (-1 each)
+- [ ] Bonus points added (max +30):
+  - [ ] Excellent BDD structure (+5 if applicable)
+  - [ ] Comprehensive fixtures (+5 if applicable)
+  - [ ] Comprehensive data factories (+5 if applicable)
+  - [ ] Network-first pattern (+5 if applicable)
+  - [ ] Perfect isolation (+5 if applicable)
+  - [ ] All test IDs present (+5 if applicable)
+- [ ] Final score calculated: max(0, min(100, Starting - Violations + Bonus))
+
+**Quality Grade:**
+
+- [ ] Grade assigned based on score:
+  - 90-100: A+ (Excellent)
+  - 80-89: A (Good)
+  - 70-79: B (Acceptable)
+  - 60-69: C (Needs Improvement)
+  - <60: F (Critical Issues)
+
+---
+
+### Step 5: Review Report Generation
+
+**Report Sections Created:**
+
+- [ ] **Header Section**:
+  - [ ] Test file(s) reviewed listed
+  - [ ] Review date recorded
+  - [ ] Review scope noted (single/directory/suite)
+  - [ ] Quality score and grade displayed
+
+- [ ] **Executive Summary**:
+  - [ ] Overall assessment (Excellent/Good/Needs Improvement/Critical)
+  - [ ] Key strengths listed (3-5 bullet points)
+  - [ ] Key weaknesses listed (3-5 bullet points)
+  - [ ] Recommendation stated (Approve/Approve with comments/Request changes/Block)
+
+- [ ] **Quality Criteria Assessment**:
+  - [ ] Table with all criteria evaluated
+  - [ ] Status for each criterion (PASS/WARN/FAIL)
+  - [ ] Violation count per criterion
+
+- [ ] **Critical Issues (Must Fix)**:
+  - [ ] P0/P1 violations listed
+  - [ ] Code location provided for each (file:line)
+  - [ ] Issue explanation clear
+  - [ ] Recommended fix provided with code example
+  - [ ] Knowledge base reference provided
+
+- [ ] **Recommendations (Should Fix)**:
+  - [ ] P2/P3 violations listed
+  - [ ] Code location provided for each (file:line)
+  - [ ] Issue explanation clear
+  - [ ] Recommended improvement provided with code example
+  - [ ] Knowledge base reference provided
+
+- [ ] **Best Practices Examples** (if good patterns found):
+  - [ ] Good patterns highlighted from tests
+  - [ ] Knowledge base fragments referenced
+  - [ ] Examples provided for others to follow
+
+- [ ] **Knowledge Base References**:
+  - [ ] All fragments consulted listed
+  - [ ] Links to detailed guidance provided
+
+---
+
+### Step 6: Optional Outputs Generation
+
+**Inline Comments** (if `generate_inline_comments: true`):
+
+- [ ] Inline comments generated at violation locations
+- [ ] Comment format: `// TODO (TEA Review): [Issue] - See test-review-{filename}.md`
+- [ ] Comments added to test files (no logic changes)
+- [ ] Test files remain valid and executable
+
+**Quality Badge** (if `generate_quality_badge: true`):
+
+- [ ] Badge created with quality score (e.g., "Test Quality: 87/100 (A)")
+- [ ] Badge format suitable for README or documentation
+- [ ] Badge saved to output folder
+
+**Story Update** (if `append_to_story: true` and story file exists):
+
+- [ ] "Test Quality Review" section created
+- [ ] Quality score included
+- [ ] Critical issues summarized
+- [ ] Link to full review report provided
+- [ ] Story file updated successfully
+
+---
+
+### Step 7: Save and Notify
+
+**Outputs Saved:**
+
+- [ ] Review report saved to `{output_file}`
+- [ ] Inline comments written to test files (if enabled)
+- [ ] Quality badge saved (if enabled)
+- [ ] Story file updated (if enabled)
+- [ ] All outputs are valid and readable
+
+**Summary Message Generated:**
+
+- [ ] Quality score and grade included
+- [ ] Critical issue count stated
+- [ ] Recommendation provided (Approve/Request changes/Block)
+- [ ] Next steps clarified
+- [ ] Message displayed to user
+
+---
+
+## Output Validation
+
+### Review Report Completeness
+
+- [ ] All required sections present
+- [ ] No placeholder text or TODOs in report
+- [ ] All code locations are accurate (file:line)
+- [ ] All code examples are valid and demonstrate fix
+- [ ] All knowledge base references are correct
+
+### Review Report Accuracy
+
+- [ ] Quality score matches violation breakdown
+- [ ] Grade matches score range
+- [ ] Violations correctly categorized by severity (P0/P1/P2/P3)
+- [ ] Violations correctly attributed to quality criteria
+- [ ] No false positives (violations are legitimate issues)
+- [ ] No false negatives (critical issues not missed)
+
+### Review Report Clarity
+
+- [ ] Executive summary is clear and actionable
+- [ ] Issue explanations are understandable
+- [ ] Recommended fixes are implementable
+- [ ] Code examples are correct and runnable
+- [ ] Recommendation (Approve/Request changes) is clear
+
+---
+
+## Quality Checks
+
+### Knowledge-Based Validation
+
+- [ ] All feedback grounded in knowledge base fragments
+- [ ] Recommendations follow proven patterns
+- [ ] No arbitrary or opinion-based feedback
+- [ ] Knowledge fragment references accurate and relevant
+
+### Actionable Feedback
+
+- [ ] Every issue includes recommended fix
+- [ ] Every fix includes code example
+- [ ] Code examples demonstrate correct pattern
+- [ ] Fixes reference knowledge base for more detail
+
+### Severity Classification
+
+- [ ] Critical (P0) issues are genuinely critical (hard waits, race conditions, no assertions)
+- [ ] High (P1) issues impact maintainability/reliability (missing IDs, hardcoded data)
+- [ ] Medium (P2) issues are nice-to-have improvements (long files, missing priorities)
+- [ ] Low (P3) issues are minor style/preference (verbose tests)
+
+### Context Awareness
+
+- [ ] Review considers project context (some patterns may be justified)
+- [ ] Violations with justification comments noted as acceptable
+- [ ] Edge cases acknowledged
+- [ ] Recommendations are pragmatic, not dogmatic
+
+---
+
+## Integration Points
+
+### Story File Integration
+
+- [ ] Story file discovered correctly (if available)
+- [ ] Acceptance criteria extracted and used for context
+- [ ] Test quality section appended to story (if enabled)
+- [ ] Link to review report added to story
+
+### Test Design Integration
+
+- [ ] Test design document discovered correctly (if available)
+- [ ] Priority context (P0/P1/P2/P3) extracted and used
+- [ ] Review validates tests align with prioritization
+- [ ] Misalignment flagged (e.g., P0 scenario missing tests)
+
+### Knowledge Base Integration
+
+- [ ] tea-index.csv loaded successfully
+- [ ] All required fragments loaded
+- [ ] Fragments applied correctly to validation
+- [ ] Fragment references in report are accurate
+
+---
+
+## Edge Cases and Special Situations
+
+### Empty or Minimal Tests
+
+- [ ] If test file is empty, report notes "No tests found"
+- [ ] If test file has only boilerplate, report notes "No meaningful tests"
+- [ ] Score reflects lack of content appropriately
+
+### Legacy Tests
+
+- [ ] Legacy tests acknowledged in context
+- [ ] Review provides practical recommendations for improvement
+- [ ] Recognizes that complete refactor may not be feasible
+- [ ] Prioritizes critical issues (flakiness) over style
+
+### Test Framework Variations
+
+- [ ] Review adapts to test framework (Playwright vs Jest vs Cypress)
+- [ ] Framework-specific patterns recognized (e.g., Playwright fixtures)
+- [ ] Framework-specific violations detected (e.g., Cypress anti-patterns)
+- [ ] Knowledge fragments applied appropriately for framework
+
+### Justified Violations
+
+- [ ] Violations with justification comments in code noted as acceptable
+- [ ] Justifications evaluated for legitimacy
+- [ ] Report acknowledges justified patterns
+- [ ] Score not penalized for justified violations
+
+---
+
+## Final Validation
+
+### Review Completeness
+
+- [ ] All enabled quality criteria evaluated
+- [ ] All test files in scope reviewed
+- [ ] All violations cataloged
+- [ ] All recommendations provided
+- [ ] Review report is comprehensive
+
+### Review Accuracy
+
+- [ ] Quality score is accurate
+- [ ] Violations are correct (no false positives)
+- [ ] Critical issues not missed (no false negatives)
+- [ ] Code locations are correct
+- [ ] Knowledge base references are accurate
+
+### Review Usefulness
+
+- [ ] Feedback is actionable
+- [ ] Recommendations are implementable
+- [ ] Code examples are correct
+- [ ] Review helps developer improve tests
+- [ ] Review educates on best practices
+
+### Workflow Complete
+
+- [ ] All checklist items completed
+- [ ] All outputs validated and saved
+- [ ] User notified with summary
+- [ ] Review ready for developer consumption
+- [ ] Follow-up actions identified (if any)
+
+---
+
+## Notes
+
+Record any issues, observations, or important context during workflow execution:
+
+- **Test Framework**: [Playwright, Jest, Cypress, etc.]
+- **Review Scope**: [single file, directory, full suite]
+- **Quality Score**: [0-100 score, letter grade]
+- **Critical Issues**: [Count of P0/P1 violations]
+- **Recommendation**: [Approve / Approve with comments / Request changes / Block]
+- **Special Considerations**: [Legacy code, justified patterns, edge cases]
+- **Follow-up Actions**: [Re-review after fixes, pair programming, etc.]
--- a/bmad/bmm/workflows/testarch/test-review/instructions.md
+++ b/bmad/bmm/workflows/testarch/test-review/instructions.md
@@ -0,0 +1,608 @@
+# Test Quality Review - Instructions v4.0
+
+**Workflow:** `testarch-test-review`
+**Purpose:** Review test quality using TEA's comprehensive knowledge base and validate against best practices for maintainability, determinism, isolation, and flakiness prevention
+**Agent:** Test Architect (TEA)
+**Format:** Pure Markdown v4.0 (no XML blocks)
+
+---
+
+## Overview
+
+This workflow performs comprehensive test quality reviews using TEA's knowledge base of best practices. It validates tests against proven patterns for fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. The review generates actionable feedback with quality scoring.
+
+**Key Capabilities:**
+
+- **Knowledge-Based Review**: Applies patterns from tea-index.csv fragments
+- **Quality Scoring**: 0-100 score based on violations and best practices
+- **Multi-Scope**: Review single file, directory, or entire test suite
+- **Pattern Detection**: Identifies flaky patterns, hard waits, race conditions
+- **Best Practice Validation**: BDD format, test IDs, priorities, assertions
+- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
+- **Integration**: Works with story files, test-design, acceptance criteria
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Test file(s) to review (auto-discovered or explicitly provided)
+- Test framework configuration (playwright.config.ts, jest.config.js, etc.)
+
+**Recommended:**
+
+- Story file with acceptance criteria (for context)
+- Test design document (for priority context)
+- Knowledge base fragments available in tea-index.csv
+
+**Halt Conditions:**
+
+- If test file path is invalid or file doesn't exist, halt and request correction
+- If test_dir is empty (no tests found), halt and notify user
+
+---
+
+## Workflow Steps
+
+### Step 1: Load Context and Knowledge Base
+
+**Actions:**
+
+1. Load relevant knowledge fragments from `{project-root}/bmad/bmm/testarch/tea-index.csv`:
+   - `test-quality.md` - Definition of Done (deterministic tests, isolated with cleanup, explicit assertions, <300 lines, <1.5 min, 658 lines, 5 examples)
+   - `fixture-architecture.md` - Pure function → Fixture → mergeTests composition with auto-cleanup (406 lines, 5 examples)
+   - `network-first.md` - Route intercept before navigate to prevent race conditions (intercept before navigate, HAR capture, deterministic waiting, 489 lines, 5 examples)
+   - `data-factories.md` - Factory functions with faker: overrides, nested factories, API-first setup (498 lines, 5 examples)
+   - `test-levels-framework.md` - E2E vs API vs Component vs Unit appropriateness with decision matrix (467 lines, 4 examples)
+   - `playwright-config.md` - Environment-based configuration with fail-fast validation (722 lines, 5 examples)
+   - `component-tdd.md` - Red-Green-Refactor patterns with provider isolation, accessibility, visual regression (480 lines, 4 examples)
+   - `selective-testing.md` - Duplicate coverage detection with tag-based, spec filter, diff-based selection (727 lines, 4 examples)
+   - `test-healing-patterns.md` - Common failure patterns: stale selectors, race conditions, dynamic data, network errors, hard waits (648 lines, 5 examples)
+   - `selector-resilience.md` - Selector best practices (data-testid > ARIA > text > CSS hierarchy, anti-patterns, 541 lines, 4 examples)
+   - `timing-debugging.md` - Race condition prevention and async debugging techniques (370 lines, 3 examples)
+   - `ci-burn-in.md` - Flaky test detection with 10-iteration burn-in loop (678 lines, 4 examples)
+
+2. Determine review scope:
+   - **single**: Review one test file (`test_file_path` provided)
+   - **directory**: Review all tests in directory (`test_dir` provided)
+   - **suite**: Review entire test suite (discover all test files)
+
+3. Auto-discover related artifacts (if `auto_discover_story: true`):
+   - Extract test ID from filename (e.g., `1.3-E2E-001.spec.ts` → story 1.3)
+   - Search for story file (`story-1.3.md`)
+   - Search for test design (`test-design-story-1.3.md` or `test-design-epic-1.md`)
+
+4. Read story file for context (if available):
+   - Extract acceptance criteria
+   - Extract priority classification
+   - Extract expected test IDs
+
+**Output:** Complete knowledge base loaded, review scope determined, context gathered
+
+---
+
+### Step 2: Discover and Parse Test Files
+
+**Actions:**
+
+1. **Discover test files** based on scope:
+   - **single**: Use `test_file_path` variable
+   - **directory**: Use `glob` to find all test files in `test_dir` (e.g., `*.spec.ts`, `*.test.js`)
+   - **suite**: Use `glob` to find all test files recursively from project root
+
+2. **Parse test file metadata**:
+   - File path and name
+   - File size (warn if >15 KB or >300 lines)
+   - Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
+   - Imports and dependencies
+   - Test structure (describe/context/it blocks)
+
+3. **Extract test structure**:
+   - Count of describe blocks (test suites)
+   - Count of it/test blocks (individual tests)
+   - Test IDs (if present, e.g., `test.describe('1.3-E2E-001')`)
+   - Priority markers (if present, e.g., `test.describe.only` for P0)
+   - BDD structure (Given-When-Then comments or steps)
+
+4. **Identify test patterns**:
+   - Fixtures used
+   - Data factories used
+   - Network interception patterns
+   - Assertions used (expect, assert, toHaveText, etc.)
+   - Waits and timeouts (page.waitFor, sleep, hardcoded delays)
+   - Conditionals (if/else, switch, ternary)
+   - Try/catch blocks
+   - Shared state or globals
+
+**Output:** Complete test file inventory with structure and pattern analysis
+
+---
+
+### Step 3: Validate Against Quality Criteria
+
+**Actions:**
+
+For each test file, validate against quality criteria (configurable via workflow variables):
+
+#### 1. BDD Format Validation (if `check_given_when_then: true`)
+
+- ✅ **PASS**: Tests use Given-When-Then structure (comments or step organization)
+- ⚠️ **WARN**: Tests have some structure but not explicit GWT
+- ❌ **FAIL**: Tests lack clear structure, hard to understand intent
+
+**Knowledge Fragment**: test-quality.md, tdd-cycles.md
+
+---
+
+#### 2. Test ID Conventions (if `check_test_ids: true`)
+
+- ✅ **PASS**: Test IDs present and follow convention (e.g., `1.3-E2E-001`, `2.1-API-005`)
+- ⚠️ **WARN**: Some test IDs missing or inconsistent
+- ❌ **FAIL**: No test IDs, can't trace tests to requirements
+
+**Knowledge Fragment**: traceability.md, test-quality.md
+
+---
+
+#### 3. Priority Markers (if `check_priority_markers: true`)
+
+- ✅ **PASS**: Tests classified as P0/P1/P2/P3 (via markers or test-design reference)
+- ⚠️ **WARN**: Some priority classifications missing
+- ❌ **FAIL**: No priority classification, can't determine criticality
+
+**Knowledge Fragment**: test-priorities.md, risk-governance.md
+
+---
+
+#### 4. Hard Waits Detection (if `check_hard_waits: true`)
+
+- ✅ **PASS**: No hard waits detected (no `sleep()`, `wait(5000)`, hardcoded delays)
+- ⚠️ **WARN**: Some hard waits used but with justification comments
+- ❌ **FAIL**: Hard waits detected without justification (flakiness risk)
+
+**Patterns to detect:**
+
+- `sleep(1000)`, `setTimeout()`, `delay()`
+- `page.waitForTimeout(5000)` without explicit reason
+- `await new Promise(resolve => setTimeout(resolve, 3000))`
+
+**Knowledge Fragment**: test-quality.md, network-first.md
+
+---
+
+#### 5. Determinism Check (if `check_determinism: true`)
+
+- ✅ **PASS**: Tests are deterministic (no conditionals, no try/catch abuse, no random values)
+- ⚠️ **WARN**: Some conditionals but with clear justification
+- ❌ **FAIL**: Tests use if/else, switch, or try/catch to control flow (flakiness risk)
+
+**Patterns to detect:**
+
+- `if (condition) { test logic }` - tests should work deterministically
+- `try { test } catch { fallback }` - tests shouldn't swallow errors
+- `Math.random()`, `Date.now()` without factory abstraction
+
+**Knowledge Fragment**: test-quality.md, data-factories.md
+
+---
+
+#### 6. Isolation Validation (if `check_isolation: true`)
+
+- ✅ **PASS**: Tests clean up resources, no shared state, can run in any order
+- ⚠️ **WARN**: Some cleanup missing but isolated enough
+- ❌ **FAIL**: Tests share state, depend on execution order, leave resources
+
+**Patterns to check:**
+
+- afterEach/afterAll cleanup hooks present
+- No global variables mutated
+- Database/API state cleaned up after tests
+- Test data deleted or marked inactive
+
+**Knowledge Fragment**: test-quality.md, data-factories.md
+
+---
+
+#### 7. Fixture Patterns (if `check_fixture_patterns: true`)
+
+- ✅ **PASS**: Uses pure function → Fixture → mergeTests pattern
+- ⚠️ **WARN**: Some fixtures used but not consistently
+- ❌ **FAIL**: No fixtures, tests repeat setup code (maintainability risk)
+
+**Patterns to check:**
+
+- Fixtures defined (e.g., `test.extend({ customFixture: async ({}, use) => { ... }})`)
+- Pure functions used for fixture logic
+- mergeTests used to combine fixtures
+- No beforeEach with complex setup (should be in fixtures)
+
+**Knowledge Fragment**: fixture-architecture.md
+
+---
+
+#### 8. Data Factories (if `check_data_factories: true`)
+
+- ✅ **PASS**: Uses factory functions with overrides, API-first setup
+- ⚠️ **WARN**: Some factories used but also hardcoded data
+- ❌ **FAIL**: Hardcoded test data, magic strings/numbers (maintainability risk)
+
+**Patterns to check:**
+
+- Factory functions defined (e.g., `createUser()`, `generateInvoice()`)
+- Factories use faker.js or similar for realistic data
+- Factories accept overrides (e.g., `createUser({ email: 'custom@example.com' })`)
+- API-first setup (create via API, test via UI)
+
+**Knowledge Fragment**: data-factories.md
+
+---
+
+#### 9. Network-First Pattern (if `check_network_first: true`)
+
+- ✅ **PASS**: Route interception set up BEFORE navigation (race condition prevention)
+- ⚠️ **WARN**: Some routes intercepted correctly, others after navigation
+- ❌ **FAIL**: Route interception after navigation (race condition risk)
+
+**Patterns to check:**
+
+- `page.route()` called before `page.goto()`
+- `page.waitForResponse()` used with explicit URL pattern
+- No navigation followed immediately by route setup
+
+**Knowledge Fragment**: network-first.md
+
+---
+
+#### 10. Assertions (if `check_assertions: true`)
+
+- ✅ **PASS**: Explicit assertions present (expect, assert, toHaveText)
+- ⚠️ **WARN**: Some tests rely on implicit waits instead of assertions
+- ❌ **FAIL**: Missing assertions, tests don't verify behavior
+
+**Patterns to check:**
+
+- Each test has at least one assertion
+- Assertions are specific (not just truthy checks)
+- Assertions use framework-provided matchers (toHaveText, toBeVisible)
+
+**Knowledge Fragment**: test-quality.md
+
+---
+
+#### 11. Test Length (if `check_test_length: true`)
+
+- ✅ **PASS**: Test file ≤200 lines (ideal), ≤300 lines (acceptable)
+- ⚠️ **WARN**: Test file 301-500 lines (consider splitting)
+- ❌ **FAIL**: Test file >500 lines (too large, maintainability risk)
+
+**Knowledge Fragment**: test-quality.md
+
+---
+
+#### 12. Test Duration (if `check_test_duration: true`)
+
+- ✅ **PASS**: Individual tests ≤1.5 minutes (target: <30 seconds)
+- ⚠️ **WARN**: Some tests 1.5-3 minutes (consider optimization)
+- ❌ **FAIL**: Tests >3 minutes (too slow, impacts CI/CD)
+
+**Note:** Duration estimation based on complexity analysis if execution data unavailable
+
+**Knowledge Fragment**: test-quality.md, selective-testing.md
+
+---
+
+#### 13. Flakiness Patterns (if `check_flakiness_patterns: true`)
+
+- ✅ **PASS**: No known flaky patterns detected
+- ⚠️ **WARN**: Some potential flaky patterns (e.g., tight timeouts, race conditions)
+- ❌ **FAIL**: Multiple flaky patterns detected (high flakiness risk)
+
+**Patterns to detect:**
+
+- Tight timeouts (e.g., `{ timeout: 1000 }`)
+- Race conditions (navigation before route interception)
+- Timing-dependent assertions (e.g., checking timestamps)
+- Retry logic in tests (hides flakiness)
+- Environment-dependent assumptions (hardcoded URLs, ports)
+
+**Knowledge Fragment**: test-quality.md, network-first.md, ci-burn-in.md
+
+---
+
+### Step 4: Calculate Quality Score
+
+**Actions:**
+
+1. **Count violations** by severity:
+   - **Critical (P0)**: Hard waits without justification, no assertions, race conditions, shared state
+   - **High (P1)**: Missing test IDs, no BDD structure, hardcoded data, missing fixtures
+   - **Medium (P2)**: Long test files (>300 lines), missing priorities, some conditionals
+   - **Low (P3)**: Minor style issues, incomplete cleanup, verbose tests
+
+2. **Calculate quality score** (if `quality_score_enabled: true`):
+
+```
+Starting Score: 100
+
+Critical Violations: -10 points each
+High Violations: -5 points each
+Medium Violations: -2 points each
+Low Violations: -1 point each
+
+Bonus Points:
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern: +5
+ Perfect isolation: +5
+ All test IDs present: +5
+
+Quality Score: max(0, min(100, Starting Score - Violations + Bonus))
+```
+
+3. **Quality Grade**:
+   - **90-100**: Excellent (A+)
+   - **80-89**: Good (A)
+   - **70-79**: Acceptable (B)
+   - **60-69**: Needs Improvement (C)
+   - **<60**: Critical Issues (F)
+
+**Output:** Quality score calculated with violation breakdown
+
+---
+
+### Step 5: Generate Review Report
+
+**Actions:**
+
+1. **Create review report** using `test-review-template.md`:
+
+   **Header Section:**
+   - Test file(s) reviewed
+   - Review date
+   - Review scope (single/directory/suite)
+   - Quality score and grade
+
+   **Executive Summary:**
+   - Overall assessment (Excellent/Good/Needs Improvement/Critical)
+   - Key strengths
+   - Key weaknesses
+   - Recommendation (Approve/Approve with comments/Request changes)
+
+   **Quality Criteria Assessment:**
+   - Table with all criteria evaluated
+   - Status for each (PASS/WARN/FAIL)
+   - Violation count per criterion
+
+   **Critical Issues (Must Fix):**
+   - Priority P0/P1 violations
+   - Code location (file:line)
+   - Explanation of issue
+   - Recommended fix
+   - Knowledge base reference
+
+   **Recommendations (Should Fix):**
+   - Priority P2/P3 violations
+   - Code location (file:line)
+   - Explanation of issue
+   - Recommended improvement
+   - Knowledge base reference
+
+   **Best Practices Examples:**
+   - Highlight good patterns found in tests
+   - Reference knowledge base fragments
+   - Provide examples for others to follow
+
+   **Knowledge Base References:**
+   - List all fragments consulted
+   - Provide links to detailed guidance
+
+2. **Generate inline comments** (if `generate_inline_comments: true`):
+   - Add TODO comments in test files at violation locations
+   - Format: `// TODO (TEA Review): [Issue description] - See test-review-{filename}.md`
+   - Never modify test logic, only add comments
+
+3. **Generate quality badge** (if `generate_quality_badge: true`):
+   - Create badge with quality score (e.g., "Test Quality: 87/100 (A)")
+   - Format for inclusion in README or documentation
+
+4. **Append to story file** (if `append_to_story: true` and story file exists):
+   - Add "Test Quality Review" section to story
+   - Include quality score and critical issues
+   - Link to full review report
+
+**Output:** Comprehensive review report with actionable feedback
+
+---
+
+### Step 6: Save Outputs and Notify
+
+**Actions:**
+
+1. **Save review report** to `{output_file}`
+2. **Save inline comments** to test files (if enabled)
+3. **Save quality badge** to output folder (if enabled)
+4. **Update story file** (if enabled)
+5. **Generate summary message** for user:
+   - Quality score and grade
+   - Critical issue count
+   - Recommendation
+
+**Output:** All review artifacts saved and user notified
+
+---
+
+## Quality Criteria Decision Matrix
+
+| Criterion          | PASS                      | WARN           | FAIL                | Knowledge Fragment      |
+| ------------------ | ------------------------- | -------------- | ------------------- | ----------------------- |
+| BDD Format         | Given-When-Then present   | Some structure | No structure        | test-quality.md         |
+| Test IDs           | All tests have IDs        | Some missing   | No IDs              | traceability.md         |
+| Priority Markers   | All classified            | Some missing   | No classification   | test-priorities.md      |
+| Hard Waits         | No hard waits             | Some justified | Hard waits present  | test-quality.md         |
+| Determinism        | No conditionals/random    | Some justified | Conditionals/random | test-quality.md         |
+| Isolation          | Clean up, no shared state | Some gaps      | Shared state        | test-quality.md         |
+| Fixture Patterns   | Pure fn → Fixture         | Some fixtures  | No fixtures         | fixture-architecture.md |
+| Data Factories     | Factory functions         | Some factories | Hardcoded data      | data-factories.md       |
+| Network-First      | Intercept before navigate | Some correct   | Race conditions     | network-first.md        |
+| Assertions         | Explicit assertions       | Some implicit  | Missing assertions  | test-quality.md         |
+| Test Length        | ≤300 lines                | 301-500 lines  | >500 lines          | test-quality.md         |
+| Test Duration      | ≤1.5 min                  | 1.5-3 min      | >3 min              | test-quality.md         |
+| Flakiness Patterns | No flaky patterns         | Some potential | Multiple patterns   | ci-burn-in.md           |
+
+---
+
+## Example Review Summary
+
+````markdown
+# Test Quality Review: auth-login.spec.ts
+
+**Quality Score**: 78/100 (B - Acceptable)
+**Review Date**: 2025-10-14
+**Recommendation**: Approve with Comments
+
+## Executive Summary
+
+Overall, the test demonstrates good structure and coverage of the login flow. However, there are several areas for improvement to enhance maintainability and prevent flakiness.
+
+**Strengths:**
+
+- Excellent BDD structure with clear Given-When-Then comments
+- Good use of test IDs (1.3-E2E-001, 1.3-E2E-002)
+- Comprehensive assertions on authentication state
+
+**Weaknesses:**
+
+- Hard wait detected (page.waitForTimeout(2000)) - flakiness risk
+- Hardcoded test data (email: 'test@example.com') - use factories instead
+- Missing fixture for common login setup - DRY violation
+
+**Recommendation**: Address critical issue (hard wait) before merging. Other improvements can be addressed in follow-up PR.
+
+## Critical Issues (Must Fix)
+
+### 1. Hard Wait Detected (Line 45)
+
+**Severity**: P0 (Critical)
+**Issue**: `await page.waitForTimeout(2000)` introduces flakiness
+**Fix**: Use explicit wait for element or network request instead
+**Knowledge**: See test-quality.md, network-first.md
+
+```typescript
+// ❌ Bad (current)
+await page.waitForTimeout(2000);
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
+
+// ✅ Good (recommended)
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
+```
+````
+
+## Recommendations (Should Fix)
+
+### 1. Use Data Factory for Test User (Lines 23, 32, 41)
+
+**Severity**: P1 (High)
+**Issue**: Hardcoded email 'test@example.com' - maintainability risk
+**Fix**: Create factory function for test users
+**Knowledge**: See data-factories.md
+
+```typescript
+// ✅ Good (recommended)
+import { createTestUser } from './factories/user-factory';
+
+const testUser = createTestUser({ role: 'admin' });
+await loginPage.login(testUser.email, testUser.password);
+```
+
+### 2. Extract Login Setup to Fixture (Lines 18-28)
+
+**Severity**: P1 (High)
+**Issue**: Login setup repeated across tests - DRY violation
+**Fix**: Create fixture for authenticated state
+**Knowledge**: See fixture-architecture.md
+
+```typescript
+// ✅ Good (recommended)
+const test = base.extend({
+  authenticatedPage: async ({ page }, use) => {
+    const user = createTestUser();
+    await loginPage.login(user.email, user.password);
+    await use(page);
+  },
+});
+
+test('user can access dashboard', async ({ authenticatedPage }) => {
+  // Test starts already logged in
+});
+```
+
+## Quality Score Breakdown
+
+- Starting Score: 100
+- Critical Violations (1 × -10): -10
+- High Violations (2 × -5): -10
+- Medium Violations (0 × -2): 0
+- Low Violations (1 × -1): -1
+- Bonus (BDD +5, Test IDs +5): +10
+- **Final Score**: 78/100 (B)
+
+```
+
+---
+
+## Integration with Other Workflows
+
+### Before Test Review
+
+- **atdd**: Generate acceptance tests (TEA reviews them for quality)
+- **automate**: Expand regression suite (TEA reviews new tests)
+- **dev story**: Developer writes implementation tests (TEA reviews them)
+
+### After Test Review
+
+- **Developer**: Addresses critical issues, improves based on recommendations
+- **gate**: Test quality review feeds into gate decision (high-quality tests increase confidence)
+
+### Coordinates With
+
+- **Story File**: Review links to acceptance criteria context
+- **Test Design**: Review validates tests align with prioritization
+- **Knowledge Base**: Review references fragments for detailed guidance
+
+---
+
+## Important Notes
+
+1. **Non-Prescriptive**: Review provides guidance, not rigid rules
+2. **Context Matters**: Some violations may be justified for specific scenarios
+3. **Knowledge-Based**: All feedback grounded in proven patterns from tea-index.csv
+4. **Actionable**: Every issue includes recommended fix with code examples
+5. **Quality Score**: Use as indicator, not absolute measure
+6. **Continuous Improvement**: Review same tests periodically as patterns evolve
+
+---
+
+## Troubleshooting
+
+**Problem: No test files found**
+- Verify test_dir path is correct
+- Check test file extensions match glob pattern
+- Ensure test files exist in expected location
+
+**Problem: Quality score seems too low/high**
+- Review violation counts - may need to adjust thresholds
+- Consider context - some projects have different standards
+- Focus on critical issues first, not just score
+
+**Problem: Inline comments not generated**
+- Check generate_inline_comments: true in variables
+- Verify write permissions on test files
+- Review append_to_file: false (separate report mode)
+
+**Problem: Knowledge fragments not loading**
+- Verify tea-index.csv exists in testarch/ directory
+- Check fragment file paths are correct
+- Ensure auto_load_knowledge: true in variables
+```
--- a/bmad/bmm/workflows/testarch/test-review/test-review-template.md
+++ b/bmad/bmm/workflows/testarch/test-review/test-review-template.md
@@ -0,0 +1,388 @@
+# Test Quality Review: {test_filename}
+
+**Quality Score**: {score}/100 ({grade} - {assessment})
+**Review Date**: {YYYY-MM-DD}
+**Review Scope**: {single | directory | suite}
+**Reviewer**: {user_name or TEA Agent}
+
+---
+
+## Executive Summary
+
+**Overall Assessment**: {Excellent | Good | Acceptable | Needs Improvement | Critical Issues}
+
+**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
+
+### Key Strengths
+
+✅ {strength_1}
+✅ {strength_2}
+✅ {strength_3}
+
+### Key Weaknesses
+
+❌ {weakness_1}
+❌ {weakness_2}
+❌ {weakness_3}
+
+### Summary
+
+{1-2 paragraph summary of overall test quality, highlighting major findings and recommendation rationale}
+
+---
+
+## Quality Criteria Assessment
+
+| Criterion                            | Status                          | Violations | Notes        |
+| ------------------------------------ | ------------------------------- | ---------- | ------------ |
+| BDD Format (Given-When-Then)         | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Test IDs                             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Priority Markers (P0/P1/P2/P3)       | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Hard Waits (sleep, waitForTimeout)   | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Determinism (no conditionals)        | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Isolation (cleanup, no shared state) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Fixture Patterns                     | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Data Factories                       | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Network-First Pattern                | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Explicit Assertions                  | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Test Length (≤300 lines)             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {lines}    | {brief_note} |
+| Test Duration (≤1.5 min)             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {duration} | {brief_note} |
+| Flakiness Patterns                   | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+
+**Total Violations**: {critical_count} Critical, {high_count} High, {medium_count} Medium, {low_count} Low
+
+---
+
+## Quality Score Breakdown
+
+```
+Starting Score:          100
+Critical Violations:     -{critical_count} × 10 = -{critical_deduction}
+High Violations:         -{high_count} × 5 = -{high_deduction}
+Medium Violations:       -{medium_count} × 2 = -{medium_deduction}
+Low Violations:          -{low_count} × 1 = -{low_deduction}
+
+Bonus Points:
+  Excellent BDD:         +{0|5}
+  Comprehensive Fixtures: +{0|5}
+  Data Factories:        +{0|5}
+  Network-First:         +{0|5}
+  Perfect Isolation:     +{0|5}
+  All Test IDs:          +{0|5}
+                         --------
+Total Bonus:             +{bonus_total}
+
+Final Score:             {final_score}/100
+Grade:                   {grade}
+```
+
+---
+
+## Critical Issues (Must Fix)
+
+{If no critical issues: "No critical issues detected. ✅"}
+
+{For each critical issue:}
+
+### {issue_number}. {Issue Title}
+
+**Severity**: P0 (Critical)
+**Location**: `{filename}:{line_number}`
+**Criterion**: {criterion_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Issue Description**:
+{Detailed explanation of what the problem is and why it's critical}
+
+**Current Code**:
+
+```typescript
+// ❌ Bad (current implementation)
+{
+  code_snippet_showing_problem;
+}
+```
+
+**Recommended Fix**:
+
+```typescript
+// ✅ Good (recommended approach)
+{
+  code_snippet_showing_solution;
+}
+```
+
+**Why This Matters**:
+{Explanation of impact - flakiness risk, maintainability, reliability}
+
+**Related Violations**:
+{If similar issue appears elsewhere, note line numbers}
+
+---
+
+## Recommendations (Should Fix)
+
+{If no recommendations: "No additional recommendations. Test quality is excellent. ✅"}
+
+{For each recommendation:}
+
+### {rec_number}. {Recommendation Title}
+
+**Severity**: {P1 (High) | P2 (Medium) | P3 (Low)}
+**Location**: `{filename}:{line_number}`
+**Criterion**: {criterion_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Issue Description**:
+{Detailed explanation of what could be improved and why}
+
+**Current Code**:
+
+```typescript
+// ⚠️ Could be improved (current implementation)
+{
+  code_snippet_showing_current_approach;
+}
+```
+
+**Recommended Improvement**:
+
+```typescript
+// ✅ Better approach (recommended)
+{
+  code_snippet_showing_improvement;
+}
+```
+
+**Benefits**:
+{Explanation of benefits - maintainability, readability, reusability}
+
+**Priority**:
+{Why this is P1/P2/P3 - urgency and impact}
+
+---
+
+## Best Practices Found
+
+{If good patterns found, highlight them}
+
+{For each best practice:}
+
+### {practice_number}. {Best Practice Title}
+
+**Location**: `{filename}:{line_number}`
+**Pattern**: {pattern_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Why This Is Good**:
+{Explanation of why this pattern is excellent}
+
+**Code Example**:
+
+```typescript
+// ✅ Excellent pattern demonstrated in this test
+{
+  code_snippet_showing_best_practice;
+}
+```
+
+**Use as Reference**:
+{Encourage using this pattern in other tests}
+
+---
+
+## Test File Analysis
+
+### File Metadata
+
+- **File Path**: `{relative_path_from_project_root}`
+- **File Size**: {line_count} lines, {kb_size} KB
+- **Test Framework**: {Playwright | Jest | Cypress | Vitest | Other}
+- **Language**: {TypeScript | JavaScript}
+
+### Test Structure
+
+- **Describe Blocks**: {describe_count}
+- **Test Cases (it/test)**: {test_count}
+- **Average Test Length**: {avg_lines_per_test} lines per test
+- **Fixtures Used**: {fixture_count} ({fixture_names})
+- **Data Factories Used**: {factory_count} ({factory_names})
+
+### Test Coverage Scope
+
+- **Test IDs**: {test_id_list}
+- **Priority Distribution**:
+  - P0 (Critical): {p0_count} tests
+  - P1 (High): {p1_count} tests
+  - P2 (Medium): {p2_count} tests
+  - P3 (Low): {p3_count} tests
+  - Unknown: {unknown_count} tests
+
+### Assertions Analysis
+
+- **Total Assertions**: {assertion_count}
+- **Assertions per Test**: {avg_assertions_per_test} (avg)
+- **Assertion Types**: {assertion_types_used}
+
+---
+
+## Context and Integration
+
+### Related Artifacts
+
+{If story file found:}
+
+- **Story File**: [{story_filename}]({story_path})
+- **Acceptance Criteria Mapped**: {ac_mapped}/{ac_total} ({ac_coverage}%)
+
+{If test-design found:}
+
+- **Test Design**: [{test_design_filename}]({test_design_path})
+- **Risk Assessment**: {risk_level}
+- **Priority Framework**: P0-P3 applied
+
+### Acceptance Criteria Validation
+
+{If story file available, map tests to ACs:}
+
+| Acceptance Criterion | Test ID   | Status                     | Notes   |
+| -------------------- | --------- | -------------------------- | ------- |
+| {AC_1}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+| {AC_2}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+| {AC_3}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+
+**Coverage**: {covered_count}/{total_count} criteria covered ({coverage_percentage}%)
+
+---
+
+## Knowledge Base References
+
+This review consulted the following knowledge base fragments:
+
+- **[test-quality.md](../../../testarch/knowledge/test-quality.md)** - Definition of Done for tests (no hard waits, <300 lines, <1.5 min, self-cleaning)
+- **[fixture-architecture.md](../../../testarch/knowledge/fixture-architecture.md)** - Pure function → Fixture → mergeTests pattern
+- **[network-first.md](../../../testarch/knowledge/network-first.md)** - Route intercept before navigate (race condition prevention)
+- **[data-factories.md](../../../testarch/knowledge/data-factories.md)** - Factory functions with overrides, API-first setup
+- **[test-levels-framework.md](../../../testarch/knowledge/test-levels-framework.md)** - E2E vs API vs Component vs Unit appropriateness
+- **[tdd-cycles.md](../../../testarch/knowledge/tdd-cycles.md)** - Red-Green-Refactor patterns
+- **[selective-testing.md](../../../testarch/knowledge/selective-testing.md)** - Duplicate coverage detection
+- **[ci-burn-in.md](../../../testarch/knowledge/ci-burn-in.md)** - Flakiness detection patterns (10-iteration loop)
+- **[test-priorities.md](../../../testarch/knowledge/test-priorities.md)** - P0/P1/P2/P3 classification framework
+- **[traceability.md](../../../testarch/knowledge/traceability.md)** - Requirements-to-tests mapping
+
+See [tea-index.csv](../../../testarch/tea-index.csv) for complete knowledge base.
+
+---
+
+## Next Steps
+
+### Immediate Actions (Before Merge)
+
+1. **{action_1}** - {description}
+   - Priority: {P0 | P1 | P2}
+   - Owner: {team_or_person}
+   - Estimated Effort: {time_estimate}
+
+2. **{action_2}** - {description}
+   - Priority: {P0 | P1 | P2}
+   - Owner: {team_or_person}
+   - Estimated Effort: {time_estimate}
+
+### Follow-up Actions (Future PRs)
+
+1. **{action_1}** - {description}
+   - Priority: {P2 | P3}
+   - Target: {next_sprint | backlog}
+
+2. **{action_2}** - {description}
+   - Priority: {P2 | P3}
+   - Target: {next_sprint | backlog}
+
+### Re-Review Needed?
+
+{✅ No re-review needed - approve as-is}
+{⚠️ Re-review after critical fixes - request changes, then re-review}
+{❌ Major refactor required - block merge, pair programming recommended}
+
+---
+
+## Decision
+
+**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
+
+**Rationale**:
+{1-2 paragraph explanation of recommendation based on findings}
+
+**For Approve**:
+
+> Test quality is excellent/good with {score}/100 score. {Minor issues noted can be addressed in follow-up PRs.} Tests are production-ready and follow best practices.
+
+**For Approve with Comments**:
+
+> Test quality is acceptable with {score}/100 score. {High-priority recommendations should be addressed but don't block merge.} Critical issues resolved, but improvements would enhance maintainability.
+
+**For Request Changes**:
+
+> Test quality needs improvement with {score}/100 score. {Critical issues must be fixed before merge.} {X} critical violations detected that pose flakiness/maintainability risks.
+
+**For Block**:
+
+> Test quality is insufficient with {score}/100 score. {Multiple critical issues make tests unsuitable for production.} Recommend pairing session with QA engineer to apply patterns from knowledge base.
+
+---
+
+## Appendix
+
+### Violation Summary by Location
+
+{Table of all violations sorted by line number:}
+
+| Line   | Severity      | Criterion   | Issue         | Fix         |
+| ------ | ------------- | ----------- | ------------- | ----------- |
+| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
+| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
+
+### Quality Trends
+
+{If reviewing same file multiple times, show trend:}
+
+| Review Date  | Score         | Grade     | Critical Issues | Trend       |
+| ------------ | ------------- | --------- | --------------- | ----------- |
+| {YYYY-MM-DD} | {score_1}/100 | {grade_1} | {count_1}       | ⬆️ Improved |
+| {YYYY-MM-DD} | {score_2}/100 | {grade_2} | {count_2}       | ⬇️ Declined |
+| {YYYY-MM-DD} | {score_3}/100 | {grade_3} | {count_3}       | ➡️ Stable   |
+
+### Related Reviews
+
+{If reviewing multiple files in directory/suite:}
+
+| File     | Score       | Grade   | Critical | Status             |
+| -------- | ----------- | ------- | -------- | ------------------ |
+| {file_1} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+| {file_2} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+| {file_3} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+
+**Suite Average**: {avg_score}/100 ({avg_grade})
+
+---
+
+## Review Metadata
+
+**Generated By**: BMad TEA Agent (Test Architect)
+**Workflow**: testarch-test-review v4.0
+**Review ID**: test-review-{filename}-{YYYYMMDD}
+**Timestamp**: {YYYY-MM-DD HH:MM:SS}
+**Version**: 1.0
+
+---
+
+## Feedback on This Review
+
+If you have questions or feedback on this review:
+
+1. Review patterns in knowledge base: `testarch/knowledge/`
+2. Consult tea-index.csv for detailed guidance
+3. Request clarification on specific violations
+4. Pair with QA engineer to apply patterns
+
+This review is guidance, not rigid rules. Context matters - if a pattern is justified, document it with a comment.
--- a/bmad/bmm/workflows/testarch/test-review/workflow.yaml
+++ b/bmad/bmm/workflows/testarch/test-review/workflow.yaml
@@ -0,0 +1,53 @@
+# Test Architect workflow: test-review
+name: testarch-test-review
+description: "Review test quality using comprehensive knowledge base and best practices validation"
+author: "BMad"
+
+# Critical variables from config
+config_source: "{project-root}/bmad/bmm/config.yaml"
+output_folder: "{config_source}:output_folder"
+user_name: "{config_source}:user_name"
+communication_language: "{config_source}:communication_language"
+document_output_language: "{config_source}:document_output_language"
+date: system-generated
+
+# Workflow components
+installed_path: "{project-root}/bmad/bmm/workflows/testarch/test-review"
+instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/test-review-template.md"
+
+# Variables and inputs
+variables:
+  test_dir: "{project-root}/tests" # Root test directory
+  review_scope: "single" # single (one file), directory (folder), suite (all tests)
+
+# Output configuration
+default_output_file: "{output_folder}/test-review.md"
+
+# Required tools
+required_tools:
+  - read_file # Read test files, story, test-design
+  - write_file # Create review report
+  - list_files # Discover test files in directory
+  - search_repo # Find tests by patterns
+  - glob # Find test files matching patterns
+
+# Recommended inputs
+recommended_inputs:
+  - test_file: "Test file to review (single file mode)"
+  - test_dir: "Directory of tests to review (directory mode)"
+  - story: "Related story for acceptance criteria context (optional)"
+  - test_design: "Test design for priority context (optional)"
+
+tags:
+  - qa
+  - test-architect
+  - code-review
+  - quality
+  - best-practices
+
+execution_hints:
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
+  iterative: true # Can review multiple files
--- a/bmad/bmm/workflows/testarch/trace/README.md
+++ b/bmad/bmm/workflows/testarch/trace/README.md
@@ -0,0 +1,802 @@
+# Requirements Traceability & Quality Gate Workflow
+
+**Workflow ID:** `testarch-trace`
+**Agent:** Test Architect (TEA)
+**Command:** `bmad tea *trace`
+
+---
+
+## Overview
+
+The **trace** workflow operates in two sequential phases to validate test coverage and deployment readiness:
+
+**PHASE 1 - REQUIREMENTS TRACEABILITY:** Generates comprehensive requirements-to-tests traceability matrix that maps acceptance criteria to implemented tests, identifies coverage gaps, and provides actionable recommendations.
+
+**PHASE 2 - QUALITY GATE DECISION:** Makes deterministic release decisions (PASS/CONCERNS/FAIL/WAIVED) based on traceability results, test execution evidence, and non-functional requirements validation.
+
+**Key Features:**
+
+- Maps acceptance criteria to specific test cases across all levels (E2E, API, Component, Unit)
+- Classifies coverage status (FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY)
+- Prioritizes gaps by risk level (P0/P1/P2/P3)
+- Applies deterministic decision rules for deployment readiness
+- Generates gate decisions with evidence and rationale
+- Supports waivers for business-approved exceptions
+- Updates workflow status and notifies stakeholders
+- Creates CI/CD-ready YAML snippets for quality gates
+- Detects duplicate coverage across test levels
+- Verifies test quality (assertions, structure, performance)
+
+---
+
+## When to Use This Workflow
+
+Use `*trace` when you need to:
+
+### Phase 1 - Traceability
+
+- ✅ Validate that all acceptance criteria have test coverage
+- ✅ Identify coverage gaps before release or PR merge
+- ✅ Generate traceability documentation for compliance or audits
+- ✅ Ensure critical paths (P0/P1) are fully tested
+- ✅ Detect duplicate coverage across test levels
+- ✅ Assess test quality across your suite
+
+### Phase 2 - Gate Decision (Optional)
+
+- ✅ Make final go/no-go deployment decision
+- ✅ Validate test execution results against thresholds
+- ✅ Evaluate non-functional requirements (security, performance)
+- ✅ Generate audit trail for release approval
+- ✅ Handle business waivers for critical deadlines
+- ✅ Notify stakeholders of gate decision
+
+**Typical Timing:**
+
+- After tests are implemented (post-ATDD or post-development)
+- Before merging a PR (validate P0/P1 coverage)
+- Before release (validate full coverage and make gate decision)
+- During sprint retrospectives (assess test quality)
+
+---
+
+## Prerequisites
+
+### Phase 1 - Traceability (Required)
+
+- Acceptance criteria (from story file OR inline)
+- Implemented test suite (or acknowledged gaps)
+
+### Phase 2 - Gate Decision (Required if `enable_gate_decision: true`)
+
+- Test execution results (CI/CD test reports, pass/fail rates)
+- Test design with risk priorities (P0/P1/P2/P3)
+
+### Recommended
+
+- `test-design.md` - Risk assessment and test priorities
+- `nfr-assessment.md` - Non-functional requirements validation (for release gates)
+- `tech-spec.md` - Technical implementation details
+- Test framework configuration (playwright.config.ts, jest.config.js)
+
+**Halt Conditions:**
+
+- Story lacks any tests AND gaps are not acknowledged → Run `*atdd` first
+- Acceptance criteria are completely missing → Provide criteria or story file
+- Phase 2 enabled but test execution results missing → Warn and skip gate decision
+
+---
+
+## Usage
+
+### Basic Usage (Both Phases)
+
+```bash
+bmad tea *trace
+```
+
+The workflow will:
+
+1. **Phase 1**: Read story file, extract acceptance criteria, auto-discover tests, generate traceability matrix
+2. **Phase 2**: Load test execution results, apply decision rules, generate gate decision document
+3. Save traceability matrix to `bmad/output/traceability-matrix.md`
+4. Save gate decision to `bmad/output/gate-decision-story-X.X.md`
+
+### Phase 1 Only (Skip Gate Decision)
+
+```bash
+bmad tea *trace --enable-gate-decision false
+```
+
+### Custom Configuration
+
+```bash
+bmad tea *trace \
+  --story-file "bmad/output/story-1.3.md" \
+  --test-results "ci-artifacts/test-report.xml" \
+  --min-p0-coverage 100 \
+  --min-p1-coverage 90 \
+  --min-p0-pass-rate 100 \
+  --min-p1-pass-rate 95
+```
+
+### Standalone Mode (No Story File)
+
+```bash
+bmad tea *trace --acceptance-criteria "AC-1: User can login with email..."
+```
+
+---
+
+## Workflow Steps
+
+### PHASE 1: Requirements Traceability
+
+1. **Load Context** - Read story, test design, tech spec, knowledge base
+2. **Discover Tests** - Auto-find tests related to story (by ID, describe blocks, file paths)
+3. **Map Criteria** - Link acceptance criteria to specific test cases
+4. **Analyze Gaps** - Identify missing coverage and prioritize by risk
+5. **Verify Quality** - Check test quality (assertions, structure, performance)
+6. **Generate Deliverables** - Create traceability matrix, gate YAML, coverage badge
+
+### PHASE 2: Quality Gate Decision (if `enable_gate_decision: true`)
+
+7. **Gather Evidence** - Load traceability results, test execution reports, NFR assessments
+8. **Apply Decision Rules** - Evaluate against thresholds (PASS/CONCERNS/FAIL/WAIVED)
+9. **Document Decision** - Create gate decision document with evidence and rationale
+10. **Update Status & Notify** - Append to bmm-workflow-status.md, notify stakeholders
+
+---
+
+## Outputs
+
+### Phase 1: Traceability Matrix (`traceability-matrix.md`)
+
+Comprehensive markdown file with:
+
+- Coverage summary table (by priority)
+- Detailed criterion-to-test mapping
+- Gap analysis with recommendations
+- Quality assessment for each test
+- Gate YAML snippet
+
+**Example:**
+
+```markdown
+# Traceability Matrix - Story 1.3
+
+## Coverage Summary
+
+| Priority | Total | FULL | Coverage % | Status  |
+| -------- | ----- | ---- | ---------- | ------- |
+| P0       | 3     | 3    | 100%       | ✅ PASS |
+| P1       | 5     | 4    | 80%        | ⚠️ WARN |
+
+Gate Status: CONCERNS ⚠️ (P1 coverage below 90%)
+```
+
+### Phase 2: Gate Decision Document (`gate-decision-{type}-{id}.md`)
+
+**Decision Document** with:
+
+- **Decision**: PASS / CONCERNS / FAIL / WAIVED with clear rationale
+- **Evidence Summary**: Test results, coverage, NFRs, quality validation
+- **Decision Criteria Table**: Each criterion with threshold, actual, status
+- **Rationale**: Explanation of decision based on evidence
+- **Residual Risks**: Unresolved issues (for CONCERNS/WAIVED)
+- **Waiver Details**: Approver, justification, remediation plan (for WAIVED)
+- **Next Steps**: Action items for each decision type
+
+**Example:**
+
+```markdown
+# Quality Gate Decision: Story 1.3 - User Login
+
+**Decision**: ⚠️ CONCERNS
+**Date**: 2025-10-15
+
+## Decision Criteria
+
+| Criterion    | Threshold | Actual | Status  |
+| ------------ | --------- | ------ | ------- |
+| P0 Coverage  | ≥100%     | 100%   | ✅ PASS |
+| P1 Coverage  | ≥90%      | 88%    | ⚠️ FAIL |
+| Overall Pass | ≥90%      | 96%    | ✅ PASS |
+
+**Decision**: CONCERNS (P1 coverage 88% below 90% threshold)
+
+## Next Steps
+
+- Deploy with monitoring
+- Create follow-up story for AC-5 test
+```
+
+### Secondary Outputs
+
+- **Gate YAML**: Machine-readable snippet for CI/CD integration
+- **Status Update**: Appends decision to `bmm-workflow-status.md` history
+- **Stakeholder Notification**: Auto-generated summary message
+- **Updated Story File**: Traceability section added (optional)
+
+---
+
+## Decision Logic (Phase 2)
+
+### PASS Decision ✅
+
+**All criteria met:**
+
+- ✅ P0 coverage ≥ 100%
+- ✅ P1 coverage ≥ 90%
+- ✅ Overall coverage ≥ 80%
+- ✅ P0 test pass rate = 100%
+- ✅ P1 test pass rate ≥ 95%
+- ✅ Overall test pass rate ≥ 90%
+- ✅ Security issues = 0
+- ✅ Critical NFR failures = 0
+
+**Action:** Deploy to production with standard monitoring
+
+---
+
+### CONCERNS Decision ⚠️
+
+**P0 criteria met, but P1 criteria degraded:**
+
+- ✅ P0 coverage = 100%
+- ⚠️ P1 coverage 80-89% (below 90% threshold)
+- ⚠️ P1 test pass rate 90-94% (below 95% threshold)
+- ✅ No security issues
+- ✅ No critical NFR failures
+
+**Residual Risks:** Minor P1 issues, edge cases, non-critical gaps
+
+**Action:** Deploy with enhanced monitoring, create backlog stories for fixes
+
+**Note:** CONCERNS does NOT block deployment but requires acknowledgment
+
+---
+
+### FAIL Decision ❌
+
+**Any P0 criterion failed:**
+
+- ❌ P0 coverage <100% (missing critical tests)
+- OR ❌ P0 test pass rate <100% (failing critical tests)
+- OR ❌ P1 coverage <80% (significant gap)
+- OR ❌ Security issues >0
+- OR ❌ Critical NFR failures >0
+
+**Critical Blockers:** P0 test failures, security vulnerabilities, critical NFRs
+
+**Action:** Block deployment, fix critical issues, re-run gate after fixes
+
+---
+
+### WAIVED Decision 🔓
+
+**FAIL status + business-approved waiver:**
+
+- ❌ Original decision: FAIL
+- 🔓 Waiver approved by: {VP Engineering / CTO / Product Owner}
+- 📋 Business justification: {regulatory deadline, contractual obligation}
+- 📅 Waiver expiry: {date - does NOT apply to future releases}
+- 🔧 Remediation plan: {fix in next release, due date}
+
+**Action:** Deploy with business approval, aggressive monitoring, fix ASAP
+
+**Important:** Waivers NEVER apply to P0 security issues or data corruption risks
+
+---
+
+## Coverage Classifications (Phase 1)
+
+- **FULL** ✅ - All scenarios validated at appropriate level(s)
+- **PARTIAL** ⚠️ - Some coverage but missing edge cases or levels
+- **NONE** ❌ - No test coverage at any level
+- **UNIT-ONLY** ⚠️ - Only unit tests (missing integration/E2E validation)
+- **INTEGRATION-ONLY** ⚠️ - Only API/Component tests (missing unit confidence)
+
+---
+
+## Quality Gates
+
+| Priority | Coverage Requirement | Pass Rate Requirement | Severity | Action             |
+| -------- | -------------------- | --------------------- | -------- | ------------------ |
+| P0       | 100%                 | 100%                  | BLOCKER  | Do not release     |
+| P1       | 90%                  | 95%                   | HIGH     | Block PR merge     |
+| P2       | 80% (recommended)    | 85% (recommended)     | MEDIUM   | Address in nightly |
+| P3       | No requirement       | No requirement        | LOW      | Optional           |
+
+---
+
+## Configuration
+
+### workflow.yaml Variables
+
+```yaml
+variables:
+  # Target specification
+  story_file: '' # Path to story markdown
+  acceptance_criteria: '' # Inline criteria if no story
+
+  # Test discovery
+  test_dir: '{project-root}/tests'
+  auto_discover_tests: true
+
+  # Traceability configuration
+  coverage_levels: 'e2e,api,component,unit'
+  map_by_test_id: true
+  map_by_describe: true
+  map_by_filename: true
+
+  # Gap analysis
+  prioritize_by_risk: true
+  suggest_missing_tests: true
+  check_duplicate_coverage: true
+
+  # Output configuration
+  output_file: '{output_folder}/traceability-matrix.md'
+  generate_gate_yaml: true
+  generate_coverage_badge: true
+  update_story_file: true
+
+  # Quality gates (Phase 1 recommendations)
+  min_p0_coverage: 100
+  min_p1_coverage: 90
+  min_overall_coverage: 80
+
+  # PHASE 2: Gate Decision Variables
+  enable_gate_decision: true # Run gate decision after traceability
+
+  # Gate target specification
+  gate_type: 'story' # story | epic | release | hotfix
+
+  # Gate decision configuration
+  decision_mode: 'deterministic' # deterministic | manual
+  allow_waivers: true
+  require_evidence: true
+
+  # Input sources for gate
+  nfr_file: '' # Path to nfr-assessment.md (optional)
+  test_results: '' # Path to test execution results (required for Phase 2)
+
+  # Decision criteria thresholds
+  min_p0_pass_rate: 100
+  min_p1_pass_rate: 95
+  min_overall_pass_rate: 90
+  max_critical_nfrs_fail: 0
+  max_security_issues: 0
+
+  # Risk tolerance
+  allow_p2_failures: true
+  allow_p3_failures: true
+  escalate_p1_failures: true
+
+  # Gate output configuration
+  gate_output_file: '{output_folder}/gate-decision-{gate_type}-{story_id}.md'
+  append_to_history: true
+  notify_stakeholders: true
+
+  # Advanced gate options
+  check_all_workflows_complete: true
+  validate_evidence_freshness: true
+  require_sign_off: false
+```
+
+---
+
+## Knowledge Base Integration
+
+This workflow automatically loads relevant knowledge fragments:
+
+**Phase 1 (Traceability):**
+
+- `traceability.md` - Requirements mapping patterns
+- `test-priorities.md` - P0/P1/P2/P3 risk framework
+- `risk-governance.md` - Risk-based testing approach
+- `test-quality.md` - Definition of Done for tests
+- `selective-testing.md` - Duplicate coverage patterns
+
+**Phase 2 (Gate Decision):**
+
+- `risk-governance.md` - Quality gate criteria and decision framework
+- `probability-impact.md` - Risk scoring for residual risks
+- `test-quality.md` - Quality standards validation
+- `test-priorities.md` - Priority classification framework
+
+---
+
+## Example Scenarios
+
+### Example 1: Full Coverage with Gate PASS
+
+```bash
+# Validate coverage and make gate decision
+bmad tea *trace --story-file "bmad/output/story-1.3.md" \
+  --test-results "ci-artifacts/test-report.xml"
+```
+
+**Phase 1 Output:**
+
+```markdown
+# Traceability Matrix - Story 1.3
+
+## Coverage Summary
+
+| Priority | Total | FULL | Coverage % | Status  |
+| -------- | ----- | ---- | ---------- | ------- |
+| P0       | 3     | 3    | 100%       | ✅ PASS |
+| P1       | 5     | 5    | 100%       | ✅ PASS |
+
+Gate Status: Ready for Phase 2 ✅
+```
+
+**Phase 2 Output:**
+
+```markdown
+# Quality Gate Decision: Story 1.3
+
+**Decision**: ✅ PASS
+
+Evidence:
+
+- P0 Coverage: 100% ✅
+- P1 Coverage: 100% ✅
+- P0 Pass Rate: 100% (12/12 tests) ✅
+- P1 Pass Rate: 98% (45/46 tests) ✅
+- Overall Pass Rate: 96% ✅
+
+Next Steps:
+
+1. Deploy to staging
+2. Monitor for 24 hours
+3. Deploy to production
+```
+
+---
+
+### Example 2: Gap Identification with CONCERNS Decision
+
+```bash
+# Find gaps and evaluate readiness
+bmad tea *trace --story-file "bmad/output/story-2.1.md" \
+  --test-results "ci-artifacts/test-report.xml"
+```
+
+**Phase 1 Output:**
+
+```markdown
+## Gap Analysis
+
+### Critical Gaps (BLOCKER)
+
+- None ✅
+
+### High Priority Gaps (PR BLOCKER)
+
+1. **AC-3: Password reset email edge cases**
+   - Recommend: Add 1.3-API-001 (email service integration)
+   - Impact: Users may not recover accounts in error scenarios
+```
+
+**Phase 2 Output:**
+
+```markdown
+# Quality Gate Decision: Story 2.1
+
+**Decision**: ⚠️ CONCERNS
+
+Evidence:
+
+- P0 Coverage: 100% ✅
+- P1 Coverage: 88% ⚠️ (below 90%)
+- Test Pass Rate: 96% ✅
+
+Residual Risks:
+
+- AC-3 missing E2E test for email error handling
+
+Next Steps:
+
+- Deploy with monitoring
+- Create follow-up story for AC-3 test
+- Monitor production for edge cases
+```
+
+---
+
+### Example 3: Critical Blocker with FAIL Decision
+
+```bash
+# Critical issues detected
+bmad tea *trace --story-file "bmad/output/story-3.2.md" \
+  --test-results "ci-artifacts/test-report.xml"
+```
+
+**Phase 1 Output:**
+
+```markdown
+## Gap Analysis
+
+### Critical Gaps (BLOCKER)
+
+1. **AC-2: Invalid login security validation**
+   - Priority: P0
+   - Status: NONE (no tests)
+   - Impact: Security vulnerability - users can bypass login
+```
+
+**Phase 2 Output:**
+
+```markdown
+# Quality Gate Decision: Story 3.2
+
+**Decision**: ❌ FAIL
+
+Critical Blockers:
+
+- P0 Coverage: 80% ❌ (AC-2 missing)
+- Security Risk: Login bypass vulnerability
+
+Next Steps:
+
+1. BLOCK DEPLOYMENT IMMEDIATELY
+2. Add P0 test for AC-2: 1.3-E2E-004
+3. Re-run full test suite
+4. Re-run gate after fixes verified
+```
+
+---
+
+### Example 4: Business Override with WAIVED Decision
+
+```bash
+# FAIL with business waiver
+bmad tea *trace --story-file "bmad/output/release-2.4.0.md" \
+  --test-results "ci-artifacts/test-report.xml" \
+  --allow-waivers true
+```
+
+**Phase 2 Output:**
+
+```markdown
+# Quality Gate Decision: Release 2.4.0
+
+**Original Decision**: ❌ FAIL
+**Final Decision**: 🔓 WAIVED
+
+Waiver Details:
+
+- Approver: Jane Doe, VP Engineering
+- Reason: GDPR compliance deadline (regulatory, Oct 15)
+- Expiry: 2025-10-15 (does NOT apply to v2.5.0)
+- Monitoring: Enhanced error tracking
+- Remediation: Fix in v2.4.1 hotfix (due Oct 20)
+
+Business Justification:
+Release contains critical GDPR features required by law. Failed
+test affects legacy feature used by <1% of users. Workaround available.
+
+Next Steps:
+
+1. Deploy v2.4.0 with waiver approval
+2. Monitor error rates aggressively
+3. Fix issue in v2.4.1 (Oct 20)
+```
+
+---
+
+## Troubleshooting
+
+### Phase 1 Issues
+
+#### "No tests found for this story"
+
+- Run `*atdd` workflow first to generate failing acceptance tests
+- Check test file naming conventions (may not match story ID pattern)
+- Verify test directory path is correct (`test_dir` variable)
+
+#### "Cannot determine coverage status"
+
+- Tests may lack explicit mapping (no test IDs, unclear describe blocks)
+- Add test IDs: `{STORY_ID}-{LEVEL}-{SEQ}` (e.g., `1.3-E2E-001`)
+- Use Given-When-Then narrative in test descriptions
+
+#### "P0 coverage below 100%"
+
+- This is a **BLOCKER** - do not release
+- Identify missing P0 tests in gap analysis
+- Run `*atdd` workflow to generate missing tests
+- Verify P0 classification is correct with stakeholders
+
+#### "Duplicate coverage detected"
+
+- Review `selective-testing.md` knowledge fragment
+- Determine if overlap is acceptable (defense in depth) or wasteful
+- Consolidate tests at appropriate level (logic → unit, journey → E2E)
+
+### Phase 2 Issues
+
+#### "Test execution results missing"
+
+- Phase 2 gate decision requires `test_results` (CI/CD test reports)
+- If missing, Phase 2 will be skipped with warning
+- Provide JUnit XML, TAP, or JSON test report path via `test_results` variable
+
+#### "Gate decision is FAIL but deployment needed urgently"
+
+- Request business waiver (if `allow_waivers: true`)
+- Document approver, justification, mitigation plan
+- Create follow-up stories to address gaps
+- Use WAIVED decision only for non-P0 gaps
+- **Never waive**: Security issues, data corruption risks
+
+#### "Assessments are stale (>7 days old)"
+
+- Re-run `*test-design` workflow
+- Re-run traceability (Phase 1)
+- Re-run `*nfr-assess` workflow
+- Update evidence files before gate decision
+
+#### "Unclear decision (edge case)"
+
+- Switch to manual mode: `decision_mode: manual`
+- Document assumptions and rationale clearly
+- Escalate to tech lead or architect for guidance
+- Consider waiver if business-critical
+
+---
+
+## Integration with Other Workflows
+
+### Before Trace
+
+1. **testarch-test-design** - Define test priorities (P0/P1/P2/P3)
+2. **testarch-atdd** - Generate failing acceptance tests
+3. **testarch-automate** - Expand regression suite
+
+### After Trace (Phase 2 Decision)
+
+- **PASS**: Proceed to deployment workflow
+- **CONCERNS**: Deploy with monitoring, create remediation backlog stories
+- **FAIL**: Block deployment, fix issues, re-run trace workflow
+- **WAIVED**: Deploy with business approval, escalate monitoring
+
+### Complements
+
+- `*trace` → **testarch-nfr-assess** - Use NFR validation in gate decision
+- `*trace` → **testarch-test-review** - Flag quality issues for review
+- **CI/CD Pipeline** - Use gate YAML for automated quality gates
+
+---
+
+## Best Practices
+
+### Phase 1 - Traceability
+
+1. **Run Trace After Test Implementation**
+   - Don't run `*trace` before tests exist (run `*atdd` first)
+   - Trace is most valuable after initial test suite is written
+
+2. **Prioritize by Risk**
+   - P0 gaps are BLOCKERS (must fix before release)
+   - P1 gaps are HIGH priority (block PR merge)
+   - P3 gaps are acceptable (fix if time permits)
+
+3. **Explicit Mapping**
+   - Use test IDs (`1.3-E2E-001`) for clear traceability
+   - Reference criteria in describe blocks
+   - Use Given-When-Then narrative
+
+4. **Avoid Duplicate Coverage**
+   - Test each behavior at appropriate level only
+   - Unit tests for logic, E2E for journeys
+   - Only overlap for defense in depth on critical paths
+
+### Phase 2 - Gate Decision
+
+5. **Evidence is King**
+   - Never make gate decisions without fresh test results
+   - Validate evidence freshness (<7 days old)
+   - Link to all evidence sources (reports, logs, artifacts)
+
+6. **P0 is Sacred**
+   - P0 failures ALWAYS result in FAIL (no exceptions except waivers)
+   - P0 = Critical user journeys, security, data integrity
+   - Waivers require VP/CTO approval + business justification
+
+7. **Waivers are Temporary**
+   - Waiver applies ONLY to specific release
+   - Issue must be fixed in next release
+   - Never waive: security, data corruption, compliance violations
+
+8. **CONCERNS is Not PASS**
+   - CONCERNS means "deploy with monitoring"
+   - Create follow-up stories for issues
+   - Do not ignore CONCERNS repeatedly
+
+9. **Automate Gate Integration**
+   - Enable `generate_gate_yaml` for CI/CD integration
+   - Use YAML snippets in pipeline quality gates
+   - Export metrics for dashboard visualization
+
+---
+
+## Configuration Examples
+
+### Strict Gate (Zero Tolerance)
+
+```yaml
+min_p0_coverage: 100
+min_p1_coverage: 100
+min_overall_coverage: 90
+min_p0_pass_rate: 100
+min_p1_pass_rate: 100
+min_overall_pass_rate: 95
+allow_waivers: false
+max_security_issues: 0
+max_critical_nfrs_fail: 0
+```
+
+Use for: Financial systems, healthcare, security-critical features
+
+---
+
+### Balanced Gate (Production Standard - Default)
+
+```yaml
+min_p0_coverage: 100
+min_p1_coverage: 90
+min_overall_coverage: 80
+min_p0_pass_rate: 100
+min_p1_pass_rate: 95
+min_overall_pass_rate: 90
+allow_waivers: true
+max_security_issues: 0
+max_critical_nfrs_fail: 0
+```
+
+Use for: Most production releases
+
+---
+
+### Relaxed Gate (Early Development)
+
+```yaml
+min_p0_coverage: 100
+min_p1_coverage: 80
+min_overall_coverage: 70
+min_p0_pass_rate: 100
+min_p1_pass_rate: 85
+min_overall_pass_rate: 80
+allow_waivers: true
+allow_p2_failures: true
+allow_p3_failures: true
+```
+
+Use for: Alpha/beta releases, internal tools, proof-of-concept
+
+---
+
+## Related Commands
+
+- `bmad tea *test-design` - Define test priorities and risk assessment
+- `bmad tea *atdd` - Generate failing acceptance tests for gaps
+- `bmad tea *automate` - Expand regression suite based on gaps
+- `bmad tea *nfr-assess` - Validate non-functional requirements (for gate)
+- `bmad tea *test-review` - Review test quality issues flagged by trace
+- `bmad sm story-done` - Mark story as complete (triggers gate)
+
+---
+
+## Resources
+
+- [Instructions](./instructions.md) - Detailed workflow steps (both phases)
+- [Checklist](./checklist.md) - Validation checklist
+- [Template](./trace-template.md) - Traceability matrix template
+- [Knowledge Base](../../testarch/knowledge/) - Testing best practices
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/bmad/bmm/workflows/testarch/trace/checklist.md
+++ b/bmad/bmm/workflows/testarch/trace/checklist.md
@@ -0,0 +1,654 @@
+# Requirements Traceability & Gate Decision - Validation Checklist
+
+**Workflow:** `testarch-trace`
+**Purpose:** Ensure complete traceability matrix with actionable gap analysis AND make deployment readiness decision (PASS/CONCERNS/FAIL/WAIVED)
+
+This checklist covers **two sequential phases**:
+
+- **PHASE 1**: Requirements Traceability (always executed)
+- **PHASE 2**: Quality Gate Decision (executed if `enable_gate_decision: true`)
+
+---
+
+# PHASE 1: REQUIREMENTS TRACEABILITY
+
+## Prerequisites Validation
+
+- [ ] Acceptance criteria are available (from story file OR inline)
+- [ ] Test suite exists (or gaps are acknowledged and documented)
+- [ ] Test directory path is correct (`test_dir` variable)
+- [ ] Story file is accessible (if using BMad mode)
+- [ ] Knowledge base is loaded (test-priorities, traceability, risk-governance)
+
+---
+
+## Context Loading
+
+- [ ] Story file read successfully (if applicable)
+- [ ] Acceptance criteria extracted correctly
+- [ ] Story ID identified (e.g., 1.3)
+- [ ] `test-design.md` loaded (if available)
+- [ ] `tech-spec.md` loaded (if available)
+- [ ] `PRD.md` loaded (if available)
+- [ ] Relevant knowledge fragments loaded from `tea-index.csv`
+
+---
+
+## Test Discovery and Cataloging
+
+- [ ] Tests auto-discovered using multiple strategies (test IDs, describe blocks, file paths)
+- [ ] Tests categorized by level (E2E, API, Component, Unit)
+- [ ] Test metadata extracted:
+  - [ ] Test IDs (e.g., 1.3-E2E-001)
+  - [ ] Describe/context blocks
+  - [ ] It blocks (individual test cases)
+  - [ ] Given-When-Then structure (if BDD)
+  - [ ] Priority markers (P0/P1/P2/P3)
+- [ ] All relevant test files found (no tests missed due to naming conventions)
+
+---
+
+## Criteria-to-Test Mapping
+
+- [ ] Each acceptance criterion mapped to tests (or marked as NONE)
+- [ ] Explicit references found (test IDs, describe blocks mentioning criterion)
+- [ ] Test level documented (E2E, API, Component, Unit)
+- [ ] Given-When-Then narrative verified for alignment
+- [ ] Traceability matrix table generated:
+  - [ ] Criterion ID
+  - [ ] Description
+  - [ ] Test ID
+  - [ ] Test File
+  - [ ] Test Level
+  - [ ] Coverage Status
+
+---
+
+## Coverage Classification
+
+- [ ] Coverage status classified for each criterion:
+  - [ ] **FULL** - All scenarios validated at appropriate level(s)
+  - [ ] **PARTIAL** - Some coverage but missing edge cases or levels
+  - [ ] **NONE** - No test coverage at any level
+  - [ ] **UNIT-ONLY** - Only unit tests (missing integration/E2E validation)
+  - [ ] **INTEGRATION-ONLY** - Only API/Component tests (missing unit confidence)
+- [ ] Classification justifications provided
+- [ ] Edge cases considered in FULL vs PARTIAL determination
+
+---
+
+## Duplicate Coverage Detection
+
+- [ ] Duplicate coverage checked across test levels
+- [ ] Acceptable overlap identified (defense in depth for critical paths)
+- [ ] Unacceptable duplication flagged (same validation at multiple levels)
+- [ ] Recommendations provided for consolidation
+- [ ] Selective testing principles applied
+
+---
+
+## Gap Analysis
+
+- [ ] Coverage gaps identified:
+  - [ ] Criteria with NONE status
+  - [ ] Criteria with PARTIAL status
+  - [ ] Criteria with UNIT-ONLY status
+  - [ ] Criteria with INTEGRATION-ONLY status
+- [ ] Gaps prioritized by risk level using test-priorities framework:
+  - [ ] **CRITICAL** - P0 criteria without FULL coverage (BLOCKER)
+  - [ ] **HIGH** - P1 criteria without FULL coverage (PR blocker)
+  - [ ] **MEDIUM** - P2 criteria without FULL coverage (nightly gap)
+  - [ ] **LOW** - P3 criteria without FULL coverage (acceptable)
+- [ ] Specific test recommendations provided for each gap:
+  - [ ] Suggested test level (E2E, API, Component, Unit)
+  - [ ] Test description (Given-When-Then)
+  - [ ] Recommended test ID (e.g., 1.3-E2E-004)
+  - [ ] Explanation of why test is needed
+
+---
+
+## Coverage Metrics
+
+- [ ] Overall coverage percentage calculated (FULL coverage / total criteria)
+- [ ] P0 coverage percentage calculated
+- [ ] P1 coverage percentage calculated
+- [ ] P2 coverage percentage calculated (if applicable)
+- [ ] Coverage by level calculated:
+  - [ ] E2E coverage %
+  - [ ] API coverage %
+  - [ ] Component coverage %
+  - [ ] Unit coverage %
+
+---
+
+## Test Quality Verification
+
+For each mapped test, verify:
+
+- [ ] Explicit assertions are present (not hidden in helpers)
+- [ ] Test follows Given-When-Then structure
+- [ ] No hard waits or sleeps (deterministic waiting only)
+- [ ] Self-cleaning (test cleans up its data)
+- [ ] File size < 300 lines
+- [ ] Test duration < 90 seconds
+
+Quality issues flagged:
+
+- [ ] **BLOCKER** issues identified (missing assertions, hard waits, flaky patterns)
+- [ ] **WARNING** issues identified (large files, slow tests, unclear structure)
+- [ ] **INFO** issues identified (style inconsistencies, missing documentation)
+
+Knowledge fragments referenced:
+
+- [ ] `test-quality.md` for Definition of Done
+- [ ] `fixture-architecture.md` for self-cleaning patterns
+- [ ] `network-first.md` for Playwright best practices
+- [ ] `data-factories.md` for test data patterns
+
+---
+
+## Phase 1 Deliverables Generated
+
+### Traceability Matrix Markdown
+
+- [ ] File created at `{output_folder}/traceability-matrix.md`
+- [ ] Template from `trace-template.md` used
+- [ ] Full mapping table included
+- [ ] Coverage status section included
+- [ ] Gap analysis section included
+- [ ] Quality assessment section included
+- [ ] Recommendations section included
+
+### Coverage Badge/Metric (if enabled)
+
+- [ ] Badge markdown generated
+- [ ] Metrics exported to JSON for CI/CD integration
+
+### Updated Story File (if enabled)
+
+- [ ] "Traceability" section added to story markdown
+- [ ] Link to traceability matrix included
+- [ ] Coverage summary included
+
+---
+
+## Phase 1 Quality Assurance
+
+### Accuracy Checks
+
+- [ ] All acceptance criteria accounted for (none skipped)
+- [ ] Test IDs correctly formatted (e.g., 1.3-E2E-001)
+- [ ] File paths are correct and accessible
+- [ ] Coverage percentages calculated correctly
+- [ ] No false positives (tests incorrectly mapped to criteria)
+- [ ] No false negatives (existing tests missed in mapping)
+
+### Completeness Checks
+
+- [ ] All test levels considered (E2E, API, Component, Unit)
+- [ ] All priorities considered (P0, P1, P2, P3)
+- [ ] All coverage statuses used appropriately (FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY)
+- [ ] All gaps have recommendations
+- [ ] All quality issues have severity and remediation guidance
+
+### Actionability Checks
+
+- [ ] Recommendations are specific (not generic)
+- [ ] Test IDs suggested for new tests
+- [ ] Given-When-Then provided for recommended tests
+- [ ] Impact explained for each gap
+- [ ] Priorities clear (CRITICAL, HIGH, MEDIUM, LOW)
+
+---
+
+## Phase 1 Documentation
+
+- [ ] Traceability matrix is readable and well-formatted
+- [ ] Tables render correctly in markdown
+- [ ] Code blocks have proper syntax highlighting
+- [ ] Links are valid and accessible
+- [ ] Recommendations are clear and prioritized
+
+---
+
+# PHASE 2: QUALITY GATE DECISION
+
+**Note**: Phase 2 executes only if `enable_gate_decision: true` in workflow.yaml
+
+---
+
+## Prerequisites
+
+### Evidence Gathering
+
+- [ ] Test execution results obtained (CI/CD pipeline, test framework reports)
+- [ ] Story/epic/release file identified and read
+- [ ] Test design document discovered or explicitly provided (if available)
+- [ ] Traceability matrix discovered or explicitly provided (available from Phase 1)
+- [ ] NFR assessment discovered or explicitly provided (if available)
+- [ ] Code coverage report discovered or explicitly provided (if available)
+- [ ] Burn-in results discovered or explicitly provided (if available)
+
+### Evidence Validation
+
+- [ ] Evidence freshness validated (warn if >7 days old, recommend re-running workflows)
+- [ ] All required assessments available or user acknowledged gaps
+- [ ] Test results are complete (not partial or interrupted runs)
+- [ ] Test results match current codebase (not from outdated branch)
+
+### Knowledge Base Loading
+
+- [ ] `risk-governance.md` loaded successfully
+- [ ] `probability-impact.md` loaded successfully
+- [ ] `test-quality.md` loaded successfully
+- [ ] `test-priorities.md` loaded successfully
+- [ ] `ci-burn-in.md` loaded (if burn-in results available)
+
+---
+
+## Process Steps
+
+### Step 1: Context Loading
+
+- [ ] Gate type identified (story/epic/release/hotfix)
+- [ ] Target ID extracted (story_id, epic_num, or release_version)
+- [ ] Decision thresholds loaded from workflow variables
+- [ ] Risk tolerance configuration loaded
+- [ ] Waiver policy loaded
+
+### Step 2: Evidence Parsing
+
+**Test Results:**
+
+- [ ] Total test count extracted
+- [ ] Passed test count extracted
+- [ ] Failed test count extracted
+- [ ] Skipped test count extracted
+- [ ] Test duration extracted
+- [ ] P0 test pass rate calculated
+- [ ] P1 test pass rate calculated
+- [ ] Overall test pass rate calculated
+
+**Quality Assessments:**
+
+- [ ] P0/P1/P2/P3 scenarios extracted from test-design.md (if available)
+- [ ] Risk scores extracted from test-design.md (if available)
+- [ ] Coverage percentages extracted from traceability-matrix.md (available from Phase 1)
+- [ ] Coverage gaps extracted from traceability-matrix.md (available from Phase 1)
+- [ ] NFR status extracted from nfr-assessment.md (if available)
+- [ ] Security issues count extracted from nfr-assessment.md (if available)
+
+**Code Coverage:**
+
+- [ ] Line coverage percentage extracted (if available)
+- [ ] Branch coverage percentage extracted (if available)
+- [ ] Function coverage percentage extracted (if available)
+- [ ] Critical path coverage validated (if available)
+
+**Burn-in Results:**
+
+- [ ] Burn-in iterations count extracted (if available)
+- [ ] Flaky tests count extracted (if available)
+- [ ] Stability score calculated (if available)
+
+### Step 3: Decision Rules Application
+
+**P0 Criteria Evaluation:**
+
+- [ ] P0 test pass rate evaluated (must be 100%)
+- [ ] P0 acceptance criteria coverage evaluated (must be 100%)
+- [ ] Security issues count evaluated (must be 0)
+- [ ] Critical NFR failures evaluated (must be 0)
+- [ ] Flaky tests evaluated (must be 0 if burn-in enabled)
+- [ ] P0 decision recorded: PASS or FAIL
+
+**P1 Criteria Evaluation:**
+
+- [ ] P1 test pass rate evaluated (threshold: min_p1_pass_rate)
+- [ ] P1 acceptance criteria coverage evaluated (threshold: 95%)
+- [ ] Overall test pass rate evaluated (threshold: min_overall_pass_rate)
+- [ ] Code coverage evaluated (threshold: min_coverage)
+- [ ] P1 decision recorded: PASS or CONCERNS
+
+**P2/P3 Criteria Evaluation:**
+
+- [ ] P2 failures tracked (informational, don't block if allow_p2_failures: true)
+- [ ] P3 failures tracked (informational, don't block if allow_p3_failures: true)
+- [ ] Residual risks documented
+
+**Final Decision:**
+
+- [ ] Decision determined: PASS / CONCERNS / FAIL / WAIVED
+- [ ] Decision rationale documented
+- [ ] Decision is deterministic (follows rules, not arbitrary)
+
+### Step 4: Documentation
+
+**Gate Decision Document Created:**
+
+- [ ] Story/epic/release info section complete (ID, title, description, links)
+- [ ] Decision clearly stated (PASS / CONCERNS / FAIL / WAIVED)
+- [ ] Decision date recorded
+- [ ] Evaluator recorded (user or agent name)
+
+**Evidence Summary Documented:**
+
+- [ ] Test results summary complete (total, passed, failed, pass rates)
+- [ ] Coverage summary complete (P0/P1 criteria, code coverage)
+- [ ] NFR validation summary complete (security, performance, reliability, maintainability)
+- [ ] Flakiness summary complete (burn-in iterations, flaky test count)
+
+**Rationale Documented:**
+
+- [ ] Decision rationale clearly explained
+- [ ] Key evidence highlighted
+- [ ] Assumptions and caveats noted (if any)
+
+**Residual Risks Documented (if CONCERNS or WAIVED):**
+
+- [ ] Unresolved P1/P2 issues listed
+- [ ] Probability × impact estimated for each risk
+- [ ] Mitigations or workarounds described
+
+**Waivers Documented (if WAIVED):**
+
+- [ ] Waiver reason documented (business justification)
+- [ ] Waiver approver documented (name, role)
+- [ ] Waiver expiry date documented
+- [ ] Remediation plan documented (fix in next release, due date)
+- [ ] Monitoring plan documented
+
+**Critical Issues Documented (if FAIL or CONCERNS):**
+
+- [ ] Top 5-10 critical issues listed
+- [ ] Priority assigned to each issue (P0/P1/P2)
+- [ ] Owner assigned to each issue
+- [ ] Due date assigned to each issue
+
+**Recommendations Documented:**
+
+- [ ] Next steps clearly stated for decision type
+- [ ] Deployment recommendation provided
+- [ ] Monitoring recommendations provided (if applicable)
+- [ ] Remediation recommendations provided (if applicable)
+
+### Step 5: Status Updates and Notifications
+
+**Status File Updated:**
+
+- [ ] Gate decision appended to bmm-workflow-status.md (if append_to_history: true)
+- [ ] Format correct: `[DATE] Gate Decision: DECISION - Target {ID} - {rationale}`
+- [ ] Status file committed or staged for commit
+
+**Gate YAML Created:**
+
+- [ ] Gate YAML snippet generated with decision and criteria
+- [ ] Evidence references included in YAML
+- [ ] Next steps included in YAML
+- [ ] YAML file saved to output folder
+
+**Stakeholder Notification Generated:**
+
+- [ ] Notification subject line created
+- [ ] Notification body created with summary
+- [ ] Recipients identified (PM, SM, DEV lead, stakeholders)
+- [ ] Notification ready for delivery (if notify_stakeholders: true)
+
+**Outputs Saved:**
+
+- [ ] Gate decision document saved to `{output_file}`
+- [ ] Gate YAML saved to `{output_folder}/gate-decision-{target}.yaml`
+- [ ] All outputs are valid and readable
+
+---
+
+## Phase 2 Output Validation
+
+### Gate Decision Document
+
+**Completeness:**
+
+- [ ] All required sections present (info, decision, evidence, rationale, next steps)
+- [ ] No placeholder text or TODOs left in document
+- [ ] All evidence references are accurate and complete
+- [ ] All links to artifacts are valid
+
+**Accuracy:**
+
+- [ ] Decision matches applied criteria rules
+- [ ] Test results match CI/CD pipeline output
+- [ ] Coverage percentages match reports
+- [ ] NFR status matches assessment document
+- [ ] No contradictions or inconsistencies
+
+**Clarity:**
+
+- [ ] Decision rationale is clear and unambiguous
+- [ ] Technical jargon is explained or avoided
+- [ ] Stakeholders can understand next steps
+- [ ] Recommendations are actionable
+
+### Gate YAML
+
+**Format:**
+
+- [ ] YAML is valid (no syntax errors)
+- [ ] All required fields present (target, decision, date, evaluator, criteria, evidence)
+- [ ] Field values are correct data types (numbers, strings, dates)
+
+**Content:**
+
+- [ ] Criteria values match decision document
+- [ ] Evidence references are accurate
+- [ ] Next steps align with decision type
+
+---
+
+## Phase 2 Quality Checks
+
+### Decision Integrity
+
+- [ ] Decision is deterministic (follows rules, not arbitrary)
+- [ ] P0 failures result in FAIL decision (unless waived)
+- [ ] Security issues result in FAIL decision (unless waived - but should never be waived)
+- [ ] Waivers have business justification and approver (if WAIVED)
+- [ ] Residual risks are documented (if CONCERNS or WAIVED)
+
+### Evidence-Based
+
+- [ ] Decision is based on actual test results (not guesses)
+- [ ] All claims are supported by evidence
+- [ ] No assumptions without documentation
+- [ ] Evidence sources are cited (CI run IDs, report URLs)
+
+### Transparency
+
+- [ ] Decision rationale is transparent and auditable
+- [ ] Criteria evaluation is documented step-by-step
+- [ ] Any deviations from standard process are explained
+- [ ] Waiver justifications are clear (if applicable)
+
+### Consistency
+
+- [ ] Decision aligns with risk-governance knowledge fragment
+- [ ] Priority framework (P0/P1/P2/P3) applied consistently
+- [ ] Terminology consistent with test-quality knowledge fragment
+- [ ] Decision matrix followed correctly
+
+---
+
+## Phase 2 Integration Points
+
+### BMad Workflow Status
+
+- [ ] Gate decision added to `bmm-workflow-status.md`
+- [ ] Format matches existing gate history entries
+- [ ] Timestamp is accurate
+- [ ] Decision summary is concise (<80 chars)
+
+### CI/CD Pipeline
+
+- [ ] Gate YAML is CI/CD-compatible
+- [ ] YAML can be parsed by pipeline automation
+- [ ] Decision can be used to block/allow deployments
+- [ ] Evidence references are accessible to pipeline
+
+### Stakeholders
+
+- [ ] Notification message is clear and actionable
+- [ ] Decision is explained in non-technical terms
+- [ ] Next steps are specific and time-bound
+- [ ] Recipients are appropriate for decision type
+
+---
+
+## Phase 2 Compliance and Audit
+
+### Audit Trail
+
+- [ ] Decision date and time recorded
+- [ ] Evaluator identified (user or agent)
+- [ ] All evidence sources cited
+- [ ] Decision criteria documented
+- [ ] Rationale clearly explained
+
+### Traceability
+
+- [ ] Gate decision traceable to story/epic/release
+- [ ] Evidence traceable to specific test runs
+- [ ] Assessments traceable to workflows that created them
+- [ ] Waiver traceable to approver (if applicable)
+
+### Compliance
+
+- [ ] Security requirements validated (no unresolved vulnerabilities)
+- [ ] Quality standards met or waived with justification
+- [ ] Regulatory requirements addressed (if applicable)
+- [ ] Documentation sufficient for external audit
+
+---
+
+## Phase 2 Edge Cases and Exceptions
+
+### Missing Evidence
+
+- [ ] If test-design.md missing, decision still possible with test results + trace
+- [ ] If traceability-matrix.md missing, decision still possible with test results (but Phase 1 should provide it)
+- [ ] If nfr-assessment.md missing, NFR validation marked as NOT ASSESSED
+- [ ] If code coverage missing, coverage criterion marked as NOT ASSESSED
+- [ ] User acknowledged gaps in evidence or provided alternative proof
+
+### Stale Evidence
+
+- [ ] Evidence freshness checked (if validate_evidence_freshness: true)
+- [ ] Warnings issued for assessments >7 days old
+- [ ] User acknowledged stale evidence or re-ran workflows
+- [ ] Decision document notes any stale evidence used
+
+### Conflicting Evidence
+
+- [ ] Conflicts between test results and assessments resolved
+- [ ] Most recent/authoritative source identified
+- [ ] Conflict resolution documented in decision rationale
+- [ ] User consulted if conflict cannot be resolved
+
+### Waiver Scenarios
+
+- [ ] Waiver only used for FAIL decision (not PASS or CONCERNS)
+- [ ] Waiver has business justification (not technical convenience)
+- [ ] Waiver has named approver with authority (VP/CTO/PO)
+- [ ] Waiver has expiry date (does NOT apply to future releases)
+- [ ] Waiver has remediation plan with concrete due date
+- [ ] Security vulnerabilities are NOT waived (enforced)
+
+---
+
+# FINAL VALIDATION (Both Phases)
+
+## Non-Prescriptive Validation
+
+- [ ] Traceability format adapted to team needs (not rigid template)
+- [ ] Examples are minimal and focused on patterns
+- [ ] Teams can extend with custom classifications
+- [ ] Integration with external systems supported (JIRA, Azure DevOps)
+- [ ] Compliance requirements considered (if applicable)
+
+---
+
+## Documentation and Communication
+
+- [ ] All documents are readable and well-formatted
+- [ ] Tables render correctly in markdown
+- [ ] Code blocks have proper syntax highlighting
+- [ ] Links are valid and accessible
+- [ ] Recommendations are clear and prioritized
+- [ ] Gate decision is prominent and unambiguous (Phase 2)
+
+---
+
+## Final Validation
+
+**Phase 1 (Traceability):**
+
+- [ ] All prerequisites met
+- [ ] All acceptance criteria mapped or gaps documented
+- [ ] P0 coverage is 100% OR documented as BLOCKER
+- [ ] Gap analysis is complete and prioritized
+- [ ] Test quality issues identified and flagged
+- [ ] Deliverables generated and saved
+
+**Phase 2 (Gate Decision):**
+
+- [ ] All quality evidence gathered
+- [ ] Decision criteria applied correctly
+- [ ] Decision rationale documented
+- [ ] Gate YAML ready for CI/CD integration
+- [ ] Status file updated (if enabled)
+- [ ] Stakeholders notified (if enabled)
+
+**Workflow Complete:**
+
+- [ ] Phase 1 completed successfully
+- [ ] Phase 2 completed successfully (if enabled)
+- [ ] All outputs validated and saved
+- [ ] Ready to proceed based on gate decision
+
+---
+
+## Sign-Off
+
+**Phase 1 - Traceability Status:**
+
+- [ ] ✅ PASS - All quality gates met, no critical gaps
+- [ ] ⚠️ WARN - P1 gaps exist, address before PR merge
+- [ ] ❌ FAIL - P0 gaps exist, BLOCKER for release
+
+**Phase 2 - Gate Decision Status (if enabled):**
+
+- [ ] ✅ PASS - Deploy to production
+- [ ] ⚠️ CONCERNS - Deploy with monitoring
+- [ ] ❌ FAIL - Block deployment, fix issues
+- [ ] 🔓 WAIVED - Deploy with business approval and remediation plan
+
+**Next Actions:**
+
+- If PASS (both phases): Proceed to deployment
+- If WARN/CONCERNS: Address gaps/issues, proceed with monitoring
+- If FAIL (either phase): Run `*atdd` for missing tests, fix issues, re-run `*trace`
+- If WAIVED: Deploy with approved waiver, schedule remediation
+
+---
+
+## Notes
+
+Record any issues, deviations, or important observations during workflow execution:
+
+- **Phase 1 Issues**: [Note any traceability mapping challenges, missing tests, quality concerns]
+- **Phase 2 Issues**: [Note any missing, stale, or conflicting evidence]
+- **Decision Rationale**: [Document any nuanced reasoning or edge cases]
+- **Waiver Details**: [Document waiver negotiations or approvals]
+- **Follow-up Actions**: [List any actions required after gate decision]
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/bmad/bmm/workflows/testarch/trace/instructions.md
+++ b/bmad/bmm/workflows/testarch/trace/instructions.md
--- a/bmad/bmm/workflows/testarch/trace/trace-template.md
+++ b/bmad/bmm/workflows/testarch/trace/trace-template.md
@@ -0,0 +1,673 @@
+# Traceability Matrix & Gate Decision - Story {STORY_ID}
+
+**Story:** {STORY_TITLE}
+**Date:** {DATE}
+**Evaluator:** {user_name or TEA Agent}
+
+---
+
+## PHASE 1: REQUIREMENTS TRACEABILITY
+
+### Coverage Summary
+
+| Priority  | Total Criteria | FULL Coverage | Coverage % | Status       |
+| --------- | -------------- | ------------- | ---------- | ------------ |
+| P0        | {P0_TOTAL}     | {P0_FULL}     | {P0_PCT}%  | {P0_STATUS}  |
+| P1        | {P1_TOTAL}     | {P1_FULL}     | {P1_PCT}%  | {P1_STATUS}  |
+| P2        | {P2_TOTAL}     | {P2_FULL}     | {P2_PCT}%  | {P2_STATUS}  |
+| P3        | {P3_TOTAL}     | {P3_FULL}     | {P3_PCT}%  | {P3_STATUS}  |
+| **Total** | **{TOTAL}**    | **{FULL}**    | **{PCT}%** | **{STATUS}** |
+
+**Legend:**
+
+- ✅ PASS - Coverage meets quality gate threshold
+- ⚠️ WARN - Coverage below threshold but not critical
+- ❌ FAIL - Coverage below minimum threshold (blocker)
+
+---
+
+### Detailed Mapping
+
+#### {CRITERION_ID}: {CRITERION_DESCRIPTION} ({PRIORITY})
+
+- **Coverage:** {COVERAGE_STATUS} {STATUS_ICON}
+- **Tests:**
+  - `{TEST_ID}` - {TEST_FILE}:{LINE}
+    - **Given:** {GIVEN}
+    - **When:** {WHEN}
+    - **Then:** {THEN}
+  - `{TEST_ID_2}` - {TEST_FILE_2}:{LINE}
+    - **Given:** {GIVEN_2}
+    - **When:** {WHEN_2}
+    - **Then:** {THEN_2}
+
+- **Gaps:** (if PARTIAL or UNIT-ONLY or INTEGRATION-ONLY)
+  - Missing: {MISSING_SCENARIO_1}
+  - Missing: {MISSING_SCENARIO_2}
+
+- **Recommendation:** {RECOMMENDATION_TEXT}
+
+---
+
+#### Example: AC-1: User can login with email and password (P0)
+
+- **Coverage:** FULL ✅
+- **Tests:**
+  - `1.3-E2E-001` - tests/e2e/auth.spec.ts:12
+    - **Given:** User has valid credentials
+    - **When:** User submits login form
+    - **Then:** User is redirected to dashboard
+  - `1.3-UNIT-001` - tests/unit/auth-service.spec.ts:8
+    - **Given:** Valid email and password hash
+    - **When:** validateCredentials is called
+    - **Then:** Returns user object
+
+---
+
+#### Example: AC-3: User can reset password via email (P1)
+
+- **Coverage:** PARTIAL ⚠️
+- **Tests:**
+  - `1.3-E2E-003` - tests/e2e/auth.spec.ts:44
+    - **Given:** User requests password reset
+    - **When:** User clicks reset link in email
+    - **Then:** User can set new password
+
+- **Gaps:**
+  - Missing: Email delivery validation
+  - Missing: Expired token handling (error path)
+  - Missing: Invalid token handling (security test)
+  - Missing: Unit test for token generation logic
+
+- **Recommendation:** Add `1.3-API-001` for email service integration testing and `1.3-UNIT-003` for token generation logic. Add `1.3-E2E-004` for error path validation (expired/invalid tokens).
+
+---
+
+### Gap Analysis
+
+#### Critical Gaps (BLOCKER) ❌
+
+{CRITICAL_GAP_COUNT} gaps found. **Do not release until resolved.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P0)
+   - Current Coverage: {COVERAGE_STATUS}
+   - Missing Tests: {MISSING_TEST_DESCRIPTION}
+   - Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
+   - Impact: {IMPACT_DESCRIPTION}
+
+---
+
+#### High Priority Gaps (PR BLOCKER) ⚠️
+
+{HIGH_GAP_COUNT} gaps found. **Address before PR merge.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P1)
+   - Current Coverage: {COVERAGE_STATUS}
+   - Missing Tests: {MISSING_TEST_DESCRIPTION}
+   - Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
+   - Impact: {IMPACT_DESCRIPTION}
+
+---
+
+#### Medium Priority Gaps (Nightly) ⚠️
+
+{MEDIUM_GAP_COUNT} gaps found. **Address in nightly test improvements.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P2)
+   - Current Coverage: {COVERAGE_STATUS}
+   - Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
+
+---
+
+#### Low Priority Gaps (Optional) ℹ️
+
+{LOW_GAP_COUNT} gaps found. **Optional - add if time permits.**
+
+1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P3)
+   - Current Coverage: {COVERAGE_STATUS}
+
+---
+
+### Quality Assessment
+
+#### Tests with Issues
+
+**BLOCKER Issues** ❌
+
+- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
+
+**WARNING Issues** ⚠️
+
+- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
+
+**INFO Issues** ℹ️
+
+- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
+
+---
+
+#### Example Quality Issues
+
+**WARNING Issues** ⚠️
+
+- `1.3-E2E-001` - 145 seconds (exceeds 90s target) - Optimize fixture setup to reduce test duration
+- `1.3-UNIT-005` - 320 lines (exceeds 300 line limit) - Split into multiple focused test files
+
+**INFO Issues** ℹ️
+
+- `1.3-E2E-002` - Missing Given-When-Then structure - Refactor describe block to use BDD format
+
+---
+
+#### Tests Passing Quality Gates
+
+**{PASSING_TEST_COUNT}/{TOTAL_TEST_COUNT} tests ({PASSING_PCT}%) meet all quality criteria** ✅
+
+---
+
+### Duplicate Coverage Analysis
+
+#### Acceptable Overlap (Defense in Depth)
+
+- {CRITERION_ID}: Tested at unit (business logic) and E2E (user journey) ✅
+
+#### Unacceptable Duplication ⚠️
+
+- {CRITERION_ID}: Same validation at E2E and Component level
+  - Recommendation: Remove {TEST_ID} or consolidate with {OTHER_TEST_ID}
+
+---
+
+### Coverage by Test Level
+
+| Test Level | Tests             | Criteria Covered     | Coverage %       |
+| ---------- | ----------------- | -------------------- | ---------------- |
+| E2E        | {E2E_COUNT}       | {E2E_CRITERIA}       | {E2E_PCT}%       |
+| API        | {API_COUNT}       | {API_CRITERIA}       | {API_PCT}%       |
+| Component  | {COMP_COUNT}      | {COMP_CRITERIA}      | {COMP_PCT}%      |
+| Unit       | {UNIT_COUNT}      | {UNIT_CRITERIA}      | {UNIT_PCT}%      |
+| **Total**  | **{TOTAL_TESTS}** | **{TOTAL_CRITERIA}** | **{TOTAL_PCT}%** |
+
+---
+
+### Traceability Recommendations
+
+#### Immediate Actions (Before PR Merge)
+
+1. **{ACTION_1}** - {DESCRIPTION}
+2. **{ACTION_2}** - {DESCRIPTION}
+
+#### Short-term Actions (This Sprint)
+
+1. **{ACTION_1}** - {DESCRIPTION}
+2. **{ACTION_2}** - {DESCRIPTION}
+
+#### Long-term Actions (Backlog)
+
+1. **{ACTION_1}** - {DESCRIPTION}
+
+---
+
+#### Example Recommendations
+
+**Immediate Actions (Before PR Merge)**
+
+1. **Add P1 Password Reset Tests** - Implement `1.3-API-001` for email service integration and `1.3-E2E-004` for error path validation. P1 coverage currently at 80%, target is 90%.
+2. **Optimize Slow E2E Test** - Refactor `1.3-E2E-001` to use faster fixture setup. Currently 145s, target is <90s.
+
+**Short-term Actions (This Sprint)**
+
+1. **Enhance P2 Coverage** - Add E2E validation for session timeout (`1.3-E2E-005`). Currently UNIT-ONLY coverage.
+2. **Split Large Test File** - Break `1.3-UNIT-005` (320 lines) into multiple focused test files (<300 lines each).
+
+**Long-term Actions (Backlog)**
+
+1. **Enrich P3 Coverage** - Add tests for edge cases in P3 criteria if time permits.
+
+---
+
+## PHASE 2: QUALITY GATE DECISION
+
+**Gate Type:** {story | epic | release | hotfix}
+**Decision Mode:** {deterministic | manual}
+
+---
+
+### Evidence Summary
+
+#### Test Execution Results
+
+- **Total Tests**: {total_count}
+- **Passed**: {passed_count} ({pass_percentage}%)
+- **Failed**: {failed_count} ({fail_percentage}%)
+- **Skipped**: {skipped_count} ({skip_percentage}%)
+- **Duration**: {total_duration}
+
+**Priority Breakdown:**
+
+- **P0 Tests**: {p0_passed}/{p0_total} passed ({p0_pass_rate}%) {✅ | ❌}
+- **P1 Tests**: {p1_passed}/{p1_total} passed ({p1_pass_rate}%) {✅ | ⚠️ | ❌}
+- **P2 Tests**: {p2_passed}/{p2_total} passed ({p2_pass_rate}%) {informational}
+- **P3 Tests**: {p3_passed}/{p3_total} passed ({p3_pass_rate}%) {informational}
+
+**Overall Pass Rate**: {overall_pass_rate}% {✅ | ⚠️ | ❌}
+
+**Test Results Source**: {CI_run_id | test_report_url | local_run}
+
+---
+
+#### Coverage Summary (from Phase 1)
+
+**Requirements Coverage:**
+
+- **P0 Acceptance Criteria**: {p0_covered}/{p0_total} covered ({p0_coverage}%) {✅ | ❌}
+- **P1 Acceptance Criteria**: {p1_covered}/{p1_total} covered ({p1_coverage}%) {✅ | ⚠️ | ❌}
+- **P2 Acceptance Criteria**: {p2_covered}/{p2_total} covered ({p2_coverage}%) {informational}
+- **Overall Coverage**: {overall_coverage}%
+
+**Code Coverage** (if available):
+
+- **Line Coverage**: {line_coverage}% {✅ | ⚠️ | ❌}
+- **Branch Coverage**: {branch_coverage}% {✅ | ⚠️ | ❌}
+- **Function Coverage**: {function_coverage}% {✅ | ⚠️ | ❌}
+
+**Coverage Source**: {coverage_report_url | coverage_file_path}
+
+---
+
+#### Non-Functional Requirements (NFRs)
+
+**Security**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- Security Issues: {security_issue_count}
+- {details_if_issues}
+
+**Performance**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- {performance_metrics_summary}
+
+**Reliability**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- {reliability_metrics_summary}
+
+**Maintainability**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | ⚠️ | ❌}
+
+- {maintainability_metrics_summary}
+
+**NFR Source**: {nfr_assessment_file_path | not_assessed}
+
+---
+
+#### Flakiness Validation
+
+**Burn-in Results** (if available):
+
+- **Burn-in Iterations**: {iteration_count} (e.g., 10)
+- **Flaky Tests Detected**: {flaky_test_count} {✅ if 0 | ❌ if >0}
+- **Stability Score**: {stability_percentage}%
+
+**Flaky Tests List** (if any):
+
+- {flaky_test_1_name} - {failure_rate}
+- {flaky_test_2_name} - {failure_rate}
+
+**Burn-in Source**: {CI_burn_in_run_id | not_available}
+
+---
+
+### Decision Criteria Evaluation
+
+#### P0 Criteria (Must ALL Pass)
+
+| Criterion             | Threshold | Actual                    | Status   |
+| --------------------- | --------- | ------------------------- | -------- | -------- |
+| P0 Coverage           | 100%      | {p0_coverage}%            | {✅ PASS | ❌ FAIL} |
+| P0 Test Pass Rate     | 100%      | {p0_pass_rate}%           | {✅ PASS | ❌ FAIL} |
+| Security Issues       | 0         | {security_issue_count}    | {✅ PASS | ❌ FAIL} |
+| Critical NFR Failures | 0         | {critical_nfr_fail_count} | {✅ PASS | ❌ FAIL} |
+| Flaky Tests           | 0         | {flaky_test_count}        | {✅ PASS | ❌ FAIL} |
+
+**P0 Evaluation**: {✅ ALL PASS | ❌ ONE OR MORE FAILED}
+
+---
+
+#### P1 Criteria (Required for PASS, May Accept for CONCERNS)
+
+| Criterion              | Threshold                 | Actual               | Status   |
+| ---------------------- | ------------------------- | -------------------- | -------- | ----------- | -------- |
+| P1 Coverage            | ≥{min_p1_coverage}%       | {p1_coverage}%       | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+| P1 Test Pass Rate      | ≥{min_p1_pass_rate}%      | {p1_pass_rate}%      | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+| Overall Test Pass Rate | ≥{min_overall_pass_rate}% | {overall_pass_rate}% | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+| Overall Coverage       | ≥{min_coverage}%          | {overall_coverage}%  | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
+
+**P1 Evaluation**: {✅ ALL PASS | ⚠️ SOME CONCERNS | ❌ FAILED}
+
+---
+
+#### P2/P3 Criteria (Informational, Don't Block)
+
+| Criterion         | Actual          | Notes                                                        |
+| ----------------- | --------------- | ------------------------------------------------------------ |
+| P2 Test Pass Rate | {p2_pass_rate}% | {allow_p2_failures ? "Tracked, doesn't block" : "Evaluated"} |
+| P3 Test Pass Rate | {p3_pass_rate}% | {allow_p3_failures ? "Tracked, doesn't block" : "Evaluated"} |
+
+---
+
+### GATE DECISION: {PASS | CONCERNS | FAIL | WAIVED}
+
+---
+
+### Rationale
+
+{Explain decision based on criteria evaluation}
+
+{Highlight key evidence that drove decision}
+
+{Note any assumptions or caveats}
+
+**Example (PASS):**
+
+> All P0 criteria met with 100% coverage and pass rates across critical tests. All P1 criteria exceeded thresholds with 98% overall pass rate and 92% coverage. No security issues detected. No flaky tests in validation. Feature is ready for production deployment with standard monitoring.
+
+**Example (CONCERNS):**
+
+> All P0 criteria met, ensuring critical user journeys are protected. However, P1 coverage (88%) falls below threshold (90%) due to missing E2E test for AC-5 edge case. Overall pass rate (96%) is excellent. Issues are non-critical and have acceptable workarounds. Risk is low enough to deploy with enhanced monitoring.
+
+**Example (FAIL):**
+
+> CRITICAL BLOCKERS DETECTED:
+>
+> 1. P0 coverage incomplete (80%) - AC-2 security validation missing
+> 2. P0 test failures (75% pass rate) in core search functionality
+> 3. Unresolved SQL injection vulnerability in search filter (CRITICAL)
+>
+> Release MUST BE BLOCKED until P0 issues are resolved. Security vulnerability cannot be waived.
+
+**Example (WAIVED):**
+
+> Original decision was FAIL due to P0 test failure in legacy Excel 2007 export module (affects <1% of users). However, release contains critical GDPR compliance features required by regulatory deadline (Oct 15). Business has approved waiver given:
+>
+> - Regulatory priority overrides legacy module risk
+> - Workaround available (use Excel 2010+)
+> - Issue will be fixed in v2.4.1 hotfix (due Oct 20)
+> - Enhanced monitoring in place
+
+---
+
+### {Section: Delete if not applicable}
+
+#### Residual Risks (For CONCERNS or WAIVED)
+
+List unresolved P1/P2 issues that don't block release but should be tracked:
+
+1. **{Risk Description}**
+   - **Priority**: P1 | P2
+   - **Probability**: Low | Medium | High
+   - **Impact**: Low | Medium | High
+   - **Risk Score**: {probability × impact}
+   - **Mitigation**: {workaround or monitoring plan}
+   - **Remediation**: {fix in next sprint/release}
+
+**Overall Residual Risk**: {LOW | MEDIUM | HIGH}
+
+---
+
+#### Waiver Details (For WAIVED only)
+
+**Original Decision**: ❌ FAIL
+
+**Reason for Failure**:
+
+- {list_of_blocking_issues}
+
+**Waiver Information**:
+
+- **Waiver Reason**: {business_justification}
+- **Waiver Approver**: {name}, {role} (e.g., Jane Doe, VP Engineering)
+- **Approval Date**: {YYYY-MM-DD}
+- **Waiver Expiry**: {YYYY-MM-DD} (**NOTE**: Does NOT apply to next release)
+
+**Monitoring Plan**:
+
+- {enhanced_monitoring_1}
+- {enhanced_monitoring_2}
+- {escalation_criteria}
+
+**Remediation Plan**:
+
+- **Fix Target**: {next_release_version} (e.g., v2.4.1 hotfix)
+- **Due Date**: {YYYY-MM-DD}
+- **Owner**: {team_or_person}
+- **Verification**: {how_fix_will_be_verified}
+
+**Business Justification**:
+{detailed_explanation_of_why_waiver_is_acceptable}
+
+---
+
+#### Critical Issues (For FAIL or CONCERNS)
+
+Top blockers requiring immediate attention:
+
+| Priority | Issue         | Description         | Owner        | Due Date     | Status             |
+| -------- | ------------- | ------------------- | ------------ | ------------ | ------------------ |
+| P0       | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
+| P0       | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
+| P1       | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
+
+**Blocking Issues Count**: {p0_blocker_count} P0 blockers, {p1_blocker_count} P1 issues
+
+---
+
+### Gate Recommendations
+
+#### For PASS Decision ✅
+
+1. **Proceed to deployment**
+   - Deploy to staging environment
+   - Validate with smoke tests
+   - Monitor key metrics for 24-48 hours
+   - Deploy to production with standard monitoring
+
+2. **Post-Deployment Monitoring**
+   - {metric_1_to_monitor}
+   - {metric_2_to_monitor}
+   - {alert_thresholds}
+
+3. **Success Criteria**
+   - {success_criterion_1}
+   - {success_criterion_2}
+
+---
+
+#### For CONCERNS Decision ⚠️
+
+1. **Deploy with Enhanced Monitoring**
+   - Deploy to staging with extended validation period
+   - Enable enhanced logging/monitoring for known risk areas:
+     - {risk_area_1}
+     - {risk_area_2}
+   - Set aggressive alerts for potential issues
+   - Deploy to production with caution
+
+2. **Create Remediation Backlog**
+   - Create story: "{fix_title_1}" (Priority: {priority})
+   - Create story: "{fix_title_2}" (Priority: {priority})
+   - Target sprint: {next_sprint}
+
+3. **Post-Deployment Actions**
+   - Monitor {specific_areas} closely for {time_period}
+   - Weekly status updates on remediation progress
+   - Re-assess after fixes deployed
+
+---
+
+#### For FAIL Decision ❌
+
+1. **Block Deployment Immediately**
+   - Do NOT deploy to any environment
+   - Notify stakeholders of blocking issues
+   - Escalate to tech lead and PM
+
+2. **Fix Critical Issues**
+   - Address P0 blockers listed in Critical Issues section
+   - Owner assignments confirmed
+   - Due dates agreed upon
+   - Daily standup on blocker resolution
+
+3. **Re-Run Gate After Fixes**
+   - Re-run full test suite after fixes
+   - Re-run `bmad tea *trace` workflow
+   - Verify decision is PASS before deploying
+
+---
+
+#### For WAIVED Decision 🔓
+
+1. **Deploy with Business Approval**
+   - Confirm waiver approver has signed off
+   - Document waiver in release notes
+   - Notify all stakeholders of waived risks
+
+2. **Aggressive Monitoring**
+   - {enhanced_monitoring_plan}
+   - {escalation_procedures}
+   - Daily checks on waived risk areas
+
+3. **Mandatory Remediation**
+   - Fix MUST be completed by {due_date}
+   - Issue CANNOT be waived in next release
+   - Track remediation progress weekly
+   - Verify fix in next gate
+
+---
+
+### Next Steps
+
+**Immediate Actions** (next 24-48 hours):
+
+1. {action_1}
+2. {action_2}
+3. {action_3}
+
+**Follow-up Actions** (next sprint/release):
+
+1. {action_1}
+2. {action_2}
+3. {action_3}
+
+**Stakeholder Communication**:
+
+- Notify PM: {decision_summary}
+- Notify SM: {decision_summary}
+- Notify DEV lead: {decision_summary}
+
+---
+
+## Integrated YAML Snippet (CI/CD)
+
+```yaml
+traceability_and_gate:
+  # Phase 1: Traceability
+  traceability:
+    story_id: "{STORY_ID}"
+    date: "{DATE}"
+    coverage:
+      overall: {OVERALL_PCT}%
+      p0: {P0_PCT}%
+      p1: {P1_PCT}%
+      p2: {P2_PCT}%
+      p3: {P3_PCT}%
+    gaps:
+      critical: {CRITICAL_COUNT}
+      high: {HIGH_COUNT}
+      medium: {MEDIUM_COUNT}
+      low: {LOW_COUNT}
+    quality:
+      passing_tests: {PASSING_COUNT}
+      total_tests: {TOTAL_TESTS}
+      blocker_issues: {BLOCKER_COUNT}
+      warning_issues: {WARNING_COUNT}
+    recommendations:
+      - "{RECOMMENDATION_1}"
+      - "{RECOMMENDATION_2}"
+
+  # Phase 2: Gate Decision
+  gate_decision:
+    decision: "{PASS | CONCERNS | FAIL | WAIVED}"
+    gate_type: "{story | epic | release | hotfix}"
+    decision_mode: "{deterministic | manual}"
+    criteria:
+      p0_coverage: {p0_coverage}%
+      p0_pass_rate: {p0_pass_rate}%
+      p1_coverage: {p1_coverage}%
+      p1_pass_rate: {p1_pass_rate}%
+      overall_pass_rate: {overall_pass_rate}%
+      overall_coverage: {overall_coverage}%
+      security_issues: {security_issue_count}
+      critical_nfrs_fail: {critical_nfr_fail_count}
+      flaky_tests: {flaky_test_count}
+    thresholds:
+      min_p0_coverage: 100
+      min_p0_pass_rate: 100
+      min_p1_coverage: {min_p1_coverage}
+      min_p1_pass_rate: {min_p1_pass_rate}
+      min_overall_pass_rate: {min_overall_pass_rate}
+      min_coverage: {min_coverage}
+    evidence:
+      test_results: "{CI_run_id | test_report_url}"
+      traceability: "{trace_file_path}"
+      nfr_assessment: "{nfr_file_path}"
+      code_coverage: "{coverage_report_url}"
+    next_steps: "{brief_summary_of_recommendations}"
+    waiver: # Only if WAIVED
+      reason: "{business_justification}"
+      approver: "{name}, {role}"
+      expiry: "{YYYY-MM-DD}"
+      remediation_due: "{YYYY-MM-DD}"
+```
+
+---
+
+## Related Artifacts
+
+- **Story File:** {STORY_FILE_PATH}
+- **Test Design:** {TEST_DESIGN_PATH} (if available)
+- **Tech Spec:** {TECH_SPEC_PATH} (if available)
+- **Test Results:** {TEST_RESULTS_PATH}
+- **NFR Assessment:** {NFR_FILE_PATH} (if available)
+- **Test Files:** {TEST_DIR_PATH}
+
+---
+
+## Sign-Off
+
+**Phase 1 - Traceability Assessment:**
+
+- Overall Coverage: {OVERALL_PCT}%
+- P0 Coverage: {P0_PCT}% {P0_STATUS}
+- P1 Coverage: {P1_PCT}% {P1_STATUS}
+- Critical Gaps: {CRITICAL_COUNT}
+- High Priority Gaps: {HIGH_COUNT}
+
+**Phase 2 - Gate Decision:**
+
+- **Decision**: {PASS | CONCERNS | FAIL | WAIVED} {STATUS_ICON}
+- **P0 Evaluation**: {✅ ALL PASS | ❌ ONE OR MORE FAILED}
+- **P1 Evaluation**: {✅ ALL PASS | ⚠️ SOME CONCERNS | ❌ FAILED}
+
+**Overall Status:** {STATUS} {STATUS_ICON}
+
+**Next Steps:**
+
+- If PASS ✅: Proceed to deployment
+- If CONCERNS ⚠️: Deploy with monitoring, create remediation backlog
+- If FAIL ❌: Block deployment, fix critical issues, re-run workflow
+- If WAIVED 🔓: Deploy with business approval and aggressive monitoring
+
+**Generated:** {DATE}
+**Workflow:** testarch-trace v4.0 (Enhanced with Gate Decision)
+
+---
+
+<!-- Powered by BMAD-CORE™ -->
--- a/bmad/bmm/workflows/testarch/trace/workflow.yaml
+++ b/bmad/bmm/workflows/testarch/trace/workflow.yaml
@@ -0,0 +1,66 @@
+# Test Architect workflow: trace (enhanced with gate decision)
+name: testarch-trace
+description: "Generate requirements-to-tests traceability matrix, analyze coverage, and make quality gate decision (PASS/CONCERNS/FAIL/WAIVED)"
+author: "BMad"
+
+# Critical variables from config
+config_source: "{project-root}/bmad/bmm/config.yaml"
+output_folder: "{config_source}:output_folder"
+user_name: "{config_source}:user_name"
+communication_language: "{config_source}:communication_language"
+document_output_language: "{config_source}:document_output_language"
+date: system-generated
+
+# Workflow components
+installed_path: "{project-root}/bmad/bmm/workflows/testarch/trace"
+instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/trace-template.md"
+
+# Variables and inputs
+variables:
+  # Directory paths
+  test_dir: "{project-root}/tests" # Root test directory
+  source_dir: "{project-root}/src" # Source code directory
+
+  # Workflow behavior
+  coverage_levels: "e2e,api,component,unit" # Which test levels to trace
+  gate_type: "story" # story | epic | release | hotfix - determines gate scope
+  decision_mode: "deterministic" # deterministic (rule-based) | manual (team decision)
+
+# Output configuration
+default_output_file: "{output_folder}/traceability-matrix.md"
+
+# Required tools
+required_tools:
+  - read_file # Read story, test files, BMad artifacts
+  - write_file # Create traceability matrix, gate YAML
+  - list_files # Discover test files
+  - search_repo # Find tests by test ID, describe blocks
+  - glob # Find test files matching patterns
+
+# Recommended inputs
+recommended_inputs:
+  - story: "Story markdown with acceptance criteria (required for BMad mode)"
+  - test_files: "Test suite for the feature (auto-discovered if not provided)"
+  - test_design: "Test design with risk/priority assessment (required for Phase 2 gate)"
+  - tech_spec: "Technical specification (optional)"
+  - existing_tests: "Current test suite for analysis"
+  - test_results: "CI/CD test execution results (required for Phase 2 gate)"
+  - nfr_assess: "Non-functional requirements validation (recommended for release gates)"
+  - code_coverage: "Code coverage report (optional)"
+
+tags:
+  - qa
+  - traceability
+  - test-architect
+  - coverage
+  - requirements
+  - gate
+  - decision
+  - release
+
+execution_hints:
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
+  iterative: true