Files
pig-farm-controller/bmad/bmm/testarch/knowledge/timing-debugging.md
2025-11-01 19:22:39 +08:00

373 lines
14 KiB
Markdown

# Timing Debugging and Race Condition Fixes
## Principle
Race conditions arise when tests make assumptions about asynchronous timing (network, animations, state updates). **Deterministic waiting** eliminates flakiness by explicitly waiting for observable events (network responses, element state changes) instead of arbitrary timeouts.
## Rationale
**The Problem**: Tests pass locally but fail in CI (different timing), or pass/fail randomly (race conditions). Hard waits (`waitForTimeout`, `sleep`) mask timing issues without solving them.
**The Solution**: Replace all hard waits with event-based waits (`waitForResponse`, `waitFor({ state })`). Implement network-first pattern (intercept before navigate). Use explicit state checks (loading spinner detached, data loaded). This makes tests deterministic regardless of network speed or system load.
**Why This Matters**:
- Eliminates flaky tests (0 tolerance for timing-based failures)
- Works consistently across environments (local, CI, production-like)
- Faster test execution (no unnecessary waits)
- Clearer test intent (explicit about what we're waiting for)
## Pattern Examples
### Example 1: Race Condition Identification (Network-First Pattern)
**Context**: Prevent race conditions by intercepting network requests before navigation
**Implementation**:
```typescript
// tests/timing/race-condition-prevention.spec.ts
import { test, expect } from '@playwright/test';
test.describe('Race Condition Prevention Patterns', () => {
test('❌ Anti-Pattern: Navigate then intercept (race condition)', async ({ page, context }) => {
// BAD: Navigation starts before interception ready
await page.goto('/products'); // ⚠️ Race! API might load before route is set
await context.route('**/api/products', (route) => {
route.fulfill({ status: 200, body: JSON.stringify({ products: [] }) });
});
// Test may see real API response or mock (non-deterministic)
});
test('✅ Pattern: Intercept BEFORE navigate (deterministic)', async ({ page, context }) => {
// GOOD: Interception ready before navigation
await context.route('**/api/products', (route) => {
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({
products: [
{ id: 1, name: 'Product A', price: 29.99 },
{ id: 2, name: 'Product B', price: 49.99 },
],
}),
});
});
const responsePromise = page.waitForResponse('**/api/products');
await page.goto('/products'); // Navigation happens AFTER route is ready
await responsePromise; // Explicit wait for network
// Test sees mock response reliably (deterministic)
await expect(page.getByText('Product A')).toBeVisible();
});
test('✅ Pattern: Wait for element state change (loading → loaded)', async ({ page }) => {
await page.goto('/dashboard');
// Wait for loading indicator to appear (confirms load started)
await page.getByTestId('loading-spinner').waitFor({ state: 'visible' });
// Wait for loading indicator to disappear (confirms load complete)
await page.getByTestId('loading-spinner').waitFor({ state: 'detached' });
// Content now reliably visible
await expect(page.getByTestId('dashboard-data')).toBeVisible();
});
test('✅ Pattern: Explicit visibility check (not just presence)', async ({ page }) => {
await page.goto('/modal-demo');
await page.getByRole('button', { name: 'Open Modal' }).click();
// ❌ Bad: Element exists but may not be visible yet
// await expect(page.getByTestId('modal')).toBeAttached()
// ✅ Good: Wait for visibility (accounts for animations)
await expect(page.getByTestId('modal')).toBeVisible();
await expect(page.getByRole('heading', { name: 'Modal Title' })).toBeVisible();
});
test('❌ Anti-Pattern: waitForLoadState("networkidle") in SPAs', async ({ page }) => {
// ⚠️ Deprecated for SPAs (WebSocket connections never idle)
// await page.goto('/dashboard')
// await page.waitForLoadState('networkidle') // May timeout in SPAs
// ✅ Better: Wait for specific API response
const responsePromise = page.waitForResponse('**/api/dashboard');
await page.goto('/dashboard');
await responsePromise;
await expect(page.getByText('Dashboard loaded')).toBeVisible();
});
});
```
**Key Points**:
- Network-first: ALWAYS intercept before navigate (prevents race conditions)
- State changes: Wait for loading spinner detached (explicit load completion)
- Visibility vs presence: `toBeVisible()` accounts for animations, `toBeAttached()` doesn't
- Avoid networkidle: Unreliable in SPAs (WebSocket, polling connections)
- Explicit waits: Document exactly what we're waiting for
---
### Example 2: Deterministic Waiting Patterns (Event-Based, Not Time-Based)
**Context**: Replace all hard waits with observable event waits
**Implementation**:
```typescript
// tests/timing/deterministic-waits.spec.ts
import { test, expect } from '@playwright/test';
test.describe('Deterministic Waiting Patterns', () => {
test('waitForResponse() with URL pattern', async ({ page }) => {
const responsePromise = page.waitForResponse('**/api/products');
await page.goto('/products');
await responsePromise; // Deterministic (waits for exact API call)
await expect(page.getByText('Products loaded')).toBeVisible();
});
test('waitForResponse() with predicate function', async ({ page }) => {
const responsePromise = page.waitForResponse((resp) => resp.url().includes('/api/search') && resp.status() === 200);
await page.goto('/search');
await page.getByPlaceholder('Search').fill('laptop');
await page.getByRole('button', { name: 'Search' }).click();
await responsePromise; // Wait for successful search response
await expect(page.getByTestId('search-results')).toBeVisible();
});
test('waitForFunction() for custom conditions', async ({ page }) => {
await page.goto('/dashboard');
// Wait for custom JavaScript condition
await page.waitForFunction(() => {
const element = document.querySelector('[data-testid="user-count"]');
return element && parseInt(element.textContent || '0') > 0;
});
// User count now loaded
await expect(page.getByTestId('user-count')).not.toHaveText('0');
});
test('waitFor() element state (attached, visible, hidden, detached)', async ({ page }) => {
await page.goto('/products');
// Wait for element to be attached to DOM
await page.getByTestId('product-list').waitFor({ state: 'attached' });
// Wait for element to be visible (animations complete)
await page.getByTestId('product-list').waitFor({ state: 'visible' });
// Perform action
await page.getByText('Product A').click();
// Wait for modal to be hidden (close animation complete)
await page.getByTestId('modal').waitFor({ state: 'hidden' });
});
test('Cypress: cy.wait() with aliased intercepts', async () => {
// Cypress example (not Playwright)
/*
cy.intercept('GET', '/api/products').as('getProducts')
cy.visit('/products')
cy.wait('@getProducts') // Deterministic wait for specific request
cy.get('[data-testid="product-list"]').should('be.visible')
*/
});
});
```
**Key Points**:
- `waitForResponse()`: Wait for specific API calls (URL pattern or predicate)
- `waitForFunction()`: Wait for custom JavaScript conditions
- `waitFor({ state })`: Wait for element state changes (attached, visible, hidden, detached)
- Cypress `cy.wait('@alias')`: Deterministic wait for aliased intercepts
- All waits are event-based (not time-based)
---
### Example 3: Timing Anti-Patterns (What NEVER to Do)
**Context**: Common timing mistakes that cause flakiness
**Problem Examples**:
```typescript
// tests/timing/anti-patterns.spec.ts
import { test, expect } from '@playwright/test';
test.describe('Timing Anti-Patterns to Avoid', () => {
test('❌ NEVER: page.waitForTimeout() (arbitrary delay)', async ({ page }) => {
await page.goto('/dashboard');
// ❌ Bad: Arbitrary 3-second wait (flaky)
// await page.waitForTimeout(3000)
// Problem: Might be too short (CI slower) or too long (wastes time)
// ✅ Good: Wait for observable event
await page.waitForResponse('**/api/dashboard');
await expect(page.getByText('Dashboard loaded')).toBeVisible();
});
test('❌ NEVER: cy.wait(number) without alias (arbitrary delay)', async () => {
// Cypress example
/*
// ❌ Bad: Arbitrary delay
cy.visit('/products')
cy.wait(2000) // Flaky!
// ✅ Good: Wait for specific request
cy.intercept('GET', '/api/products').as('getProducts')
cy.visit('/products')
cy.wait('@getProducts') // Deterministic
*/
});
test('❌ NEVER: Multiple hard waits in sequence (compounding delays)', async ({ page }) => {
await page.goto('/checkout');
// ❌ Bad: Stacked hard waits (6+ seconds wasted)
// await page.waitForTimeout(2000) // Wait for form
// await page.getByTestId('email').fill('test@example.com')
// await page.waitForTimeout(1000) // Wait for validation
// await page.getByTestId('submit').click()
// await page.waitForTimeout(3000) // Wait for redirect
// ✅ Good: Event-based waits (no wasted time)
await page.getByTestId('checkout-form').waitFor({ state: 'visible' });
await page.getByTestId('email').fill('test@example.com');
await page.waitForResponse('**/api/validate-email');
await page.getByTestId('submit').click();
await page.waitForURL('**/confirmation');
});
test('❌ NEVER: waitForLoadState("networkidle") in SPAs', async ({ page }) => {
// ❌ Bad: Unreliable in SPAs (WebSocket connections never idle)
// await page.goto('/dashboard')
// await page.waitForLoadState('networkidle') // Timeout in SPAs!
// ✅ Good: Wait for specific API responses
await page.goto('/dashboard');
await page.waitForResponse('**/api/dashboard');
await page.waitForResponse('**/api/user');
await expect(page.getByTestId('dashboard-content')).toBeVisible();
});
test('❌ NEVER: Sleep/setTimeout in tests', async ({ page }) => {
await page.goto('/products');
// ❌ Bad: Node.js sleep (blocks test thread)
// await new Promise(resolve => setTimeout(resolve, 2000))
// ✅ Good: Playwright auto-waits for element
await expect(page.getByText('Products loaded')).toBeVisible();
});
});
```
**Why These Fail**:
- **Hard waits**: Arbitrary timeouts (too short → flaky, too long → slow)
- **Stacked waits**: Compound delays (wasteful, unreliable)
- **networkidle**: Broken in SPAs (WebSocket/polling never idle)
- **Sleep**: Blocks execution (wastes time, doesn't solve race conditions)
**Better Approach**: Use event-based waits from examples above
---
## Async Debugging Techniques
### Technique 1: Promise Chain Analysis
```typescript
test('debug async waterfall with console logs', async ({ page }) => {
console.log('1. Starting navigation...');
await page.goto('/products');
console.log('2. Waiting for API response...');
const response = await page.waitForResponse('**/api/products');
console.log('3. API responded:', response.status());
console.log('4. Waiting for UI update...');
await expect(page.getByText('Products loaded')).toBeVisible();
console.log('5. Test complete');
// Console output shows exactly where timing issue occurs
});
```
### Technique 2: Network Waterfall Inspection (DevTools)
```typescript
test('inspect network timing with trace viewer', async ({ page }) => {
await page.goto('/dashboard');
// Generate trace for analysis
// npx playwright test --trace on
// npx playwright show-trace trace.zip
// In trace viewer:
// 1. Check Network tab for API call timing
// 2. Identify slow requests (>1s response time)
// 3. Find race conditions (overlapping requests)
// 4. Verify request order (dependencies)
});
```
### Technique 3: Trace Viewer for Timing Visualization
```typescript
test('use trace viewer to debug timing', async ({ page }) => {
// Run with trace: npx playwright test --trace on
await page.goto('/checkout');
await page.getByTestId('submit').click();
// In trace viewer, examine:
// - Timeline: See exact timing of each action
// - Snapshots: Hover to see DOM state at each moment
// - Network: Identify slow/failed requests
// - Console: Check for async errors
await expect(page.getByText('Success')).toBeVisible();
});
```
---
## Race Condition Checklist
Before deploying tests:
- [ ] **Network-first pattern**: All routes intercepted BEFORE navigation (no race conditions)
- [ ] **Explicit waits**: Every navigation followed by `waitForResponse()` or state check
- [ ] **No hard waits**: Zero instances of `waitForTimeout()`, `cy.wait(number)`, `sleep()`
- [ ] **Element state waits**: Loading spinners use `waitFor({ state: 'detached' })`
- [ ] **Visibility checks**: Use `toBeVisible()` (accounts for animations), not just `toBeAttached()`
- [ ] **Response validation**: Wait for successful responses (`resp.ok()` or `status === 200`)
- [ ] **Trace viewer analysis**: Generate traces to identify timing issues (network waterfall, console errors)
- [ ] **CI/local parity**: Tests pass reliably in both environments (no timing assumptions)
## Integration Points
- **Used in workflows**: `*automate` (healing timing failures), `*test-review` (detect hard wait anti-patterns), `*framework` (configure timeout standards)
- **Related fragments**: `test-healing-patterns.md` (race condition diagnosis), `network-first.md` (interception patterns), `playwright-config.md` (timeout configuration), `visual-debugging.md` (trace viewer analysis)
- **Tools**: Playwright Inspector (`--debug`), Trace Viewer (`--trace on`), DevTools Network tab
_Source: Playwright timing best practices, network-first pattern from test-resources-for-ai, production race condition debugging_