pig-farm-controller/bmad/bmm/workflows/document-project/README.md

# Document Project Workflow

**Version:** 1.2.0
**Module:** BMM (BMAD Method Module)
**Type:** Action Workflow (Documentation Generator)

## Purpose

Analyzes and documents brownfield projects by scanning codebase, architecture, and patterns to create comprehensive reference documentation for AI-assisted development. Generates a master index and multiple documentation files tailored to project structure and type.

**NEW in v1.2.0:** Context-safe architecture with scan levels, resumability, and write-as-you-go pattern to prevent context exhaustion.

## Key Features

- **Multi-Project Type Support**: Handles web, backend, mobile, CLI, game, embedded, data, infra, library, desktop, and extension projects
- **Multi-Part Detection**: Automatically detects and documents projects with separate client/server or multiple services
- **Three Scan Levels** (NEW v1.2.0): Quick (2-5 min), Deep (10-30 min), Exhaustive (30-120 min)
- **Resumability** (NEW v1.2.0): Interrupt and resume workflows without losing progress
- **Write-as-you-go** (NEW v1.2.0): Documents written immediately to prevent context exhaustion
- **Intelligent Batching** (NEW v1.2.0): Subfolder-based processing for deep/exhaustive scans
- **Data-Driven Analysis**: Uses CSV-based project type detection and documentation requirements
- **Comprehensive Scanning**: Analyzes APIs, data models, UI components, configuration, security patterns, and more
- **Architecture Matching**: Matches projects to 170+ architecture templates from the solutioning registry
- **Brownfield PRD Ready**: Generates documentation specifically designed for AI agents planning new features

## How to Invoke

```bash
workflow document-project
```

Or from BMAD CLI:

```bash
/bmad:bmm:workflows:document-project
```

## Scan Levels (NEW in v1.2.0)

Choose the right scan depth for your needs:

### 1. Quick Scan (Default)

**Duration:** 2-5 minutes
**What it does:** Pattern-based analysis without reading source files
**Reads:** Config files, package manifests, directory structure, README
**Use when:**

- You need a fast project overview
- Initial understanding of project structure
- Planning next steps before deeper analysis

**Does NOT read:** Source code files (_.js, _.ts, _.py, _.go, etc.)

### 2. Deep Scan

**Duration:** 10-30 minutes
**What it does:** Reads files in critical directories based on project type
**Reads:** Files in critical paths defined by documentation requirements
**Use when:**

- Creating comprehensive documentation for brownfield PRD
- Need detailed analysis of key areas
- Want balance between depth and speed

**Example:** For a web app, reads controllers/, models/, components/, but not every utility file

### 3. Exhaustive Scan

**Duration:** 30-120 minutes
**What it does:** Reads ALL source files in project
**Reads:** Every source file (excludes node_modules, dist, build, .git)
**Use when:**

- Complete project analysis needed
- Migration planning requires full understanding
- Detailed audit of entire codebase
- Deep technical debt assessment

**Note:** Deep-dive mode ALWAYS uses exhaustive scan (no choice)

## Resumability (NEW in v1.2.0)

The workflow can be interrupted and resumed without losing progress:

- **State Tracking:** Progress saved in `project-scan-report.json`
- **Auto-Detection:** Workflow detects incomplete runs (<24 hours old)
- **Resume Prompt:** Choose to resume or start fresh
- **Step-by-Step:** Resume from exact step where interrupted
- **Archiving:** Old state files automatically archived

**Example Resume Flow:**

```
> workflow document-project

I found an in-progress workflow state from 2025-10-11 14:32:15.

Current Progress:
- Mode: initial_scan
- Scan Level: deep
- Completed Steps: 5/12
- Last Step: step_5

Would you like to:
1. Resume from where we left off - Continue from step 6
2. Start fresh - Archive old state and begin new scan
3. Cancel - Exit without changes

Your choice [1/2/3]:
```

## What It Does

### Step-by-Step Process

1. **Detects Project Structure** - Identifies if project is single-part or multi-part (client/server/etc.)
2. **Classifies Project Type** - Matches against 12 project types (web, backend, mobile, etc.)
3. **Discovers Documentation** - Finds existing README, CONTRIBUTING, ARCHITECTURE files
4. **Analyzes Tech Stack** - Parses package files, identifies frameworks, versions, dependencies
5. **Conditional Scanning** - Performs targeted analysis based on project type requirements:
   - API routes and endpoints
   - Database models and schemas
   - State management patterns
   - UI component libraries
   - Configuration and security
   - CI/CD and deployment configs
6. **Generates Source Tree** - Creates annotated directory structure with critical paths
7. **Extracts Dev Instructions** - Documents setup, build, run, and test commands
8. **Creates Architecture Docs** - Generates detailed architecture using matched templates
9. **Builds Master Index** - Creates comprehensive index.md as primary AI retrieval source
10. **Validates Output** - Runs 140+ point checklist to ensure completeness

### Output Files

**Single-Part Projects:**

- `index.md` - Master index
- `project-overview.md` - Executive summary
- `architecture.md` - Detailed architecture
- `source-tree-analysis.md` - Annotated directory tree
- `component-inventory.md` - Component catalog (if applicable)
- `development-guide.md` - Local dev instructions
- `api-contracts.md` - API documentation (if applicable)
- `data-models.md` - Database schema (if applicable)
- `deployment-guide.md` - Deployment process (optional)
- `contribution-guide.md` - Contributing guidelines (optional)
- `project-scan-report.json` - State file for resumability (NEW v1.2.0)

**Multi-Part Projects (e.g., client + server):**

- `index.md` - Master index with part navigation
- `project-overview.md` - Multi-part summary
- `architecture-{part_id}.md` - Per-part architecture docs
- `source-tree-analysis.md` - Full tree with part annotations
- `component-inventory-{part_id}.md` - Per-part components
- `development-guide-{part_id}.md` - Per-part dev guides
- `integration-architecture.md` - How parts communicate
- `project-parts.json` - Machine-readable metadata
- `project-scan-report.json` - State file for resumability (NEW v1.2.0)
- Additional conditional files per part (API, data models, etc.)

## Data Files

The workflow uses a single comprehensive CSV file:

**documentation-requirements.csv** - Complete project analysis guide

- Location: `/bmad/bmm/workflows/document-project/documentation-requirements.csv`
- 12 project types (web, mobile, backend, cli, library, desktop, game, data, extension, infra, embedded)
- 24 columns combining:
  - **Detection columns**: `project_type_id`, `key_file_patterns` (identifies project type from codebase)
  - **Requirement columns**: `requires_api_scan`, `requires_data_models`, `requires_ui_components`, etc.
  - **Pattern columns**: `critical_directories`, `test_file_patterns`, `config_patterns`, etc.
- Self-contained: All project detection AND scanning requirements in one file
- Architecture patterns inferred from tech stack (no external registry needed)

## Use Cases

### Primary Use Case: Brownfield PRD Creation

After running this workflow, use the generated `index.md` as input to brownfield PRD workflows:

```
User: "I want to add a new dashboard feature"
PRD Workflow: Loads docs/index.md
→ Understands existing architecture
→ Identifies reusable components
→ Plans integration with existing APIs
→ Creates contextual PRD with epics and stories
```

### Other Use Cases

- **Onboarding New Developers** - Comprehensive project documentation
- **Architecture Review** - Structured analysis of existing system
- **Technical Debt Assessment** - Identify patterns and anti-patterns
- **Migration Planning** - Understand current state before refactoring

## Requirements

### Recommended Inputs (Optional)

- Project root directory (defaults to current directory)
- README.md or similar docs (auto-discovered if present)
- User guidance on key areas to focus (workflow will ask)

### Tools Used

- File system scanning (Glob, Read, Grep)
- Code analysis
- Git repository analysis (optional)

## Configuration

### Default Output Location

Files are saved to: `{output_folder}` (from config.yaml)

Default: `/docs/` folder in project root

### Customization

- Modify `documentation-requirements.csv` to adjust scanning patterns for project types
- Add new project types to `project-types.csv`
- Add new architecture templates to `registry.csv`

## Example: Multi-Part Web App

**Input:**

```
my-app/
├── client/     # React frontend
├── server/     # Express backend
└── README.md
```

**Detection Result:**

- Repository Type: Monorepo
- Part 1: client (web/React)
- Part 2: server (backend/Express)

**Output (10+ files):**

```
docs/
├── index.md
├── project-overview.md
├── architecture-client.md
├── architecture-server.md
├── source-tree-analysis.md
├── component-inventory-client.md
├── development-guide-client.md
├── development-guide-server.md
├── api-contracts-server.md
├── data-models-server.md
├── integration-architecture.md
└── project-parts.json
```

## Example: Simple CLI Tool

**Input:**

```
hello-cli/
├── main.go
├── go.mod
└── README.md
```

**Detection Result:**

- Repository Type: Monolith
- Part 1: main (cli/Go)

**Output (4 files):**

```
docs/
├── index.md
├── project-overview.md
├── architecture.md
└── source-tree-analysis.md
```

## Deep-Dive Mode

### What is Deep-Dive Mode?

When you run the workflow on a project that already has documentation, you'll be offered a choice:

1. **Rescan entire project** - Update all documentation with latest changes
2. **Deep-dive into specific area** - Generate EXHAUSTIVE documentation for a particular feature/module/folder
3. **Cancel** - Keep existing documentation

Deep-dive mode performs **comprehensive, file-by-file analysis** of a specific area, reading EVERY file completely and documenting:

- All exports with complete signatures
- All imports and dependencies
- Dependency graphs and data flow
- Code patterns and implementations
- Testing coverage and strategies
- Integration points
- Reuse opportunities

### When to Use Deep-Dive Mode

- **Before implementing a feature** - Deep-dive the area you'll be modifying
- **During architecture review** - Deep-dive complex modules
- **For code understanding** - Deep-dive unfamiliar parts of codebase
- **When creating PRDs** - Deep-dive areas affected by new features

### Deep-Dive Process

1. Workflow detects existing `index.md`
2. Offers deep-dive option
3. Suggests areas based on project structure:
   - API route groups
   - Feature modules
   - UI component areas
   - Services/business logic
4. You select area or specify custom path
5. Workflow reads EVERY file in that area
6. Generates `deep-dive-{area-name}.md` with complete analysis
7. Updates `index.md` with link to deep-dive doc
8. Offers to deep-dive another area or finish

### Deep-Dive Output Example

**docs/deep-dive-dashboard-feature.md:**

- Complete file inventory (47 files analyzed)
- Every export with signatures
- Dependency graph
- Data flow analysis
- Integration points
- Testing coverage
- Related code references
- Implementation guidance
- ~3,000 LOC documented in detail

### Incremental Deep-Diving

You can deep-dive multiple areas over time:

- First run: Scan entire project → generates index.md
- Second run: Deep-dive dashboard feature
- Third run: Deep-dive API layer
- Fourth run: Deep-dive authentication system

All deep-dive docs are linked from the master index.

## Validation

The workflow includes a comprehensive 160+ point checklist covering:

- Project detection accuracy
- Technology stack completeness
- Codebase scanning thoroughness
- Architecture documentation quality
- Multi-part handling (if applicable)
- Brownfield PRD readiness
- Deep-dive completeness (if applicable)

## Next Steps After Completion

1. **Review** `docs/index.md` - Your master documentation index
2. **Validate** - Check generated docs for accuracy
3. **Use for PRD** - Point brownfield PRD workflow to index.md
4. **Maintain** - Re-run workflow when architecture changes significantly

## File Structure

```
document-project/
├── workflow.yaml                    # Workflow configuration
├── instructions.md                  # Step-by-step workflow logic
├── checklist.md                     # Validation criteria
├── documentation-requirements.csv   # Project type scanning patterns
├── templates/                       # Output templates
│   ├── index-template.md
│   ├── project-overview-template.md
│   └── source-tree-template.md
└── README.md                        # This file
```

## Troubleshooting

**Issue: Project type not detected correctly**

- Solution: Workflow will ask for confirmation; manually select correct type

**Issue: Missing critical information**

- Solution: Provide additional context when prompted; re-run specific analysis steps

**Issue: Multi-part detection missed a part**

- Solution: When asked to confirm parts, specify the missing part and its path

**Issue: Architecture template doesn't match well**

- Solution: Check registry.csv; may need to add new template or adjust matching criteria

## Architecture Improvements in v1.2.0

### Context-Safe Design

The workflow now uses a write-as-you-go architecture:

- Documents written immediately to disk (not accumulated in memory)
- Detailed findings purged after writing (only summaries kept)
- State tracking enables resumption from any step
- Batching strategy prevents context exhaustion on large projects

### Batching Strategy

For deep/exhaustive scans:

- Process ONE subfolder at a time
- Read files → Extract info → Write output → Validate → Purge context
- Primary concern is file SIZE (not count)
- Track batches in state file for resumability

### State File Format

Optimized JSON (no pretty-printing):

```json
{
  "workflow_version": "1.2.0",
  "timestamps": {...},
  "mode": "initial_scan",
  "scan_level": "deep",
  "completed_steps": [...],
  "current_step": "step_6",
  "findings": {"summary": "only"},
  "outputs_generated": [...],
  "resume_instructions": "..."
}
```