beanflows/MULTIAGENT_SYSTEM_README.md

# Multi-Agent System for Claude Code

A lean, pragmatic multi-agent system for software and data engineering tasks. Designed for small teams (1-2 people) building data products.

---

## Philosophy

**Simple, Direct, Procedural Code**
- Functions over classes
- Data-oriented design
- Explicit over clever
- Solve actual problems, not general cases

Inspired by Casey Muratori and Jonathan Blow's approach to software development.

---

## System Structure

```
agent_system/
├── README.md                      # This file
├── coding_philosophy.md           # Core principles (reference for all agents)
├── orchestrator.md                # Lead Engineer Agent
├── code_analysis_agent.md         # Code exploration agent
├── implementation_agent.md        # Code writing agent
└── testing_agent.md               # Testing agent
```

---

## Agents

### 1. Orchestrator (Lead Engineer Agent)
**File:** `orchestrator.md`

**Role:** Coordinates all work, decides when to use workers

**Responsibilities:**
- Analyze task complexity
- Decide: handle directly or spawn workers
- Create worker specifications
- Synthesize worker outputs
- Make architectural decisions

**Use:** This is your main agent for Claude Code

### 2. Code Analysis Agent
**File:** `code_analysis_agent.md`

**Role:** Explore and understand code (read-only)

**Responsibilities:**
- Map code structure
- Trace data flow
- Identify patterns and issues
- Answer specific questions about codebase

**Use:** When you need to understand existing code before making changes

### 3. Implementation Agent
**File:** `implementation_agent.md`

**Role:** Write simple, direct code

**Responsibilities:**
- Implement features
- Modify existing code
- Write SQLMesh models
- Create Robyn routes
- Build evidence.dev dashboards

**Use:** For building and modifying code

### 4. Testing Agent
**File:** `testing_agent.md`

**Role:** Verify code works correctly

**Responsibilities:**
- Write pytest tests
- Create SQL test queries
- Test data transformations
- Validate edge cases

**Use:** For creating test suites

---

## How It Works

### Decision Tree

```
Task received
    ↓
Can I do this directly in <30 tool calls?
    ↓
YES → Handle directly (90% of tasks)
NO  → ↓
    ↓
Is this truly parallelizable?
    ↓
YES → Spawn 2-3 workers (10% of tasks)
NO  → Handle directly anyway
```

**Golden Rule:** Most tasks should be handled directly by the orchestrator. Only use multiple agents when parallelization provides clear benefit.

### Example: Simple Task (Direct)

```
User: "Add an API endpoint to get user activity"

Orchestrator: This is straightforward, <20 tool calls
    ↓
    Handles directly:
    - Creates route in src/routes/activity.py
    - Queries data lake
    - Returns JSON
    - Tests manually
    - Done
```

**No workers needed.** Fast and simple.

### Example: Complex Task (Multi-Agent)

```
User: "Migrate ETL pipeline to SQLMesh"

Orchestrator: This is complex, will benefit from parallel work
    ↓
    Phase 1 - Analysis:
    Spawns Code Analysis Agent
    - Maps existing pipeline
    - Identifies transformations
    - Documents dependencies
    → Writes to .agent_work/analysis/
    ↓
    Phase 2 - Implementation:
    Spawns 2 Implementation Agents in parallel
    - Agent A: Extract models
    - Agent B: Transform models
    → Both write to .agent_work/implementation/
    ↓
    Phase 3 - Testing:
    Spawns Testing Agent
    - Validates output correctness
    → Writes to .agent_work/testing/
    ↓
    Orchestrator synthesizes:
    - Reviews all outputs
    - Resolves conflicts
    - Creates migration plan
    - Done
```

**Parallelization saves time** on truly complex work.

---

## Tech Stack

### Data Engineering
- **SQLMesh** - Data transformation framework (SQL models)
- **DuckDB** - Analytics database (OLAP queries)
- **Iceberg** - Data lake table format (on R2 storage)
- **ELT** - Extract → Load → Transform (in warehouse)

### SaaS Application
- **Robyn** - Python web framework
  - Hosts landing page, auth, payment
  - Serves evidence.dev build at `/dashboard/`
- **evidence.dev** - BI dashboards (SQL + Markdown → static site)

### Architecture
```
User → Robyn
        ├── / (landing, auth, payment)
        ├── /api/* (API endpoints)
        └── /dashboard/* (evidence.dev build)
```

---

## Working Directory

All agent work goes into `.agent_work/` with feature-specific subdirectories:

```
project_root/
├── README.md                          # Architecture, setup, tech stack
├── CLAUDE.md                          # Memory: decisions, patterns, conventions
├── .agent_work/                       # Agent work (add to .gitignore)
│   ├── feature-user-dashboard/        # Feature-specific directory
│   │   ├── project_state.md           # Track this feature's progress
│   │   ├── analysis/
│   │   │   └── findings.md
│   │   ├── implementation/
│   │   │   ├── feature.py
│   │   │   └── notes.md
│   │   └── testing/
│   │       ├── test_feature.py
│   │       └── results.md
│   └── feature-payment-integration/   # Another feature
│       ├── project_state.md
│       ├── analysis/
│       ├── implementation/
│       └── testing/
├── models/                            # SQLMesh models
├── src/                               # Application code
└── tests/                             # Final test suite
```

**Workflow:**
1. New feature → Create branch: `git checkout -b feature-name`
2. Create `.agent_work/feature-name/` directory
3. Track progress in `.agent_work/feature-name/project_state.md`
4. Update global context in `README.md` and `CLAUDE.md` as needed

**Global vs Feature Context:**
- **README.md**: Current architecture, tech stack, how to run
- **CLAUDE.md**: Memory file - decisions, patterns, conventions to follow
- **project_state.md**: Feature-specific progress (in `.agent_work/feature-name/`)

**Why `.agent_work/` instead of `/tmp/`:**
- Persists across sessions
- Easy to review agent work
- Can reference with normal paths
- Keep or discard as needed
- Feature-scoped organization

**Add to `.gitignore`:**
```
.agent_work/
```

---

## Usage in Claude Code

### Setting Up

1. Copy agent system files to your project:
   ```
   mkdir -p .claude/agents/
   cp agent_system/* .claude/agents/
   ```

2. Add to `.gitignore`:
   ```
   .agent_work/
   ```

3. Create `.agent_work/` directory:
   ```
   mkdir -p .agent_work/{analysis,implementation,testing}
   ```

### Using the Orchestrator

In Claude Code, load the orchestrator:

```
@orchestrator.md

[Your request here]
```

The orchestrator will:
1. Analyze the task
2. Decide if workers are needed
3. Spawn workers if beneficial
4. Handle directly if simple
5. Synthesize results
6. Deliver solution

### When Workers Are Spawned

The orchestrator automatically reads the appropriate agent file when spawning:

```
Orchestrator reads: code_analysis_agent.md
    ↓
Creates specific task spec
    ↓
Spawns Code Analysis Agent with:
    - Agent instructions (from file)
    - Task specification
    - Output location
    ↓
Worker executes independently
    ↓
Writes to .agent_work/analysis/
```

---

## Coding Philosophy

All agents follow these principles (from `coding_philosophy.md`):

### Core Principles
1. **Functions over classes** - Use functions unless you truly need classes
2. **Data is data** - Simple structures (dicts, lists), not objects hiding behavior
3. **Explicit over implicit** - No magic, no hiding
4. **Simple control flow** - Straightforward if/else, early returns
5. **Build minimum that works** - Solve actual problem, not general case

### What to Avoid
❌ Classes wrapping single functions
❌ Inheritance hierarchies
❌ Framework magic
❌ Over-abstraction "for future flexibility"
❌ Configuration as code pyramids

### What to Do
✅ Write simple, direct functions
✅ Make data transformations obvious
✅ Handle errors explicitly
✅ Keep business logic in SQL when possible
✅ Think about performance

---

## Examples

### Example 1: Build Dashboard

**Request:** "Create dashboard showing user activity trends"

**Orchestrator Decision:** Moderate complexity, 2 independent tasks

**Execution:**
1. Setup:
   - Create branch: `git checkout -b feature-user-dashboard`
   - Create `.agent_work/feature-user-dashboard/`
   - Read `README.md` and `CLAUDE.md` for context

2. Spawns Implementation Agent A
   - Creates SQLMesh model (user_activity_daily.sql)
   - Writes to `.agent_work/feature-user-dashboard/implementation-data/`

3. Spawns Implementation Agent B (parallel)
   - Creates evidence.dev dashboard
   - Writes to `.agent_work/feature-user-dashboard/implementation-viz/`

4. Orchestrator synthesizes
   - Reviews both outputs
   - Tests evidence build
   - Deploys together
   - Updates `.agent_work/feature-user-dashboard/project_state.md`

**Result:** Working dashboard with data model

### Example 2: Fix Bug

**Request:** "This query is timing out, fix it"

**Orchestrator Decision:** Simple, direct handling

**Execution:**
1. Setup:
   - Create branch: `git checkout -b fix-query-timeout`
   - Create `.agent_work/fix-query-timeout/`

2. Orchestrator handles directly
   - Runs EXPLAIN ANALYZE
   - Identifies missing index
   - Creates index
   - Tests performance
   - Documents in `.agent_work/fix-query-timeout/project_state.md`

**Result:** Query now fast, documented

### Example 3: Large Refactor

**Request:** "Migrate 50 Python files from sync to async"

**Orchestrator Decision:** Complex, phased approach

**Execution:**
1. Phase 1: Analysis
   - Code Analysis Agent maps dependencies
   - Identifies blocking calls
   - Writes to `.agent_work/analysis/`

2. Phase 2: Implementation (parallel)
   - Implementation Agent A: Core modules (20 files)
   - Implementation Agent B: API routes (15 files)
   - Implementation Agent C: Utils (15 files)
   - All write to `.agent_work/implementation/`

3. Phase 3: Testing
   - Testing Agent validates async behavior
   - Writes to `.agent_work/testing/`

4. Orchestrator synthesizes
   - Resolves conflicts
   - Integration testing
   - Migration plan

**Result:** Migrated codebase with tests

---

## Best Practices

### For Orchestrator
- Default to handling directly
- Spawn workers only for truly parallel work
- Give workers focused, non-overlapping tasks
- Use extended thinking for planning
- Document decisions in `project_state.md`

### For Worker Specs
**Good:**
```
AGENT: implementation
OBJECTIVE: Create SQLMesh model for user_activity_daily
SCOPE: Create models/user_activity_daily.sql
CONSTRAINTS: DuckDB SQL, incremental by date, partition by event_date
OUTPUT: .agent_work/implementation/models/
BUDGET: 20 tool calls
```

**Bad:**
```
AGENT: implementation
OBJECTIVE: Help with the data stuff
```

### For Long Tasks
- Maintain `.agent_work/project_state.md`
- Update after each major phase
- Use compaction if approaching context limits
- Load files just-in-time (not entire codebase)

---

## Context Management

### Just-in-Time Loading

Don't load entire codebases:
```bash
# Good: Survey, then target
find models/ -name "*.sql" | head -10
rg "SELECT.*FROM" models/
cat models/specific_model.sql

# Bad: Load everything
cat models/*.sql
```

### Project State Tracking

For long tasks (>50 turns), maintain state:

```markdown
## Project: [Name]
## Phase: [Current]

### Completed
- [x] Task 1 - Agent - Outcome

### Current
- [ ] Task 2 - Agent - Status

### Decisions
1. Decision - Rationale

### Next Steps
1. Step 1
2. Step 2
```

---

## Troubleshooting

### "Workers are duplicating work"
**Cause:** Vague task boundaries
**Fix:** Be more specific, assign non-overlapping files

### "Coordination overhead too high"
**Cause:** Task not parallelizable
**Fix:** Handle directly, don't use workers

### "Context window exceeded"
**Cause:** Loading too much data
**Fix:** Use JIT loading, summarize outputs

### "Workers stepping on each other"
**Cause:** Overlapping responsibilities
**Fix:** Separate by file/module, clear boundaries

---

## Summary

**System:**
- 4 agents: Orchestrator + 3 workers
- Orchestrator handles most tasks directly (90%)
- Workers used for truly complex, parallelizable work (10%)

**Philosophy:**
- Simple, direct, procedural code
- Data-oriented design
- Functions over classes
- Build minimum that works

**Tech Stack:**
- Data: SQLMesh, DuckDB, Iceberg, ELT
- SaaS: Robyn, evidence.dev
- Testing: pytest, DuckDB SQL tests

**Working Directory:**
- `.agent_work/` for all agent outputs
- Add to `.gitignore`
- Review, then move to final locations

**Golden Rule:** When in doubt, go simpler. Most tasks don't need multiple agents.

---

## Getting Started

1. Read `coding_philosophy.md` to understand principles
2. Use `orchestrator.md` as your main agent in Claude Code
3. Let orchestrator decide when to spawn workers
4. Review outputs in `.agent_work/`
5. Iterate based on results

Start simple. Add complexity only when needed.