RFC-0009: Multi-Agent Orchestration for Complex Tasks¶
Status¶
Superseded by ADR-0020 (via ADR-0019)
Summary¶
Introduce a multi-agent orchestration pattern to handle complex, multi-concern tasks (backend implementation, extension development, testing, documentation) in parallel. An orchestrator breaks down the task and delegates to 4 subject matter experts (Store & Backend, Extension, Testing, Documentation), each with mandatory SDD foundation but focused on their domain. Post-implementation ripple-checks and CHANGELOG updates are orchestrator tasks, not separate experts.
Motivation¶
remote-store has grown to ~2MB across 41 source files, 55 test files, and 72 documentation files, spanning multiple non-trivial concerns:
- Backend implementation — ABC contract, error mapping, capabilities
- Testing — coverage targets, edge cases, conformance fixtures
- Documentation — user guides, API docs, navigation (mkdocs.yml, _nav.yml)
- Spec verification — RFC review, spec completeness, dependency mapping
- Process compliance — BACKLOG entries, CHANGELOG, ripple-checks against CLAUDE-REFERENCE.md
Current bottleneck: A single Claude Code agent must juggle all concerns simultaneously, risking:
- Incomplete work (missed doc updates, incomplete ripple-checks, untested error paths)
- Hallucination (agents inventing design instead of reading specs)
- Context switching overhead (large task = many file reads, many context windows)
Proposal¶
Four Subject Matter Experts + Orchestrator Role¶
| Role | Scope | Domain Focus | SDD Foundation |
|---|---|---|---|
| Orchestrator (meta-role) | Break down task, delegate, post-implementation ripple-checks & CHANGELOG, compile results | Task decomposition, SDD enforcement, cross-file consistency | Mandatory: CLAUDE.md, BACKLOG, ripple-check table |
| Store & Backend Expert | src/remote_store/ + src/remote_store/backends/ — Store API, Backend ABC, error mapping, capabilities |
Store core logic, backend architecture, error handling, capability invariants | Specs 001 (Store API), 003 (Backend ABC), 005 (error model), backend-specific specs |
| Extension Expert | src/remote_store/ext/ — extension implementation, public API contract |
Extension API design, Store contract usage, ADR-0008 pattern | Specs 024 (partition), 031 (dagster), 033 (streams), 034 (integrity); ADR-0008 (architecture), DESIGN.md |
| Testing Expert | tests/ — test design, coverage targets, edge cases, conformance fixtures |
Pytest patterns, spec traceability, coverage rigor | Specs (all via @pytest.mark.spec), DESIGN.md (conventions) |
| Documentation Expert | docs-src/, guides/, docstrings — user guides, API reference, navigation |
Diátaxis structure, docstring format, mkdocs nav | DESIGN.md, DOCUMENTATION.md, example docstrings |
Not every task requires all 4 experts. The orchestrator decides which experts to spawn based on the task scope. A pure backend task may skip the Extension Expert; a docs-only task may only need the Documentation Expert.
Implementation Approach: Claude Code Native (KISS)¶
Use Claude Code's built-in Task tool. Orchestrator (main Claude Code session) breaks down the task and delegates to subagents via the Task tool.
Advantages: - Native to Claude Code — no custom Python needed (KISS principle) - Automatic parallelism (independent experts work in parallel) - Seamless integration with CLAUDE.md workflow - Subagents inherit branch, git context automatically - Simple to reason about and maintain
Optional future: Custom Python orchestrator (if parallelism needs become critical and cost optimization is required) — but not the initial approach.
Worked Example: New Backend with Extension¶
Trigger: User asks to add a GCS backend with a ext.gcs_datasets extension.
Orchestrator workflow:
- Pre-check: BACKLOG entry exists, spec drafted, ripple-check table consulted
- Spawn experts in parallel (Task tool):
- Store & Backend Expert →
src/remote_store/backends/_gcs.py(ABC impl, error mapping, capabilities) - Extension Expert →
src/remote_store/ext/gcs_datasets.py(extension impl, ADR-0008 pattern, public API contract) - Testing Expert →
tests/test_gcs_conformance.py+tests/ext/test_gcs_datasets.py(conformance fixtures, extension tests, coverage) - Documentation Expert →
guides/backends/gcs.md+guides/extensions/gcs-datasets.md(user guides, docstrings) - Experts execute in parallel, each reading their relevant specs first
- Orchestrator collects results → runs ripple-checks:
- README backends table,
pyproject.tomlextras, examples, auto-registration - CHANGELOG + BACKLOG updates
- Compile PR, push, report
Success criteria:
- Experts work in parallel (no blocking)
- Code passes hatch run all (95% coverage)
- Ripple-checks 100% complete (no forgotten updates)
- PR ready to merge with no follow-ups
Invariants¶
All experts and the orchestrator enforce these rules:
Orchestrator (before delegating):
1. Verify BACKLOG entry exists with correct ID and status
2. Read relevant specs — confirm they cover the task
3. Consult CLAUDE-REFERENCE.md ripple-check table — identify affected files
4. Scope each expert: no expert touches files outside their domain
Orchestrator (after experts complete):
1. Run ripple-check audit (verify cross-file consistency)
2. Update CHANGELOG.md and BACKLOG.md
3. Verify hatch run all passes
All experts: - Read the spec first, before writing code/tests/docs. If spec contradicts code, spec is source of truth. Flag contradictions immediately. - Follow DESIGN.md conventions for their domain - Do not invent behavior — all design traces to a spec
Alternatives Considered¶
- No orchestration — Continue single-agent workflow
-
Rejected: Scales poorly as backlog grows; ID-013 and BK-123 are already context-constrained
-
Seven specialist experts (Spec Reviewer, Backend, Extension, Testing, Doc, Ripple Auditor, Async/Perf)
-
Rejected: Over-engineered; ripple-checks and audits are orchestrator tasks, not separate experts
-
Custom Python orchestrator (ThreadPoolExecutor + Anthropic SDK)
- Considered: Needed if cost optimization or fine-grained model selection becomes critical
-
Deferred: Start with Claude Code native (KISS), optional future upgrade
-
Per-concern specialized sessions (separate Claude Code session per role)
- Rejected: No coordination; duplicate work; complex state management
Impact¶
- Public API: No change
- Backwards compatibility: Non-breaking (internal orchestration only)
- Performance: Faster task completion (parallelism) for complex features
- Testing: No automated tests — validation is manual via trial runs on a real backlog item
- Docs: New guide in DEVELOPMENT_STORY.md or sdd/000-process.md on using orchestrator
Open Questions¶
- First test case: Which task to validate orchestration on?
- New backend (e.g., GCS) — isolated, fast feedback
- Existing backlog item (ID-013, BK-123) — real-world complexity
- Skill or ad-hoc? Should orchestrator be a dedicated skill (e.g.,
/orchestrate), or ad-hoc prompt in main session? - Recommendation: Start ad-hoc, formalize as skill if it becomes routine
- Model selection: Which model for expert subagents (Haiku vs Sonnet)?
- Expert prompt templates: Detailed prompt templates for each expert role belong in the implementation (skill definition or CLAUDE.md section), not in this RFC. To be designed during implementation.
References¶
- SDD workflow:
sdd/000-process.md - Ripple-check table:
sdd/CLAUDE-REFERENCE.md(§ Ripple-check table > Detailed checklist) — orchestrator must verify post-implementation - Code conventions:
sdd/DESIGN.md(experts follow domain-specific conventions) - Documentation standards:
sdd/DOCUMENTATION.md,CONTRIBUTING.md(authoritative document format) - Expert domain specs:
- Store & Backend:
sdd/specs/001-store-api.md,sdd/specs/003-backend-adapter-contract.md,sdd/specs/005-error-model.md, backend-specific specs - Extension:
sdd/specs/024-ext-partition.md,031-ext-dagster.md,033-ext-streams.md,034-ext-integrity.md;sdd/adrs/0008-extension-architecture.md - Testing:
sdd/DESIGN.md(test conventions),@pytest.mark.spec("ID")traceability - Documentation:
sdd/DOCUMENTATION.md(Diátaxis structure), docstring examples in codebase - In-progress work:
sdd/BACKLOG.md(ID-013 async, ID-018 conda-forge, BK-123 memory audit) - Claude Code instructions:
CLAUDE.md(ripple-checks, spec discipline, branch workflow)