How I Built a Multi-Agent Development System That Plans, Executes, and Remembers

The Problem Nobody Talks About

Every developer knows the feeling. You start a project with clear intentions, make progress for a week, then pause for a few days – a client call, a deadline elsewhere, a weekend away – and return to find yourself reading your own code like a stranger’s. Context has leaked. Decisions you made on a Tuesday afternoon evaporated by Thursday. The task has drifted, the scope has crept, and you spend the first hour of every session just reconstructing what you already knew.

Solo development has a context problem. Not a motivation problem, not a tooling problem – a structural problem. The moment you step away, your system loses continuity. There is no one else in the room to hold the thread.

This is the problem I set out to solve with a workflow I built inside OpenClaw, an always-on AI agent platform I contribute to. The solution is not a better prompt. It is an architecture.


The Architecture

The system is built on three moving parts: a coordinator agent, parallel sub-agents, and a file-based memory layer.

The coordinator agent is always running. It does not execute tasks directly – it orchestrates them. When a new project or feature lands on the queue, the coordinator spawns sub-agents, assigns them research briefs, collects their outputs, and routes the accumulated context through a structured set of stages. It never reads full research files itself; it tracks only file paths and summaries. This is a deliberate constraint. Reading everything is slow and expensive; tracking references is fast and keeps the coordinator’s context window clean.

Sub-agents are isolated workers. Each one is deployed with a specific research question, a clear output path, and strict instructions to write their findings to a file and return only a two-to-three sentence summary. They run in parallel, so three research questions get answered simultaneously rather than sequentially. They do not share context with each other – that is the coordinator’s job.

The memory layer is flat files on disk. Research outputs, architecture drafts, gap analyses, phase plans – all written to a temporary planning directory that gets cleaned up after the plan is committed to git. This is not sophisticated. It is intentionally simple. Files are durable, inspectable, and diffable. You can read what the system thought six months ago. You can trace any decision back to its source research.

The workflow that ties these parts together is the project-planning workflow. It is the concrete example I will use for the rest of this article.


Walking Through the Workflow

Stage 0: Setup

When a planning task begins, the coordinator creates a timestamped working directory. Everything for that planning session lives here: a research/ subdirectory for raw findings, a waves/ subdirectory for follow-up research, and later, phase plans. The directory structure is fixed and known to all sub-agents by contract. They know exactly where to write, and the coordinator knows exactly where to look.

Stage 1: Discovery

This is where the parallelism happens. The coordinator deploys three research sub-agents simultaneously.

The Lesson Researcher searches past documentation, memory files, and the solutions directory for patterns that apply to the current feature. What has been attempted before? What worked? What failed? This agent prevents the system from repeating known mistakes.

The Current State Researcher reads the existing codebase, configuration files, and integration points to establish what is already there. This agent answers: what does the current implementation actually look like, not what do we wish it looked like?

The Constraints Researcher checks for deadlines, dependencies, budget limitations, and technical boundaries. This agent ensures the plan is grounded in reality rather than ambition.

These three agents run in parallel and write their outputs to the research/ directory. The coordinator does not wait for one to finish before launching the next.

After the initial pass, the coordinator runs a Gap Analysis. It reads the three research files and classifies every remaining question into one of three buckets:

  • RESEARCHABLE: The answer exists and can be found with more investigation.
  • USER_INPUT: This is a genuine human judgment call – a scope decision, a priority trade-off, a design preference. The coordinator presents these to the human as a focused set of options with implications, not as an open-ended question.
  • DEFERRABLE: This can be decided during implementation without blocking the plan.

The coordinator runs a maximum of three research waves to fill RESEARCHABLE gaps before moving on. If gaps remain after three waves, they get classified appropriately and the process moves forward rather than stalling.

Stage 2: Feature Research

With the current state established, the coordinator deploys a second wave of targeted research sub-agents. These look at specific angles: integration points with other systems, dependency implications, security considerations, performance boundaries, error-handling patterns. This is a fan-out phase – the coordinator spreads agents across specific concerns that the high-level discovery phase surfaced.

The output of this stage is a validation check: are all the requirements for planning actually knowable at this point? If something is still unclear, it gets routed back to research or deferred.

Stage 3: Planning

Now the system switches from research to architecture.

The Architect sub-agent reads all the research files and produces a high-level architecture document. This covers component overview, responsibilities, interfaces between components, data flow, and technology decisions. It is structural, not implementational — the goal is to answer “what are the parts and how do they fit together” before any code is written.

For complex features, the coordinator then deploys phase planners – one per major section – to produce detailed phase plans. Each plan includes an overview, a task list with specific files and dependencies, and verification commands. These are not vague descriptions of what should happen. They are executable checklists: edit this file, run this command, verify this output.

Finally, the coordinator integrates all phase plans into a coherent structure: a README with a task index pointing to individual phase documents.

Stage 4: Cleanup

The temporary planning directory is deleted. The final plan is committed to git. The feature is ready to move into implementation.


Why These Decisions

Why file-based coordination? Because files are durable and inspectable in a way that in-memory context is not. A sub-agent writing to disk is doing something a coordinator can verify later. More importantly, files survive session boundaries. The coordinator can be restarted, the system can be reloaded, and the research context persists. This is not a database – it does not need to be. A flat file system with a predictable structure is sufficient and far simpler.

Why subagents over a single agent? Because parallel research is fundamentally faster than sequential research. Three questions answered simultaneously beats three questions answered one after another, every time. More broadly, subagents allow the system to scale research breadth without degrading the coordinator’s ability to synthesise. The coordinator holds the thread; the subagents hold the detail.

Why structured stages? Because without structure, research loops forever. The stage gates – discovery, gap analysis, feature research, planning, cleanup – force the system to make decisions about when to stop investigating and start synthesising. The maximum-three-research-waves rule is a specific instance of this principle. Perfectionism is the enemy of delivery, and the structure prevents the system from chasing diminishing returns indefinitely.


What This Demonstrates

This system runs autonomously. Given a feature brief, it produces a research-backed plan with verification commands, without requiring the human to guide it through each step. The human is in the loop for judgment calls – scope decisions, design preferences, trade-offs that depend on context only a person has – but the research, synthesis, and structural planning happen without intervention.

The multi-agent coordination pattern is explicit and traceable. The coordinator does not delegate mysteriously – it deploys sub-agents with specific briefs, collects their summaries, routes outputs through stage gates, and writes everything to a known directory structure. You can audit the system: read any research file, check any gap classification, trace any architecture decision back to the source research that prompted it.

The structured stages impose discipline on an inherently open-ended process. Planning is not a creative free-for-all; it is a bounded process with entry criteria, stage gates, and an exit condition. Researchable gaps get exhausted within a fixed number of waves. USER_INPUT questions get surfaced to the human as discrete options, not as vague requests for direction. The output is a plan with verifiable tasks, not a wishlist of features.

The output is measurable. Each phase plan contains specific files to edit, dependencies to check, and verification commands to run. Implementation is not guessing – it is following a checklist that was produced by a deliberate process.


The Shift

Building this system changed how I think about my role. I started as a developer who wrote code. Now I think of myself as a systems architect who occasionally writes code. The interesting work is not the implementation – it is the structure that makes implementation reliable, repeatable, and auditable.

The coordinator agent is not a magic box. It is a program with explicit inputs, explicit outputs, and explicit constraints. The subagents are not intelligent in the general sense – they are competent researchers within a bounded brief. The workflow enforces discipline.

What the system does is make the process of planning visible and repeatable in a way that solo development never is. The context does not leak because the context is written down. The task does not drift because the stages force decisions at the right time. The knowledge does not disappear between sessions because it lives in files, not in someone’s (or something’s) working memory.

That is the difference. Not automation for its own sake, but structure applied to a process that desperately needed it.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.