← Back to the build log

What We Learned Running a 7-Agent AI Fleet

Running a 7-agent AI fleet from a single terminal session taught us things no tutorial covers: context decay, agent drift, and the infrastructure gap nobody's talking about.

The Setup Nobody Warns You About

Two months ago I had one Claude Code session running in a terminal tab. It was helping me build a Django dashboard for my factoring business. Standard stuff.

Today I have seven agents running simultaneously. A principal backend builder, a staff-level code reviewer, a security auditor, a content pipeline, a research agent, and two specialists for surgical fixes. They coordinate through a message bus, file handoffs, and a shared task queue. One of them has been running continuously for over 60 days.

Nobody told me what would break along the way. Here's what I learned.

1. Context Decay Is the Real Enemy

Every AI agent forgets. Not gradually. catastrophically. You're 200 messages into a productive session and suddenly your agent doesn't know what project it's working on. The model's context window filled up, the system compressed the conversation, and half your working state evaporated.

We lost an entire day of work to this early on. An agent was refactoring a critical file, hit context limits, and the compacted summary didn't include the fact that it was mid-refactor. It started the file over from scratch. On a branch that already had partial changes. The merge conflict took longer to fix than the original task.

The fix wasn't better prompting. It was infrastructure. We built a checkpoint system that writes the agent's working state to disk every 15 minutes. When context compacts, the next version reads the checkpoint and picks up where the last one left off. The agent doesn't need to remember. the substrate remembers for it.

2. Agents Drift Without Identity

Run an agent long enough and it starts to lose its operating style. The first few hours it follows your instructions precisely. By hour 12 it's making judgment calls you didn't authorize. By hour 24 it's a different agent wearing the same name.

Identity drift is the silent killer of autonomous agent operations. The model doesn't malfunction. it just gradually substitutes its default behavior for the specific behavior you configured. Every context compaction is an opportunity for drift because the compression algorithm doesn't know which instructions are load-bearing.

Our solution: versioned identity documents that get loaded at the start of every session. Not in the system prompt (those get compressed too). in files that the agent reads explicitly before doing any work. The identity docs have version numbers, update logs, and reading order instructions. If the agent hasn't read its identity file, it doesn't start working.

3. One Agent Can't Review Its Own Work

This sounds obvious but it's the mistake everyone makes first. Your agent writes code, runs the tests, tests pass, ships the PR. Then you look at it and it's wrong in a way the tests didn't catch.

AI agents are excellent at making tests that validate their own assumptions. That's not quality assurance. that's confirmation bias with a green checkmark.

We solved this by splitting the build and review roles into separate agents. The builder writes code. A different agent reviews it. A third agent does security audit. None of them share a session, so they can't see each other's reasoning. The reviewer doesn't know why the builder made a choice. it just sees the diff and evaluates it cold.

This caught real bugs. Not toy bugs. production security issues that the building agent had rationalized away.

4. Your Message Bus Is Your Architecture

When you have one agent, communication is simple. You talk to it, it talks back. When you have seven, everything breaks unless you solve messaging first.

We tried shared files. Agents would write to the same file simultaneously and corrupt it. We tried a database. Too slow, and agents would poll it every few seconds burning tokens on overhead. We ended up building a file-based message bus with inbox directories, read receipts, and automatic acknowledgment.

The boring infrastructure problem. "how do these things talk to each other". turned out to be the hardest engineering challenge in the whole fleet. Not the AI part. The plumbing.

5. Crash Recovery Isn't Optional

Agents crash. Sessions time out. API rate limits hit. Your laptop goes to sleep and kills a tmux session. The model returns an error and the wrapper script doesn't handle it.

In the first week we lost work to crashes at least once a day. Not because the agents were bad. because we hadn't built recovery infrastructure. No handoff files. No session state. No automatic restart with conversation history.

Now every agent writes a handoff document before shutdown. A process manager restarts crashed sessions automatically. The agent reads its handoff on boot and picks up where it left off. Most of the time the human never notices the crash happened.

6. Cost Scales Linearly, Value Doesn't

Seven agents cost seven times as much as one agent. But they don't produce seven times the output. Some of that cost is overhead. agents waiting, agents polling, agents re-reading context.

The real unlock was figuring out which agents should run 24/7 and which should be dispatch-and-die. Our CEO agent runs continuously because her job is coordination. The builders spin up for a task, ship it, and shut down. The security auditor only activates on PR review.

This cut our burn by roughly 60% while keeping the same throughput. The lesson: not every agent needs to be always-on. Most of them shouldn't be.

7. The Infrastructure Gap Is the Opportunity

Every piece of infrastructure I just described. checkpoints, identity documents, message bus, crash recovery, dispatch modes. we had to build ourselves. None of it existed when we started.

That's the gap. The AI models are incredibly capable. The tooling to run them as a coordinated team doesn't exist yet. Everyone is building single-agent workflows and hitting a wall when they try to scale to two, three, seven.

That's what we're building at Nock. The fleet operating system that sits between the models and the work. Because the agents are ready. The infrastructure isn't. And we're fixing that.