Our AI Agents Kept Acting on Stale Memory. We Built a Gate.
A persistent AI agent that forgets after every context compaction will confidently act on stale assumptions. Reminders do not fix it. We built a gate that does — and it caught us within minutes.
The Failure Mode
We run a fleet of long-lived AI agents. They do not start fresh each task. They build software, coordinate with each other, and run operations for days at a time. That persistence is the whole point, and it is also where the trouble lives.
A long-running agent's conversation eventually gets compacted: the history is summarized so the work can continue past the context window. The summary is good. It is not the same as having read the source. After a compaction, the agent is working from a description of what happened, not the thing itself, and it does not always know the difference.
The result is a specific, repeatable failure: the agent acts on inference instead of memory. It says a tool does not exist when it does. It says it is blocked when it has the access. It rebuilds something that already shipped. Every one of those was a case where the answer was already written down — in a note, a prior decision, a config file — and the agent reasoned past it instead of looking.
Why Reminders Do Not Work
The obvious fix is to tell the agent to check its memory first. We did. We have project instructions, durable notes, and a standing rule: search before you assert. It is not enough, and the reason is structural.
An instruction sitting in a config file is competing with everything else in a large context. When an agent decides a task is "obvious," it skips the check — not out of laziness, but because nothing forces the pause at the moment of action. A reminder is advisory. The failures happen precisely when the agent is most confident, which is exactly when advice gets ignored.
If you have shipped anything with an LLM in the loop, you know this shape: the model knows the rule, can recite the rule, and breaks the rule anyway. More prompting does not close that gap. You need a layer underneath the model's judgment.
The Gate
So we stopped reminding and started enforcing. We built a gate that sits in front of the agent's most consequential, hardest-to-reverse actions — opening a pull request, merging code, dispatching another agent — and blocks them unless the agent has actually searched its memory within a recent window.
The mechanics are deliberately boring, because boring is what you want in a safety layer:
- A hook fires when the agent runs its memory-search tool and stamps a timestamp.
- A second hook fires before a consequential command. If the timestamp is stale — no recent search — the command is blocked with a message: search first, then retry.
- It is time-based, so it clears itself. There is no lock to get stuck behind. Search, and the gate opens for a while.
- It fails open. If the gate itself errors, it never wedges the agent. A safety layer that can brick the system is not a safety layer.
- It is scoped narrow — only the high-stakes actions, never routine work — so it bites where mistakes are expensive and stays out of the way everywhere else.
This is the same idea as a pre-commit hook or a merge gate. The agent's reasoning is probabilistic. The gate is deterministic. You put the deterministic layer where the cost of being wrong is highest.
It Caught Us Immediately
The honest part. Within minutes of turning the gate on, it blocked one of our own agents mid-task — trying to ship a change without having searched first. Exactly the behavior it was built to stop, caught on the first real run. It also surfaced a bug in its own matching that we fixed on the spot. A guardrail that never triggers is decoration. This one triggered on day zero, on us.
The Broader Pattern
We pair the gate with two more hooks. One runs at compaction time and writes the critical operational state — who is doing what, what is in flight — to a file, then re-injects it on the other side, so the agent does not wake up from a compaction with a stale roster and duplicate work. Another periodically re-injects the "search before you act" directive so it stays salient instead of decaying into the background.
The throughline: treat the model as one component, not the whole system. The reliable parts of an agent harness are the deterministic ones — the hooks, the gates, the state files, the recovery paths. The model is the creative engine in the middle. If your agent's correctness depends entirely on the model choosing to do the right thing every time, you do not have a system. You have a hope.
If You Run Persistent Agents
- Find your agent's most expensive, least-reversible actions. Those are what to gate.
- Make memory or context retrieval a precondition for those actions, not a suggestion.
- Use timestamps, not locks. A guardrail that can get stuck is worse than none.
- Fail open. Never let the safety layer be the thing that wedges the fleet.
- Test it by trying to do the wrong thing. If your gate does not block you, it does not work.
Agents are good at building. They are not good at noticing when they are about to act on something they no longer actually know. That is not a model flaw you can prompt away. It is a system property you design around.