I Built a Compiler with Agent Fleets. Here’s What Broke.

I used GitHub Copilot’s agent fleets to build a compiler. Not a toy — 3,000+ tests, 91 diagnostics, better than what we had before. Multiple agents running in parallel while I did other things.

The throughput is real. But it also broke in ways I didn’t expect.

This post is about those breaks, and the rules I now follow to prevent them.

The pitch

Agent fleets let you dispatch work and walk away. You describe what you want, a Copilot session spins up, and starts working on your repo. You can run multiple agents in parallel on different tasks. They all operate on the same directory and files. In theory, this multiplies your throughput.

In practice, it does — until they step on each other.

The contention problem

Fleet agents don’t talk to each other. They don’t know the others exist. When two agents modify the same files, you get collisions: one agent finishes, commits, and moves on. The other agent tries to complete, realizes its working state is gone, resets the branch, and tries to reapply its changes — discarding the first agent’s work because it looks “unrelated.” Sometimes both agents end up in this loop. Nobody wins.

I spent hours building out features across multiple agents. Things were looking great. Then I noticed some features had stopped working. When I asked the agent about it, it casually told me that another background agent had “basically rewritten the repo” — and suggested I wait for the background agent to finish so it could try to reconcile the conflicting changes. There’s no async communication channel between the main session and fleet agents. You can’t steer them, warn them, or update their instructions mid-flight. You just wait.

Contrast this with other agent frameworks — some, like OpenClaw, let the orchestrating agent steer background agents in real time: send new constraints, redirect work, or kill a task that’s gone off track. That’s a fundamentally different model. Copilot fleets today are fire-and-forget.

No warning. No conflict detection. No merge. Changes just get lost.

Temporal contention

There’s a subtler version of this that’s harder to avoid:

Temporal contention diagram showing two agents dispatched minutes apart, working on the same files with no coordination channel

You have an idea, dispatch an agent. Two minutes later you have a related idea. The first agent is still running — there’s no way to update it. And there’s no easy way for the orchestrator to know a priori whether the new task will conflict with the running one. So it dispatches. Contention.

The only mechanism today is killing the first agent and starting over. There’s no “hey, also do this” channel.

You might think planning helps. And it does — you can spend more time upfront specifying exactly what each agent should do before dispatching. Copilot’s plan mode is great for this. But humans keep thinking after they hit submit. That’s just how brains work. Plan mode reduces contention, it doesn’t eliminate it.

The formula

I think about contention as:

contention ∝ ∂ideas ∂t × agent_execution_time

If you have ideas faster than agents can finish, you’ll have overlapping agents. Speed helps, but even a fast agent doesn’t eliminate the need for fleets. Fleets let you decompose a big problem into parallel tracks. That’s real, not just a speed workaround.

To be clear: Copilot is excellent, and fleets are a genuinely powerful feature. I’m confident they’ll get smarter about coordination over time. This is just my experience in March 2026 — a snapshot of where things are today, not a verdict on where they’re going. The gaps I’m describing are solvable, and GitHub is iterating fast — that team is on fire, honestly one of the most productive teams I’ve seen.

Three rules I follow now

After losing work a few times, I settled on three rules that have held up:

1. TDD is non-negotiable

Write tests before code. Every feature, every diagnostic, every behavior gets a test first. Not because TDD is a religion. Tests are how you detect when one agent broke what another agent built.

When an agent opens a PR and the tests pass, you have confidence. When they fail, you know exactly what regressed. Without tests, you’re flying blind — you won’t even notice the damage until much later.

With 3,000+ tests, I catch conflicts almost immediately. That number isn’t an accident. It’s the direct result of treating tests as the ground truth for the project.

2. Every agent gets a worktree and a branch

No agent works on main. Ever. Each background agent gets its own git worktree on its own branch. This gives you isolation by default. Agents can’t silently overwrite each other’s work because they’re working in different directories on different branches.

git worktree add /tmp/agent-task-123 -b feature/add-diagnostics

This is cheap. Worktrees are lightweight. And it means you always have a clean merge point where you can see exactly what changed.

There’s a file-level version of this trick too. In C#, I started having background agents write new code into separate files using partial class instead of editing the main class file. Agent A works in Parser.cs, Agent B writes to Parser.Diagnostics.cs — same class, different files, zero merge conflicts. The main agent can consolidate later if the split doesn’t make sense, but often it does and you end up with better-organized code anyway.

Same principle as worktrees — don’t touch the same artifact — just applied at a different granularity.

3. One merge authority

The main agent (or me) handles all merges. Background agents propose changes via PRs. They don’t merge anything themselves. This creates a single coordination point where conflicts are visible and resolvable.

If two agents touched the same area, I see it at merge time. Not after the damage is done.

Enforcing the rules

I store these rules in GitHub Copilot’s native repo memory. You just tell Copilot “remember this” and it does — you’ll see it invoke a store_memory tool call, a built-in tool for persisting conventions across sessions. When an agent spins up, it loads these automatically. The memory tells it: work in a worktree, write tests first, don’t merge.

Not foolproof. Agents can still drift. But the failure mode goes from “silently lost a day of work” to “caught it in a PR review.”

What I wish existed

The rules above work if you’re disciplined. But there are gaps that discipline can’t cover:

Agent-to-agent communication. If Agent A is modifying the parser and Agent B is about to touch the same area, B should know. Even a simple “these files are locked” mechanism would help.

Worktrees as a default. Today you have to explicitly set this up. It should be the default behavior for any background agent — spin up in an isolated worktree, propose changes via PR, never touch the main branch directly.

Tests as ground truth. The ecosystem should treat passing tests as the primary integration signal, not just “did the agent say it’s done.” If tests fail after an agent’s changes, that’s a hard stop, not a suggestion.

You’re not the only one hitting this

Niko Heikkilä tried Claude Code’s /batch for a framework migration — parallel worktrees, PRs per task, e2e tests as gates. He got merge conflicts between parallel PRs, doom loops trying to fix “done” work, and ultimately called the project off. Tests were there but coordination wasn’t.

Tamir Dresher wrote about the same problem from the state management angle — squad memory files polluting code PRs when multiple agents share state. Different angle, same underlying issue.

GitHub clearly sees it too — Brady Gaster’s team just shipped Squad, a coordinated multi-agent system that drops a specialized team (lead, frontend dev, backend dev, tester) directly into your repo. Squad solves a different layer than what I’m describing here: it handles who does what — role specialization, internal review loops, shared project memory. My rules handle how agents don’t step on each other — worktrees, file isolation, merge authority. They’re complementary. I haven’t tried Squad yet but it’s next on my list.

So what now

Agent fleets work. I built a real project with them and the throughput is genuinely impressive. But they work like concurrent programming: without synchronization primitives, you get race conditions.

Tests catch the damage. Worktrees prevent the collision. One merge authority gives you a place to reconcile.

Until the tooling catches up, that’s what I’ve got. It’s working.

I’ve distilled these into a SKILL.md you can drop into your repo. Any agent that reads it will follow the rules automatically.

I write about building with AI agents, the stuff that actually works and the stuff that breaks. Follow me on LinkedIn for more.

Share on

X Facebook LinkedIn Bluesky