GenAI / AI Governance

Brownfield First: Why You Can’t Just Let AI Rewrite Your Codebase

27 Aug 2025·9 min

Most AI coding demos start in a clean room.

The repository is tidy. The requirement is clear. The code is either new or well behaved. The model gets a prompt, writes something useful, and the story ends with a neat productivity win.

That is not how most engineering organisations live.

Most of the value is trapped in brownfield systems: old services, awkward integrations, settlement logic, reporting paths, half-trusted tests, and business rules that only become visible when someone breaks them. In that world, "just let AI rewrite it" is not bold. It is careless.

The problem is not code generation

AI is often very good at producing plausible code for an existing system.

The trouble is that plausibility is not the same as safety.

In a brownfield codebase, the ticket is usually only the visible part of the job. The rest lives in exception handling, manual workarounds, downstream jobs, support habits, and weird pieces of logic that nobody would design that way today but that still matter because the business adapted around them.

An agent can write a patch that satisfies the request and still remove the very behaviour that keeps finance reconciled, operators unblocked, or customers out of trouble.

Warning

Before you ask AI to change a brownfield system, you need to know what the system already does that nobody remembered to write down.

Recover before you rewrite

That is the discipline most organisations skip.

They move straight from request to implementation. The old human version of that mistake was edit and pray. The AI version is faster and more convincing, which makes it worse.

The safer sequence is simpler than it sounds.

First, recover current behaviour in the slice you want to touch. Not the whole subsystem. Just the local path that matters. Pull real examples. Trace what happens. Find the exceptions. Work out which outputs, records, and side effects the rest of the organisation actually depends on.

Then pin down that behaviour with characterization tests where you can. These tests are not endorsements of current design. They are a record. They tell the team, "Under these conditions, this is what the system does today." Without that record, every review becomes an argument from memory.

Then map the boundaries of the change. Where are the likely edit points? Which paths must remain untouched? Where would wrongness show up first: in a queue, a report, a tax entry, a notification, a settlement export, an audit trail?

Only then does AI become useful in the way people hope it will.

Safe delegation zones are discovered, not assumed

One of the most expensive mistakes in AI-enabled modernisation is assuming the system is uniformly understandable. It rarely is.

Some slices are relatively safe. The behaviour is visible. The proof is local. The team can explain what should change and what must stay fixed. Those are good places to let AI help more aggressively.

Other slices are not safe at all. The behaviour crosses teams. The real business consequence shows up somewhere downstream. The tests are thin. The people who understand the odd cases are all remembering different things. Those are not good candidates for autonomous rewriting, no matter how ugly the code looks.

The work, then, is to find the safe delegation zones instead of pretending the whole system deserves the same level of automation.

What that looks like in practice

A brownfield-first approach usually means smaller, more conservative changes than leaders first imagine.

Keep the first release narrow. Reuse an existing path before inventing a new one. Add one automated trigger to known behaviour instead of redesigning the whole workflow. Separate cleanup from behavioural change so reviewers can still tell what actually happened. Leave the manual fallback in place until the new path has earned trust.

That may feel slower than a big rewrite. It is usually faster than discovering, in production, that the old mess was carrying obligations your elegant new flow quietly dropped.

Why this matters more with AI than without it

Because AI lowers the cost of changing code before you have earned the right to change it.

That is useful in healthy parts of the system. It is dangerous in opaque ones.

A human engineer working through a brownfield area will often feel friction as they go. They hit uncertainty. They slow down. They ask someone. They notice that the logic does not quite make sense. AI reduces that natural drag. It can turn incomplete understanding into a tidy diff long before anybody has recovered the baseline.

So the discipline has to move earlier.

The real acceleration comes after the team has done the recovery work. Once current behaviour is visible, the delta is sharp, and the proof surfaces are known, AI can be very effective. It can draft tests, suggest local changes, help with additive implementation, and reduce the grunt work of operating inside an old system.

But it earns that role. It does not get it by default.

The organisations that learn this early

The teams that succeed with AI in brownfield systems are usually the ones that stop talking about rewriting the whole estate and start talking about making one slice safer to change.

They recover local truth. They name what must remain fixed. They choose a narrow first move. They treat preserved behaviour as something to prove, not something to assume. And they expand only after the slice becomes easier to explain than it was before.

That is not timid.

It is how grown-up engineering works when the code already matters.

AI can help you move faster in brownfield systems. It just cannot tell you, on its own, which parts of the past you are still obliged to keep.