Disassembling slop

Slop isn't bad AI code. It's code that is cheap to produce and expensive to verify, a trade LLMs knocked out of balance. Name the imbalance and you can act on it, instead of just asking people to be more careful.

A black-and-white photograph of lumpy, porridge-like slop being ladled onto a plate held by a child in a school canteen, with other children seated at tables in the background.

“Slop” is a useful word. It’s vivid, it’s rude, and every engineer knows what it means. But we don’t often stop to ask what slop actually is.

We all think we know what it is instinctively, and move on. That’s a problem, vague words lead to vague answers. If slop is just “bad AI code,” then the fix is “better AI” or “try harder,” and neither of those is a plan.

I want to stop for a minute and try to pull this term apart. It isn’t about AI. It’s about a trade between two things that used to be balanced and aren’t any more.

Here’s the claim:

Slop is code that is cheap to produce and expensive to verify.

Everything else follows from that. The duplication, the review fatigue, the 2,000-line pull request on a Tuesday afternoon, all of it comes from the same imbalance. Understand it and you can do something about it. Leave it vague and you’ll keep doing what most teams have done for two years: ask people to be more careful and count the wrong things on a dashboard.

What changed

Before LLMs, writing code and checking code cost about the same. Both needed a person paying attention. If you wrote a hundred lines, someone could read a hundred lines. Often that someone was you an hour later, wondering what past-you was thinking. The two sides of the trade were roughly in balance.

LLMs broke that balance. Writing code is now almost free, but checking it isn’t, because a reviewer still reads at the same speed they did five years ago. The ratio has gone from something like one-to-one to something like one-to-a-hundred.

That single shift is the whole problem. Everything we call slop is a downstream effect of it.

When writing is free, volume goes up, because there’s no cost pressure to keep a change small. As changes get bigger and reviewers don’t get faster, the review queue grows, and as the queue grows, review quality drops. People skim, approve, and move on. Defects flow into the codebase. The next AI prompt then reads that codebase as context, learns from the slop already there, and produces more of the same. The codebase trains the tool that degrades the codebase.

This is a loop, and each turn makes the next turn harder. The codebase gets messier, which makes verification more expensive, which widens the gap between writing and checking, which produces more slop.

Most of the response to AI coding has assumed we’re heading toward a new equilibrium. We aren’t. We’re in a slide that gets steeper the longer we leave it.

Where slop shows up

Once you see slop as a problem of cheap production and expensive verification, the symptoms start to organise themselves. The same gap shows up in six different places, and each one is a specific kind of verification that got skipped.

Intent. The code does something, but not the thing that was actually needed. This happens when the prompt was vague and the AI filled in the gaps with plausible guesses. The author never had to commit to what they wanted, because the model was happy to commit on their behalf. Slop here looks like a feature that technically works but misses the point of the ticket.

Context. The code is correct in isolation and wrong for this codebase. It uses a different HTTP client than the rest of the system. It defines a type that already exists three directories away. It calls the API directly instead of going through the service layer everyone else uses. The AI didn’t know the conventions, and the author didn’t check.

Comprehension. The author can’t explain what they shipped. They reviewed the diff, it looked reasonable, and they approved it for themselves. Two weeks later something breaks, and nobody on the team can debug it quickly because nobody understands it. Simon Willison calls this cognitive debt, which is the right name. The code works, but the understanding that should have come with writing it is missing.

Verification. No one proved it works. Tests are absent, or shallow, or written by the same AI that wrote the code and therefore confirm only that the code does what the code does. The pull request passes CI because the bar for passing CI was never high enough to catch this. Slop here is the unverified delta, the bit between “compiles” and “correct.”

Integration. Locally fine, globally wrong. The function is clean. The call site is clean. But the cache layer got bypassed, or the unidirectional data flow got broken, or a state mutation slipped into a place that’s meant to be pure. Integration is much more expensive to verify than units, which is exactly why this is the verification that tends to get skipped.

Hygiene. The leftovers. Dead code paths, console.logs, hallucinated imports, helpful refactors nobody asked for, two new dependencies added in passing, a config file edited for no reason. Each one is small, and together they’re the texture of a codebase being slowly worn down by inattention.

This isn’t meant to be a complete list. Some overlap, and a pull request usually fails in several at once. But each one points at a different kind of verification work that has to happen for code to be trusted, and a different place where that work got pushed onto someone else, or onto future-you, or onto nobody at all.

What they have in common is that things got skipped. The writing happened and the matching verification didn’t. The function got produced; the question of whether it belonged here got skipped. The tests got generated; the question of whether they tested the right thing got skipped.

This is what the imbalance looks like in practice. Six gaps in every pull request that gets waved through, with no single dramatic break, just a slow accumulation that no one quite owns.

Three ways to respond

If slop comes from cheap production meeting expensive verification, there are only three ways to respond. Two of them help. One of them is what most teams are actually doing.

Make verification cheaper. This is the boring answer, and it’s the right one. You can’t make a human read faster, but you can shrink what they have to read. Smaller commits, clearer intent, codified conventions the AI has to follow, tests that prove the interesting thing rather than the easy thing, linters that catch the obvious failures before a person sees them. None of this is new. Most of it predates AI by decades. The point isn’t to invent new disciplines but to apply the old ones harder, because the cost of skipping them has gone up.

Make production more expensive. This sounds counterintuitive, but it works. Require a written spec before the code gets generated. Require a passing test before the pull request can be opened. Require the author to explain, in the commit message, why this change exists and what they verified. Each requirement adds friction to the cheap side of the trade, which rebalances the equation. Most of what gets called “AI governance” is really this category in disguise. The friction is the point, not the side effect.

Pretend the imbalance isn’t there. Ship more, hope for the best, and reach for another AI tool when the review queue gets too long. This is where most teams are right now. It’s also where the GitClear numbers come from: code churn doubling, duplication outpacing refactoring for the first time, delivery stability dropping as adoption climbs. The mechanism is straightforward. When you respond to a verification problem by generating more code to verify, you make the problem worse. The AI doing the reviewing has the same blind spots as the AI doing the writing, and confidence goes up while correctness goes down.

The honest framing for an engineering leader is that the first two responses are work, and the third one is what happens when you don’t do the work. Buying another tool is not a strategy. Hiring more reviewers is not a strategy. The only durable response is to take verification seriously again, which means treating it as engineering rather than as something that happens at the end if there’s time.

What we’re actually dealing with

Every tool we’ve built in software engineering, from the compiler onward, has been a verification tool. The type checker, the test runner, the linter, the code review, the staging environment, the pull request itself. Each one made the question “is this right?” cheaper to answer than it was before.

AI is the first major tool in our history that does the opposite. It makes the question “can I produce something?” almost free, and leaves the question “is it right?” exactly where it was.

We’ve never had a tool that helps with writing but not with checking, at least not at this scale. The discipline of software engineering hasn’t caught up. The slop is the gap between what we can now produce and what we can still verify. Closing it is the work of the next few years, and the teams that take it seriously will be the ones still shipping cleanly when the rest are rewriting.