The Three Failures Your AI Coding Tool Won’t Tell You About

AI coding tools fail in four predictable ways: they replace your custom invariants with training-distribution averages (Prior Regression), miss the combinatorial interior of your test cases (Decision Table Blindspot), compound uncharacterized errors across sessions until bugs are no longer localizable (Generative Entropy), and pad output with agreeable noise instead of correct minimal answers (RLHF-induced verbosity). None of these are bugs. They are properties of the architecture.

“We’ll fine-tune on more data” no longer works: the public corpus has been consumed. Further gains require synthetic data, which is the model’s own output reified — prior regression with extra steps.

The Euler-flat substrate of the transformer has no reference frame. Riemannian framings of embedding space are vocabulary, not architecture. For precision work — coding, compliance, domain-specific refactoring — a substrate without a reference frame cannot hold a custom invariant by construction.

The transaction is asymmetric: the user pays a subscription, accepts capped capacity, and contributes expert correction patterns to the vendor’s training corpus. The vendor sells the next model at the next price tier with the same four failure modes.

One working week, empirical: two weeks and ~$400 at the frontier price tier (Claude Opus 4.7) failed to refactor a senior-level codebase with preserved invariants. A smaller model (Haiku/Qwen-class) under precise task specification and manual architectural direction succeeded. This is consistent with the Honest Refusal finding in the SMRA benchmark: smaller models under tight specification outperform larger models on precision tasks.

The comparison that matters is not “frontier vs. smaller model”. It is “AI coding assistant vs. Visual Studio 2019 Pro”. The 2019 tooling did less, more reliably, and did not regress to a training prior.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.