There was a seismic shift in the AI world recently. In case you didn’t know, a Claude Code update was released just before the Christmas break. It could code awesomely and had a bigger context window, which is sort of like memory and attention span. Scott Cunningham wrote a series of posts demonstrating the power of Claude Code in ways that made economists take notice. Then, ChatGPT Codex was updated and released in January as if to say ‘we are still on the frontier’. The battle between Claude Code and Codex is active as we speak.
The differentiation is becoming clearer, depending on who you talk to. Claude Code feels architectural. It designs a project or system and thrives when you hand it the blueprint and say “Design this properly.” It’s your amazingly productive partner. Codex feels like it’s for the specialist. You tell it exactly what you want. No fluff. No ornamental abstraction unless you request it.
Codex flourishes with prompts like “Refactor this function to eliminate recursion”, or “ Take this response data and apply the Bayesian Dawid-Skene method”. It does exactly that. It assumes competence on your part and does not attempt to decorate the output. It assumes that you know what you’re doing. It’s like your RA that can do amazing things if you tell it what task you want completed. Having said all of this, I’ve heard the inverse evaluations too. It probably matters a lot what the programmer brings to the table.
Both Claude Code and Codex are remarkably adept at catching code and syntax errors. That is not mysterious. Code is valid or invalid. The AI writes something, and the environment immediately reveals whether it conforms to the rules. Truth is embedded in the logical structure. When a single error appears, correction is often trivial.
When multiple errors appear, the problem becomes combinatorial. Fix A? Fix B? Change the type? Modify the loop? There are potentially infinite branching possibilities. Even then, the space is constrained. The code must run, or time out. That constraint disciplines the search. The reason these models code so well is that the code itself is the truth. So long as the logic isn’t violated, the axioms lead to the result. The AI anchors on the code to be internally consistent. The model can triangulate because the target is stable and verifiable.
AI struggles when the anchor disappears
Consider OCR (optical character recognition) of historical documents. My work involves 19th-century census forms—row numbers, names, occupations, disability markers. OCR is our attempt to tag characters or words based on their visual shape. But the shapes are imperfect: smudges, dark spots, ink bleed, faded paper, unusual fonts, cropped scans, resolution noise… These are all ubiquitous and inconveniently random. A “3” can look like an “8” and a stray mark can resemble a “1.” The OCR system produces estimated text. The AI sees only those estimates. It sees the page, but it does not see the underlying true words on the page. That’s by design! If it could understand the true image, then there would be no need for OCR!
If AI doesn’t know the truth, then it cannot know whether it performed well.
Take something simple: the row numbers 1–40 on a historical census form. As humans, we glance at the column and clearly see the numbers one through forty. To us, it feels like direct observation. But the OCR engine is inferring. It might output 1, 2, 3, 8, 5, 6. It might skip a row or duplicate a value. The AI tuning the parameters does not know which entries are wrong. It sees only outputs.
How can AI improve OCR accuracy? What parameter does it change? The possible tweaks are effectively infinite. Without ground truth, there is no anchor. When the search space becomes unbounded and the exceptions become legion, models hesitate. Infinity is not tractable.
We can provide the truth. We can say, “The correct row numbers are 1 through 40.” But then we are supervising the OCR. And if we must supply the answers, we undercut the premise of automation. We can describe the structure: “The numbers are sequential and evenly spaced.” That’s true to our eye, but it’s not literally true down to the pixel.
Humans are not as different as we think
When we look at an image or a list of numbers, we feel as though we are seeing something objectively true. In reality, we are inferring the state of the world conditional on our perception and context. We filter noise, interpolate, and impose order to understand what we see. AI does the same thing. The difference is that, in typical coding tasks, it has a highly confident anchor. The syntax rules are stable and errors aren’t debatable. In ambiguous visual or social environments, that anchor disappears. The model struggles to know what is correct and what is merely plausible.
This is the frontier. AI excels where truth is explicit. Some calculation on known values is just an elaborate tautology. It struggles where ambiguity dominates and where the ground truth is hidden behind noisy perception. Given that human life is saturated with social ambiguity like tone, implication, and uncertainty, we retain an advantage for now.
I do not expect that advantage to persist indefinitely. Vision models are improving. The dividing line between humans and AI is still visible. When there is a clear anchor to truth, AI dominates. When truth must be inferred without a human oracle, its performance degrades.
This is where the performance inequality is. It’s between our skills that have been constantly honed by a lifetime of unclear facts and the AI skills of logic that requires true premises.