The Unexamined Code Is Not Worth Shipping

Socrates famously said “The unexamined life is not worth living.” In the era of AI, we might also say “The unexamined code is not worth shipping.”

I used to tell my team that writing a CGATS file parser is a rite of passage for a junior color code jockey. CGATS files are a venerable human-readable text file format generally used to carry measured characterization data to validate and profile color input and output devices like printers and monitors. The thing about the ANSI CGATS.5 (and later CGATS.11, and the n part of ISO 13655) is that it’s deliciously open in terms of what’s allowed as well as what has been implemented in the past, so writing a CGATS file parser is always an exercise of many if..then..elseif’s.

CGATS parsing apps are also exemplars of feature creep temptation, as there are so many things one wants to be able to do once one has the data in hand — outlier removal, comparison via various ΔE metrics, 3D plotting, data aggregation, and so on. It’s really easy to geek out.

As I didn’t take any code with me from my last job, and I was feeling too ornery to buy a personal copy of ColorThink, I decided to whip out Claude and hack something together. Initial success was fantastic. With 15 minutes’ work, I got a nice-looking Node.js app with tables and data filtering, along with drag-and-drop file selection mechanics and easy 3D Plotly interactivity.

Then I added ΔE calculations based on a sticky dropdown selector from elsewhere in the UI. At first I was impressed that Claude had the moxie to go searching through the Internet looking for specific math formulae, but…

Boom.

All of a sudden, file load behaviour went down and everything broke. What on Earth did finding all the matching device colorant lines and calculating ΔE have to do with file load?

After a couple of hours of fruitless manual debugging — the usual: inspection, diff, walk-backs, clearing conversation, etc. — I finally swallowed my pride and asked Claude itself what was wrong, using an explicitly-stated query. The problem turned out to be incorrectly-generated Claude code — missing parentheses around a power (**2) expression with a minus sign in front, which is a JS syntax rule. Broken syntax in the math meant that the entire script block broke. Fixed math meant everything was happy again.

All this even though I added a rule about keeping changes limited to the one source file as part of normal best AI coding practices.

So overall a positive experience, but definitely a cautionary tale about AI code. In the normal course of development, if writing from scratch, the issue with the math wouldn’t have cropped up in the first place, as it’s the sort of thing which is subtle after writing but not during writing. In this case, Claude just dumped a pile of inline code all at once as part of the single bad step — it was clear that it was the new code which caused the problem, but there was no weighting behind it in terms of design order, and there was lots of it, all at once.

Generalizing from this experience, AI coding tools behave like junior coders. They write a whole pile of stuff, commit, and then wonder why the code doesn’t work. Senior developers build up more carefully, and test individual pieces according to criticality and complexity. This helps with ensuring both that the code works right when first delivered and that unit tests and testing in general are easier to achieve. Side effects should be more limited and hence easier to fix if you architect your code right in the first place.

With AI coding, this takes a slightly different approach than with traditional coding. So — build in testing in your pipeline, examine your code.

Category: Software Development

Leave a Reply Cancel reply