Merchants hundreds of years ago needed a reliable way to keep track of their business without second guessing every single ledger entry. They started using double entry accounting to force a strict mathematical balance into their daily record keeping. Every time money moved, the transaction had to be recorded twice as an equal debit and credit across two separate accounts.

This structural redundancy meant the accounting system naturally caught bad math as well as simple typos. If a clerk recorded a cash deposit but forgot to log the corresponding revenue, the entire ledger immediately failed to balance. The total assets simply would not match the liabilities and equity at the end of the month.

The Double Keying Era

Early computing had a similar kind of manual redundancy to its keypunch workflows. They used a tedious process, the two pass verification, where one person punched data into cards, and another retyped the exact same information into a verifier machine. If the new keystrokes didn't perfectly match the punched holes, the machine locked up to flag the typo.

This brute force approach caught typos before they ruined expensive batch processing jobs on mainframes. That practice quickly became a massive financial burden as technology improved. The cost of all that manual labor eventually far outweighed the cost of computer time.

When interactive visual terminals arrived, operators could instantly spot and correct their own typing mistakes on a screen. Computers also gained the ability to run basic sanity checks on the data as it was entered. Paying two people to type the same document simply stopped making sense.

Balancing the Codebase

Software engineering actually has a long history of adopting this double entry mindset, in spirit if not the letter. We build structural redundancy directly into our daily workflows to minimize mistakes before they reach a user. Two common practices capture this exact philosophy in completely different ways.

Test driven development serves as the most direct equivalent to the financial ledger. Developers rely on a mix of unit tests to verify individual components and integration tests to prove those pieces actually communicate properly. While each method is highly effective on its own, their combined sum is much greater than its parts, creating a perfectly balanced ledger between what the code actually does and what we expect it to do.

Pair programming tackles the verification problem from a completely human direction. Two developers share a single screen, with one person typing the code while the other continuously reviews the logic to spot potential flaws. Having two sets of eyeballs actively analyzing the exact same problem creates a live verification system that minimizes the chances of errors.

Flipping The Hierarchy

This balancing act becomes critical as artificial intelligence takes over the coding process. AI models can generate thousands of lines of seemingly working logic in a matter of seconds. The sheer volume of output completely breaks traditional human workflows and our cognitive limits.

Relying on manual code reviews to catch subtle logical flaws in machine generated functions is no longer remotely sustainable. Moving forward, comprehensive behavioral coverage goes from being a luxury to an absolute necessity. We must strictly test actual outcomes and edge cases rather than just hitting an arbitrary line-execution metric.

Historically, the application code was treated as the primary asset while tests were viewed as a secondary chore. AI turns the actual system code into a cheap and easily replaceable commodity. The test suite has taken its place as the most vital and permanent asset an engineering team owns.

Tests as the Source of Truth

The test suite is rapidly becoming the actual spec for the entire system. It holds the core business rules, defining exactly what the software needs to achieve in clear executable terms. Just like a product requirements document guides an early prototype, a solid test suite will be the foundation that massively accelerates AI driven development for production systems.

Just as with human developers, strict tests give the AI the confidence to safely make changes and refactor code without breaking things. Developers spend their time refining these strict assertions, leaving the actual system code as a temporary implementation detail.

We can actually recreate double entry verification, in its core at least, by splitting this workload across different systems. One specific AI model generates the core test suite based on human rules. An entirely different foundational model, one that doesn't share any training blind spots, is then tasked with writing the code to satisfy those exact requirements.

This prevents a single system from hallucinating a bad solution and then conveniently writing a flawed test to validate its own mistake. The code running in production simply becomes the natural byproduct of a perfectly balanced testing ledger.


If you've reached this far, thank you for reading! :)

I thought retiring in my mid 30s after a few exits would be fun but I've just been bored and a bit undersocialized without morning Slacks and emails to wake up to. If you’re building something interesting and could use an extra set of hands to ship, or just want to say hi, feel free to reach out. My inbox is open.