HN Reader

Show HN: TheAuditor v2.0 – A “Flight Computer” for AI Coding Agents

I’m a former Systems Architect (Cisco/VMware) turned builder in Thailand. TheAuditor v2.0 is a complete architectural rewrite (800+ commits) of the prototype I posted three months ago.

The "A-ha" moment for me didn't come from a success; it came from a massive failure. I was trying to use AI to refactor a complex schema change (a foundation change from "Products" to "ProductsVariants"), and due to the scope of it, it failed spectacularly. I realized two things:

* Context Collapse: The AI couldn't keep enough files in its context window to understand the full scope of the refactor, so it started hallucinating, "fixing" superficial issues. If I kept pressing it, it would literally panic and make up problems "so it could fix them," which only resulted in the situation going into a death spiral. That’s the villain origin story of this tool. :D * Stale Knowledge: It kept trying to implement Node 16 patterns in a Node 22 project, or defaulting to obsolete libraries (like glob v7 instead of v11) because its training data was stale.

I realized that AI agents are phenomenal at outputting working code, but they have zero understanding of it. They optimize for "making it run at any cost"—often by introducing security holes or technical debt just to bypass an error. This is a funny paradox because when "cornered/forced" to use cutting-edge versions, syntax, and best practices, it has zero issue executing or coding it. However, it’s so hilariously unaware of its surroundings that it will do anything else unless explicitly babysat.

I built v2 to be the "Sanity Check" that solves a lot of these issues, and it aims to continue solving more of the same and similar issues I face. Instead of letting the AI guess, TheAuditor indexes the entire codebase into a local SQLite Graph Database. This gives the AI a queryable map of reality, allowing it to verify dependencies and imports without needing to load "all" files into context.

A/B Demo: https://www.youtube.com/watch?v=512uqMaZlTg As seen in the demo video, instead of trying to read 10+ full files and/or grepping to make up for the hallucinations, it can now run "aud explain" and get 500 lines of deterministic "facts only" information. It gets just what it needs to see versus reading 10+ files, trying to keep them in context, finding what it was looking for, and trying to remember why it was looking to begin with.

I also learned that regex/string/heuristics don't scale at all and are painfully slow (hours vs minutes). I tried the regex-based rules/parsers approach, but they kept failing silently on complex files and suffered constant limitations (the worst offender was having to read all files per set of rules). I scrapped that approach and built a "Triple-Entry Fidelity" system. Now, the tool acts like a ledger: the parser emits a manifest, the DB emits a receipt. If they don't match, the system crashes intentionally.

It’s no longer just a scanner; it’s a guardrail. In my daily workflow, I don't let the AI write a line of code until the AI (my choice just happens to be CC/Codex) has run a pre-investigation for whatever problem statement I'm facing at the moment. This ensures it's anchored in facts and not inference assumptions or, worse, hallucinations.

With that said, my tool isn't perfect. To support it all, I had to build a pseudo-compiler for Python/JS/TS, and that means preparing extractors for every framework, every syntax—everything, really. Sometimes I don't get it right, and sometimes I just won't have had enough time to build it out to support everything.

So, my recommendation is to integrate the tool WITH your AI agent of choice rather than seeing it as a tool for you, the human. I like to use the tool as a "confirm or deny," where the AI runs the tool, verifies in source code, and presents a pre-implementation audit. Based on that audit, I will create an "aud planning."

Some of the major milestones in v2.0

* Hybrid Taint: I extended the Oracle Labs IFDS research to track data flow across microservice boundaries (e.g., React fetch → Express middleware → Controller).

* Triple-Entry Fidelity: This works across every layer (Indexer -> Extractor -> Parser -> Storage). Every step has fidelity checks working in unison. If there is silent data loss anywhere in the pipeline, the tool crashes intentionally.

* Graph DB: Moved from file-based parsing to a SQLite Graph Database to handle complex relationships that regex missed.

* Scope: Added support for Rust, Go, Bash, AWS CDK, and Terraform (v1 was Python/JS only).

* Agent Capabilities: Added Planning and Refactor engines, allowing AI agents to not just scan code but safely plan and execute architectural changes

Love to see people leveraging static analysis for AI agents. Similar to what we're doing in Brokk but we're more tightly coupled to our own harness. (https://brokk.ai/) Would love to compare notes; if you're interested, hmu at [username]@brokk.ai.

Quick comparison: Auditor does framework-specific stuff that Brokk does not, but Brokk is significantly faster (~1M loc per minute).

1 month agoby jbellis

Happy to inform Ive just created my first pip package to make it bit easier to install :D https://pypi.org/project/theauditor/

1 month agoby ThailandJohn

Cool! I've been playing with the same code -> graph concept for LLM work. Why did you decide to go for a pseudo-compiler with a ton of custom rules rather than try to interact with the AST itself?

1 month agoby digdugdirk

Lots of formal methods and verification submissions this week!

1 month agoby esafak

Looks neat. Can't use it due to the license.

1 month agoby butterisgood

Great idea!

Did you consider using treesitter instead of the pseudo compiler?

1 month agoby ozozozd