Tech News

AI-Proofing Theorem: How a $6M Investment is Combatting Bugs in AI-Written Code

Published

2 weeks ago

January 28, 2026

Theorem wants to stop AI-written bugs before they ship — and just raised $6M to do it

AI-Powered Software Development: The Next Frontier in Verification

As the landscape of software development evolves with the integration of artificial intelligence, a young startup is placing its bets on a crucial aspect of the industry’s future. Instead of focusing solely on writing code, the next major bottleneck in software development may lie in the realm of trust.

Theorem, a San Francisco-based company that emerged from Y Combinator’s Spring 2025 batch, recently announced a significant milestone. The startup has secured $6 million in seed funding to develop automated tools that verify the accuracy of AI-generated software. Leading the funding round is Khosla Ventures, with participation from prominent names like Y Combinator, e14, SAIF, Halcyon, alongside angel investors such as Blake Borgesson and Arthur Breitman.

This investment comes at a crucial juncture as AI-driven coding assistants, including those from GitHub, Amazon, and Google, are churning out billions of lines of code annually. While enterprise adoption of AI in software development is on the rise, the ability to ensure the correctness of AI-generated code has not kept pace. This oversight gap poses a significant threat to critical infrastructure, from financial systems to power grids.

Jason Gross, co-founder of Theorem, points out that the speed at which AI-generated code is being produced far exceeds human review capacity. The challenge lies in verifying that this code functions as intended, highlighting the pressing need for automated verification tools.

The Marriage of Formal Verification and AI

Theorem’s foundational technology combines formal verification, a mathematical method that certifies software behaves precisely as specified, with AI models trained to generate and validate proofs automatically. This innovative approach drastically reduces the time required for a process that historically demanded years of expertise at the PhD level, now achievable in a matter of weeks or even days.

While formal verification has been around for decades, it was primarily reserved for mission-critical applications like avionics systems and cryptographic protocols due to its high cost. Gross, drawing from his experience in verifying cryptography code at MIT, emphasizes the significant shift brought by AI in making this process more accessible and cost-effective.

Enhancing Software Oversight with Fractional Proof Decomposition

Theorem’s system operates on a principle coined by Gross as “fractional proof decomposition.” Instead of exhaustively testing every potential behavior, which is often impractical for complex software, the technology allocates verification resources based on the importance of each code component.

This method recently proved its effectiveness by identifying a bug that evaded detection during testing at Anthropic, a company behind the Claude chatbot. By enabling developers to catch bugs efficiently without excessive computational resources, Theorem’s approach offers a practical solution to software validation.

In a notable demonstration named SFBench, Theorem utilized AI to translate and prove equivalency for 1,276 problems from Rocq to Lean, a feat that would have required extensive human labor. The company’s architecture also excels in handling interdependent code, a challenge for conventional AI coding agents limited by context windows.

Real-World Application of Theorem’s Technology

Collaborating with clients in AI research, electronic design automation, and GPU-accelerated computing, Theorem has showcased the tangible benefits of its technology. One compelling case study involved transforming a 1,500-page specification into 16,000 lines of reliable code, addressing performance issues without manual review.

By generating production-grade code based on a concise executable specification and conducting rigorous equivalence checks, Theorem empowered a customer to enhance their system’s performance significantly while ensuring error-free deployment.

Safeguarding Critical Infrastructure from AI-Generated Software Risks

Amid growing concerns about the reliability of AI systems embedded in critical infrastructure, Theorem’s funding announcement underscores the urgency of ensuring software security. With AI systems evolving rapidly and the potential for subtle bugs to cause significant disruptions, the focus on verification becomes paramount.

Gross emphasizes the need for “asymmetric defense” in software security, highlighting the importance of scalable protection mechanisms that can withstand the evolving landscape of AI hacking. As AI continues to reshape software development, the adoption of formal verification becomes a critical safeguard against vulnerabilities.

Theorem’s Unique Position in the AI Code Verification Landscape

Amidst a burgeoning market of AI and formal verification startups, Theorem distinguishes itself through its dedicated focus on scaling software oversight. Rather than limiting its scope to specific domains, the company’s tools cater to systems engineering teams requiring meticulous correctness guarantees before implementing changes.

The founding team’s expertise reflects this technical orientation, with Gross specializing in programming language theory and deploying verified code at scale. Co-founder Rajashree Agrawal, a machine learning research engineer, focuses on training AI models crucial to the verification pipeline.

Paving the Way for Secure AI-Powered Systems

Theorem intends to leverage the recent funding to expand its team, enhance compute resources for training verification models, and explore new sectors like robotics, renewable energy, cryptocurrency, and drug synthesis. Despite its small team, the startup’s trajectory signals a shift in the evaluation criteria for AI coding tools.

While the initial wave of AI-assisted development prioritized speed and efficiency, Theorem’s focus on verification underscores the importance of ensuring safety in software evolution. As AI systems advance exponentially, the imperative for rigorous oversight to prevent uncontrolled deployment becomes increasingly clear.

Gross envisions a future where superhuman software engineering is a reality, necessitating a paradigm shift in the economics of oversight. With the onus on verifying AI-generated code before it assumes control, Theorem emerges as a pivotal player in shaping the future of secure and reliable software development.

The era of machines writing code is already here, underscoring the critical role of meticulous verification in upholding software integrity and functionality.