You Can’t Patch Your Way Out of This

Just last month, Mozilla used Claude Mythos and other AI models to find and fix 423 security bugs in Firefox – more than in the previous 15 months combined. That may sound like great news, but it should terrify you.

This explosion of bug-finding is happening because AI models suddenly got much better at finding existing vulnerabilities. And while the Mozilla team is using their powers for good (patching their system to make it safer), those same capabilities have nightmarish potential in the hands of bad actors. Systems that once looked secure can now be torn apart by automated attackers capable of discovering exploits at a scale no human team could match.

So what happens next? Mozilla patches 450 more bugs next month? Are we entering a world where cybersecurity is just an arms race between whoever has the strongest vulnerability-finding model?

Because if that’s the game, defenders lose.

Remember: attackers only need to find one good bug. Defenders need to find all of them.

Cyberwarfare has always been asymmetric. Attackers move fast, make mistakes, and still succeed. Defenders, meanwhile, are tasked with securing sprawling, deeply complex systems with high accuracy – and somehow, until now, they have (just barely) managed to hold the line.

AI is making this imbalance dramatically worse. The barriers to launching sophisticated cyberattacks are collapsing. Exploits are becoming cheaper, faster, and more scalable. And AI infrastructure itself – increasingly embedded into the systems that power hospitals, energy grids, financial networks, and national defense – introduces entirely new and poorly understood attack surfaces.

This is already beginning to happen. Just this month, attackers used Claude and GPT models during an attempted intrusion into water infrastructure serving the Monterrey metropolitan area, home to more than five million people.

We can’t patch our way out of this.

The solution is not finding bugs faster. It’s building systems where we can mathematically prove critical software is correct – and therefore resistant to entire classes of attacks before they ever reach production.

That’s what AI-enabled Formal Verification makes possible.

Formal Verification

Instead of endlessly finding and patching bugs after deployment, formal verification allows us to prove that critical components – like memory isolation, access control systems, and cryptographic boundaries – behave exactly as intended.

This changes the game entirely. The goal is no longer to just patch vulnerabilities faster than attackers can discover them. The process of proving a system correct becomes the last patch you need, reliably eliminating entire classes of vulnerabilities before they ever reach production.

The cybersecurity world has known about formal verification for decades. The real challenge is whether they can scale to today’s threat landscape. Constructing formal proofs is extraordinarily difficult, even for experts, often requiring immense time, specialized knowledge, and painstaking manual effort. The cost quickly becomes prohibitive.

Today’s formal verification ecosystem is filled with remarkable breakthroughs and powerful tools. However, it remains fragmented and difficult to deploy at the scale this moment demands. In other words, we’ve had the foundations for provably correct software all along. We just haven’t had a practical way to apply them broadly… until now.

Scaling Formal Verification with AI 

Just as an antidote is often derived from the poison it cures, artificial intelligence – the same technology accelerating the cyber threat landscape – may also be the very thing that finally makes Formal Verification practical at scale. 

Formal verification fundamentally depends on constructing proofs. Luckily, modern models are becoming extraordinarily capable at mathematical reasoning. In the past year alone, AI-assisted systems have helped solve long-standing Erdős problems (e.g., problems #728, #281, #1196) and autoformalize modern Fields Medal–winning proofs in Lean – automatically filling gaps, translating human reasoning into machine-checked logic.

In other words, AI can automate the most difficult and time-consuming parts of formal verification – making the process dramatically faster, cheaper, and easier for non-experts to use.

The implication is profound: If AI can automate theorem proving and software verification, then FV can scale. And if FV can scale, then security may no longer depend on racing to patch the next vulnerability, but on proving the vulnerabilities cannot exist in the first place.

The Three Pillars

To make provably secure systems practical at scale, we believe the community must come together around three core pillars, and we are committed to helping move this work forward:

  1. Measure: To understand what is working (or not), we first need to evaluate AI systems in formal verification tasks. Our first step is to build benchmarks and metrics that quantify the quality and capabilities of AI-generated proofs over time. These signals will help us identify which domains improve through scale alone, and which require new scaffolding or intermediate training signals.
  2. Hone: Next, we must use AI to accelerate the development of formal methods itself, building on decades of breakthroughs and today’s already-powerful tools to create a more scalable and unified verification ecosystem where formal guarantees can extend across languages, frameworks, and infrastructure layers.
  3. Protect: Finally, we use our AI-enabled FV Toolchain to secure real-world critical systems, beginning with those systems where formal methods’ mathematical guarantees of correctness will deliver the most immediate and concrete value. This includes applying formal guarantees to critical AI infrastructure.

Each verified system makes the world safer, but even attempting verification teaches us something. Every FV effort exposes the limits of the tooling, sharpens the benchmarks, and reveals where the stack still falls short.

If we build this effort together, sharing results, connecting tools, and learning from one another’s failures and successes, those lessons compound across the ecosystem in a virtuous cycle. Better benchmarks produce better tools, better tools enable stronger verification, and stronger verification makes the next system easier to secure.

Over time, security will stop being a reactive cycle of patching failures and become a compounding process of engineering correctness.

Looking Forward

Breaking things has always been easier than fixing things. And yet, we find ourselves at a crossroads. Artificial intelligence – this wonderful, fascinating, terrifying technology – has made it dramatically cheaper and easier to find and exploit weak points in the software infrastructure that serves as scaffolding for much of the modern world. 

Now, more than ever, we must break out of the endless cycle of patching exploits. We have the technology. Instead of reacting to threats as they emerge, let’s secure our systems once and for all.

In the weeks ahead, we’ll be publishing a series of blog posts fleshing out these three pillars, as well as some of the ongoing challenges and risks at the intersection of FV and AI. Stay tuned!