What Works (and Doesn't) Selling Formal Methods

This article began life as a talk I gave in late 2024

I love formal methods—I should say that to begin with, because this article is mostly about what doesn’t work when trying to do FM projects. Over the last 20 years, formal methods have grown and grown, and I’m proud to say that Galois has made its own contributions to this success. I made a lot of mistakes when I was scoping and running formal methods projects. I’d like other people to avoid the pitfalls I fell into, and hopefully make bigger and more interesting mistakes.

Over my years at Galois I’ve been on a lot of sales calls. I didn’t expect this - I thought I was going to work on industry applications of formal methods. But it turns out that doing research for industry requires first actually finding industry projects, and that involves sales calls. The result is I’ve talked to a lot of people who might, just might, want to do a project with Galois. Here’s a prototypical interaction:

This kind of conversation can go two ways: either we successfully scope the project (hooray!), or we can’t find a project the client wants (boo). Either case is educational, although of course the first is more satisfying. The second case is by far the majority—most often, we can’t find a formal methods project that would help the client. That’s disappointing, but also, it’s interesting. Why is this? 

When you’re talking to a potential client, you’re trying to figure out what they care about. Clients have real money on the line, so they care a lot about getting something they want. Quite often I’ve found that my starting intuitions are just plain wrong. I’ll say “well, you as a person in industry <whatever> obviously want this”, and the client says “no we don’t care about that”. I’ve come to realize this is good, actually. Having someone tell you you’re wrong really sharpens your intuitions, once you get over the bruise to your ego. 

Over the years, I’ve formed a few opinions about what works when trying to scope formal methods projects. I certainly don’t think any of these lessons are brand new, but I don’t see them discussed much, and they’re often a bit different from the conventional wisdom in formal methods research. Let’s start with the basics. 

Worldview: Costs vs. Benefits

My basic worldview is pretty simple. Any project has expected costs and benefits. There’s some client-specific break-even line above which a given project makes sense. Below that line, the project is too costly, and it’s not worth doing.  

Here I’m thinking about holistic cost: time, money, people, resources – anything that requires something of your organization; and holistic benefit – security, reliability, economic value, happiness, being able to sleep at night, all those kinds of things. And of course, the break-even line isn’t straight but is probably some fancy multi-dimensional surface. This is a highly abstract and very inaccurate model of the world. But in some sense, it reflects how we make decisions in everyday life. Something cheap and very beneficial is probably something you should do, and something expensive and not very beneficial is probably something you should not do.

Potential clients of formal methods are somewhat rational when it comes to costs and benefits. They know what's high priority and low priority. They have a limited budget for projects, and they weigh projects against each other. Projects that make sense to clients in terms of cost-benefit tradeoffs will happen, and projects that don’t, won’t happen. 

This seems almost trivial - how does it help? Well, I also see a lot of confusion about why formal methods aren’t adopted more widely. I think the explanation is actually quite simple. Many potential formal methods projects don’t make sense in terms of cost/benefit tradeoffs

Sometimes in formal method research, we talk about what’s possible. For example, we now have great projects like SeL4, CompCert, and HACL  that show it’s possible to create industrially-useful systems with formal guarantees. That’s a great accomplishment, but all of those projects were funded as research efforts. If we want more formally verified projects, those projects have to not just be possible, but also be more valuable to clients than other things they could do with their resources. 

I’m going to give a few specific takes about scoping formal methods projects. But my core claim is that projects have to make sense in terms of costs and benefits. That’s all. 

1: Projects Have to Deliver Value Early

I quite quickly realized that formal methods projects don’t happen in one giant leap. Instead, you put in a bit of cost, get a bit of benefit, until eventually you get to your larger goal. Here’s one kind of cost-benefit curve we often see in theorem proving projects: 

Here you set up the initial definitions, you prove some lemmas, and then you prove the main theorem. The problem is, until you prove the main theorem you don’t get a lot of benefit. This kind of cost-benefit curve really does not work for most clients. Quite far into that curve you’ve spent a lot of money, a lot of time, a lot of people, etc., but you don’t have a lot of benefit to show. Farther along you do have a lot of benefits, but it’s very late and very expensive. 

Often I have this conversation. A client says: “I saw your blog posts and I’m excited about formal methods! What would I get in x months and for y dollars?”. We look at it and realize: “Oh no, X months, Y dollars. That’s really not very much! We can… write some definitions for that amount of money.”

And so then I go to the potential client and I say: “Hey, what about four times as much time and eight times as much money?” At that point, they mostly don’t call me back. 

Ideally a small cost would lead to a small but real benefit. So for Project 1, we are able to use X dollars and the client sees some benefit. For example, we find some bugs or give the client confidence in a small system component. Even better, the client is happy, and they say: “Let’s do another project.” And then we do Project 2, which is bigger. And we can ratchet up the curve. 

Formal Methods tools often don’t allow us to scope projects like this, because there are lots of up-front costs before we get to the benefits. We have to write specifications, build proofs, understand the domain, train engineers, train clients sometimes, and we sometimes even have to build new tools. It’s very hard to get immediate returns on investment from these tools.

Formal methods people like to make fun of testing. But the advantage of testing is that it’s easy to run a few tests and find some bugs. If you write a few more tests, you’ll probably find a few more bugs. That gives a very attractive cost-benefit curve: 

But you might say: “Ah, but with testing, it’s very difficult to achieve higher levels of correctness. Formal methods wins again!”. Well, I have some bad news for you, which is my second hot take.

2: Correctness Doesn’t Matter

It’s true. No one cares about correctness.

Now, I don’t want to be too provocative. Obviously, if your code doesn’t do anything like what you want it to, that is a problem. Developers spend a lot of effort trying to find and fix bugs. But often in formal methods, we’re aiming for a higher standard—a system that’s completely correct, at least with respect to some specification. 

Unfortunately, a lot of developers don’t care about getting to a higher level of correctness. Their current tools work great, they have priced in the existence of bugs and security flaws. The value of fewer bugs and more security is basically zero, so if a new tool costs anything at all, it’s too expensive. This is even true for high assurance systems, which is always surprising to me. You’d think that for a thing that flies or drives around, you’d want that to be less buggy. But often the companies or people who build and maintain these systems genuinely don’t care much, either because they’ve priced in the bugs or because their system is designed to be robust in the presence of bugs.

A good way to understand this is that developers have a lot of competing demands, and many other things can be more important than increasing correctness or security. For example, a team might prefer to spend limited resources on shipping security features and fixes, hiring more developers for the team, paying down technical debt, or meeting upstream needs from customers. Worse, a correctness technology might make other goals more difficult and expensive. For example, rewriting in Fancy Language X sounds great, but it might mean the team is much harder to staff, which is a significant cost to most organizations. 

Here’s an anecdote to illustrate the point. A Galois colleague was talking to a leader at a high-assurance system maker and they had this exchange:

That response makes my head slightly explode. So it’s worth digging into why this might be the case. Well, this developer believes that their current reliability and security process works well for them. They believe their system is already more reliable and secure than their competitor’s. And they don’t win or lose sales based on reliability or security considerations. A fancy new correctness technology would introduce costs (training, engineering, friction during development) and in their view, they wouldn’t see any benefit from that investment. 

Here’s the kicker: the same colleague followed up with this exchange: 

This also makes my head explode a bit, because isn’t compliance testing supposed to be about correctness and security? Well no, there’s a very specific process needed to achieve compliance, and for this company it’s highly manual. They’ve got to write millions of tests themselves and it’s grinding on their engineering team. It slows down their release schedule, reduces their flexibility, and it’s hard to staff those projects. But without the compliance testing, they can’t sell the product. This matters because it actually impacts sales of their product.

In other words: the benefits that are important to us (like correctness and security) may not match the benefits that are important to the potential client you’re talking to (like speed, cost, and staffing).. Clients want a project to solve their problems, which means understanding what success might mean to them. 

3: It’s Hard to Define and Explain Success

Okay, you find a client that wants to do a project. Now you have to agree what you’ll actually do. Here’s a caricature of a FM results: 

  • You have an artifact which gives the precise technical meaning of some property – maybe a thousand-line theorem in some formal language. 
  • You have high confidence this property holds, because you did formal methods – a proof in Lean or something similar. 

The problem is the exact meaning of this theorem matters a lot, but your client probably can’t understand what it means. 

(Maybe in some cases even you, the expert, don't even have a deep enough understanding of what the theorem means. For example, the formal proofs of the seL4 kernel establish amongst other things that the implementation C code precisely matches an abstract model of the kernel. This is a really powerful and impressive result, but to understand it, you have to study the formal model in detail. I have not read the model in detail, and I would guess that most people using seL4 have not either. Instead we rely on shorthand intuitions - “the SeL4 kernel has been formally verified”.) 

This information gap creates several problems when scoping projects. Firstly, how does the client know that they’re getting what they want? Secondly, how do we explain tradeoffs between different choices? And last, a problem we’ve run into on many projects:  How do we know when we’re done?

Here’s a way to explain formal methods results that I do not recommend.

The client is very happy at this point. But then 3 months pass and the client calls you: 

The problem here is that the client didn’t understand what they were getting. It might be that the bug isn’t covered by the proof, or that the proof technically allows the behavior that they consider a bug, or even that they changed something important and violated a key assumption of the proof. They didn’t understand the result well enough to expect or avoid these problems. They probably also don’t have a clear enough model for you to explain what’s going on, so all they’ll hear is you quibbling and trying to shift the blame. 

(This also comes back to costs and benefits. The client didn’t correctly understand the benefit of what you gave them.)

Here's a slightly better version of this explanation. 

This is in some sense better because at least the client correctly understands they are confused. This might be okay in some circumstances. Maybe the client trusts you to make a judgement about what’s important. But it’s still not very satisfying. 

I think an even slightly better method of explanation, which we have done on some projects, looks like this:

We did this on the AWS LibCrypto project. There’s a page with 30 or 40 different caveats demarcating what the proofs really mean.

This approach takes a lot of time, and depends on having a technical client with a lot of bandwidth for discussions. It also doesn’t completely eliminate misunderstandings. The caveats are very technical, in that you have to understand AWS LibCrypto before you can understand what a particular caveat means. But the caveats are also much more abstract than the real technical theorems. So there’s some potential for confusion there, but at least the gap between the client’s understanding and the real result has been reduced. 

There’s a more subtle problem we’ve run into: caveats tend to get ‘smoothed off’ as they are explained internally. So you might see this kind of pattern: 

  • Galois: [technical results, technical caveats] 
  • Client engineering team: [simplified results, simplified caveats] 
  • Client CTO: [simplified results] 
  • Client PR team: “Galois has shown there are NO BUGS IN OUR SYSTEM” 

This isn’t a result of anyone acting deceptively. At each stage, a more technical person is giving a good explanation to a less technical person. But the effect is that the client ends up with a profound misunderstanding of what you’ve given them. Then three months later they find a horrible bug and everyone is mad at you.

Suppose you get that call—there’s a horrible bug and the client is mad. Well, now you need to figure out whether it’s a problem you need to fix. More generally, you need to have a clear boundary for the project so you can scope the work and declare the project done at the end. This is why it’s very important to agree on objectives in as much detail as possible—for example, with a list of caveats, as we did for AWS LibCrypto. The worst case scenario is that you tell the client, “hey, good news, we’re finished” and they say “no you aren’t, we need this caveat removed”. 

Unfortunately, my experience is that it’s difficult to predict the cost of formal methods projects. I think the big reason for this is that minor differences can radically alter the cost of a proof. For example, in the AWS LibCrypto proofs, we verified two different variants of the core theorems. The first version worked at a set of fixed key-sizes, while the second worked over any size. Intuitively, these two proofs were extremely similar, but for technical reasons, the second required nearly as much work all over again. Not great! We also often find that the cost of verification is dominated by a few particularly tricky lines of code, and that it’s hard to tell beforehand which lines will be problematic. 

All of this means that the success of a project on time and under-budget, often depend on the particular technical caveats the client wants. But as we’ve already seen, it’s typically very hard for the client to know what these caveats mean for them. 

4: Do Cheap Things First

Okay great, so you have a client that wants to do the project. The timescale is good, they care about correctness, and they swear they understand the theorem. Then they look you in the eye and say “give it to me straight, is this really the best use of the money?”. You’ve seen their testsuite (it doesn’t exist), and you say “er…”. 

The problem is that FM projects are expensive, and cheap techniques are really cheap! 

I think there’s a rough hierarchy of techniques you can use to make software more reliable and secure. At the top are things like code review and documentation, testing, CI/CD-based development, things like that. And as you go further down you have things that are a bit more formal, like fuzzing or property-based testing. And then down at the bottom you have serious formal methods:  

The cheap things near the top are actually very effective, and that’s how most systems are built. CI/CD-based software engineering, for example, is very successful at blocking mistakes. And obviously, most SWE processes in the world do not use formal techniques in any way. But it’s also true that many many projects don’t use cheap assurance techniques, or they don’t use them anywhere near enough. Think of all the projects with small test suites, or no code review practices. 

This often makes it difficult to advise a client that what they need is a formal methods project. Suppose we have two possible projects: “do a formal method” and “write more tests”: 

As well as being cheaper, the project to write more tests has some other advantages. Testing tools are generally more mature than formal methods tools, so the project is likely to be more predictable. It’s probably easier to staff for the same reason. And although the level of assurance is lower, the client can probably try several different assurance methods for the same cost as the formal methods project, or target several subsystems. 

This makes it really hard for a lot of formal methods projects to pencil out. I think there are two possible responses from the perspective of formal methods advocates. 

The first approach (which I call gold plating) is you only apply formal methods after you’ve used all the other, cheaper methods exhaustively. I think this describes a lot of FM projects today. For example, in our proof on AWS-LibCrypto, we were verifying code that had mostly been derived from OpenSSL, via BoringSSL, and finally to AWS-LibCrypto. A core crypto library is an important piece of security infrastructure, so this particular code had been rigorously analyzed and audited over many years. Our verification project was the capstone on this, which finally gave bulletproof confidence that the library was correct. 

The problem is this really limits the target market for FM to a tiny number of security-critical components used by the most paranoid and well-resourced organizations on earth. 

The second approach (which I call YOLO because I’m extremely cool) is to say that actually formal methods can replace all the other cheaper methods. FM is all you need! This sounds great but isn’t really true in most situations. For one thing, approaches like code review and documentation have beneficial side effects other than merely raising confidence in the system. For another, those annoying certification requirements I mentioned earlier often require particular kinds of testing. And there’s just the fact that formal techniques are new, and so it’s hard for most people to trust them without more familiar techniques as backup. 

I think this is an argument for tighter integration between formal methods and cheaper techniques. Rather than either or, why not both? This is an idea we’re exploring in our C verification tool, CN

Key Levers: Costs vs. Benefits

I sometimes hear people claiming that formal methods are demonstrably better than the techniques software engineers mostly use today. The only reason formal techniques aren’t more popular (according to this theory) is that engineering teams are unaware, conservative, maybe put off by superficial difficulties like poor interfaces and documentation. I don’t think this is quite right. My observation is that engineers are mostly rational when thinking about costs and benefits, at least within the bounds of their particular systems and problems. 

This means that, when we have some formal method that seems under-appreciated, we have basically two levers we can pull. We can make it cheaper, or we can make it more beneficial. Either or both can haul the project across the line into break-even territory. 

Remember, we’re talking holistic costs, holistic benefits. Those benefits could be security, ability to sell things, etc. Costs could be people, time, money, etc. And this is a highly specific client break-even line. It could be that something that’s not valuable for one client may be very valuable for another.

The cost line matters a lot, in particular the human cost of learning and using tools. But I think that moving along the benefit line may be easier in some ways, because it really means building tools that solve specific client problems. Every time Galois does a project for a client, we learn a little more, our cost/benefit gets better, and next time around we get to make more interesting mistakes.