Independence and Safe Deployment Decisions

The remedy to Go Fever is independent checks and balances.

Aug 17, 2024

In a parallel universe it’s mid-December. The driver-out robotaxi demo deadline of December 31st is looming. The developers are sleep-deprived but think things are going great (all things considered). The testers are reporting problems, but nothing the devs can’t turn around in a day or two. The CEO is losing sleep over needing to meet the deadline to unlock next year’s funding. The test vehicles haven’t tried to crash in the demo part of the city in at least a couple weeks. And everyone desperately needs a week of down time with their families for the holidays to recover from a year-long push. Time to demo, and everyone has Go Fever.

The big meeting comes — the CEO needs input to decide whether to do the big driver-out demo on public roads now.

Traffic lights showing red to green / https://pixabay.com/photos/traffic-light-signal-traffic-street-876050/ — Ready … Set … GO!

Here is the case for doing the demo:

If we demo successfully, we get another year of funding by meeting the milestone. We figure 50/50 shot it will work the first time.
If we demo unsuccessfully, we can try again until the end of December to get rid of whatever minor bug comes up. We figure 95% chance we can pull this off.
If we demo and it is catastrophic, and the crash gets on social media, our company dies. But we think this is very unlikely — perhaps 5%. And nobody seriously believes a pedestrian will get hurt by just one measly demo.

But if we do not demo:

The longer we wait, the less time there is to fix any bugs. If we wait until Dec. 31 to improve safety, and get only one shot, perhaps only a 50% chance of success.
If we don’t demo by December 31, the company dies for sure due to lack of funding.

By the way:

The safety team is telling us they won’t sign off on the demo because the safety case is nowhere near complete. This is due to chronic understaffing and developers being too busy fixing bugs to do safety “paperwork.”

From a management point of view, this is a no-brainer. Doing the demo has good up-side. Not doing the demo is only down-side and guarantees the worst case will happen on December 31st. So we do the demo, and sure enough it works. (We had the safety team block all the streets at the demo location, doing it at 3:30 AM with no cars and no pedestrians — which let the safety team approve of the demo despite their concerns.)

The company gets another year of runway. Hooray!

But … the next year we’re on a treadmill. We need to get more cars on the road and run them without drivers. Slowly at first, then more quickly, things spiral out of control. Embarrassing headlines. A small crash that makes the news. Then a bigger one. And now we have a big problem. The C-suite has been sacked. Lots of our colleagues have been sacked too. Our stock options are worthless. We’re hanging on by a thread with our investors. Things look bleak.

We simply did what we had to do for the company to make progress on our mission to save lives. What went wrong to get us here?

I would argue that the second thing that went wrong was the doing a driver-out ride without independent safety oversight gating the decision. The first thing that went wrong was was not baking safety oversight mechanisms into the initial planning used for fund raising.

I believe you tend to get what you incentivize. If you incentivize risk-taking and aggressive deployment decisions, that is what you will get. And without some sort of independent check and balance you are likely to eventually get an outcome that presents existential risk to the company due to a catastrophic mishap.

If your business plan is to roll the dice and IPO before you get unlucky, then, well, I guess you don’t need independent oversight. … If you can handle potentially living with knowing your autonomous vehicle seriously injured or even killed someone. (I expect that living with that knowledge is worse in practice than it is as a thought experiment. Ask rank and file folks who have been through it.)

But if you want to actually put safety first like it says on the company’s web page, the thing you were missing in the above scenario was the independent safety person at that go/no go meeting. Their job is to moderate Go Fever, ask the hard questions, and be there to say “no” when it needs to be said. Doing so is no small job. They will only succeed if they have robust institutional support.

Any company should have an independence mechanism for approving release of life-critical software — both initially and for updates. The degree of independence for any particular decision might vary based on estimated risk, so to keep it simple let’s just talk about initial removal of a driver from a robotaxi for the first such trip on public roads.

The key question is what pressure is on the independent safety oversight to say “yes” when that isn’t really the right answer. If they say “no” do they get fired? Do their stock options tank? Are they somehow incentivized to give a particular answer? Do they work for an independent organization that specializes in safety assessments?

Ask what incentives someone has to say “yes” when they should be saying “no.” Here are some potential decision gates for making that fateful decision, with an approximate ranking from least independent to most.

CEO unilaterally decides when to deploy, potentially ignoring any safety advice
Internal group majority votes (e.g., CEO, CTO, Chief Safety Officer)
Internal group decides, everyone has veto power
Internal group decides; unfavorable external assessment can be ignored
Internal group decides; favorable external assessment is required against industry consensus standards (i.e., an independent opinion of “best practices”)
Internal group decides first, then external regulatory type approval is required

To be sure, things can get complicated. Regulators or external assessors might not have enough information or expertise to give iron-clad approval quality. The unilateral CEO decider might actually listen to safety advice. The internal group might be persuaded by a dissent even without a veto mechanism. Or not.

To get safety, investors have to be OK with someone saying “no” when things aren’t ready. If safety is treated as an afterthought that’s not going to be the case. So that means serious safety engineering scope needs to be baked into the plans way up front at the fundraising stage. And major business KPIs have to be tracking safety, not just functionality.

There is a moral hazard here. Even if the autonomous vehicle is 10x or 100x more dangerous than a human driver, it might well go a large number of miles without a crash. The team might be able to whip it into shape before that crash has a chance to happen. In fact, that premature release plus fix safety later might actually be the company’s explicit intention. After all, if something goes wrong they can always pin it on a rogue CEO who gets fired and then be seen to get safety religion.

And goodness knows we wouldn’t want to stifle innovation!

But it’s smarter (or at least more ethical) set up a process that incentivizes the outcome you want with the right level of checks and balances. That serves to deter irresponsible innovation from putting other road users at risk.

Regulators and the public should judge every company based on the transparency they provide about the independence of their decision process as a critical factor. If the company is not talking about how they make the go/no-go decision or are cagey about that topic, assume that when push comes to shove they will roll the dice, do the demo, hope to get lucky, and do damage control if they don’t. Because without highly visible evidence to the contrary, that is probably what they are incentivized to do.

Phil Koopman is a safety advocate who has been doing autonomous vehicle safety for a really long time, and is a supporter of responsible innovation. He’s wondering how many more catastrophic road user harm incidents the autonomous vehicle industry can survive.

Patrick Hillberg

Aug 17

Wow. The safety manager you describe is the lived experience of one of my students (probably more) a few years ago. This clip from one of my lectures was developed based on their experience. Very similar to your post.

https://youtube.com/clip/UgkxtLsfgvyTAonHvKRDAgfLX0LIczF1Njvq?si=D91vlr3fR5se2Nln

Expand full comment

1 reply by Phil Koopman

Cliff Bargar

Aug 18

Great description of how some of this process could occur in a business - I think it applies as much to a startup as to a public company. Part of the problem for many of these companies is that when they claim to be prioritizing safety, or even measuring it, it's superficial metrics like disengagements or incidents per distance traveled or per trip, rather than really building a safety case from the ground up and ensuring robust design and testing practices.

4 more comments...

Autonomous System Safety by Phil Koopman

Discussion about this post