Statistically Safer Does Not Really Matter for Autonomous Vehicles

Crash-by-crash comparisons to a human driver will be the most important criterion

Nov 30, 2024

Statistical safety for autonomous vehicles is not going to be what drives societal acceptance. Instead, for each dramatic crash people will ask “would a human driver have done better?”

The robotaxi industry desperately wants societal acceptance to be all about being statistically “safer than a human driver” via achieving so-called Positive Risk Balance (PRB). They are busy publishing data analysis to argue this point, with marketing that claims victory even though there isn’t enough data about high-severity mishaps to know how it will turn out. To be sure, I think PRB is one of several important safety metrics. Heck, I even wrote a whole book that is largely about PRB. But PRB is simply table stakes, not a victory criterion. PRB will not be THE metric that captures public attention. The industry has chosen the wrong public relations battle to fight here.

Despite the rhetoric, trust is down. In large part this is due to adverse news cycles, coupled with untrustworthy corporate behavior. Poor behavior by Cruise is still making headlines. The constant drumbeat of Tesla robotaxi-wannabe hype, crashes and investigations hurts as well. Every high-profile crash makes the news, much to the frustration of the robotaxi public relations folks who desperately want everyone to believe the factually unsupported notion that the technology is already saving lives.

I have previously spoken and written at length about the deep reasons why there are more dimensions to societally acceptable safety than net risk, encompassing risk transfer, negligent driving behavior, standards conformance, absence of unreasonable fine-grain risk, ethics, and equity concerns. Those reasons aren’t going to go away, but many of them will not really gain public attention — if they ever do — until operational scales increase further.

I don't think PRB is what will drive the public narrative as technology scales up. Rather, it is already being driven by stories: good rides, bad rides, good driving behavior, bad driving behavior, and crashes. Indeed everyone wants to compare robotaxis to human drivers -- but on an individual ride basis, not on statistical outcomes.

We can boil things down to a simpler story for why the robotaxi safety issue is already so thorny for public acceptance this early in the game:

For each crash, the public will judge safety by whether they think they themselves would have avoided that particular crash as a human driver.

There are so many things for an engineer to dislike about this situation. For example, I didn’t say an “average” driver, but rather each individual driver, most of whom consider themselves to be above average. Moreover, this logic is impervious to any argument about “saving lives,” because the technology doesn’t get credit for saving lives with this framing. Robotaxis are simply judged on whether the loss from each individual crash was avoidable, with essentially no credit awarded for other avoided losses. That might sound unfair, but I think this is going to be the reality for the foreseeable future.

The industry is setting itself up to be vulnerable to this criterion by pounding away on the narrative that they will be safer than a human driver, and disingenuously claiming that they are already saving lives. Their strategy demands everyone compare them to human drivers. And that is what they are getting — but not the way they want it.

The industry wants the comparison to human drivers to be net statistical. But ask yourself: how many general audience members are really going to sit still for a nuanced argument about statistical metrics, confidence intervals, and threats to validity?

Nope. The general public has gotten the main message of “safer than a human driver,” but without the nuances involved in PRB. By default they are going to apply that message to the news they see on a headline-by-headline basis.

This means that if there is a crash that makes headlines due to a fatality or dramatic injury, the question of the day will be whether a human driver could have avoided that same crash. In particular, whether the human driver reading the headline thinks (with no burden of proof whatsoever) that they themselves would have avoided that same crash, or would have done better mitigating harm from an unavoidable crash.

Consider how this principle plays out with the Cruise pedestrian dragging mishap, which was a severe real-world injury event. Summarizing, a human-driven vehicle first hit a pedestrian, who was then thrown into the path of a Cruise robotaxi in the adjacent lane. The robotaxi ran over the pedestrian while coming to an initial stop. After stopping, the robotaxi then initiated a pull-over maneuver, restarting motion from a stop, then dragging the pedestrian down the street. The news from that event concentrated on a lack of transparency from Cruise about the pedestrian dragging portion of the mishap, but for our analysis let us assume that a botched public/regulatory relations interaction was not a factor and ask how people might respond to the facts of the mishap sequence.

Consider the following points (see this paper for details):

The robotaxi accelerated toward a jaywalking pedestrian in a crosswalk before the crash, figuring the pedestrian would be out of the way by the time it got there. This likely violated a California road rule by not slowing down for a pedestrian in a crosswalk — which applies even if that person is jaywalking. The human-driven vehicle next to the robotaxi made the same mistake.
The robotaxi was oblivious to the likely implications of a pedestrian being hit in an adjacent lane. While it saw that impact happening, the robotaxi did not react to the results of that collision until the pedestrian showed up several seconds later in its own lane. The computer could have done better, but this was (barely) arguably in line with what a human driver might have done if not paying attention to the adjacent lane. After all, as Cruise stressed in their messaging, the other car, not the robotaxi, hit the pedestrian first.
The robotaxi had time to stop completely before impact with the pedestrian. But doing so would have required it to react more quickly than an alert human driver. Instead, it hit the pedestrian at nearly full speed, slowing quickly, but with almost all of the deceleration happening after impact. The computer could have done better, but its performance was about in line with what a human driver could have done if waiting until the pedestrian showed up in the own-vehicle lane to brake.
The robotaxi did not call emergency services e911, but a passerby did. A human driver might well be shocked by the mishap, with a passerby calling e911 first instead. Hopefully a human driver would call e911 after getting over the shock if nobody else said they had already made the call, but the robotaxi and its support team never called as far as we can tell.
Immediately after impact, the robotaxi forgot there was a pedestrian trapped under the car. After stopping for a split second, the robotaxi started moving again down the road, dragging the pedestrian under the car. The mishap report makes it clear no human driver would have done that. Any reasonable human driver would have remembered visibly running over the pedestrian, who started out partly on the hood of the robotaxi before being dragged under, in the last handful of seconds and not moved the vehicle. Even if motion had been re-started, kinesthetic cues would have made it obvious that that something large (the pedestrian) was being run over by the wheels of the vehicle, causing any reasonable human driver to stop sooner than the robotaxi did.

A robustly designed, safe robotaxi should have gotten all 5 of these points right, but the Cruise robotaxi was 0 for 5.

However, the exercise is: which of these stands out as being the most egregious? Why?

(… Seriously — pause to ask yourself which of the above five items gives you personally the biggest reaction …)

I’m guessing #5 is the one you would have trouble forgiving for a human driver the most. Needlessly dragging someone down the street under their own car is going to be a problem for any human driver being judged for their behavior.

Is your answer different for a robotaxi than for a human driver? Should it be? Do Cruise’s claims of dramatically lower crash rates than human ride hail drivers change your reaction?

Surely there will be a lot of hindsight bias in pronouncements of whether a crash was avoidable. Coulda/shoulda/woulda is a national pastime. Nonetheless, I believe the standard that will prevail in the public narrative for a long time is: would a human driver have done better? This is not PRB, but rather “safer in the small” in which an event-by-event comparison is made by the reader to their own fancied driving expertise.

Eventually, if robotaxis amass enough data to show beyond reasonable doubt that they are 10x or 100x safer than human drivers for fatality rates at scale, judgment criteria might change to be more forgiving of specifics in light of statistical improvement. But how that turns out is likely to depend less on the numbers than the optics of newsworthy robotaxi crashes. And for now we don’t even know if robotaxis are as safe as human drivers for fatalities, let alone multiples safer. So this is where we are:

If the reader of a news story thinks “I would never have made that mistake,” the robotaxi company loses.

To illustrate the issue, one might imagine a world in which robotaxis are 1000 times safer than human drivers, but every single fatality is a spectacularly awful news photo. What if we’re going from 40,000 fatalities to 40, but all 40 are really horrific things a human driver would never have done? What if all 40 mishaps are computer driver behaviors that, if performed by a human driver, would be attributed to malicious intent to kill? Perhaps the case might be made that there is so much benefit that it’s still OK. I think it unlikely things will turn out this severely, but as a thought experiment it illustrates the potential vulnerability of a safety argument made solely on PRB to the general public.

If robotaxis are only 10% safer — or even twice as safe — as human drivers, there will be an increasing flow of embarrassing and occasionally scary robot-hurts-human headlines. It is hard to see how the industry can get past the negative publicity of human-would-have-avoided-that crashes at scale as fleet sizes increase.

The current industry strategy for this problem is to blame anyone they can but themselves for a crash. But that has its limits, leaving every robotaxi company vulnerable to even one severe crash for which there is just no way to evade blame. Just ask Cruise how that worked out for them. (There are other companies that folded in the wake of their bad crash as well.)

Driving on public roads is not risk-free. There will be a severe crash for each company operating sooner or later that can’t be dodged, and companies just don’t seem to be preparing as well as they might for the day that will happen.

My preferred way to address this issue is for the industry to stop betting the entire outcome on the “safer than human driver” narrative and instead operate on the basis of transparency, trust, accountability, and being a responsible actor when deploying immature technology on public roads.

But for now the industry is doubling down on the “saving lives” narrative. And that’s a shame, because in the long term this will hurt the industry.

Ryan Lindsay

Nov 30

I think PRB is still an important safety metric; for one thing it can indicate which way the system is trending. But I agree regarding what’s currently happening (everybody comparing to humans/themselves). And in particular I think when there are fatalities, the news headlines will blow up, and that will be a big risk for autonomous vehicle companies. Also, it does seem like transparency is a must.

Expand full comment

Cris Constantinescu

Dec 3

I appreciate this in-depth analysis of the robot driver vs human driver and the corporate public relations vs human perception of tragic events. I’m wondering what measures Cruise took after that horrific accident to avoid similar events in the future. May be raising the testing bar, before putting vehicles on the road? Needless to say, the other driverless vehicle makers should have taken notice too!

7 more comments...

Autonomous System Safety by Phil Koopman

Discussion about this post