As you might have heard already, there is a new Waymo safety dashboard as of September 2024. I’ve received many queries about what I think about it, and other reviews have been decidedly mixed. This is a good news/bad news kind of situation. So buckle up, and here we go…
(Image source: https://waymo.com/safety/impact/ )
The good news: Waymo is publishing results of collecting safety outcome data, including miles traveled to pair with SGO data. This is a good thing! I know a lot of work is going into this, and I salute the hard-working engineers at Waymo who are pushing to get this data for analysis.
Their data suggests that for fender-benders and low severity crashes they seem to be doing well enough so far — according to the criteria they have selected. With 22 million miles we’re starting to see more plentiful data for lower severity crashes. We’ll find out if things stay on track in the next year I’d imagine. But this is all subject to the decisions they have made in data analysis. There will be and should be discussion about the data analysis approach. (Note that fatality data is insufficient to draw conclusions. More on that later.)
In reality, as long as Waymo is not dramatically worse at low-severity crashes, probably societal acceptance will not hinge on anything in this data or their analysis method details. Rather, acceptance will be driven by other factors related to trust. (We’ll get to that below.)
Close enough to average human drivers is probably going to work out for them if they are in the ballpark — noting that an “average” human driver includes all the drunk and distracted drivers with vehicles that don’t have all the safety features we expect in new cars like the ones Waymo has. So while it is worthwhile and important to have the discussion about how to collect, organize, weight, and otherwise do this data analysis, I don’t see the outcome being make-or-break for Waymo at this point.
In other words, from a purely data analysis point of view, probably things are still maturing, but generally on track — however…(*)
((*) Having recently been quoted out of context on this topic, I consider any quote of this sentence without the “however” to be a deliberate misquote.)
The needs-more-thought news: Some consider the Waymo data analysis techniques controversial, and in general there is some skepticism about these numbers from a number of conversations I’ve had. I hope that others will dig deep into their data analysis techniques, provide suggestions that improve their data evaluation criteria checklist and so on. A common theme I’ve heard is that their data collection and reporting is far better than for other companies — but that the industry in general is pretty opaque, so this is only a relative statement.
They have disclaimers on that new data web page such as not saying the data predicts fatality outcomes, and that “There’s no single metric to evaluate the safety of AVs, and an aggregate, retrospective analysis like this may be one important factor in confirming design elements and predictions done in earlier iterations of our safety determination lifecycle.”
Overall, they’re collecting data, they’re publishing numbers, and they’ve gotten a lot better in the last couple years about more clearly stating the assumptions and limitations to their research.
However…
The bad news: The way this data is being deployed by Waymo beyond their research publication machinery ranges from a distraction to outright propaganda. That is not the fault of the researchers (to the degree those researchers do not themselves promote the propaganda). But it is an issue for Waymo as a company.
Waymo’s tactic of presenting tons of safety data as evidence of good intentions is apparently supposed to build trust. But mischaracterizing those research results as Waymo is doing will erode trust — either as more people figure out the game, or as real world mishaps (and even just embarrassing videos) continue to cause people to realize reality diverges from the hype.
Here are some persistent issues with Waymo’s approach:
They keep saying their data proves they are already saving lives — even as their research results make it clear we are a factor of more than 10x (perhaps 100x) in miles away from having any idea how that turns out. (See details below at the end of this post for more.)
Waymo’s strategy is for the most part to operate with as little independent oversight as they can, and tell us how it turns out. We simply have to trust them to do the right thing as they expand operations to new operational environment and push out continual software updates. This an industry-wide playbook, with the AVIA lobbying organization being their proxy for a state-by-state campaign to make sure manufacturers have the least possible accountability for harm done to other road users. Waymo is just as problematic for their lobbying practices as any other company that supports AVIA. As an additional data point, Waymo is reported to have spent $1.2 million in Sacramento on lobbying, with much of it spent just before a controversial CPUC permission vote. In the real world you get safety only with independent oversight. This concerted effort to minimize effective independent technical oversight is a continuing reason for concern.
Previous data analysis reports provoked trust concerns — such as stopping the data analysis period the day before a passenger injury and then trumpeting “no passenger injuries.” Once burned, twice shy. It is no wonder people are reluctant to trust this very complex data publication based on face value. The onus is on Waymo to earn trust. They need to do better, and they need to make it easy for us to understand they are doing better. (Shouting really loudly that this time they are good actors while failing to show contrition for missteps isn’t enough. Neither is saying they are better than the other actors in a generally problematic field of players.)
Remember that Cruise had safety data that showed they were oh-so-much-better than human drivers. Even as their scaling up suggested the trend wasn’t hold up. Then one day we all found out things weren’t what they had been saying. Waymo isn’t Cruise. But the AV industry reputation overall is not one that should make us eager to give any company the benefit of the doubt when they are over-claiming safety beyond what their data actually supports.
At a higher level, the issue with the new Waymo data dashboard is that, no matter what you think about the technical merits of the data analysis, it is being used as part of an ongoing Waymo public & government relations campaign to take more credit for safety than is deserved. In practice the dynamic is we are all supposed to forgive negative externalities imposed by this company on other road users and residents of the cities they operate in because they are busy saving lives (with no data to prove that is more than aspirational — yet). If they want forgiveness for making messes, they should earn it on the merits, not on a disinformation campaign about forward looking statements.
What about the traffic jams? There have been recent videos showing Waymos stuck in traffic jams. (See the viral parking lot videos, but there are other instances of Waymo-induced traffic jams.) They are embarrassing to be sure. And the fact that it took someone reporting the behavior to get Waymo to fix a flock of them honking for no reason at 4 AM — and that it took two tries — undermines confidence that Waymo actually knows about and immediately fixes vehicle behavioral problems. But they were not injuries/fatalities.
The bigger issue here is that as long as Waymo decides to sell on “we’re better than human drivers” rather than actual, measurable social benefit, they will be vulnerable to brand tarnish from every embarrassing video clip. And there is no doubt we have not seen the last of these. They are using a misguided and obsolete playbook of selling on saving lives, but they keep doubling down on it. If you want to know the metrics I’d much rather see in play, see my previous post on The Societal Case for Robotaxis.
Below is a repost of a previous social media post on the difference between Waymo research results and what they over-claim in their public/government relations. And in particular why their PR claim of saving lives contradicts their actual research results. (This is new material to substack, so including it here for those who might have missed it. So consider this a 2-for-1 posting.)
Waymo's Misleading Claim of Saving Lives
Waymo claims they are "already saving lives" so often, and people are so taken in by that misleading claim, that I'm going to take a moment to explain why it is misleading. And especially harmful when used as justification for loose regulatory policies as it so often is.
The claim: "The data to date indicates the Waymo Driver is already reducing traffic injuries and fatalities." Here is the claim, which has been at the top of the Waymo Safety landing page for quite a while now (https://waymo.com/safety/ including as of Sept. 19, 2024; highlighted of those words added):
Having had high school English, I would interpret that sentence as also including an unproven claim of "already reducing fatalities" being supported by data. And I would expect that anyone authoring this sentence would reasonable expect a reader or listener to conclude "already reducing fatalities." Those listeners include federal and state regulators and legislators. And journalists.
This claim is absurd for a simple reason. US driving data shows human-driven vehicles have ballpark 1 fatal crash per 100M miles (varies by year, zip code, etc. -- for more nuance see this narrated video slide which is in terms of fatal crashes, noting that some such crashes have multiple fatalities). But their latest study is for only 7.1 million miles. They need something like 40 times more data prove they are actually saving lives with statistical confidence (almost certainly it will be much more). (NOTE: they are at 22 million miles since this essay was written. Nothing substantive has changed here.)
What is really going on here seems to be some sort of word game that is essentially guaranteed to mislead readers. Their 7.1 million mile study talks about a bin called "any-injury-reported" crashes that were lower than human-driven vehicles, and fatalities are a subset of that bin. So the claim being made is (apparently) the bin containing fatalities is better than human drivers. Without mention that the sample size is too small for valid conclusions on fatalities. So maybe they have saved about 0.07 or perhaps even 0.10 lives depending on the baseline you use for human drivers -- and maybe not.
But don't just take my word for it, see for yourself this excerpt from Waymo's own paper saying "Serious injury and fatalities are a subset of this any-injury-reported benchmark, but no statement on these outcome levels can be made at this time based on this retrospective data." In other words, Waymo does not have enough data to know how fatalities will turn out. That's the truth. Waymo's safety landing page claim is something other than the full truth.
Waymo paper: "Comparison of Waymo Rider-Only Crash Data to Human Benchmarks at 7.1 Million Miles" https://arxiv.org/pdf/2312.12675 (top of page 15; highlight added)
(Originally posted on blogger, 6/24/2024)
Thanks Phil. Few day back a colleague brought this stats for acceptance criteria and validation target discussion. Then, I try provide a few analogy of the dashboard stats.
- 84% fewer airbag deployed - These robotaxi operate major of the mileage without occupancy. In that case regardless the speed airbag doesn't deploy if no passenger detected.
- 73% fewer injury causing crashes - same as above along the explanation from F perkin.
- 48% fewer police reported crashes - many crashes remain unreported as no direct driver involvement or any severe damage due to low speed and defensive driving.
These are purly misleading numbers. It is advisable to use meaningful justifiable stats relevant to the use case.
Thanks.
Perhaps also worth noting that Waymo
operations to date have all been at low speeds which are only rarely associated with serious injuries or fatalities. Waymo has recently received permission to operate on freeways in San Francisco which will quadruple their (inherently hazardous) kinetic energy and reduce time available for adverse event reaction. Good luck to Waymo and the tens of thousands of unwitting vulnerable test participants in this poorly bounded dangerous experiment.