NTSB Investigates Waymo School Bus Incidents
What does this mean, and what might we do to improve safety while the process plays out?
The US National Transportation Safety Board (NTSB) has opened an investigation into the incidents of Waymo robotaxis driving past school buses engaged in student pickup/dropoff in at least two states, focusing on Austin TX incidents. This is in addition to a previous NHTSA investigation and subsequent safety recall in December 2025 failed to fully address the issue.

What We Can Expect
NTSB is all about a deep, thorough dive. Things do not happen quickly. We can expect a preliminary report within 30 days that summarizes initial facts. The detailed final report that is issued after a board meeting is likely to take 12 to 24 months. I anticipate that the board meeting as well as final report will be both detailed and instructive, as they tend to be. The report will include safety recommendations that can benefit the whole industry.
This process is not about quick fixes, but rather deep insights that benefit the whole industry by improving safety for the long haul.
What Role Does NTSB Play?
It is important to keep in mind that NTSB is a neutral investigation agency, and not a regulator. NHTSA investigations might result in mandatory or voluntary recalls, while NTSB recommendations result in non-binding recommendations.
NTSB investigations are widely considered the gold standard of trying to understand what happened and what changes should be made going forward to apply lessons learned from transportation incidents. NTSB reports typically have a diagnosis of the contributing causes to an incident, but also a set of recommendations. The thing to pay attention to at the industry level is the set of recommendations. Whoever might be blamed for these incidents, what matters most for the on-going safety of the industry is preventing future loss events.
NTSB typically makes each recommendation to a particular stakeholder. For example, there might be a recommendation to a car maker to improve their technical approach, a regulator to improve their regulatory framework, and a fleet operator to improve their operational practices, all for the same mishap. NTSB tracks the responses to recommendations as to whether a response has been received and whether NTSB deems the response satisfactory.
Perhaps surprisingly, many NTSB recommendations in the highly automated vehicle area do not get satisfactory responses, or even timely responses from some stakeholders. NHTSA has notably failed to provide satisfactory responses to many NTSB recommendations. A slightly dated list of relevant recommendations can be found in this 2021 response to a NHTSA request for comments, which lists several open and unacceptable responses to previous recommendations. (NTSB rated NHTSA and US DOT responses unacceptable for recommendations going back to 2015: H-15-4, H-17-37, H-17-38, H-17-39, H-19-47, and H-19-48. A few responses were deemed acceptable.) NTSB maintains a database of investigation information that might have updates on these and other recommendations.
Even if stakeholders do not take acceptable action on NTSB recommendations, those recommendations still have significant value in documenting what the Autonomous and Highly Automated Vehicle (AV) industry should be doing. Responses (satisfactory or otherwise) provide transparency to other stakeholders as to the degree the AV industry is (or is not) taking safety recommendations seriously.
While NTSB is famously known for investigating aviation crashes, their scope includes other transportation modes. In fact, 18% of their safety recommendations apply to highway transportation. Investigations frequently involve fatal mishaps, but responding to non-loss incidents is also within scope. They have previous recommendations specific to school bus incidents, which might play a part in motivating their decision to investigate in this case even though a fatality has not (yet) occurred.
I have seen Waymo boosters try to downplay the need for such an investigation by NTSB, but I think their viewpoints misapprehend the value provided. And those trying to argue the incidents are unworthy of NTSB’s attention simply because nobody has gotten killed (yet) need to get a clue.
Even if Waymo figures out and fixes the technical problems involved, a significant benefit of an NTSB investigation is that other stakeholders can also benefit from recommendations that would otherwise be held as Waymo secrets. The point here is for the entire industry to learn important safety lessons, not help Waymo find their software defects.
Starting Points for Improvement
To be clear, it is up to NTSB to determine the scope and outcomes of this investigation. And as an outsider I do not have the ability to predict which way they will go.1
However, I think it might be helpful to the industry to consider discussion prompts based on these incidents. Some of these topics might play a role in the NTSB report, while others turn out not to. But I think all are relevant for stakeholder discussion in light of publicly available information. We don’t have to wait 12-24 months for an NTSB report to start these discussions. (But when that report does come in, there will no longer be a plausible excuse for dodging the discussions relevant to their recommendations.)
What decision criteria should an AV company use to stand down vs. keep operating when there is documented proof that their computer driver is breaking road rules in a way that would constitute a moving violation for a human driver? In this case the Austin school district (ISD) asked Waymo to stand down operations during active school bus hours, and Waymo said no. (Waymo could also have installed in-vehicle safety drivers, or used other technical mitigations such as avoiding school bus positions available in on-line tracking software apps, but they elected not to do that as far as we know.) Should there be a stand-down decision process that is more transparent to and involves stakeholders outside the AV company?
How can we have confidence that when an AV company says “it’s fixed” in the future that it is actually fixed? It seems that Waymo has two failed fixes for this problem. While they say things have improved and they were better than human drivers to begin with, some back-of-the-envelope analysis shows their track record is dramatically worse than human drivers at school bus traffic violations (perhaps 10x worse). How does the industry regain stakeholder trust that fixes are really fixes after this repeated failure to accomplish a real fix? “Trust us bruh” only goes so far, and that’s not as far as it did before these incidents. (Perhaps NTSB will find out it really is fixed at some point, but that does not really resolve the track record to this point in terms of eroding trust.)
We’ll have to see if the school bus incidents were happening all along and not caught (or not acted upon) by Waymo until publicly reported, or if they are new behavior.
Waymo already knows the answer to whether this is an old problem or a new one. If they don’t that is problem at a much deeper level. I imagine NTSB will dig into this. But for now other companies should be asking themselves the analogous question for whether/how they would know of moving violation driving behavior before they are shamed on social media and an investigation takes place.
If bad behaviors were due to software defects there all along, why did Waymo either not know about them or not address them before the public shaming campaigns started?
If bad behaviors are a new phenomenon for the Fall 2025 school year, then what caused this degraded behavior and how did they slip through the validation process? If they are new failures, that would undermine claims of safety via lots of miles because safety on previous miles apparently was for different software, and clearly did not predict safety around school buses accurately for these new miles.
Either way, what other critical hazards might Waymo (and other companies) have missed that will show up as they scale up to multiple cities?
Do other AV companies have sufficient validation that they will not make similar mistakes when they deploy around school buses? Or for that matter will they struggle with the meaning of flashing lights or other-than-pole-mounted stop signs in general?
Difficulty in getting safe behavior around comparatively rare situations with comparatively unique driving behavior rules is exactly the type of problem I would expect to see in a system that uses end-to-end machine learning. (For example, the meaning of a stop sign mounted on a school bus or held by a crossing guard is quite different than a stop sign on a signpost.) Regardless of whether this is the root cause for these incidents, have other players worked through how they will mitigate risks like this if their end-to-end systems struggle? Expending special validation effort early in design for school buses will at the very least serve as an early warning whether this is a problem area.
Should school districts prioritize automated enforcement cameras mounted on school buses when robotaxis come to town? The reports are from Austin primarily because they have automated enforcement cameras. Reports from Atlanta are also coming in, but depend on observers catching robotaxis in the act. Beyond that, it seems the Austin experience is they help with human violations as well, with only a 2% recidivism rate, so doing this might be a win regardless of robotaxi behavior.
School districts and municipalities should consider how proactive they need to be in detecting and pushing for accountability for robotaxi misbehaviors. An apparent lesson learned from these incidents is that accountability follows from public pressure campaigns by local stakeholders. How do localities plan to collect, aggregate, and take action on robotaxi misbehaviors?
State regulators should reconsider their “let it rip” approach to autonomous vehicle licensure and regulation. A human driver who violated school bus safety road rules dozens of times would likely have lost their license, but there has been no discernable state regulatory consequence to Waymo from their driver doing that. Consider something analogous to a human-driver point system for autonomous vehicles to force an operational stand-down or ODD exclusion when repeated dangerous driving behavior occurs. We should not have to wait for a child to be killed by a robotaxi to decide that robotaxis should not be given a free pass while the company repeatedly tries (and thus far fails) to fix critically unsafe driving behavior over a period of many months.
Federal regulators should consider FMVSS criteria for vehicle automation related to high-risk road rule violations such as stopping for school buses, stop sign behavior, red light behavior, pedestrian crosswalk behavior, and rail grade crossing behavior. While a road test for autonomous vehicles does not prove safety conclusively, a test-based FMVSS will provide (a) basic testing to keep clearly incapable vehicles off the road, and (b) a more clear-cut basis for recall authority if those type of incidents happen on public roads despite vehicles having passed related the basic testing.
I’m eager to read NTSB materials on this topic as they are released. I imagine they’ll come up with some recommendations in areas I missed. And perhaps the investigation will show that some of the concerns reflected above missed the mark when the full extent and nature of the failures is understood. But if even one of the ideas above sparks a stakeholder getting ahead of the curve to improve safety sooner rather than later, that will be a win for public safety and ultimate success of AV technology.
Phil Koopman has been working on self-driving car safety for about 30 years, and embedded systems for even longer. For more on applying AI, see his new book: Embodied AI Safety.
This is part 2 of a two-part post on this topic. See part 1 here:
It is important not to try to front-run NTSB root cause analysis, and I am not attempting to do so here. We do not know which of the topics I discuss were contributing causes to these incidents. If there had been fatalities, then speculating as to the root cause would be problematic as a matter of process and respect to the victims’ families and any operational personnel. But in this case nobody has been hurt (yet), and these suggestions I believe are important to consider regardless of the ultimate NTSB contributing factor findings. My points should be considered as topics everyone in the industry should be thinking about in light of these incidents happening, whether or not they ultimately turn out to have been specific contributors to this particular set of incidents.


Thanks for the article, enjoyed it.
Waymo needs to get this fixed and I support the NTSB investigation.
I’ll push back on how you are framing the safety debate overall.
If AVs are much safer overall but less safe in a set of rare edge cases, and the net result is many fewer deaths and injuries, is that a tradeoff worth making?
Almost everyone reading this post commits 10+ moving violations a day. Speeding, rolling through stop signs, failure to signal turns, running stale yellow lights, etc.
That is the baseline that AVs are trying to improve on. I don’t what them to have a blank check but I do want the approach that saves lives overall.