Time To End Public Beta Road Testing of Automated Driving
It is not beta testing. It is an unsafe deployment of half-baked software.
Origin and evolution of the term beta testing.
Why public beta road testing is largely irrelevant to safety testing
Tester cosplay
China recently banned public beta road testing. Time for the US to consider doing the same.
For decades, conventional vehicles were tested by trained, qualified testers who were the employees of or skilled contractors for car companies.[1] Testing started in engineering labs, then progressed to test tracks. Testing only happened on public roads when the technology was mature. Public road testing was used to perform some final adjustments in conditions impractical to reproduce on test tracks, and make sure there were no surprises in diverse and extreme driving conditions.
Ordinary road users have not signed up to be experimental test subjects, and did not volunteer to be subject to the risks that might stem from a malfunctioning prototype. For decades, car companies have taken substantive steps to avoid exposing public road users to those risks. For decades, still-in-development, immature, prototype software was something that might run on the desktop computers of sophisticated, volunteer testers, but not on public roads.
Somehow the automotive perspective on public road testing changed when computer drivers started controlling steering. Now, suddenly, in some vehicles, retail customers have become “beta testers” of supervised automated driving systems – which abuses that term in multiple ways.
The history of beta testing
Historically, a “beta” tester was a sophisticated close partner to the developer, typically having a signed contract. Testers exchanged early access to the software for a requirement to report defects they might encounter. The presumption was that the software had already been through the manufacturer’s developmental “alpha” testing, and was thought to be ready to release as a product. Beta testers were there to discover issues stemming from unexpected requirements or unexpected usage patterns missing from the product definition. If a beta tester found a straight-up software defect affecting ordinary functionality that manifested in ordinary conditions, that defect escape constituted a major software development process failure. Beta testing was a defense-in-depth technique to mitigate the risk of important software defects being released.[2]
Somewhere along the way, software developers in a rush to market have distorted the meaning of the word “beta” to instead mean that the software was half-baked. This happened over a period of decades. Initial “public beta” processes were approximations of traditional beta testing, but with less rigorous customer screening.
In time, companies in a hurry dispensed with establishing a formal testing relationship and limiting deployment. Instead, they simply slapped the word “beta” on software not to mean that it was part of a purposeful testing phase, but rather that the product is a half-baked thing that is being publicly released to see if they might bootstrap a business with it. If such software is harmless when it fails, that can be a reasonable business strategy. But safety-critical functionality is an entirely different matter.[3]
In practice, the term “beta” no longer means a closely held pool of expert users providing feedback. Rather, it has become a code phrase for “don’t blame us for the bugs; it’s a beta and you should know that means there will be bugs; and it is free software we have not figured out how to monetize yet, so you have no right to complain.” At times, “beta” has also been seen as a badge of honor for being innovative and pushing the boundaries of getting the very latest technology into the hands of users without delay.[4] Both of those meanings can provide considerable business value. But they have fundamental conflicts with deploying products with acceptable safety unless something more is done.
As a result of this changed approach, many developers have dropped any pretense that beta software is close to ready to be sold as a final product. Or that public beta testing involved a special, close relationship with sophisticated users.
Rather, public beta versions tend to be a conduit for deploying evolving functionality, often over a period of multiple years. Perhaps that is OK for productivity software and general-purpose computing tools, especially in the world of free-to-try services. However, these applications do not tend to be highly safety-critical.
Public beta testing of safety-critical products sold for use by ordinary retail customers is a perversion of the idea of beta testing. The description of how public beta for Automated Vehicles (AVs) works and the arguments for its utility are seductively similar to narratives about how beta testing works for other products. But in the world of safety-critical products, public beta testing has contributed to severe injuries and even deaths.
AV public beta testing has turned into a way to deploy immature, still-evolving software to retail customers to both get them to think they are getting the latest shiny technology, and also deflect blame for any software defects. “It is only a beta, so bugs are to be expected.” “Sure, that failed, but the next release will blow your mind.”
Beta testing is irrelevant to (responsible) safety testing
A crucial technical issue with public beta releases is that they have little relationship to the type of methodical engineering development and testing process that needs to be in place for safety-critical system development. That type of testing involves a methodical approach defined in a test plan. Each test in a plan is linked to a requirement that is to be tested against. For each test, the system is presented with a specified scenario, the response is observed, and that response is compared against the behavior specified in the requirements.
Real testing works this way: “The system is supposed to do X in condition Y. Let us set up condition Y. We ran the system, and we did indeed see result X (test pass) or something other than X (test fail).” For a safety-critical system, the most important tests follow this pattern to be sure that all the safety requirements (the “does X in condition Y” part) indeed have been met. This specifically includes validating that all hazards have been mitigated as intended. This is the sort of thing that happens in engineering development and alpha testing, not beta testing. Any attempt to replace alpha testing with beta testing in effect is skipping over the core practices of safety validation.
As a concrete example: “The system is supposed to stop at a stop sign. We ran the system, and it failed to stop at the stop sign. That is a test fail.” Once a test fails, you do not need to watch it fail another thousand times to know it has, indeed, failed. Moreover, any number of test passes without having changed the software do not make the safety critical failure of a test failure (not detecting a stop sign) go away.
While some types of testing involve accumulating road miles or otherwise using a product to see what requirements might be missed, those are not the core of safety testing. Public beta mileage accumulation does not follow a test plan, and does not compare outcomes to an engineering behavioral requirement other than checking for reported vehicle crashes. It is more like informally messing around to see whether a crash happens.[5]
In the AV world, Tesla has used a public beta testing strategy to release their still-in-development automated driving software as “beta.” This has included their auto-steer functionality, as well as their so-called Full Self Driving (FSD) feature. For FSD, they added an installation warning: “It may do the wrong thing at the worst time.”[6]
The problem here is that we are not talking about a free, cloud-based e-mail system. We are talking about thousands of pounds of vehicle operating on public roads with “beta” software that the manufacturer freely acknowledges is capable of doing something dangerous in a situation that is difficult to manage. In the hands of a retail customer who has not been to test driver school.
Unlike manufacturer testers, public beta test drivers are not specifically trained, are not following a test plan, and are not operating a vehicle that has all known safety-relevant defects resolved before being placed on public roads. Despite the name, is not testing in any reasonable sense. It is deployment of immature, prototype software as a retail product in a way that puts other road users at risk.
The high cost of tester cosplay
Some AV beta test drivers glory in their tester cosplay.[7] Some might even have excellent test driver skills. Those drivers might make minimal contributions to long-term product development by mitigating and surviving novel vehicle misbehaviors and reporting them. But most misbehaviors in a large public beta testing fleet can be expected to be repeat reports of known problems, providing little or no value, while each incident imposes avoidable risk on road users.
A serious testing program does not need a fleet of vehicles to blow through hundreds or thousands of stop signs or red traffic lights for developers to know that it has a safety-relevant defect of poor stop sign performance. After the first missed stop sign or red light, public road testing of those specific functions should be shut down until the problem can be fixed. Data collection might continue under manual driving control, but testers should not be asked to intercede to prevent crashes until the problem has been fixed.
Worse still is the risk exposure placed on ordinary folks who buy a vehicle without fully appreciating the exposure placed on them by a disclaimer that an automated driving feature is “beta.” Many likely do not really appreciate just what that means. Others have bought into a narrative that the legal disclaimer is just there to keep annoying lawyers happy, and there is no real reason to worry. And so on. Nonetheless, they are taking on the role of being blamed by default for any crashes caused by malfunctioning automated driving features.
AV beta test drivers have suffered palpable harm from their role as a tester. Some have been seriously injured or killed.[8] Some have faced serious legal consequences for injuring or killing other road users.[9] All of them can expect to be blamed for failing to mitigate dangerous computer driver behaviors because, after all, they decided they wanted to cosplay as test drivers.
Time to ban public beta road testing
Beta testing of anything other than a product ready to be shipped as a series production feature, by trained test drivers, has no place on public roads.
China has gotten the message, and has banned public beta testing in the wake of a horrific multi-fatality crash.[10] It is time for the US to get that message as well.
This is a preliminary version of a section from an upcoming book by Phil Koopman. Expected release in 2025.
[1] A typical process would require a design engineer to go through a test driving school held on a closed course before being allowed to take a car on public roads with anything less than production-release software. The expectation was that software had passed closed course tests run by trained testers according to engineering test plans before being allowed on public roads.
[2] See Sharrow, 2025: https://softhandtech.com/why-is-it-called-beta-testing/
[3] This is related to the concept of Minimum Viable Product (MVP), which is the simplest product one can ship that provides value to the customer, with the notion that even buggy software can provide enough value to be worth tolerating minor bugs. Deploying an MVP can be a crucial step in a startup company’s life to help define and refine a product offering. However, Minimum Viable Safety (which is not a standard term) requires more than a proof-of-concept software release that is prone to causing loss events. https://en.wikipedia.org/wiki/Minimum_viable_product
[4] Google gmail was famously “beta” through its user scaling-up period. That started with its limited public release in 2004 via an invitation system. Beta status continued through its general public release in 2007, and on into July 2009. See: https://time.com/43263/gmail-10th-anniversary/
[5] There is an approach of taking huge numbers of road miles and finding scenarios that just happen to match required test scenarios to check off test plan items. But that defers understanding whether software is safe until after deployment, meaning that potentially unsafe software is being put on public roads in the hands of ordinary drivers.
[6] See McFarland, 2020: https://www.cnn.com/2020/10/30/cars/tesla-full-self-driving/index.html
[7] For an explanation of the term cosplay, see: https://en.wikipedia.org/wiki/Cosplay
[8] An early victim was said to be interested in testing the limits of his automated vehicle functionality. He paid with his life. See Abrams & Kurtz, 2016: https://www.nytimes.com/2016/07/02/business/joshua-brown-technology-enthusiast-tested-the-limits-of-his-tesla.html
[9] This is not to say that the drivers supervising the automation were necessarily blameless, but rather to point out that failure to supervise self-driving technology can bring significant legal consequences to the beta tester, even if the technology itself has defects due to being in a less-than-completed development status. Drivers might not appreciate the seriousness of their legal exposure. There have already been multiple drivers facing criminal charges for mishaps.
See Krisher & Dazio, 2022: https://apnews.com/article/tesla-autopilot-fatal-crash-charges-91b4a0341e07244f3f03051b5c2462ae
See Brownell, 2024: https://www.jalopnik.com/tesla-driver-who-trusted-autopilot-charged-with-killing-1851428652/
See O’Kane, 2024: https://techcrunch.com/2024/09/04/woman-who-allegedly-killed-two-people-using-ford-bluecruise-charged-with-dui-homicide/
[10] See Yoshida, 2025:
Thank you for highlighting this important topic. It is also important to emphasize that serious companies (e.g. logic-bearing medical devices, aircraft, navigation infrastructure and industrial machinery including offshore oil rigs, etc.) that expose the public to serious harm from developmental software embrace detailed test protocols and/or physical barriers to assure safety before release. Technology readiness levels published by the US DOD and DOT [1] and CMMI software maturation practices from Carnegie Mellon University [2] (I think you have heard of this institution, Phil) provide useful guidance. Beta testers are demonstrably competent, indemnified, and paid. While not perfect, quality escapes from those protocols are rare and subjected to extensive root cause determination before testing is resumed. AVs and consumer firearms are the only products that rely on customer or third party liability, injury, or death as surrogates for competent test protocols. That needs to change. In favor of public safety.
[1] https://www.gao.gov/products/gao-20-48g
[2]https://cmmiinstitute.com/learning/appraisals/levels
In my country (in Europe), the
'pre-homologation prototype' approval framework requires you to obtain a test permit before testing any supervised / full automation system on public roads.
It's actually been inspired by SAE J3018 and the AVSC publications. Hence, we address some key elements like system maturity acc. to TRL scale, safety driver training program and qualifications, driver monitoring and other risk mitigating controls such as a limited test fleet size, limited test periods and weekly reporting as part of the oversight.