Adapting ANSI/UL 4600 Into A Self-Driving Regulatory Standard
How to make UL 4600 objective and repeatable for regulatory use
Regulators around the world are struggling with how to deal with regulating self-driving cars. Perhaps an existing industry consensus standard will do the job here, although maybe not in the way you might expect. Let's talk about adapting ANSI/UL 4600: Standard for Safety for the Evaluation of Autonomous Products to serve as a regulatory standard for Highly Automated Vehicles (HAVs) such as robotaxis and robotrucks. The secret sauce is a figurative bank audit approach to safety case conformance checks.
Self-certification and driving tests
Currently, US HAV manufacturers are required to "self-certify" to the Federal Motor Vehicle Safety Standards (FMVSS). Those standards apply to conventional vehicle functionality. Adaptations are being made to address robotaxis and eventually robotrucks, mostly in the area of permitting them to omit steering wheels and other in-vehicle human driver requirements. But when the dust settles, the current approach to FMVSS will not ensure that an HAV’s computer driver can drive safely. It will just address conventional vehicle functions: Do the brakes work? Do the headlights work? Do crash safety features work? And so on. Nothing about driving skills beyond plans for automated emergency braking tests.
A common idea to address the gap between FMVSS and safe driving behaviors is to expand FMVSS to include some sort of driving test, “vision” test, and so on, just like a human driver would get. That can certainly keep the worst stuff off the road.
But a human-style driving test will be ineffective on its own at proving safety. The problem is that the most important part of a driving test is the birth certificate proving the driver is a human of sufficient age. That birth certificate is a proxy for being able to reason about how the world works in unstructured situations, a survival instinct that motivates avoiding being hurt in a serious crash, and some level of both social responsibility plus maturity. (Sure, go ahead and get snarky about teenager drivers. But there is a minimum driving age, and that is the point here.)
Because computers are not people, a major problem with regulating HAVs is that testing simply won't prove them safe. There are too many edge cases for testing to cover them all. A human-style driving test can play a role, and is certainly better than nothing. But it will not come close to solving the whole safety assurance problem.
Process-based standards
We solve computer system safety in other application domains by requiring engineering rigor, usually in the form of conformance to safety standards. Safety standards involve following a specified engineering process during design, and assessing for compliance to that process. That approach is a big part of the reason for aircraft, trains, and many other products to be safe as they are.
In late 2020, NHTSA published an Advanced Notice of Propose Rulemaking (ANPRM) that would have required compliance to three safety standards for HAVs: ISO 26262, ISO 21448, and ANSI/UL 4600. Perhaps today that might be updated to include the newly published ISO 5083 and ISO 8800. The general idea was to apply available industry standards rather than create government-written standards, in keeping with a US government regulatory policy to prefer industry-created standards when available and appropriate.
Public comments were submitted on the ANPRM … and it is apparently going to die of neglect. To be sure, there are reasonable concerns to be hammered out to progress this proposal to a rule, but it seems like that is not going to happen. Automotive regulations are not ready for process-based safety standards, at least in the US. Instead it seems we have a new plan that prioritizes removing barriers to deployment and reducing incident reporting requirements.
Right now regulation mostly amounts to hoping there are no bad actors in the chase for a trillion-dollar market, waiting for crashes, and working the recall system. I have no doubt that regulators want to do better. Stakeholders deserve better. And I think the industry wants to do better at providing safety assurance, but they cannot bring themselves to commit to regulatory oversight of process standard compliance (ISO 26262, ISO 21448, and the like).
We need to find a safe space for regulations. Regulators like objective, repeatable, test-base safety standards. Something that anyone with knowledge in the area can apply and get the same result. But we don't have such driving tests for HAVs, and even if we did they would be more about basic driving competence than an assurance of real-world safety outcomes.
So what can we do?
ANSI/UL 4600 and self-certification
The first piece is to have manufacturers self-certify to ANSI/UL 4600.
UL 4600 is not a process standard, and it is not a product construction standard. Rather, it is a standard for how to determine if a safety case is well-constructed. UL 4600 tells you how to evaluate a safety case. It is not how to build the safety case, not how to create the computer driver software, not how to build the product, and not how to test the product.
UL 4600 covers the broad range of technical topics that matter when designing an HAV computer driver. It talks about the engineering techniques you might want to use. But it does not say "thou shalt use technique X." Rather, it says "Did you think of technique X? If you used it, let us know. If you did not use it, let us know why not.” Or “Here is a list of various types of vulnerable road users; did you consider all of them to the degree they are relevant to your system?” Or “Here are the several usual techniques that you might apply. Which of them did you use? And if none of them, what did you do instead, and why was that a reasonable decision?” Also, are there claims being made without supporting arguments? Are there arguments made that are unsupported by either evidence or a stated assumption? And so on.
Most of UL 4600 can be summed up as: #DidYouThinkOfThat?
This means that UL 4600 is not a specification for engineering process as is common for other safety standards. It does not require following a list of engineering steps to create the HAV. Rather, it requires there to be a safety case of some sort, and describes (in detail) a process for evaluating whether the safety case considers relevant topics.
An internal conformance assessment is required that takes into account the technical substance of the safety case. However, that is under the complete control of the manufacturer. So manufacturers have no reason to complain that a conformance assessment will be biased, or performed by people who do not understand the system. They can run it however they like. When their efforts are concluded, they produce a conformance assessment package (UL 4600 section 17.2).
The conformance assessment package defined in UL 4600 provides a written basis for a declaration of self-certification of conformance. No outsider need be involved, and technical discretion is completely controlled by the manufacturer. A regulator might simply require an attestation of self-certification of conformance as documented by a conformance assessment package per UL 4600 section 17.2. That’s a one-paragraph letter saying, essentially “yup, we self-certify to UL 4600, omitting the independent assessment portions.”
That is part 1 of how you can get self-certification of HAVs using UL 4600 right out of the box. The process is all in the standard -- just follow it and sign off. For self-certification the manufacturer does not even have to show anyone the safety case. They just have to say they've done it.
Road tests and regulatory checks
However, we still have a problem. How can a regulator check the homework of the self-assessment? For vehicle tests they get hold of some vehicles and put them on a test track to independently test them as a regular activity. In case mishap reports are made, the regulator can spot-run relevant tests related to problems of, for example, backup camera failures.
But you can't road test a safety case very well if you can’t even see it, since every company is likely to have a different safety case, different operational design domain, different concept of operations, and so on. How in the world do you create standardized road tests that go beyond a simple "stops at stop signs" and "doesn't hit stuff" driving test?
Even worse, you're going to have to re-do any road testing process after every major software update. For end-to-end machine learning, it is easy to argue that all bets are off for safety when you re-train if there are no other safety assurance pieces in place. That means revalidation after every computer driver software update that affects machine learning, no matter how minor. At least for now, we're back to the current situation of every manufacturer being different, often with high opacity on why we should believe any particular software release is safe.
In practice we still end up with a regulator-defined road test, plus ad hoc reactive tests to respond to mishap reports and consumer complaint trends.
Having a simple road test that regulators can run on exactly the same terms as manufactures will help keep folks honest. But it will take a decade to develop and be pretty basic. Simulator-based road tests will be even harder, because the technology is still changing quickly. Perhaps two decades.
What can we do in the meantime?
The ANSI/UL 4600 bank audit
Fortunately there is an answer in the form of a figurative "bank audit" of a UL 4600 safety case.
A brief history: In the development of the first edition of UL 4600, one of the controversial points we negotiated in the voting committee had to do with independent assessments. I'll go into the tradeoffs in a section of my upcoming book, but the bottom line discomfort was doubt that anyone other than developers themselves would be able to vet a safety case. The technology was too cutting edge and the skills base was too small. The solution in that version of the standard was to treat independent assessment as a "bank audit".
The idea of a bank audit assessment is pretty straightforward. For each numbered section of UL 4600 (called a prompt element), the independent assessor would ask "point to where in your safety case I can find the information responsive to that prompt element." If there is a non-blank entry in the safety case where the internal safety team points, that prompt element passes audit. Repeat across all prompt elements for the standard.
This has attributes regulators like to see. It is a well-characterized process ("walk down the prompt elements and check each one is addressed somehow"). It is objective ("non-blank entry"). It is technology-neutral for the product being built. And it is highly repeatable. In fact, you could automate it pretty simply. Have whoever does the safety case for the manufacturer export a spreadsheet/CSV file in which each row has both a UL 4600 prompt element number and corresponding text from the safety case. You pass if all the prompt elements are accounted for and none of the text is blank. Simple as that.
If regulators and industry agree on this approach, the bank audit process could even be added to UL 4600 in a relatively simple and quick process as part of the standard itself.
Details and doubts
In reality it probably isn’t quite that simple, and indeed there are some loose ends to clean up. But that idea will generally work. Here is some fine print.
The 2nd and 3rd editions of UL 4600 give independent assessors more power to exercise discretion. So that gets rolled back for this bank audit version. This can be done by exempting section 17.3 of the 3rd edition of the standard, which gives independent assessors the power to apply judgment to assessing the safety case. (Full conformance would still require 17.3. But regulators could make that optional for regulatory compliance.) And create a requirement that the conformance package has the traceability spreadsheet between UL 4600 prompt elements and safety case elements. So the regulatory requirement is not full conformance to UL 4600, but rather to provide materials for a bank audit on self-assessment.
UL 4600 has a tiered system for deviations, meaning that some prompt elements are mandatory, while others can be waived from requiring a substantive safety effort. A bank audit can be applied without changing the standard at all by auditing Mandatory, Required, and Highly Recommended prompt elements in the traceability table. Recommended elements can be ignored. It is important to keep in mind in particular that Highly Recommended elements do not require technical action -- they simply require an explanation as to why no technical action has been taken if that is the case. From an audit point of view, as long as there is a non-blank description of something associated with that prompt, the safety case passes. That something could be a substantive safety activity description, or it could be “not done because …” As a practical matter, the deviation details do not matter for such a simple audit. Simply look for non-blank text, and ignore “Recommended” prompt elements from the audit entirely.
Of course this approach, as any approach, might be gamed. Some enterprising safety engineer might simply turn a chatbot loose to populate a safety case that addresses all the prompt elements. But then too, some banker might use an LLM to create a second set of books to fool auditors. A simple audit approach won't force companies to be honest or safe if they don't want to be. However, it does force them to write something down, and that something can be checked later.
For safety teams that want to do the right thing for safety, this gives them some regulatory pressure on their side to take the time to really dig into safety and visit all the UL 4600 #DidYouThinkOfThat prompts that are relevant to their system. This can help moderate the inevitable timing pressures would otherwise push the product team to put something on the road with less-than-robust safety assurance.
One check and balance against meaningless safety cases is that a regulator might decide to take a look at a safety case in response to a mishap as part of a recall investigation. If the safety case is nonsense, that might prompt a wider investigation.
Similarly, a product defect lawsuit in the wake of a severe injury or fatality might seek access to the safety case (under a court confidentiality order) as part of the discovery process to see why the designers thought the behavior associated with the mishap was nonetheless acceptably safe. The problem for a manufacturer cutting corners on safety will be that they self-certified conformance. The audit is only a sanity check. The real check only comes later if there is a mishap that presents the possibility the safety case was not as good as it should have been.
On the other hand, a manufacturer with a robust safety case will be in an excellent position to show that they did their homework upfront, and that any issue that arises occurred despite their good faith efforts at safety. This could be a powerful shield against punitive damages in civil litigation. And likely it will reduce the number of court fights overall because the problematic crashes are less likely to happen if safety engineering has been done thoroughly.
We expect HAVs will have continual updates, and those updates might affect the validity of the safety case. Not to worry. UL 4600 section 17.4 covers that topic, including how to think about re-evaluating the safety case after a change.
There are some who will no doubt scream that this idea is regulatory over-reach. That seems to be a standard industry reaction to any proposal for regulators to do anything, so let's just move past the theater on that point and look at the merits.
Yes, this will require some effort on the part of design teams. But companies and regulators alike are already talking about safety cases. I have heard many stakeholders say that everything in UL 4600 matters, and it is simply a time and resources question how much of it to do. So this is not a matter of creating a new substantial engineering requirement on developers, but rather trying to inject some quality control and support for a necessary amount of effort for something they were going to do anyway (subject to available time and resources). This is a way to motivate manufacturers to provide the time and resources necessary for acceptable safety.
UL 4600 is a standard that went through an industry consensus process with input from stakeholders and voting members from around the world. This has happened three times, resulting in the current 3rd edition that encompasses robotaxis, heavy robotrucks, and other vehicles that need to operate on public roads. Any interested party can comment and attend meetings. All the companies in the industry have had multiple opportunities to have their say, and many have. The UL process permits continual updates, so there is no lock-in to any part of the standard companies think becomes obsolete.
The industry has been able to create the UL 4600 safety case grading rubric for use on their own homework. If they find out there is a problem with the standard, they can fix it quickly and cheaply (months and a few on-line meetings, not years and multiple international plane flights as is common with some other standards processes).
Some complain that standards are too expensive to access, and I agree with that. However, UL has made the praiseworthy decision to make the entirety of UL 4600 available to view online for free (see note at end of this article for details).
Some complain that standards conformance assessment is a racket. You have to pay an external party to come criticize your product, and there is considerable concern that outsiders won't understand cutting-edge technology. Or that there will be a race to the bottom that panders to inadequate safety, so as to get more assessment business for the external vendors. Or perhaps both. This proposal avoids that by making the assessment a bank audit. The audit can literally be done by sorting a spreadsheet and looking for missing numbers and blank text fields. More sophisticated tools would no doubt be convenient, but are not required to implement this approach right now.
While this is a US-centric description, I imagine these ideas can fit into other systems with appropriate tailoring.
What next?
OK, so there it is, a proposal to take a step forward on regulating HAVs using an objective, repeatable, technology-neutral, measurable process. We simply require a safety case to trace to parts of an industry consensus standard that is five years old and has matured into its third edition.
What do you think?
The author is the originator of ANSI/UL 4600, issued in 2020. The standard has been updated twice over the years to the currently issued 3rd edition. UL 4600 is maintained by an international voting committee of stakeholders representing manufacturers, safety advocates, regulators, insurance companies, and more.
Here is a UL 4600 quick-start landing page: https://users.ece.cmu.edu/~koopman/ul4600/index.html
This has been a special drop-in post. The next Embodied AI Safety Concerns post will appear as scheduled.
Thanks, Phil.
My suggestion is that we (you, somebody) should develop a straw man UL 4600 audit schedule. There is currently a lot of uncertainty as to the schedule and resources needed to complete a review. Such a schedule in my experience promotes industry acceptance, allowing industry to plan and apply resources as needed. For example, assuming that the first day is day zero, such a schedule might include:
Day -14: one day on site meeting to introduce audit team leaders, audit team membership and qualifications, discuss logistics (meeting room, needed computer and network resources, parking, passes, lunch payment and preferences,, etc.) refresher on pass/fail criteria, expectations for developer expert participation, process for closing liens on open items at conclusion of on-site meeting if any, process for appeal of adverse audit team decisions, allowable audit team expenses, etc.
Day -0: Audit team arrival on site, confirmation of logistics arrangements.
Day 1 0900 - 1030:Introductions, orientation by audit team, orientation by developer leadership
1030 - 1300 section A-B
1330 - 1600 Section C-D
1600 - 1630 Internal audit team summary preparation
1630 - 1730 Review results with developer, enumeration of closed items, liens on open items.
Day 2-9, 0800 - 0815: Opening remarks by Audit lead
0815 - 0830: Opening remarks by Developer Lead
0830 - 1630: Safety Case presentations
1630 - 1700: Status review and open item/lien identification
Day 10 0800
1300 - 1645: Presentation of results to developer team, identification of open items and liens, developer appeal of adverse determinations, residual process to delivery of compliance certification
1645-1700 Final remarks and adjournment.
This is only a brief sketch, and there is clearly more to be done, but there has got to be a way for developers seeking UL4600 certification, and companies desiring certification as UL 4600 auditors, to bound the scope and marshal necessary resources for a successful audit outcome. UL4600 audit process definition might be a prerequisite for industry acceptance, but in any case couldn't hurt. The process sketched above would be familiar to companies that have already had, for example, an ISO 9000 audit. It seems to me that the UL 4600 document alone is a fantastic framework for structuring the AV developer safety case, but absent widespread history of audits (there aren't any yet by companies documenting their competence) and we need to start somewhere) its (especially early)implementation may benefit from better definition of the corollary expected audit process and closure requirements.
I think that such a plan would alleviate apprehension about developer resources and schedule and their reluctance to engage with UL on UL4600 certification.
Day 9 0800-1200 Audit team review and synopsis, identification of open items and liens