To Lidar or Not To Lidar

That is the trillion-dollar question. With a special guest appearance by end-to-end machine learning.

Oct 19, 2024

Is Tesla making the right bet in avoiding Lidar? Are the other robotaxi companies making the right bet by including it? It is not just about the Lidar. Robotaxi success hinges on an interrelated set of engineering tradeoffs involving sensor cost vs. fleet scale vs. operational area vs. machine learning strategy. If you want to pick a winner, you need to understand the complexities involved in the game.

GenAI picture of a robotaxi with Lidar. DALL-E 3

As a brief recap of how we got here, Tesla avoids Lidar, and calls it a “crutch.” Over time they dropped radar and ultrasonic sensors too. And they say they avoid expensive-to-maintain high definition maps of the roads they drive on. At the same time, they promise go-anywhere autonomy next year. (This time for sure!) When they eventually deploy it will be an overwhelming win, because as currently envisioned their deployment will be essentially everywhere all at once. With a high degree of uncertainty as to when “eventually” might be. They are swinging for the fences.

Other robotaxi companies such as Waymo are taking a different, more incremental approach. Let’s call them the “Others” for simplicity, realizing that each company has its own approach. The Others use Lidar, as well as high definition maps, radar, and other sensors. The leaders among them have already gotten to the point they can operate on public roads driverless. However, they are limited to specific locations and environmental conditions.

Everyone wants to know who will win. I have my prediction below. But understanding what goes behind anyone’s prediction is far more important than yet-another-hot-take-on-Lidar.

Big fleet vs. small fleet

The important thing to realize is that Lidar vs. non-Lidar is only asking a part of the question. And probably not the most important part, at that.

Regardless of the path to how we got here, the industry is largely broken up into two camps at this point:

Tesla: Non-Lidar, large data-gathering fleet, no geofence, end-to-end machine learning
Others: Lidar (and other sensors), small data-gathering fleet, incremental deployment, machine learning+other source code

At the next level of detail, the comparison goes as follows. (This is simplifying, especially for some of the Other companies. And real-world solutions tend to be in the middle of the extremes. So this is a broad-brush-stroke look to emphasize the tradeoffs at work.)

Tesla has a huge fleet size. They can’t afford to put expensive sensors in each vehicle due to cost pressure if they want to make a profit. Indeed, radar and ultrasound have been removed over time to save cost. And their cameras are automotive-cheap rather than the best that money can buy. That means they have a platform that is by comparison sensor-poor. Moreover, their system must work across the country (and ideally many countries) or they restrict their customer base.

But there is some very powerful good news here. Customers are paying for each vehicle being used to gather data! And at this point they have accumulated a lot of customers, many of whom are eager to be volunteer data providers. That gets Tesla a huge number of miles on public roads, essentially for free.

The challenge is that because the data sensor feeds are comparatively impoverished and they want to be able to drive everywhere, they need to make smart use of as much of that data as they possibly can. There are far too many miles to touch them all by hand, for example via labeling objects in each image frame. Some manual analysis was done to get started, but that just won’t scale.

Ultimately this ends up pushing Tesla into end-to-end machine learning (E2E) trained via reinforcement learning. With E2E machine learning, the computer driver takes camera images as inputs in one end and produces driving commands (steering, acceleration, braking) out the other. When an E2E system is trained via reinforcement learning, an enormous computing complex is used to iteratively improve the computer driver to act in a way statistically similar to a (hopefully good) human driver. No manual labeling of images. Camera data goes in one end, and good/bad behavior is judged at the vehicle control outputs. We’ll revisit why they are pushed there a bit later.

The Others have a dramatically smaller fleet size, often having tens to hundreds of vehicles. Developmental miles are expensive, requiring trained, paid safety drivers in each vehicle until the system is highly capable. The vehicle hardware cost is not cheap, but is not so huge in the context of per-mile test driver and total engineering costs. So they can cover their cars with sensors without too much effect on the total developmental cash burn rate. That generates great data. But not nearly as many miles of data as for Tesla.

The architectural approach by the Others is using chunks of machine learning for the things that are difficult to program, such as perception (“is that thing a pedestrian?”). Those chunks of machine learning are tied together by other, programming-intensive techniques. That makes it easier to create special rules, deal with exceptions, and check the correctness of some parts of the computer driver’s functionality.

However, creating this type of system tends to take a larger programming team. Beyond that, labeling data to teach the machine learning functions (“that thing is a pedestrian, but this other thing is a bush”) can additionally take thousands of full-time personnel.

The good news is that this approach can squeeze a lot more value out of the less plentiful but comparatively higher quality data that is available. And this approach can be more adept at responding to emergent surprises by updating behavior for some particular situation via human-written source code instead of the much more cumbersome process of retraining a huge end-to-end machine learning system.

For now, the Others’ approach seems to be winning. There are robotaxis operating in multiple cities, while Tesla still requires drivers to intervene when their software makes a mistake.

End-to-end machine learning

While it didn’t really start this way, it seems that the crux of the difference will boil down to who is using end-to-end machine learning. A decision to have a large fleet with less capable sensors means that Tesla must have a highly scalable way to squeeze value out of the data in an almost completely automated way. Trying to scale other techniques such as automated labeling is going to run into problems with edge cases that are known to neither the computer driver nor the auto-labeling system. The path to avoiding those problems guides Tesla to an end-to-end (E2E) reinforcement learning approach.

Revisiting a purely E2E approach, data from cameras and any other available sensors goes into a machine learning system at one end, and driving commands come out the other. Training consists of looking at some camera inputs and seeing if the vehicle is commanded to go left/straight/right and slower/faster. Good vehicle behavior gets a higher score, and bad behavior gets a lower score. For reinforcement learning, that score is a comparison to what a human driver actually did in response to that data. The E2E function is adjusted (via a machine learning mechanism) to improve the score. Repeat that training cycle enough times and the E2E learning process eventually produces a driver that can behave remarkably well for situations that are well-represented in the training data. But bring training data. Bring LOTS of training data. But perhaps that’s OK, because the one thing Tesla has access to is lots of data.

The heavy-tail challenge

The biggest challenge for any type of machine learning is that behavior tends to be difficult to predict in scenarios that are missing from the training data. Rare events can lead to behaviors that would be said to lack common sense if they were coming from a human driver.

We know that to be successful, robotaxis need to be acceptably safe. The catch is that safety outcomes tend to be dominated by rare events with high consequence. At a ballpark rate of one fatality per 100 million miles for human drivers as a comparison baseline, a one-in-a-million high-risk edge case event is far too frequent to ignore. And there are many such events that have to be dealt with (this is the so-called heavy tail ceiling). That means any machine-learning based training needs to incorporate hazardous situations from least billions of miles into its training data to have any hope of covering enough heavy tail edge cases – and perhaps a lot more.

This would seem to give the advantage to Tesla, because they can get a lot more miles cheaply. However, their miles are less effective for training, because they have lower sensor quality (just mediocre cameras rather than much higher quality sensors), and most of their drivers are not intentionally looking for hazardous rare event edge cases to catch on camera the way a trained test driver might. A billion miles of raw road data might not be enough to uncover what they need to see.

Even worse for Tesla, plain vanilla E2E machine learning is a data hog. One would expect to need a LOT more data to train an E2E system. The difference is that for the Others you can break the data up into chunks for training: that’s a person; that’s a red light; that’s another vehicle, with each individual object recognition function individually validated. And that validation being separable for different functions, such as path planning and vehicle control. For an E2E approach all you know is that a picture with a person, a red light, and a vehicle resulted in the computer driver stopping – but not whether the vehicle stopped for any of those things rather than some other unrelated reason such as a missing manhole cover. One expects it will take much more data to train and validate something without the ability to understand what’s going on inside an opaque E2E function.

Even worse, the Tesla E2E system needs to work everywhere. “Everywhere” has a whole lot of edge cases, further increasing the data needs to cover the heavy tail. By comparison, the Others can limit their exposure for at least some types of edge cases by operating in more restricted locations and environmental conditions. This is especially true for the robotruck gang who are betting that interstate highways in the middle of nowhere have a much more manageable number of edge cases than the urban cores travelled by robotaxis.

The net result is that Tesla probably needs much, MUCH more data to put a safe robotaxi on the road compared to the Others. Whether Tesla’s operational fleet size gives them multiplier big enough to get there is an interesting question for which we don’t have enough information to be able to judge objectively.

A different bet: There is another path, which I understand is the trail being blazed by Waabi and likely others as well. If you have end-to-end machine learning you can get that huge amount of road data from a simulator rather than vehicle miles. But that just kicks the search for edge cases over to ensuring your simulator can come up with all the ones that matter. It seems they have a plan for this based on generative AI coming up with the edge cases automatically. I expect that is a bit too optimistic, but perhaps it would work for very limited deployment scenarios that they envision. The merits of that approach end up being a whole different discussion.

My Prediction

So how will this turn out? Let me look into my crystal ball…

Revisiting the title, these tradeoffs aren’t really about the Lidar. The tradeoff is about how you try to build robotaxi software with a lot of less capable sensor packages vs. fewer but more capable sensor packages driving on the roads. That leads to a set of tradeoffs that ends, at least for now, in a choosing whether to use E2E machine learning or a more segmented architectural approach. Lidar is just the first question in a much bigger, more complex, intertwined set of design choices.

Tesla is taking a higher-risk/higher-reward path. My personal thoughts are that even with all the vehicles they have they won’t have the miles they need to get an E2E approach working any time soon. It feels like it is still many years away for them. We’ll know we are getting closer when human driver interventions are so rare that social media gives bragging rights to Tesla fans who manage to find a brand new edge case, rather than bragging rights going to fans who manage to capture a single ride without intervening on video.

Meanwhile, the Other robotaxi companies will continue to grind away at expanding their operational areas and conditions over time. I expect that the Others will find out whether a robotaxi business using a computer driver is viable a long time before Tesla gets to the point of being able to launch one. But, as they say, Your Mileage May Vary.

Roy White

Jul 19

The tradeoff is also compute power vs sensors. Maybe, just maybe, Tesla can get to FSD with only cameras, but at what cost in time and compute? Tesla has already had to upgrade for free those with Hardware 2.0 or 2.5 who purchased “FSD”. Musk has said those with Hardware 3 may also require an upgrade TBD and if so I assume it has to be a free upgrade for FSD purchasers. Even if camera-only is the ultimate goal, you can make a pretty good argument that in the early training and reinforcement-learning development phases the additional sensors provide multi-modal feedback that makes the camera approach better and cuts training time. And the argument that human drivers use their eyes without radar / lidar is flawed because humans make a lot of mistakes and have a lot of accidents. Yes you can fly a plane on VFR, but the ones with radar and a bevy of instruments are much safer. There’s a time to use a “Cortez burn the boats strategy” to inspire technology advancement. But there’s also times when you are being obstinate just to be “Right”. The camera-only approach of Musk seems closer to the latter.

Expand full comment

Jack

Nov 23, 2024Edited

Author was being kind to Tesla. They will never have viable Robotaxis without LiDar. Look at poor weather events that can fog or make a camera lense blurry. How about terrain that is changing or dynamic and construction zones. Having deep data is not going to help much in those environments. Many have been killed driving their Tesla because they trusted its self-autonomy tech. Lending Tree study revealed Tesla has highest crash rate of 30 car brands.

1 reply by Phil Koopman

3 more comments...

Autonomous System Safety by Phil Koopman

Discussion about this post