Self-Driving Cars: Comprehensive Guide

The automated car is just around the corner, But how does the technology work? Who is building them? And how will they change our world?

Tesla, GM Cruise, Google’s Waymo, Comma AI, and many many more companies are working on automated vehicles. But who is likely to have the best system? To answer that we need to understand the technology they use. But this isn’t your normal overview which just tells you that they use LIDAR, Computer vision, and radar. Instead you will learn what these things are and how they work. So you can decide who is going to build the best self-driving car.

First up, before an automated vehicle can do anything it needs to know where it is. So let’s start by blasting into space and talking about what perhaps is the most underappreciated technology of the modern age, the Global Positioning System.

GPS is made up of 24 satellites orbiting over 12500 miles above the earth. The system was developed for the US military to enable accurate nuclear missile delivery from submarines. While it still is a military system, it was opened up for civilian use in the 1990s. But how does it work?

These satellite are constantly moving in orbit. As they move they broadcasts a radio signal that contains the satellite’s location and the time the signal signal was sent.

A GPS receiver on earth fixes it’s location by analysing signals from four or more GPS satellites. One satellite’s signal is used to synchronise the time of the receiver. While the other signals are for location fixing.

The location is gained by finding the time difference between the receivers clock and the time embedded in the satellite broadcasts.

It works a little like this: the receiver says “the exact time on my clock is 11:42.33, and I have received a signal which says. “This signal was sent at 11:42.30 from some precise coordinates above the earth.” Since the speed of light is known, it finds the difference between the time on its clock and the time embedded in each satellite signal it receives. Then it multiplies each time difference by the speed of light. That gives it the distance from it to each satellite.

The receiver now has the exact location of three satellites and the distance to each of them. From this, using a process called transliteration it can plot its own position on a map.

Now the car has it’s location figured out, it aligns its direction with it’s built in compass. But it still doesn’t know much about where it is. This is where the approaches of different self-driving companies begin to differ. Google’s Waymo, GM Cruise, Honda and many other companies use a technology called LIDAR, which is short for ‘light detection and ranging’. They use LiDAR to build High-Definition 3D maps of the world and to gather distance data while driving. Tesla and Comma AI don’t use LiDAR, but we will talk about them later.

So what is LIDAR? One way to think about LiDAR is that it’s a measuring tape, but instead of using tape, it uses an infra-red laser. Imagine a point that a laser pen makes. That point is light from the laser which has scattered off an object and returned and hit your retina. LiDAR uses this same idea.

The most common form of LiDAR, use a spinning mechanism that flashes the laser on and off thousands of times a second. Each time the laser fires, it measures the time between the light leaving and returning to the sensor. Since distance is just speed multiplied by time, the distance to the laser point can easily be calculated.

The LiDAR can’t function with distance data alone, it needs more information. It needs he angle of the beams it sends out, the precise altitude of the car, as well as the car’s pitch, roll and yaw. Therefore for the LiDAR to work well it needs precision gyroscopes. But it also needs to have a accurate GPS,

All this information is combined and stored in a point cloud. Which is a giant table which has the X, Y and Z co-ordinates of all the points that he LiDAR detects. The LiDAR can also detect the return signal strength, so a prediction of what each object is made out of can be made. This is shown by the colours of the point cloud images.

These points clouds can then be built into HD maps or they can be used to give the machine precise distances to what the cameras see.

LiDAR is excellent for night-vision because it has it’s own infra-red light source. So the automated car will always spot that pedestrian or cyclist dressed in black at night. That level of certainty doesn’t exist with cameras or your eyes. Andother advantage of LiDAR is that it is less likely to be blinded by bright sun or rapidly changing exposure. An extra sensor also provides another layer of redundancy to the system which is good for safety.

However, this redundancy is expensive, currently thousands of dollars per device. That price will reduce in future as technology improves and solid state devices are commercialised. Perhaps to as little as $100 per unit. But will LiDAR make cars significantly safer in the long term? Only time will tell.

So why don’t Tesla and Comma AI use LiDAR?

Elon Musk argues that LiDAR doesn’t provide much benefit because its wavelength is so close to visible light. Further to this he says that humans don’t drive around with lasers shooting out of their eyes, so why should cars?

Humans can figure out the distance to objects and their sizes from just vision, so we should be able to teach machines to do the same.

While human drivers function entirely on vision, that doesn’t mean that cars need to. Adding additional technology like LiDAR and radar enables the machine to gather more data and potentially spot the unexpected.

We won’t know for a few years which approach is right. But it goes to show that there isn’t one solution to automated cars and we are likely see many different systems succeed in the coming years.

—

GPS and LiDAR give the car information about its location, but for the car to drive it needs to understand what is happening around it. While LiDAR, radar and ultrasound sensors help with this, vision is key, which means the car needs sight from cameras.

To understand the complexity of this problem let’s think about what your brain does. It converts light, electromagnetic waves, hitting the rods and cones in your retina to electrical impulses, these travel down the optic nerve and are processed in the brain. Somehow in that blob of tissue, electrical impulses are converted to images and you understand them at a high level. You can name almost any object in your environment, and know what it does, how you expect it to move, and how much of a risk it is to you. This is the consequence of hundreds of millions of years of evolution.

The computer has none of that ability, initially at least. So some clever programmers have to teach it. Which is a hell of a lot more nuanced than you might initially think.

Look at this video of cameras covering every direction around the car. You can see what directions those images correspond to. And you could predict that if you were being overtaken by someone then that person would appear first in the rear camera, then the side and finally the front. The self-driving car doesn’t have that ability, it doesn’t even initially understand what an object is.

So before the car can do anything it needs to learn to classify objects, to say what is a car, what is a bike, a sign, or a pedestrian.

Let’s imagine that a programmer wants to teach a machine to recognise a car. To do that humans select and label thousands of images of cars. These are fed into into an image recognition algorithm. Through machine learning, the algorithm finds commonalities in the pictures and begins to make it’s own rules to define what a car is.

Perhaps the best way of understanding this is with a real example. When Tesla was teaching their cars to stop at junctions they found that their cars weren’t great at identifying all the different stop signs in the real world. Some are hidden behind bushes, others might have lights surrounding them, often they are faded or damaged.

To solve this, Tesla wrote some code which made all the cars in their fleet send back images of things that look like stop signs. Of the returned images about 10% were actual stop signs. These were then sorted and labelled by humans and used to re-train the model.

That new model with massively improved stop sign recognition was then rolled out to the fleet. This

technique works for anything from cyclists to cones. But the point is people don’t tell the machine what something looks like, instead they show it labelled images and the computer makes its own identification rules.

Knowing what things are is a classification problem, but there are plenty of other problems that need to be solved. Because an image is 2D, that means there is no depth data stored in it. Cruse, Waymo and other LiDAR using companies get depth data by combining point cloud information with video data. But what if your car doesn’t have LiDAR? Somehow depth needs to be extrapolated from this flat image.

Tesla extracts depth using a neural network. The neural network isn’t trained on the fly, but back at Tesla HQ, once it has been trained it is then run on the cars. They train it by taking stills from video and making the computer predict what distance the objects are from the camera. This happens with every pixel in the image, so now instead of a flat 2D picture, the neural net creates a rough 3D model. In the next step the neural net grades its predictions.

Since the video is from a moving car the images are constantly changing. So what it does is take its first predicted depths and think ‘if I’m moving at this speed, what would these depths look like in half a second’, it then adjusts its predicted depths to figure that out. So now it has a whole bunch of new depths for each pixel.

This isn’t that useful yet, so then it takes the 3D model it made and converts it back into a 2D image. That new generated image is compared to what is called the ‘ground truth’ which is the real image the camera recorded. Like a good teacher the machine learning algorithm says ‘I got this pixel right, but this one wrong’ then it finds out how much it was wrong by for each pixel. Then using that information it adjusts its neural net and repeats the process again but with new images. By repeating this process millions of times it ends up getting it right almost all the time. So it has learned to get depth from 2D images! In this process they use self-supervised machine learning where program checks it’s own result with no human input

Once the car has the depth of images, a different neural net takes the images from all the cameras surrounding the car and stitches them together. After that, more information like how curbs connect up, where traffic lights are, and how junctions function are extracted. This information is transferred into a birds-eye-view map, onto which the car positions itself.

This is different to Waymo and Cruise who use pre-made HD maps to locate their car. So they know precisely where curbs, traffic-lights, and junctions are. Tesla does store some map data, but this is simple and only specifies the location of features of interest. This in contrast to the HD approach that has a centimetre accurate models of the world.

High definition maps allows cars to predict the precise route, including lane changes many junctions ahead. So the car doesn’t get caught out in the wrong lane at the last minute. But they take up a lot of space in memory, and may hinder the car’s abiilty to think on the fly.

A couple of other sensors that automated cars use to understand their environment are ultrasound and radar.

An ultrasound system emits a high-frequency sound wave with a speaker. That wave bounces off objects near the car and is re-detected. The time difference between emitted and received waves can be used to calculate the distance to objects surrounding the car. This tech is ideal for detecting things close to the vehicle that are difficult for cameras to see.

Automated vehicles also use radar. Like LiDAR radar uses electromagnetic waves to determine the distance to objects. However, the waves it uses are tens of thousands of times longer than LiDAR’s waves. These can penetrate fog, rain and bushes, allowing the car to observe more of the world.

With radar the car can see through the dense snow of a blizzard. Radar therefore makes crashes much less likely as obstructions on roads can be detected hundreds of metres before they are visible.

Once the information from all the sensors has been detected and sorted, the car can begin to plan how to drive its route. To do that the vehicle needs to predict how other road users behave in complex situations, like busy junctions at rush hour. As you might have guessed, machine learning algorithms are used for the predictions, these look for environmental cues like cars indicating, which lane vehicles are in and what direction they are pointing. From its observations the car predicts forward in time what might happen around it. It might observe: “The car ahead is indicating left and in the left hand lane, I am 99% sure it will turn left, but there is a small chance it might do something else”. Of course the car doesn’t really think in words, that is just a helpful analogy.

Patterns in driving behaviour are found by neural nets analysing millions of miles of driving data, we now know this can’t simply be coded in.

One useful way to show this is to describe how identifying parked cars has changed over the years. Humans use to define a car being parked ‘when a tracked box that surrounds the car doesn’t move over repeated video frames.’ But now, humans don’t define it, instead neural nets using supervised learning make their own rules for what defines a parked car. This is done by showing the machine many labelled examples of parked cars, from which it finds commonalities between the images. This new approach uses reinforcement learning, where a neural net makes its own rules of what cars do completely independent of humans.

When the machine finds its own patterns to complex problems it can improve over time. So this is what all developers are moving towards. The only difficulty with this approach is that it uses an incredible amount of computing power and examples to train the neutral nets.

One thing to think about before we move onto the next section is the sensors humans have. We drive using a couple of eyes on a pivoting and rotating head. That ins combiend with a lot of processing power. The point is, the in future the self-driving car will be the one with the best brain, not not the best sensors.

So now you have an idea of the complexities of the problem lets talk about how different companies are solving self driving.

—-

First up is Waymo.

Waymo aren’t rushing things and they are taking a safety first approach. Their mission is to go straight to level 4 self driving, This means the car drives itself in specific areas with no human. With level 4 driving passengers can sit back, relax and don’t have to jump for the wheel.

As of April 2020 they had driven more than 20 million miles on public roads, and 10 billion in simulation. Their automated vehicles have been in Phoenix for years and now the public can hop on board by downloading an app. Though the cars are driving themselves they can’t be said to be fully autonomous as there are still people monitoring the fleet’s actions and intervening when the car doesn’t know what to do. This make sense as it’s better if the technology you’re testing doesn’t crash.

So let’s take a look at Waymo’s most recent vehicle. They are using modified Jaguar Ipaces, which are electric cars. These vehicles are fitted with an insane number of sensors: Twenty nine cameras are dotted around the car, some see close and others up to 500m away. Five LiDARs made up of a 360 degree main unit with a 300m range, as well as four close range units fitted at the car’s corners. Six high resolution radars, complete the suite and give Waymo an incredible amount of data to work with.

The car doesn’t just record this information to see its environment. This information is gathered so Waymo can build a model of the world. By simulating the world Waymo can trial updates to their self-driving software without risking crashes. To create the best possible simulation Waymo are watching other cars and seeing how they behave. This behaviour is used to create ‘agents’ in their simulations. Agents in simulations may be aggressive or passive, they may run red lights, or cut in without a warning. The point is to make sure that the automated car reacts safely to all these scenarios.

Waymo is fairly quite about what their plans for the future are, but needless to say we expect them to extend their taxi service into new cities shortly. In addition to cars they are currently planning to automate trucking by collaboration with Daimler. Initially, it’s likely that these vehicles will run along specific routes, between distribution centres for example, rather than complete every day trucking jobs.

—-

Let’s talk about Tesla next. Elon Musk initially hoped to have full self-driving solved in 2017, but as most followers of Tesla and SpaceX know, Elon time can be a little ambitious, but he gets the job done.

Making any video which focuses on the specifics of what Tesla is doing is challenging because by the time it’s finished Tesla have updated something and the video is out of date! As of the time of writing, Tesla has their Full Self Driving beta rolled out to a group of testers in the US. And they are updating the builds weekly. So lets look at their self-driving tech starting with hardware.

On each car is a forward facing radar with a 560ft range, twelve ultrasound sensors that can detect out to 16ft, and a total of eight cameras. Pointing forward are three units, one is wide angle and can see out to 60m, a mid-range one that can see out to 150m, and a final long range with 250m range. These are coupled with four side facing and a single rear facing camera. Tesla also measure wheel movement, steering angle, and inertia, to make sure the car knows its location when it loses GPS connection.

Compared to most of Tesla’s competitors their sensor suite is sparse. This is because Elon believes the primary problem of self-driving to be a software problem. So the only thing that adding more complex hardware does it add to the cost, required computing power, and the development time. Their approach is to use the minimum hardware to solve the problem. Many other manufactures will be watching to see whether or not their no LiDAR minimal computing power approach works.

Tesla are so committed to saving money on hardware that they hired legendary chip designer Jim Keller to design their car’s processors.. By employing Keller and taking the the design of their hardware in house Tesla has been able to develop a chip that can be retrofitted into older cars to give them the processing power to achieve full autonomy.

While Tesla’s custom hardware is cool, their biggest advantage over everyone else is their fleet size. Earlier in this video we said that data is key to self-driving. The more data a company has, the more information they have to train their machine learning models on. These learning models don’t run on the car. The learning is done back at Tesla HQ on supercomputers. What the car’s hardware is doing is processing images, labelling and reacting to the environment according to the rules which the software dictates.

An additional advantage of having a large fleet is that Tesla can test their updates in what they call ‘shadow mode’. This basically means that their self-driving program runs behind the scenes and checks what it would do against what the human is doing. If it and the human disagree on something then that information is recorded and sent back to Tesla where they can adjust their models.

No other company can do this on the scale that Tesla does, so they have an enormous advantage when it comes to data gathering and finding edge cases. This might give them the advantage they need to fully solve self-driving first.

A car company developing autonomous vehicles that you may not have heard much about is Honda. Despite their quietness, they look likely to have the first regulator approved level three self driving car on the roads. This means that the car is in complete control and there is no expectation that a human will talk control. This technology called ‘traffic-jam pilot’ will only be available on highways initially, But it goes to show that just because someone isn’t publicising their self-driving tech, doesn’t mean that they aren’t working on it.

Another self-driving player is GM Cruise.

Cruise like Waymo and most other automated car programs are planning on using a LiDAR, camera and radar sensor suite. One of Cruises most interesting developments is their Articulating Radar Assembly. These rapidly rotating radars sit on the car’s wing-mirrors and swing to monitor risky areas while the car executes turns. These radars give precise speed and distance data of oncoming vehicles in thick fog and rain. And thus provide an extra layer of safety to the vehicles. Interestingly, these were developed and proven in simulations before they were tried on vehicles, saving significant testing time.

Earlier this year Cruise revealed their vision of the future, the Origin self-driving vehicle. This quirky concept has been designed to last one million miles with minimal maintenance. Without the driving seat, pedals, steering while and centre console, the car is much more spacious and allows passengers room to stretch out. Unfortunately you won’t be able to buy it as it will only be used by their self-driving taxi service.

To make sure their Origin vehicle is fit to drive with humans they are designing their software not to be pushed around by human drivers. Their self-driving vehicles are learning that if there isn’t a gap then you’ve got to do what human drivers do and signalling intent and force other divers make a gap for you.

Figure 3: https://www.extremetech.com/wp-content/uploads/2020/02/Cruises-test-vehicles-have-as-many-as-twenty-radar-and-nearly-a-dozen-lidar..png

Last up in our list of companies we are looking at is Comma AI.

Comma AI and Openpilot, was founded by George Holz the first person to crack the iPhone. Holz thinks that you shouldn’t have to buy a car with self-driving technology pre-installed, instead you should be abl to buy the kit and fit it to your car. So that’s what they have developed. A $1000 device that attaches to your windscreen and plugs into your car which can drive it.

Their current hardware the Comma 2 is essentially an augmented cellphone, it uses the forward facing camera for vision, the processor for compute, and an infra-red driver facing camera for monitoring attention. Holz insists that driver monitoring is essential at this time, but he doesn’t want it to be intrusive. So they have built a algorithm that assess road conditions, and makes a decision on whether the driver needs to pay attention, rather than alerting the driver at regular intervals.

Their slogan, “make driving chill” encapsulates what they are trying to do. They want to take the stress out of driving, rather than provide an autonomous taxi service. To achieve this they plan to deliver intermediary milestones, that is real tech that you can buy and fit to your car.

The way Comma AI keeps you in lane is pretty impressive. It doesn’t look at lane lines, instead the AI been trained end-to-end. Meaning it has learnt to stay in lane by observing videos of people driving and finding its own rules, rather than being told to ‘look for these lines and stay between them’. That means it isn’t tricked by bad lane markings and should function well even in the snow!

There are many other players in the automated car world beside the ones we have talked about, AutoX, Zoox, Uber, BMW, Aurora and of course Apple who are testing in secret. But we don’t have enough time to talk about them all.

So how will automated cars change the world?

In the US human error contributes to 95% of car accidents. Three of the most common causes of crashes are alcohol intoxication, fatigue, or driver distraction. If automated cars reduce human error then roads would become safer. People would sleep through their daily commute, improving their health and lowering stress. So automated cars might have a secondary benefit of reduce stress induced deaths, like heart attacks and strokes, while also saving lives by eliminating human error.

But initially at least this might be a careful balancing act if automated cars are too cautious then they may induce accidents by irritating other drivers. We all know that following slow or bad drivers is infuriating. Self-driving companies will have to be sure that their cars take adequate risks and aren’t too cautious.

In the early days of self-driving the novelty of the vehicles may be an issue. Already, some people a little too intoxicated have been testing GM Cruise’s cars. It’s likely that this will continue for some time, and as pedestrians learn that cars will automatically stop if they walk out in front of them.

Automated vehicles will also affect industry.

Fifteen years ago, taxi drivers were local experts, they had to know all the streets in cities and the quickest routes around town. But now with Sat navs anyone can be a taxi driver. The next step will be the driver being removed.

Long distance taxi drivers would quickly be rendered obsolete. An automated car doesn’t have to go home and sleep. And instead of being confined to one geographical area, cars could migrate around countries. We might even find states raising speed limits or building exclusive self-driving highways as confidence in the technology grows. It might not just be taxi drivers losing out but also short haul airlines.

But it wouldn’t just be long distance taxi drivers in trouble, truck drivers, delivery drivers and many other industries would be effected. But it wouldn’t be a clear cut case of these jobs disappearing. Many delivery drivers aren’t just drivers, they knock on doors and get signatures, or unload product off the truck, automating the human tasks isn’t straightforward.

So will automated vehicles won’t replace everything. There will always be jobs like house removals, helping people with wheelchairs, or moving train locomotives on low loaders, which require a human touch. These are the jobs which combine driving with something else. So when it comes to long distance driving perhaps what we will see is the role of drivers being altered. The future for long distance truck drivers might become security, more than driving.

Over time automation will ingress into the design of vehicles. We can expect to see narrow electric vehicles, fitting side by side in a lane, wizzing passengers around cities instead of having wide cars blocking up the place. Gone will be the driver, the petrol fumes and the noise. But how far away from this world are we? Only time will tell.

The success of autonomous vehicles, like everything, will eventually come down to cost. With software you can automate hundreds or thousands of jobs with one computer. But automating a vehicle only takes one job, the driver’s. Therefore, for a platform to be successful it has to be cheap, which is a huge win for customers as we can expect the technology to become very affordable in the coming years.

Author: John Ewbank