What’s Under the ‘Hood’ of Self-Driving Cars?

Opinions expressed by Entrepreneur contributors are their own.

Waymo, a unit of Google’s Alphabet Inc, and Yandex Self-Driving Group, a division of the Russian-based Yandex corporation, are among more than a dozen leading names in automated vehicle software and hardware. The former recently launched a pilot self-driving taxi program in San Francisco, the latter has been testing its automated cars worldwide for the past several years and has self-driving rovers on several college campuses in the U.S. Not to be outdone, the Chinese tech company Baidu Inc., along with the Toyota-backed Chinese self-driving startup, Pony.ai, is set to debut a 100-car fleet of paid driverless taxis in Beijing in 2022, and has preliminary plans to launch a similar program in California in the same year (see link below).

The main challenge to such vehicles’ broader applicability, however, is training the AI that powers self-driving to the point of flawlessness.

It’s all about data

Software and data engineers train the algorithms that lie at the core of artificial intelligence, and for self-driving cars, data relating to the roads and driving conditions on and around them is key for navigation. To do so, AI must first learn what this data means, and this is where “data labeling” plays a critical role. Labeling data (essentially attaching a meaning to it) is the first step in creating this kind of full-fledged AI. Such labels, or meanings, must be informative, discriminating and independent. They also need to be precise and correspond to reality (the ground truth), which is why data labeling is a human-based process — often a tedious and arduous one, requiring thousands of people to examine and annotate (to label a vehicle as either a truck or a car, for example, or to distinguish among traffic light colors).

Imagine a self-driving car spotting a vehicle with a bicycle on top of it. It has detected an object, yes, but is it a bike or a car? Is it both? And most importantly, how should the system behave in response to it? This is where human help becomes indispensable, since only people can train computer vision models (the self-driving car’s “brain”) to properly detect complicated objects. They are also the ones fine-tuning AIs so that the latter understand different landscapes and avoid issues. Yandex self-driving cars, for instance, needed to label additional images from Las Vegas roads, such as traffic lights, which might be blinking yellow, unlike traffic lights in other countries. This way, Yandex helps its cars “wrap their head around” the town, one with little resemblance to Moscow, where they’d been originally “trained”.

Data labeling types for self-driving cars include segmentation, 2D bounding boxes, lane marking, video tracking annotation, point annotation and 3D object recognition — each requiring careful treatment, as together they teach AI to understand what’s happening on the road. The more often data is labeled accurately, the faster the AI will pick it up, produce patterns, and avoid generating a distorted picture, which impedes them from differentiating safe from unsafe situations.

The labeling rush

Sudden high demand for data labeling gave rise to a host of new labeling services. Estimated to reach $8.22 billion by 2028, what’s referred to as the “data annotation” market offers a number of services to scale, with the aim of making the process increasingly quick and affordable. They include Toloka, Scale, Mighty AI, Appen, Cloud Factory, Amazon’s Mechanical Turk and many others. Yandex Self-Driving Group, for example, used Toloka to collect data in an expedited manner, saving multiple millions of dollars on its annotation. By now Yandex’s driverless cars have traveled more than 10.5 million miles. Another such company is Scale, a San Francisco-based startup that has created significant buzz in Silicon Valley. In 2018, the company secured $18 million to label raw data from clients such as Lyft, General Motors, Zoox, Voyage, nuTonomy and Embark. Scale’s mission is to improve built-in AIs by reviewing images, radar, and lidar data from cars alongside other sensor data to better identify objects on the road, including pedestrians and cyclists.

How mainstream will self-driving become?

Although self-driving technology is making enormous leaps forward, admittedly, it needs a lot more data to go mainstream. Despite launching driverless taxis in San Francisco and floating the idea of expanding to trucking, logistics and personal vehicles, Waymo is still putting a driver behind the steering wheel —a legal requirement as well as an acknowledgment that this technology is in the early stages of development. Yandex Self-Driving Group, meanwhile, is gearing up to launch an unmanned taxi program in Moscow this winter, making it possible to ride a robotic vehicle within certain areas.

Ultimately, much depends on building stable pipeline systems with automated quality control. If they become ubiquitous, this burgeoning industry could become commonplace in as few as ten years.