Tesla’s AI Day revealed a turning point in the development of autonomous vehicles and neural networks that will have profound implications for our future. An impact that most do not yet comprehend and many will not in the future.
As excited as I am about the Tesla robot unveiled and its future capabilities, more important is what we have learned about the state of the technology that will enable autonomous vehicles in the near future, with Tesla well ahead of everyone else.
The presentation we saw was not a marketing or PR event to educate the public, but to attract talent to Tesla’s AI team, because talent is rare and makes all the difference in solving the world’s toughest problems. Autonomous driving is one of those problems, so it’s no surprise that much of what the audience heard was technical and could only be understood if you’re an expert yourself.
Autonomous driving and walking is the new term to learn because the technology can be used for both cars and humanoid robots, with adaptations for anything else that moves in a medium. Although the technology is very complex and few people understand the challenges involved, the best way to express Tesla’s progress is that they are taking an approach that is closest to the way humans drive their cars today. Tesla has vector-based 4D perception and orientation, planning, anticipation, simulation, iteration, learning, and actions that are similar to the human brain. That’s a lot of words that mean nothing to most but are the most important to a few.
All of the above is done with software, but you also have hardware that is purpose-built for the task and scalable, like a brain and nervous system that can evolve, learn and change, usually called plasticity. Both the hardware and software together enable us, humans, to control a vehicle safely, and I’m confident the same will be true of Tesla’s FSD someday. Even the Dojo computer, which trains the neural network in a separate unit to improve algorithms, is a task that the human brain performs at least partially while asleep. Human learning does not take place entirely while awake, but while asleep.
I don’t call myself an expert on autonomous technologies, although others do against my wishes. Some have invited me for interviews, which I reluctantly accept, hoping to make my small contribution to the mysterious world of software and computer learning that some call artificial intelligence or AI. Intelligence is a term we are still unable to define correctly but dare to use for a technology we don’t understand either. While I do my best to understand all aspects, and 20 years of experience as an engineer in the software and hardware industries is helpful, my experience means nothing in the accelerated innovation of Tesla’s fully self-driving software.
While the impressive team Tesla has assembled presented developments in great detail at AI Day, all the other automakers and software companies trying to develop a working system are silent, keeping most of their developments under wraps. If they had overcome the local maximum Tesla found themselves in a few times, they could proudly present their achievement, but since they are silent, we can assume that they have not even understood in depth the task they are trying to solve. If you believe that the fact that autonomous technology works on certain defined roads is a sign that others have solved the problem, you’re kidding yourself. Tesla could drive a vehicle from the east coast to the west coast today without a driver or supervision and still not be able to claim to have solved FSD fully.
I dare say that all other systems today are working towards their local maximum, not even understanding that the approach they are taking will not enable a fully autonomous system on all roads where the driver can relinquish all responsibility and liability to the system. Using lidar as a sensor is a reflection of not understanding the real task at hand, but once their system fails to improve beyond a certain level that is insufficient for FSD, they will either have to start from scratch again or choose to license the much cheaper, and at that point already mature, system from Tesla. I predict that most of those claiming to have a system that works will disappear in the next few years, and none will have anywhere near the capabilities of Tesla FSD.
That’s a lot of claims and predictions for someone who doesn’t call himself an expert, isn’t it? Allow me to go one level deeper into some of the hardware and software components from Tesla to show what they distinguish from all others.
A vector space is the definition of an object in a 3-dimensional space. Images describe a 2-dimensional space, and while all other systems work with 2 dimensions, Tesla works with 3 plus the time element, which is why it is also called 4-dimensions or 4D. It’s comparatively easy to get a vehicle to drive based on 2-dimensional images generated by cameras for perception, but locating your vehicle in 3 dimensions is an order of magnitude more challenging and precise. This precision is necessary to avoid collisions or other unwanted incidents. Using three dimensions allows the system to identify objects that a two-dimensional system cannot, and the time factor gives it the ability to simulate and plan what will happen next, which we call anticipation. Anticipation through simulation allows you to avoid dangerous situations before they happen. While Tesla works in 4-dimensions, everyone else works in 2-dimensions.
Spatially Recurrent Neural Network
Position encodings along with multi-camera features and the kinematics of the vehicle allow recurrent cells to be built around the vehicle, adding a temporal dimension to perception. While this sounds complicated and like a strange combination of words, it is nothing more than a recurring identification of the vehicle at its location that allows for time-based changes and necessary adjustments to be detected, often referred to as collision avoidance. While the neural network can create a recurrent and changing map, it is also able to plan a path to travel in that space and correct that path in iterations to get closer to an optimized result including the ability to remember. While other vendors use high-density maps that are outdated as soon as they are uploaded, a Tesla vehicle understands its surroundings in real time like no one else.
Tesla vehicles have 8 cameras around the vehicle. Instead of using each camera individually, a fused view of all 8 cameras is created, providing a holistic perception that understands objects better than a single camera. Such ambient perception is required to recognize the world around the vehicle, otherwise, the interpretation will always be flawed. Tesla used to call it a bird’s eye view and created this fusion earlier to be able to use the result sooner. Other system developers use even more than 8 cameras, but since they process each 2D image individually, their perception of the environment will always fall short of what a Tesla vehicle is capable of perceiving. This is an essential approach to making sense of the time dimension, which is used as a logical extension of fusion.
Labeling static and dynamic objects is done manually by all companies, but Tesla will do it automatically using its Dojo computer which once again puts the company ahead of everyone else. This is because the correct labeling of objects in images and videos by the system is the basis for correct decision-making by an autonomous driving vehicle. Today Tesla uses 1,000 labelers or people that do the label task, but given the complexity and diversity of our world, a manual approach is not sufficient. Automatic labeling is not done by any other system provider and is the holy grail to accelerate processes and improvements. Even more exciting, each vehicle provides input to the Dojo computer to correctly label maps, completing and confirming the world map in which all Tesla vehicles drive over time. The accuracy surpasses by an order of magnitude any high-density maps that other providers regularly upload to their vehicles.
Data and Simulation
While Tesla uses real-world data on an unprecedented scale by inputting now more than 1.5 million vehicles on the road with 8 cameras, the world is diverse and edge cases occur that no one has seen before. Other companies try to compensate for missing data with simulations, but any simulation is always worse than the reality and an expression of imagination. It can be said that simulations will never be able to compensate for real data, but they can enrich existing real data to create unusual edge cases that have never been seen before. These edge cases are critical for training algorithms so that they can be prepared before they are confronted with these unusual cases for the first time. Simulations are also very useful to create situations where the labeling of many objects is not done by manual labeling, but as preparation by simulated data. An example of this is a road with 100 moving and labeled objects on it.
Sensors provide the input to your autonomous driving system, but what makes it drive is the output, based on the intervening computations. If you have a lot of input from a lot of sensors, you only get a better output if the computer can use it to produce a better output. At some point, the system has to merge the inputs calculated by the different neural nets into a single output, but how do you decide which output from lidar, radar, and cameras is correct if they contradict each other? This problem cannot be solved because each of them can be right or wrong. For this reason, nature has decided that humans must use their eyes as the main input for computation and output when driving. To solve autonomous driving, using cameras is the right approach because conflicts are eliminated and decisions can be made quickly and reliably. All other system developers using radar, lidar, and other sensors create conflicts they cannot resolve and are consequently trapped in a local maximum. When they realize that, the time and capital investment will be enormous, and a new approach that is equally expensive will likely lead to the end of the company.
Dojo D1 Chip
On the hardware side of the equation, Tesla has developed its own internally developed chip and computer that is the fastest and largest in the world of autonomous driving. It is important to understand that all other companies are working with chips and computers designed for different use cases, not just autonomous driving and neural networks. A multi-purpose system always makes compromises to be sellable to a broad customer base, be it with heat generation, compute power, size or price. Having a dedicated system developed for just one purpose or many is a crucial difference. For Dojo it allows the neural network training cycles to be accelerated at a rate that no other computer in the world can match. For Tesla, this means that improvements will be faster compared to everyone else, simply because they can train the neural network faster. Even if others took the same approach to neural network development, without Dojo, they would fall behind in improvement unless they have a faster system. The scalability and flexibility to expand the computer as needed allow it to be easily upgraded as required avoiding long development, design, and manufacturing cycles. Dojo is scheduled to launch in 2022 and will take Tesla a huge leap forward.
With all the categories described above, Tesla stands out from all other approaches to autonomous driving and continues to extend its lead. Most importantly, all of the elements described help in one way or another to accelerate the iteration process of the system and thus the rate of improvement. Regardless of which path AI competitors take, if they don’t have elements in their system that speed up iteration, they will fall behind in improvement.
“Moats are lame,” Elon Musk rightly said, because the pace of innovation and improvement determines who leads, not who protects intellectual property. With that in mind, it’s silly to try to determine an innovative leader by counting how many patents someone has. While many argue about whether Tesla’s approach is the right one, no one disputes that no single company is faster at innovation and iterative improvement.
With the winner claiming most of the pie, at the end of the transition to FSD, there will be one major supplier that others will license from because it can offer a better system at a lower cost. Consequently, most of the global FSD profits will go to one company.
Since everything Tesla has developed to solve the autonomous vehicle problem can be used for any future neural network development for automated vehicles including the Tesla Bot, the system itself is a service that can be monetized as a Neural Network as a Service solution. With this leverage, Tesla opens up an unprecedented opportunity to generate revenue with high-profit margins, similar to software companies. It is therefore likely that the majority of future profits from neural networks in various industries will also fall to Tesla. In addition, the Tesla Bot is an incredibly huge business opportunity on its own that will, in my humble opinion, be larger than everything Tesla does today with vehicles and energy.
There are still many challenges to overcome and much to learn, but from what we saw at AI Day, we can conclude that Tesla’s lead is getting bigger and its moat is getting wider.
About the author
Alex Voigt has been a supporter of the mission to transform the world to sustainable carbon free energy and transportation for 40 years. As an engineer, he is fascinated about the ability of humankind to develop a better future via the use of technology. As a German, he is sometimes frustrated about the German automotive industry and its slow progress with battery electric vehicles which is why he started to publish in English and German. With 30 years of experience in the stock market, he is invested in Tesla [TSLA], as well as some other tech companies, for the long term.