Tesla has repeatedly stated that the FSD V12 will enable a new “end-to-end Autopilot” and that the model architecture will undergo significant changes. The market is still confused and divided about the technical details and potential impact of the new architecture. According to the information disclosed by Tesla and the information posted by Musk on the X platform (former Twitter), we speculate that Tesla has two “end-to-end” routes under synchronized research: 1) cascaded end-to-end neural network; and 2) World Model. We believe that FSD V12 is more likely to be the former, and is expected to land early next year to better realize the L3 capability; World Model is still relatively new, but in the medium and long term it is expected to become the basic model of the field of automated driving and even body intelligence, which can be likened to the GPT in the LLM, or have a disruptive impact on the industry pattern.
- Possibility 1 (higher): Cascaded end-to-end neural networks, which dramatically improve training efficiency and upper-performance limits, are expected to help autonomous driving systems better realize L3 capabilities, but whether they can move towards L4 remains to be seen.
- Possibility 2 (lower): World Model, which is expected to create a basic pedestal in the field of autonomous driving and lead the GPT moment of vision; it may still be in the GPT-1 stage, but the speed of development is worth looking forward to, and it may have a disruptive impact on the industry pattern in the medium to long term.
- Tesla owners are still a long way from ‘end-to-end’
- Risk factor
Possibility 1 (higher): Cascaded end-to-end neural networks, which dramatically improve training efficiency and upper-performance limits, are expected to help autonomous driving systems better realize L3 capabilities, but whether they can move towards L4 remains to be seen.
Technology essence: the system uses neural network algorithms from input to output, without the intervention of any artificial rules. Currently, most of the automatic driving models are modularized architecture, with different tasks such as perception prediction, planning, control, etc. belonging to multiple different small models, and the downstream regulation and control links are still generally rule-based. The “end-to-end” neural network can directly output steering, braking, acceleration, and other control commands after inputting images. In order to improve the training effect, the “end-to-end” large neural network may be cascaded from multiple small sub-networks. However, unlike traditional modular architectures that use “rules” to connect modules, the sub-modules of a cascaded neural network are trained and stacked on top of each other in a “neural network” fashion, allowing for data-driven optimization of the entire end-to-end model, avoiding the need for “local optimization, not local control”. This allows for data-driven optimization of the entire end-to-end model, avoiding the dilemma of “local optimization, not global optimization”.
Potential impact: Better realization of L3 capabilities. The core benefit of end-to-end/neural networks is that the key to model iteration changes from “engineers” to more easily scaled “data and arithmetic”, resulting in a significant increase in training efficiency and performance ceiling. In practical terms, we believe that the performance potential demonstrated by the end-to-end solution is expected to dramatically increase the takeover level of the autonomous driving system, leading to a truly undisputed L3 capability (e.g., up to 1 takeover per week). However, the end-to-end model’s “black box” problem has yet to be solved by the industry, so we believe it remains to be seen whether it can ultimately move towards L4 full driverlessness in pursuit of ultimate safety.
Development progress: it is expected to be equipped with FSD V12, and it is more likely to be officially landed early next year. on August 26, 2023, Musk conducted a live driving broadcast of the FSD V12 demo on the X platform (formerly Twitter). We believe that although the capabilities shown in the live broadcast are still a long way from being formally pushed to consumers, it can be basically inferred that the current V12 model is very close to end-to-end, coupled with Tesla’s huge investment in data and arithmetic, the V12 next to the end of the year is expected to speed up the iteration rate significantly.
Possibility 2 (lower): World Model, which is expected to create a basic pedestal in the field of autonomous driving and lead the GPT moment of vision; it may still be in the GPT-1 stage, but the speed of development is worth looking forward to, and it may have a disruptive impact on the industry pattern in the medium to long term.
The essence of the technology: a self-supervised pre-trained model without annotation. World Model generates continuous-frame video scenes related to autonomous driving, and its essence is to learn the rich semantics in the video and the physical laws behind it, so as to develop a deep understanding of the evolution of the physical world. Based on the rich semantic information provided by the World Model and its powerful understanding of the world, the perception and prediction capabilities of autonomous driving models are expected to be significantly improved, and downstream tasks such as planning and control are expected to be solved.
Potential Impact: Provides a new set of unproven paths for automated driving, which, if realized, will cause greater disruption to the industry. In the short and medium term, the World Model may be mainly used in data synthesis and simulation, the importance of the manufacturer’s fleet size to algorithm training may decrease, and the framework of the closed loop data will also change. In the long run, the World Model has the potential to become a basic model for autonomous driving and even body intelligence, which can be compared to GPT providing a general solution for all NLP problems. Considering the more standardized solutions and huge capital investment (the capital requirement is several times of the current generation BEV+Transformer solution), the industry is expected to see the emergence of a handful of powerful platforms for the World Model base model layer, which will provide autonomous driving capabilities to OEMs/operators in the form of SaaS or APIs, and the industry landscape and the cooperation model may change considerably. The industry landscape and cooperation model may change significantly.
Development progress: still early days, but the pace of development is worth looking forward to. We believe that Tesla, Wayve, and other companies coincidentally launched the World Model this year, largely inspired by GPT. At present, World Model may still be in the stage of GPT-1, but considering the industry’s strong consensus on the potential of the “big model”, the upgrading of arithmetic power, and the huge amount of data accumulated by players represented by Tesla, we believe that World Model’s outbreak from 0 to 1 may be faster than that of ChatGPT (OPT), but the speed of development is worthy of expectation. ChatGPT faster (OpenAI from GPT-1 to GPT-3.5 took 4 years).
Tesla owners are still a long way from ‘end-to-end’
Whether it’s a cascading end-to-end neural network or a World Model, we summarize that the core barriers lie in 1) a team of professional AI talents, 2) massive video data, and 3) huge arithmetic investment. Domestic OEMs will start to realize the mass production of this generation of “BEV+Transformer” architecture on board from 2023, so as to realize the “off map” as soon as possible and “open the city” as much as possible. The current focus of the domestic smart car enterprises is to realize “off-map” and “open city” as soon as possible.
For the next-generation “end-to-end” technology route, the fastest domestic OEMs are still in the early stage of pre-research, and there is a big gap between Tesla in terms of talent, data, and arithmetic, so optimistically it will take about 3 years from the pre-research to the car.
As for the World Model, which has a more uncertain future (or needs to wait for Tesla to show more results), the domestic OEMs, which are facing greater sales and delivery pressures, may also have reservations about investing talents and resources in the new technology. In comparison, we believe that industry giants such as Tesla and Huawei have a greater advantage in terms of capital and resources; while startups such as WAYVE and Excellent Technology are more focused on their goals and energies, and are also expected to attract more AI talents, similar to OpenAI back then.
Tesla’s End-to-End Technology Path Development Less Than Expected; Tesla’s FSD V12 Push Progress Less Than Expected; Large Shortage of Training Arithmetic in the Cloud.
Appapa Autopilot Buddy is not a safety device. Improper use of this product can cause injury or accident. The device is only used for closed road sections, not for public roads. Autopilot Buddy is an aftermarket device that attaches to a Tesla vehicle’s steering wheel designed to fool the Autopilot System into thinking the driver’s…