The new chatbot from DeepSeek, which boldly stated, "Hi, I was created so you can ask anything and get an answer that might even surprise you," has made significant waves in the AI industry. This introduction has not only captured attention but also contributed to one of NVIDIA's largest stock price drops, showcasing DeepSeek's impact on the market.
Image: ensigame.com
DeepSeek's AI model stands out due to its innovative architecture and training methods. Let's delve into the key technologies that set it apart:
Multi-token Prediction (MTP): This method allows the model to predict multiple words at once by analyzing different segments of a sentence. This not only boosts the accuracy but also the efficiency of the model, making it a powerful tool for understanding and generating text.
Mixture of Experts (MoE): DeepSeek V3 utilizes a sophisticated architecture with 256 neural networks, activating eight for each token processing task. This approach significantly speeds up AI training and enhances overall performance, making it a standout feature of their technology.
Multi-head Latent Attention (MLA): This mechanism focuses on the most crucial parts of a sentence, extracting key details repeatedly. By doing so, MLA reduces the risk of missing important information, allowing the AI to capture nuanced details in the input data effectively.
DeepSeek, a prominent Chinese startup, claims to have developed this competitive AI model at a relatively low cost. They assert that training the powerful DeepSeek V3 neural network cost them only $6 million and used just 2048 graphics processors.
Image: ensigame.com
However, analysts from SemiAnalysis have uncovered that DeepSeek's operations involve a much larger computational infrastructure. They estimate that DeepSeek uses approximately 50,000 Nvidia Hopper GPUs, including 10,000 H800 units, 10,000 H100s, and additional H20 GPUs, spread across several data centers. These resources are used for AI training, research, and financial modeling, with the company's total investment in servers reaching around $1.6 billion and operational expenses at $944 million.
DeepSeek is a subsidiary of the Chinese hedge fund High-Flyer, which established it as a separate AI-focused division in 2023. Unlike many startups that rely on cloud computing, DeepSeek owns its data centers, giving it complete control over AI model optimization and faster innovation deployment. The company's self-funded status enhances its agility and decision-making speed.
Image: ensigame.com
Furthermore, DeepSeek attracts top talent from leading Chinese universities, with some researchers earning over $1.3 million annually. Despite these significant investments, the company's claim of training its latest model for just $6 million seems unrealistic, as this figure only accounts for GPU usage during pre-training and excludes other substantial costs such as research, model refinement, data processing, and infrastructure.
Since its founding, DeepSeek has invested over $500 million in AI development. Its compact structure allows it to implement AI innovations quickly and effectively, unlike larger, more bureaucratic companies.
Image: ensigame.com
DeepSeek's example illustrates that a well-funded, independent AI company can compete with industry giants. While the company's success is driven by substantial investments, technical breakthroughs, and a strong team, the notion of a "revolutionary budget" for AI model development may be overstated. Nonetheless, DeepSeek's costs remain significantly lower than those of its competitors, such as the $100 million spent on training ChatGPT4o compared to DeepSeek's $5 million for R1.
However, it's still cheaper than its competitors.