News about DeepSeek AI's Low-Cost Models Suspected to Use OpenAI Data, Sparking Online Irony

The emergence of DeepSeek AI, a Chinese-developed model offering significantly cheaper alternatives to Western AI solutions like ChatGPT, has sparked intense debate and concern in the tech industry. This week, former President Donald Trump labeled DeepSeek as a "wake-up call" for the U.S. tech sector, following a drastic $600 billion drop in Nvidia's market value. Nvidia, a leader in GPU production essential for AI operations, saw its shares plummet by 16.86%, marking the largest single-day loss in Wall Street history. Other tech giants like Microsoft, Meta Platforms, and Google's parent company Alphabet experienced declines ranging from 2.1% to 4.2%, while AI server manufacturer Dell Technologies fell by 8.7%.

DeepSeek's R1 model, built on the open-source DeepSeek-V3, claims to require considerably less computing power than its Western counterparts and was reportedly trained for just $6 million. While these claims are under dispute, the introduction of DeepSeek has raised questions about the massive investments American tech firms are making in AI, causing investor unease. The model's popularity surged, propelling it to the top of the U.S. free app download charts amidst growing discussions about its capabilities.

Bloomberg reported that OpenAI and Microsoft are investigating whether DeepSeek utilized OpenAI's API to incorporate OpenAI's AI models into its own, a practice known as distillation. OpenAI stated to Bloomberg, "We know PRC (China) based companies — and others — are constantly trying to distill the models of leading U.S. AI companies." Distillation, which involves training AI models by extracting data from larger, more capable ones, violates OpenAI's terms of service. OpenAI emphasized its commitment to protecting its intellectual property and highlighted the importance of collaborating with the U.S. government to safeguard advanced AI models from adversarial and competitive threats.

David Sacks, Trump's artificial intelligence czar, told Fox News, "There’s substantial evidence that what DeepSeek did here is they distilled knowledge out of OpenAI models, and I don’t think OpenAI is very happy about this." He anticipates that leading AI companies will take steps to prevent such distillation in the coming months.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.

Amidst these developments, observers have noted the irony of OpenAI's situation, given its own history of using copyrighted materials to train ChatGPT. Tech PR and writer Ed Zitron tweeted, "I'm so sorry I can't stop laughing. OpenAI, the company built on stealing literally the entire internet, is crying because DeepSeek may have trained on the outputs from ChatGPT. They're crying their eyes out. What a bunch of hypocritical little babies."

In January 2024, OpenAI admitted in a submission to the UK's House of Lords communications and digital select committee that training AI models like ChatGPT without copyrighted material was "impossible." They argued that since copyright covers nearly all forms of human expression, excluding copyrighted materials would severely limit the effectiveness and relevance of AI systems.

The issue of training AI on copyrighted materials has become a focal point in the tech industry, especially with the rise of generative AI. In December 2023, the New York Times sued OpenAI and Microsoft for the "unlawful use" of its work to develop their products. OpenAI defended its practices, asserting that such training constitutes "fair use" and emphasized its support for journalism and partnerships with news organizations.

This lawsuit followed another in September 2023, where 17 authors, including George R. R. Martin, accused OpenAI of "systematic theft on a mass scale." Additionally, in August 2023, District Judge Beryl Howell upheld a U.S. Copyright Office ruling from 2018 that AI-generated art cannot be copyrighted, underscoring the necessity of human creativity in copyright protection.