AiNews.com
Posts
China’s DeepSeek Unveils AI Model to Compete with OpenAI’s o1

China’s DeepSeek Unveils AI Model to Compete with OpenAI’s o1

Alicia Shapiro
November 21, 2024 • Estimated Reading Time: 6 minutes

A sleek, futuristic AI research lab filled with glowing blue and green server racks, symbolizing advanced computing power. A digital display showcases complex data visualizations, including mathematical equations and logical pathways, representing reasoning AI models. In the background, a modern, high-tech globe highlights global AI competition, with China prominently marked. The overall setting conveys innovation and cutting-edge technology.

Image Source: ChatGPT-4o

China’s DeepSeek Unveils AI Model to Compete with OpenAI’s o1

DeepSeek, a Chinese AI research firm funded by quantitative traders, has launched DeepSeek-R1, a reasoning-focused AI model it claims can compete with OpenAI’s o1-preview. This release highlights China's growing ambition in advanced AI development.

Reasoning AI models stand apart by simulating deeper thinking processes. These models “fact-check” themselves by dedicating more time to analyzing queries, which reduces errors commonly seen in standard AI systems. Like OpenAI’s o1, DeepSeek-R1 approaches problem-solving in structured steps, often taking tens of seconds to compute complex answers.

Competitive Benchmarks and Limitations

DeepSeek-R1’s performance was evaluated across several benchmarks, comparing it to OpenAI’s o1-preview and other AI models:

AIME 2024: DeepSeek-R1 scored the highest with a pass@1 rate of 52.5%, outperforming o1-preview at 44.6%. Other models, such as GPT-4.0 and Claude-3.5-Sonnet, trailed significantly. AIME evaluates AI models using other AI systems.
MATH: The model achieved an impressive accuracy of 91.6%, surpassing o1-preview at 85.5%. MATH is a dataset of word problems testing logical and mathematical reasoning.
GPQA Diamond: DeepSeek-R1 scored 58.5% in pass@1, behind o1-preview’s 73.3% but ahead of other competitors.
Codeforces: A measure of programming ability, where DeepSeek-R1 earned a top rating of 1450, narrowly beating o1-preview at 1428.
LiveCodeBench: DeepSeek-R1 demonstrated 51.6% accuracy, slightly below o1-preview at 53.6%.
ZebraLogic: The model showed 56.6% accuracy, ranking second to o1-preview’s 71.4%.

These results, posted by DeepSeek on X, highlight the model’s strong reasoning and problem-solving capabilities while revealing areas for improvement in general logic tasks like ZebraLogic and certain GPQA challenges.

A set of bar charts comparing the performance of DeepSeek-R1 and other AI models, including OpenAI’s o1-preview, across multiple benchmarks. The benchmarks include AIME 2024, MATH, GPQA Diamond, Codeforces, LiveCodeBench, and ZebraLogic. Each chart highlights metrics such as pass@1, accuracy, and ratings, showing DeepSeek-R1 excelling in AIME and MATH while performing competitively in other categories. The charts provide a clear visual comparison of model capabilities in reasoning and problem-solving tasks.

DeepSeek-R1 Benchmarks. Image Source: DeepSeek X Post

However, feedback on social media has identified shortcomings. DeepSeek-R1 reportedly struggles with basic logic tasks, such as tic-tac-toe, mirroring issues observed in OpenAI’s o1. Additionally, the model has been jailbroken by users, including one instance where it provided a detailed recipe for methamphetamine.

Political Censorship in Responses

DeepSeek-R1 also blocks politically sensitive queries, refusing to address topics like Chinese leader Xi Jinping, Tiananmen Square, or Taiwan’s geopolitical situation. Analysts attribute this censorship to stringent government regulations requiring Chinese AI models to align with "core socialist values."

China’s government enforces these standards by mandating internet regulator assessments and proposing blacklists of training data sources. Many Chinese AI models, including DeepSeek-R1, now avoid controversial subjects to comply with these rules.

A Shift Toward New AI Approaches

The launch of DeepSeek-R1 underscores a broader trend in AI innovation. Traditional “scaling laws,” which suggest models improve by adding more data and computational power, are increasingly scrutinized. Reports indicate that improvements in models from major players like OpenAI, Google, and Anthropic have plateaued.

Reasoning models like DeepSeek-R1 and OpenAI’s o1 are part of a shift toward alternative strategies, such as test-time compute, which allows AI to allocate additional processing time and resources to complete tasks. Microsoft CEO Satya Nadella recently said, “We are seeing the emergence of a new scaling law,” during the Ignite conference.

DeepSeek’s Role in the AI Landscape

DeepSeek’s efforts reflect the ambitions of its parent company, High-Flyer Capital Management, a quantitative hedge fund leveraging AI for trading strategies. High-Flyer’s investment in AI infrastructure is significant, with its latest server cluster boasting 10,000 Nvidia A100 GPUs at a reported cost of $138 million.

DeepSeek has previously disrupted the market with models like DeepSeek-V2, prompting competitors like ByteDance and Baidu to slash prices or offer free services for their own AI solutions.

Looking ahead, DeepSeek plans to open-source DeepSeek-R1 and release an API, but users can already test the R1-Lite-Preview through DeepSeek Chat at chat.deepseek.com. While the platform is free to access, its advanced “Deep Think” mode is capped at 50 messages per day, offering a hands-on way to experience the model’s capabilities.

What This Means

DeepSeek’s introduction of DeepSeek-R1 marks a significant step for Chinese AI research as it seeks to match Western innovations. However, the model's challenges, including ethical concerns, logic gaps, and government-imposed restrictions, underscore the complexities of developing competitive and compliant AI in China.

As the global AI landscape evolves, breakthroughs like test-time compute will likely shape future innovation. The next few years will determine whether reasoning models like DeepSeek-R1 can overcome their current flaws and redefine AI’s capabilities.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.