AiNews.com
Posts
Hugging Face Replicates DeepSeek-R1 with Open-R1, Launches New Math Dataset

Hugging Face Replicates DeepSeek-R1 with Open-R1, Launches New Math Dataset

Alicia Shapiro
February 12, 2025 • Estimated Reading Time: 5 minutes

A sleek digital visualization of an AI neural network solving complex mathematical equations. The interconnected nodes and circuits represent the reasoning process of language models, while floating mathematical symbols highlight AI’s advanced problem-solving capabilities. In the background, subtle representations of datasets and coding scripts reflect Hugging Face’s open-source approach to AI development.

Image Source: ChatGPT-4o

Hugging Face Replicates DeepSeek-R1 with Open-R1, Launches New Math Dataset

Hugging Face has announced Open-R1, a fully open-source replication of DeepSeek-R1, a groundbreaking reasoning model that has stirred excitement in the AI community. DeepSeek-R1 originally gained attention for matching or outperforming OpenAI's o1 model, thanks to its innovative use of reinforcement learning (RL) to enhance reasoning skills in AI without human supervision.

While DeepSeek-R1 impressed with its performance and detailed technical report, it left some gaps—the datasets and training code were not released. This prompted Hugging Face to launch the Open-R1 project, aiming to reconstruct DeepSeek’s training pipeline and data, validate its claims, and make reasoning models more transparent and accessible for the open-source community.

DeepSeek-R1’s Breakthrough and Open-R1’s Mission

DeepSeek-R1 was built on DeepSeek-V3, a powerful base model comparable to heavyweights like GPT-4o and Sonnet 3.5, yet trained for just $5.5 million thanks to architectural efficiencies. It introduced two versions:

DeepSeek-R1-Zero: Skipped supervised fine-tuning, relying entirely on reinforcement learning (RL) to develop reasoning skills. However, its responses were often unclear.
DeepSeek-R1: Combined an initial “cold start” supervised fine-tuning phase with reinforcement learning (RL). The model was fine-tuned on a small set of carefully crafted examples to enhance clarity and readability. After this, it underwent multiple RL stages, incorporating both human preference-based feedback and verifiable reward systems to reject low-quality outputs. This approach allowed the model to not only develop strong reasoning abilities but also produce polished, coherent, and consistent responses that are easier for users to understand.

Despite its success, key details like data collection methods, training hyperparameters, and scaling laws remain unknown. Hugging Face’s Open-R1 seeks to fill in these gaps, offering a fully transparent framework that allows researchers to build, refine, and expand upon DeepSeek’s model.

Open-R1’s Approach to Open-Source AI Reasoning

The Open-R1 project focuses on three main goals:

Replicate R1-Distill Models: By distilling a high-quality reasoning dataset from DeepSeek-R1 outputs.
Recreate the Pure RL Pipeline: To mirror DeepSeek-R1-Zero’s process using new, large-scale datasets for math, reasoning, and code.
Multi-Stage Training Process: Demonstrating how to progress from a base model to fine-tuned models via supervised learning and RL.

Hugging Face hopes this initiative will not only replicate results but also empower the open-source community to contribute and innovate in AI reasoning, extending applications beyond math to fields like coding and medicine.

OpenR1-Math-220k: A New Dataset for Mathematical Reasoning

As part of the Open-R1 project, Hugging Face has released OpenR1-Math-220k, a large-scale dataset designed to enhance mathematical reasoning in language models. This dataset was created by generating two solutions for 400,000 math problems using DeepSeek-R1 and filtering down to 220,000 high-quality correct reasoning traces.

Key Highlights of OpenR1-Math-220k:

Generated Locally with 512 H100 GPUs: Leveraging local infrastructure for faster, large-scale data generation, generating 180k reasoning traces per day.
Multiple Solutions per Problem: Based on NuminaMath 1.5, OpenR1-Math-220k generates multiple answers per problem, enabling more effective filtering and refinement of reasoning traces.
Automated Filtering with Math Verify: Uses a rule-based system to retain only correct reasoning traces. For cases where answers are malformed or difficult to verify, Llama3.3-70B-Instruct is used as an additional judge to improve dataset accuracy.
Performance Matching: Fine-tuning models like Qwen-7B on OpenR1-Math-220k achieves results comparable to DeepSeek’s distilled models.

This dataset demonstrates scalable, high-quality reasoning data generation, which could be expanded to domains like code generation and scientific research.

For More Technical Details

For readers interested in the deeper technical aspects of OpenR1-Math-220k—including data generation methods, performance benchmarks, and advanced filtering techniques—visit the official Hugging Face repository at Hugging Face Open-R1 as well as their blog that describes their techniques.

Looking Ahead

Hugging Face’s Open-R1 project marks a significant step toward transparency in AI reasoning models. By openly replicating DeepSeek-R1 and creating datasets like OpenR1-Math-220k, the project not only validates existing breakthroughs but also lays the groundwork for future innovations in mathematics, coding, and scientific research.

This initiative underscores the growing importance of open-source collaboration in AI, providing researchers and developers the tools to refine and push the boundaries of reasoning models. As more contributions come in from the global community, Open-R1 could help accelerate the development of AI systems that excel at complex reasoning tasks, all while remaining accessible to everyone.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.