• AiNews.com
  • Posts
  • DeepSeek Unveils New AI Reasoning Method Amid R2 Model Buzz

DeepSeek Unveils New AI Reasoning Method Amid R2 Model Buzz

A high-tech digital lab scene showing AI researchers from DeepSeek and Tsinghua University working around a glowing central core labeled “DeepSeek-GRM.” Data streams extend from the core to floating icons representing “Generative Reward Modeling” and “Self-Principled Critique Tuning.” Holographic displays show charts, neural networks, and model references like “V3,” while a glowing question mark labeled “R2” suggests the forthcoming release. The atmosphere is scientific, collaborative, and forward-looking, emphasizing cutting-edge research in AI reasoning.

Image Source: ChatGPT-4o

DeepSeek Unveils New AI Reasoning Method Amid R2 Model Buzz

Chinese AI startup DeepSeek, in collaboration with Tsinghua University, has introduced a new technique to enhance the reasoning abilities of large language models (LLMs). The announcement comes as anticipation builds around the possible release of DeepSeek’s next-generation model, R2.

A Dual-Method Breakthrough

The new technique merges Generative Reward Modelling (GRM) with self-principled critique tuning, as detailed in a paper published Friday on arXiv, a scientific preprint platform. This hybrid approach aims to help AI systems produce faster and more accurate responses aligned with human intent.

  • GRM trains models using reward signals that reflect human preferences.

  • Self-principled critique tuning helps refine the model’s outputs based on internal evaluations.

According to researchers, the resulting DeepSeek-GRM models have achieved “competitive performance” compared to top public reward models. While DeepSeek plans to open-source these models, a timeline has not been provided.

Anticipation Around DeepSeek-R2

The publication arrives amid widespread speculation over the release of DeepSeek-R2, the successor to the company’s well-received R1 reasoning model. While Reuters reported last month that R2 may launch in April, DeepSeek has not confirmed this. A company customer service account reportedly denied the claim in a private group chat with clients.

DeepSeek previously made headlines with DeepSeek-R1, which drew global attention for delivering strong performance at a lower cost than major competitors. The company also upgraded its V3 model in March, with enhancements in reasoning, front-end development, and Chinese writing.

A Research-First Approach

Founded in 2023 by entrepreneur Liang Wenfeng, DeepSeek has emphasized research over publicity. The company has gradually released open-source tools, including five code repositories in February, and pledged “sincere progress with full transparency.”

Liang is also the founder of High-Flyer Quant, the hedge fund backing DeepSeek’s rapid growth. In February, he introduced a technique called native sparse attention to improve how LLMs handle large datasets.

Looking Ahead

DeepSeek’s latest research highlights a significant step toward aligning AI reasoning with human values—an area of growing importance as language models become more capable and autonomous. By combining generative reward modeling with internal critique mechanisms, the company is pushing the boundaries of how AI systems can self-evaluate and refine their outputs in real time.

If successful, this approach could reduce the need for extensive post-training human feedback, making LLM development more scalable and cost-effective. It also underscores a larger trend in the AI field: the race to build generalist, high-reasoning systems that are not only powerful but also efficient and aligned with user intent.

As global AI players—from OpenAI to Google to DeepSeek—compete to develop the next breakthrough, DeepSeek’s commitment to both performance and transparency could position it as a uniquely influential force in the next generation of language model development.

Whether DeepSeek-R2 lives up to expectations remains to be seen, but the groundwork being laid through research and open collaboration suggests the company is thinking beyond product launches—toward long-term leadership in AI reasoning.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.