AiNews.com
Posts
Alibaba’s Qwen3 Outperforms OpenAI o1 and DeepSeek R1 in Open Benchmarks

Alibaba’s Qwen3 Outperforms OpenAI o1 and DeepSeek R1 in Open Benchmarks

Alicia Shapiro
April 29, 2025 • Estimated Reading Time: 6 minutes

Image Source: ChatGPT-4o

Alibaba’s Qwen3 Outperforms OpenAI o1 and DeepSeek R1 in Open Benchmarks

Alibaba has launched Qwen3, a powerful suite of open-source large language models that set a new standard for non-proprietary AI. Developed by the company’s Qwen team, the Qwen3 series includes eight models—six dense and two “mixture-of-experts” (MoE) variants—with capabilities that reportedly exceed open competitors and rival leading proprietary models from OpenAI and Google.

The standout model, Qwen3-235B-A22B, features 235 billion parameters and has outperformed DeepSeek’s R1 and OpenAI’s o1 on benchmarks such as ArenaHard, which includes complex queries in software engineering and mathematics. It also nears the performance of Google Gemini 2.5 Pro, making it one of the most advanced publicly available models to date.

A benchmark comparison table showing performance scores across several AI tasks for multiple large language models, including Qwen3-235B-A22B, Qwen3-32B, OpenAI-o1, Deepseek-R1, Grok 3 Beta, Gemini 2.5-Pro, and OpenAI-o3-mini. The table includes benchmarks such as ArenaHard, AIME'24, AIME'25, LiveCodeBench, CodeForces Elo rating, Aider Pass@2, LiveBench, BFCL, and MultiIF. Qwen3-235B-A22B leads in several benchmarks, including ArenaHard (95.6), AIME'24 (85.7), and CodeForces (2056). Gemini 2.5-Pro also performs strongly, especially in AIME'24 (92.0) and MultiIF (77.8). The data highlights Qwen3's competitive advantage against both open-source and proprietary models. Footnotes clarify evaluation formats and scoring methodology.

Qwen3 Outperforms Competitors in AI Benchmark Tests Across Multiple Tasks. Image Source: Alibaba

Advanced Architecture and Hybrid Reasoning

Qwen3's architecture emphasizes hybrid reasoning—a dynamic approach that lets users choose between fast, efficient responses or more detailed, computation-heavy outputs. This flexibility is similar to OpenAI's “o” series and is designed for complex tasks across science, math, and engineering.

Users can toggle this "Thinking Mode" via the Qwen Chat interface or by inserting prompts such as /think or /no_think when deploying the model locally or via API. The MoE design activates only the necessary experts per query, optimizing performance while reducing compute overhead.

The models are available under the permissive Apache 2.0 license, allowing unrestricted commercial use and deployment across platforms like Hugging Face, ModelScope, Kaggle, and GitHub.

Broad Model Range and Multilingual Power

Qwen3 includes dense models of various sizes:

Qwen3-32B
Qwen3-14B
Qwen3-8B
Qwen3-4B
Qwen3-1.7B
Qwen3-0.6B

These offer scalable options for diverse workloads, from laptop prototyping to large-scale cluster deployment depending on the user's needs and computational budgets.

Qwen3 also significantly expands multilingual support, now covering 119 languages and dialects, making it well-suited for global research and enterprise applications.

Training and Deployment Flexibility

Qwen3 represents a major leap from its predecessor, Qwen2.5. The team doubled the dataset size to 36 trillion tokens, using web crawls, structured document extractions, and synthetic data focused on math and coding. A robust seven-stage training pipeline (three pretraining, four post-training) enables its dynamic reasoning capabilities.

Deployment is flexible and fast:

OpenAI-compatible endpoints via SGLang and vLLM
Local deployment tools such as Ollama, LMStudio, MLX, llama.cpp, and KTransformers
Agent-based workflows supported by the Qwen-Agent toolkit that simplifies tool-calling operations
Official LoRA and QLoRA adapters support secure, private fine-tuning without data leaving the organization.

A Strategic Tool for Enterprises

For enterprise users, Qwen3 offers GPT-4-class reasoning at a lower compute cost. Its MoE architecture activates just 22B of 235B parameters per call, reducing GPU memory demands and shrinking the inference attack surface. Running the model on-premises enhances data control and observability, while the Apache 2.0 license removes licensing friction common in more restrictive alternatives.

While organizations should still review governance implications when deploying China-based models, Qwen3 offers a serious alternative to U.S. models from OpenAI, Google, Anthropic, Meta, and others.

Looking Ahead

The Qwen team is already preparing to scale even further, targeting long-horizon reasoning, extended context windows, broader modality integration, and enhanced reinforcement learning based on environmental feedback.

Qwen3 is more than a model drop—it’s a competitive signal towards Artificial General Intelligence (AGI). The global AI race remains wide open, and with open-source models rapidly advancing, savvy organizations have more reason than ever to stay flexible, experimental, and vendor-agnostic.

What This Means

Qwen3’s release adds significant momentum to open-source AI development at a time when access, transparency, and control are becoming increasingly critical for enterprises and researchers alike. By offering performance that rivals leading proprietary models, while maintaining a flexible and permissive licensing model, Qwen3 positions itself as both a practical tool for today’s deployments and a glimpse into the future of a more democratized AI landscape.

Its arrival also intensifies the broader competition toward Artificial General Intelligence (AGI), signaling that innovation is no longer confined to a handful of large tech companies.

In a world increasingly shaped by who controls access to advanced AI, Qwen3 reminds us: open source doesn’t mean second best—it's leading the way.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.