• AiNews.com
  • Posts
  • Google Launches Gemini 2.5 Flash, a Faster AI Model with Thinking Control

Google Launches Gemini 2.5 Flash, a Faster AI Model with Thinking Control

A realistic developer workspace featuring a sleek laptop displaying Google AI Studio with the Gemini 2.5 Flash interface. The screen highlights a large “Thinking Budget” slider set to 24,576 tokens. Below, prompt examples are grouped into Low and High Reasoning categories. The desk is organized with a coffee mug, smartphone, pens, and notes, with soft sunlight illuminating the modern coding environment. In the background, code is visible on a second monitor, emphasizing an active, professional setup.

Image Source: TCLtv+

Google Launches Gemini 2.5 Flash, a Faster AI Model with Thinking Control

Google has rolled out Gemini 2.5 Flash in preview, its newest AI model designed to combine high performance with developer-friendly cost and speed. Now available in preview through Google AI Studio, Vertex AI, and the Gemini app, 2.5 Flash introduces a unique feature: controllable reasoning, letting users adjust how much “thinking” the model does before responding.

Building on the fast, lightweight 2.0 Flash, this new version upgrades reasoning abilities without compromising speed and cost. It’s Google's first fully hybrid reasoning model, allowing developers to turn “thinking” on or off, or set a thinking budget to find the right tradeoff between speed, quality, and cost.

What Does “Thinking” Mean?

In Gemini models, thinking refers to an internal reasoning phase before generating a response. When enabled, the model can break down complex problems, plan multi-step answers, and deliver more accurate results—especially for advanced math, logic, or programming tasks.

With thinking off, Gemini 2.5 Flash retains the fast output of 2.0 Flash, but still offers improved performance. Developers can also set a custom token budget (0 to 24,576 tokens) to control how much the model reasons before responding—either via a parameter in the API or using the slider in Google AI Studio and Vertex AI. The model is trained to automatically scale its reasoning to match prompt complexity.

Example prompts by reasoning level:

Low Reasoning:

  • Translate “Thank you” into Spanish

  • How many provinces does Canada have?

Medium Reasoning:

  • You roll two dice. What’s the probability they add up to 7?

  • My gym has pickup hours for basketball between 9-3pm on MWF and between 2-8pm on Tuesday and Saturday. If I work 9-6pm 5 days a week and want to play 5 hours of basketball on weekdays, create a schedule for me to make it all work.

High Reasoning:

  • A cantilever beam of length L=3m has a rectangular cross-section (width b=0.1m, height h=0.2m) and is made of steel (E=200 GPa). It is subjected to a uniformly distributed load w=5 kN/m along its entire length and a point load P=10 kN at its free end. Calculate the maximum bending stress (σ_max).

Available Now in Preview

Gemini 2.5 Flash is now accessible via:

  • Gemini API (Google AI Studio + Vertex AI)

  • Gemini app, with support for new features like Canvas, a collaborative space for document and code editing

  • Developers can explore the new thinking budget parameter and test performance using examples in the Gemini Cookbook or official API docs. General availability for full production use is expected soon.

What This Means

Gemini 2.5 Flash isn’t just a faster model—it’s a smarter one, giving developers unprecedented control over the cognitive workload of AI. With controllable reasoning, Google is turning what was once a background process into a programmable feature, allowing teams to fine-tune latency, cost, and quality based on each task’s complexity.

This puts Gemini 2.5 Flash on a new frontier: hybrid reasoning that adapts to context and use case. It reflects a growing industry shift toward flexible AI deployment, where performance isn't just about benchmarks but about how well a model fits into real-world pipelines.

Google’s push here also responds to competitive pressure. As OpenAI rolls out pricing models like Flex and Anthropic hones Claude’s reasoning depth, Gemini 2.5 Flash aims to meet developers where they are—whether that’s optimizing costs for simple queries or tackling high-stakes reasoning with precision.

In a market where speed and intelligence often come at a tradeoff, Gemini 2.5 Flash says: why not both?

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.