AiNews.com
Posts
OpenAI Launches Flex Pricing for Slower, Cheaper AI Model Access

OpenAI Launches Flex Pricing for Slower, Cheaper AI Model Access

Alicia Shapiro
April 18, 2025 • Estimated Reading Time: 3 minutes

A modern computer setup shows a realistic software developer's desktop with an OpenAI API dashboard on the main monitor. The screen displays two side-by-side options: “Standard Processing” and “Flex Processing.” The Flex option is highlighted in blue and marked with a caution about slower speeds and occasional unavailability. A laptop nearby shows a bar graph of API usage data, and terminal code is visible on multiple screens in the background. The workspace includes a sleek keyboard, a glowing OpenAI-branded mouse, and a cup of pens, suggesting a tech-savvy, real-world development environment.

Image Source: TCLtv+

OpenAI Launches Flex Pricing for Slower, Cheaper AI Model Access

OpenAI has launched Flex processing, a new pricing tier for its API that trades speed and guaranteed availability for significantly reduced costs. The option is now available in beta for OpenAI’s o3 and o4-mini models and targets lower-priority use cases like data enrichment, model evaluations, and asynchronous workflows.

Flex processing is designed to serve “non-production” environments where slower response times and occasional resource unavailability are acceptable. In return, developers get 50% off standard API prices.

Here’s a breakdown of the new pricing:

o3 model:

$5 per million input tokens (vs. $10 standard)
$20 per million output tokens (vs. $40 standard)

o4-mini model:

$0.55 per million input tokens (vs. $1.10 standard)
$2.20 per million output tokens (vs. $4.40 standard)

The move comes as AI model costs continue to rise, and major players like Google and DeepSeek introduce faster, more cost-effective alternatives. Just this week, Google announced Gemini 2.5 Flash, a high-performance, low-cost reasoning model aimed squarely at cost-conscious developers.

OpenAI’s announcement also included a policy update: users in tiers 1–3 (based on spend levels) will now need to complete ID verification to access the o3 model and certain advanced features like reasoning summaries and streaming API support. This is part of OpenAI’s broader effort to limit abuse from bad actors and enforce its usage policies.

What This Means

With Flex processing, OpenAI is offering a middle ground for developers who want access to powerful models but don’t need real-time performance. It’s a direct play to remain competitive in an increasingly price-sensitive market—especially as rivals release leaner, faster models aimed at everyday tasks.

This also signals a future where AI access becomes more segmented, not just by capability, but by speed, reliability, and trust—developers will need to balance price, performance, and identity in every decision.

In the race to scale responsibly, flexibility may prove to be OpenAI’s most valuable feature.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.