• AiNews.com
  • Posts
  • AI Agents’ Task Completion Capacity Doubling Every 7 Months

AI Agents’ Task Completion Capacity Doubling Every 7 Months

A conceptual illustration showing AI agents’ increasing ability to complete complex tasks. A glowing AI neural network forms a rising staircase, with icons of clocks, calendars, and human figures placed at intervals, symbolizing increasing task length and human effort. The background features abstract digital code in cool blue and green tones, conveying rapid AI advancement and progress.

Image Source: ChatGPT-4o

AI Agents’ Task Completion Capacity Doubling Every 7 Months

A new study from METR (Model Evaluation and Testing Research) sheds light on one of AI’s most critical frontiers: not how well models perform on benchmark tests, but how long and complex the real-world tasks are that AI agents can actually complete. Researchers found that the length of tasks AI models can autonomously complete with 50% reliability has been doubling every 7 months over the past six years—a measurable, exponential trend pointing to significant shifts ahead.

If this trend continues, AI systems could soon manage week-long or month-long projects, transforming not only software development but also administrative, creative, and knowledge-based industries.

How They Measured AI Task Performance

To evaluate AI performance, METR researchers used a diverse set of multi-step software and reasoning tasks, recording how long each task takes human professionals with relevant expertise to complete. They found a strong correlation: AI models have nearly 100% success rates on tasks that take humans under four minutes, but their success rate drops to less than 10% for tasks exceeding four hours.

This led to a practical benchmark—measuring AI capability based on task length. By fitting logistic curves to model performance, researchers estimated the length of tasks each model could reliably complete at various success probabilities.

For example:

  • Tasks under 4 minutes: ~100% AI success rate.

  • Tasks over 4 hours: Less than 10% success rate.

  • Claude 3.7 Sonnet: Capable of handling some tasks taking human experts up to an hour but reliably succeeds on shorter tasks.

This “task length” approach gives a real-world, interpretable gauge of AI progress, moving beyond academic tests to something directly relevant to labor and productivity. However, the estimated length of tasks an AI agent can complete is influenced by factors such as the specific tasks selected, and the expertise level of the human professionals used for comparison.

Exponential Growth Since 2018

Historical data shows a clear exponential trend:

  • AI agents' ability to complete longer tasks has consistently doubled every 7 months since 2018.

  • Extrapolating this trend suggests that by 2028-2029, AI agents could reliably complete projects currently requiring weeks of human effort.

Even accounting for significant error margins in measurement or model comparisons, the researchers note this only shifts the forecast by about two years—underlining the robustness of the trend.

Validating Results Across Datasets

To ensure reliability, the study cross-validated results across various datasets, including:

  • SWE-Bench Verified, based on real-world software engineering tasks, which showed an even faster doubling time of under 3 months.

  • Sensitivity analyses demonstrated consistent trends regardless of task selection, model variations, or methodological adjustments.

For more details on the report, you can visit METR’s website.

Implications for AI Forecasting and Risk Management

The authors argue that focusing on task length as a benchmark offers a more meaningful, real-world gauge of AI progress than traditional academic tests. This approach makes it possible to track model improvement across a broad range of skill levels and task types.

If the exponential growth continues, it could soon enable AI systems to autonomously handle month-long, complex projects—introducing significant benefits but also posing critical questions about oversight, labor impacts, and potential misuse.

What This Means

While much has been said about AI’s superhuman abilities on academic benchmarks, this research adds something crucial: a real-world metric that translates directly to how AI might reshape labor and productivity.

Instead of focusing solely on test scores, it measures how long a task—measured by human effort—AI agents can reliably complete. The consistent exponential growth shows that AI models are rapidly moving from completing short, isolated tasks to handling more complex, sustained projects.

For business leaders, policymakers, and anyone concerned about workforce impact, this provides a tangible timeline: if current trends hold, AI agents could independently execute week-long or even month-long tasks within the next few years. It signals the need to prepare for a future where AI agents may play a far larger role in sectors reliant on long-duration human labor—from software engineering to administrative work. That shift carries significant implications, from potential automation of entire job functions to the need for new governance, oversight, and ethical frameworks.

The key takeaway: METR’s findings aren’t about abstract progress—they provide a measurable signal of how close we may be to major real-world changes in how work gets done.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.