AiNews.com
Posts
Amazon Debuts Nova Act for Building Browser-Based AI Agents

Amazon Debuts Nova Act for Building Browser-Based AI Agents

Alicia Shapiro
April 01, 2025 • Estimated Reading Time: 6 minutes

A digital interface shows an AI agent autonomously completing tasks in a web browser, including ordering food, managing a calendar, and interacting with a simple web game. A glowing digital presence represents the agent, assisting across screens. The interface includes scheduling tools, automation scripts, and subtle Amazon Nova branding. The setting is sleek and futuristic, with soft lighting and a clean, high-tech layout that emphasizes efficient, autonomous task execution.

Image Source: ChatGPT-4o

Amazon Debuts Nova Act for Building Browser-Based AI Agents

Amazon has introduced Nova Act, a new AI model and SDK (software development kit) designed to help developers build agents that interact directly with web browsers—completing tasks with minimal human oversight.

Available now as a research preview through nova.amazon.com, the Nova Act SDK allows developers to experiment with creating agents that can perform real-world browser tasks. Early examples include submitting out-of-office requests, adding calendar holds, or setting automated email responses.

Unlike traditional LLM-based agents focused on language or retrieval, Nova Act is built for real interaction—executing multi-step workflows that involve complex UI (user interface) navigation. The SDK enables developers to break down tasks into reliable atomic commands—simple, precise actions like search, checkout, or clicking a button—that can be combined into more complex workflows. Developers can also add fine-grained instructions to these commands, such as “don’t accept the insurance upsell,” for greater precision and control.

To improve reliability, Nova Act supports direct browser manipulation via Playwright—helpful for more sensitive actions like entering passwords or navigating tricky UI elements. Developers can interleave Python code into their agents to insert breakpoints, write tests, run assertions, or use thread pools for parallel execution—an important feature given that browser-based agents are often slowed down by page load times.

Key Features of Nova Act

Browser-Based Task Execution: Agents can complete tasks like filling forms, clicking buttons, and navigating popups.
Atomic Commands + Custom Instructions: Developers can break tasks into simple steps and fine-tune them with detailed guidance.
Code Interleaving: Python snippets can be added for flexibility, debugging, and speed.
API Integration: Agents can alternate between browser interactions and direct API calls for reliability.
Headless Operation: Once tasks are stable, agents can run in the background or be turned into asynchronous APIs.

Benchmark Performance

Nova Act is built around reliable building blocks that can be composed into more complex workflows. While many agent benchmarks focus on high-level tasks—where even top models typically score only 30% to 60%—Amazon has prioritized reliability in specific, failure-prone UI interactions. Nova Act scores above 90% on internal evaluations involving difficult elements like date pickers, dropdowns, and popups, and leads on benchmarks like ScreenSpot and GroundUI Web, which directly test a model’s ability to take action across real web interfaces.

Amazon reports strong performance in several browser-based agent benchmarks, surpassing competitors like OpenAI and Anthropic’s Claude 3.7 Sonnet on two out of three tests:

A table comparing the performance of Amazon Nova Act, Claude 3.7 Sonnet, and OpenAI CUA on three browser-based agent benchmarks: ScreenSpot Web Text, ScreenSpot Web Icon, and GroundUI Web. Nova Act scores highest on the first two: 0.939 for ScreenSpot Web Text and 0.879 for ScreenSpot Web Icon. On GroundUI Web, Nova Act scores 0.805, slightly below Claude (0.825) and OpenAI (0.823). A footnote explains that benchmarks were conducted internally using simple natural language prompts and APIs from Bedrock and OpenAI.

Amazon Nova Act Outperforms Competitors in Key Browser-Based Benchmarks. Image Source: Amazon

These scores reflect the model’s ability to understand and manipulate web interfaces based on natural language commands—like setting font size or identifying icons.

Real-World Use and Future Vision

Automating Routine Tasks with Scheduling

Nova Act’s focus on reliability means agents don’t need constant supervision once they’re set up. Developers can enable headless mode, turn agents into APIs that integrate directly into products, or schedule them to run asynchronously in the background. One early example: an agent that automatically orders a salad for delivery every Tuesday at dinner.

Generalizing to Unseen Environments

Despite not being trained on certain types of interfaces, Nova Act has shown promising results in novel environments—including basic web games. These early checkpoints suggest the model is capable of transferring UI understanding beyond its original training data.

Integration with Alexa+ for Real-World Browsing

Nova Act is already being used within Alexa+ to handle web-based tasks when traditional APIs fall short. In these cases, the agent navigates websites in a self-directed way to complete actions on behalf of the user, showcasing its ability to adapt and execute across real-world interfaces.

What This Means

Nova Act represents a significant leap toward practical AI agents that can operate within real digital environments—not just respond to text. As the limitations of API-only integrations become clear, models that can directly manipulate browser interfaces may become essential to next-generation productivity tools, enterprise automation, and consumer applications.

By focusing on atomic reliability and composability, Amazon is emphasizing stability over hype—aiming to provide dependable building blocks for agents that can eventually handle complex, unsupervised workflows. While still in early stages, Nova Act reflects Amazon’s broader vision of agents that don't just understand tasks—but actually do them.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.