- AiNews.com
- Posts
- Amazon Debuts Nova Act for Building Browser-Based AI Agents
Amazon Debuts Nova Act for Building Browser-Based AI Agents

Image Source: ChatGPT-4o
Amazon Debuts Nova Act for Building Browser-Based AI Agents
Amazon has introduced Nova Act, a new AI model and SDK (software development kit) designed to help developers build agents that interact directly with web browsers—completing tasks with minimal human oversight.
Available now as a research preview through nova.amazon.com, the Nova Act SDK allows developers to experiment with creating agents that can perform real-world browser tasks. Early examples include submitting out-of-office requests, adding calendar holds, or setting automated email responses.
Unlike traditional LLM-based agents focused on language or retrieval, Nova Act is built for real interaction—executing multi-step workflows that involve complex UI (user interface) navigation. The SDK enables developers to break down tasks into reliable atomic commands—simple, precise actions like search, checkout, or clicking a button—that can be combined into more complex workflows. Developers can also add fine-grained instructions to these commands, such as “don’t accept the insurance upsell,” for greater precision and control.
To improve reliability, Nova Act supports direct browser manipulation via Playwright—helpful for more sensitive actions like entering passwords or navigating tricky UI elements. Developers can interleave Python code into their agents to insert breakpoints, write tests, run assertions, or use thread pools for parallel execution—an important feature given that browser-based agents are often slowed down by page load times.
Key Features of Nova Act
Browser-Based Task Execution: Agents can complete tasks like filling forms, clicking buttons, and navigating popups.
Atomic Commands + Custom Instructions: Developers can break tasks into simple steps and fine-tune them with detailed guidance.
Code Interleaving: Python snippets can be added for flexibility, debugging, and speed.
API Integration: Agents can alternate between browser interactions and direct API calls for reliability.
Headless Operation: Once tasks are stable, agents can run in the background or be turned into asynchronous APIs.
Benchmark Performance
Nova Act is built around reliable building blocks that can be composed into more complex workflows. While many agent benchmarks focus on high-level tasks—where even top models typically score only 30% to 60%—Amazon has prioritized reliability in specific, failure-prone UI interactions. Nova Act scores above 90% on internal evaluations involving difficult elements like date pickers, dropdowns, and popups, and leads on benchmarks like ScreenSpot and GroundUI Web, which directly test a model’s ability to take action across real web interfaces.
Amazon reports strong performance in several browser-based agent benchmarks, surpassing competitors like OpenAI and Anthropic’s Claude 3.7 Sonnet on two out of three tests:
These scores reflect the model’s ability to understand and manipulate web interfaces based on natural language commands—like setting font size or identifying icons.
Real-World Use and Future Vision
Automating Routine Tasks with Scheduling
Nova Act’s focus on reliability means agents don’t need constant supervision once they’re set up. Developers can enable headless mode, turn agents into APIs that integrate directly into products, or schedule them to run asynchronously in the background. One early example: an agent that automatically orders a salad for delivery every Tuesday at dinner.
Generalizing to Unseen Environments
Despite not being trained on certain types of interfaces, Nova Act has shown promising results in novel environments—including basic web games. These early checkpoints suggest the model is capable of transferring UI understanding beyond its original training data.
Integration with Alexa+ for Real-World Browsing
Nova Act is already being used within Alexa+ to handle web-based tasks when traditional APIs fall short. In these cases, the agent navigates websites in a self-directed way to complete actions on behalf of the user, showcasing its ability to adapt and execute across real-world interfaces.
What This Means
Nova Act represents a significant leap toward practical AI agents that can operate within real digital environments—not just respond to text. As the limitations of API-only integrations become clear, models that can directly manipulate browser interfaces may become essential to next-generation productivity tools, enterprise automation, and consumer applications.
By focusing on atomic reliability and composability, Amazon is emphasizing stability over hype—aiming to provide dependable building blocks for agents that can eventually handle complex, unsupervised workflows. While still in early stages, Nova Act reflects Amazon’s broader vision of agents that don't just understand tasks—but actually do them.
Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.