Agent-Browser: Revolutionizing Browser Automation for AI Agents

In the rapidly evolving world of AI-driven development, tools that bridge the gap between intelligent agents and real-world web interactions are becoming essential. Enter agent-browser, an open-source headless browser automation CLI developed by Vercel Labs. This innovative tool is specifically designed to empower AI agents with the ability to navigate, interact with, and extract data from web pages in a seamless, programmatic manner.
Hosted on GitHub with over 9.7k stars and 508 forks, it's gaining traction among developers building AI-powered applications that need reliable web automation.
What is Agent-Browser?
Agent-browser is a command-line interface (CLI) that allows AI agents to perform browser automation tasks without the need for a visible browser window. Built with a fast Rust-based core and a Node.js fallback for broader compatibility, it leverages the Playwright library to control browsers like Chromium, Firefox, and WebKit.
What sets it apart is its AI-first design: instead of relying on fragile CSS selectors or XPath, it uses semantic locators and generates snapshots of the page's accessibility tree. These snapshots include stable element references (like @e1 or @e2), making interactions deterministic and reliable for AI systems.
This tool is particularly useful for scenarios where AI agents need to automate web tasks, such as filling forms, clicking buttons, or scraping content. It's compatible with popular AI platforms on all supported platforms including Claude Code, Cursor, Codex, Copilot, Gemini, and OpenCode, enabling easy integration into existing workflows.
Key Features
Agent-browser boasts an extensive set of over 50 commands, categorized into navigation, interaction, inspection, and control functions. Here's a breakdown of some standout features:
Semantic Locators and Snapshots
Find elements by ARIA roles, text, labels, placeholders, or other attributes. The snapshot command outputs a machine-readable accessibility tree with unique refs, allowing AI agents to reference elements without re-querying the DOM.
# Get accessibility tree snapshot
agent-browser snapshot
# Output includes stable refs like:
# [ref=e1] heading "Welcome"
# [ref=e2] button "Sign In"
# [ref=e3] textbox "Email"
Interaction Commands
Use click, fill, type, hover, and more to simulate user actions:
# Click using snapshot reference
agent-browser click @e2
# Fill a form field
agent-browser fill @e3 "test@example.com"
# Type with keyboard simulation
agent-browser type @e4 "Hello World"
Waiting Mechanisms
Commands like wait --text "Welcome" or wait --url "**/dash" ensure actions proceed only when conditions are met, handling dynamic web content effectively:
# Wait for text to appear
agent-browser wait --text "Welcome"
# Wait for URL pattern
agent-browser wait --url "**/dashboard"
# Wait for element to be visible
agent-browser wait --visible @e5
Session Management
Support for multiple isolated sessions and persistent profiles (via --profile) allows reusing authenticated states, like logged-in sessions, across runs:
# Start with persistent profile
agent-browser open example.com --profile my-profile
# The profile retains cookies, localStorage, etc.
Debugging and Output
Take screenshots, save PDFs, record traces, and stream the browser viewport via WebSocket for live previews. The --json flag outputs responses in a format ideal for AI parsing:
# Screenshot
agent-browser screenshot page.png
# PDF export
agent-browser pdf document.pdf
# JSON output for AI consumption
agent-browser snapshot --json
Advanced Integrations
Connect to cloud browser services like Browserbase or Browser Use for serverless execution, and even integrate with existing browsers via the Chrome DevTools Protocol.
Architecture: Client-Daemon Design
The tool's client-daemon architecture ensures efficiency:
- Rust CLI - Parses commands and communicates with the daemon
- Node.js Daemon - Manages the browser instance via Playwright
- Cross-platform - Works on macOS, Linux, and Windows
This separation means the browser stays warm between commands, making sequential operations fast.
Installation and Setup
Getting started with agent-browser is straightforward:
# Install via npm
npm install -g agent-browser
# Install browser (Chromium)
agent-browser install
For Linux users, additional dependencies might be needed:
agent-browser install --with-deps
Building from source requires cloning the GitHub repo, installing Rust, and running build commands with pnpm.
Installing the Skill with Skilz CLI
The skill lives in Vercel's agent-browser repository, which contains the agent-browser skill. The Skilz CLI clones the entire repository and prompts you to select the desired skill. Since this repository primarily features one main skill, the prompt may list it directly or install it automatically.
Important: Direct subpaths like vercel-labs/agent-browser/skills/agent-browser won't work. Use the repository root instead:
# Install Skilz CLI first
pip install skilz
# Install from Vercel's agent-browser repository
skilz install -g vercel-labs/agent-browser/
You'll see an interactive prompt similar to:
Found 1 skill in repository:
[1] agent-browser (skills/agent-browser)
[A] Install all
[Q] Cancel
Select skill(s) [1, A, Q]: 1
Enter 1 to install the Agent-Browser skill.
Agent-Specific Installation
The skill works across all 21+ supported agents. Here are the commands for popular agents:
# User-level (available in all projects)
skilz install -g vercel-labs/agent-browser/ --agent claude
# Project-level only
skilz install -g vercel-labs/agent-browser/ --project --agent claude
# User-level
skilz install -g vercel-labs/agent-browser/ --agent opencode
# Project-level
skilz install -g vercel-labs/agent-browser/ --project --agent opencode
Gemini CLI (project-level only):
skilz install -g vercel-labs/agent-browser/ --project --agent gemini
OpenAI Codex:
skilz install -g vercel-labs/agent-browser/ --agent codex
Other Agents (Windsurf, Qwen Code, Aider, etc.):
skilz install -g vercel-labs/agent-browser/ --agent <name>
Note: The -g flag is required when using GitHub shorthand paths like vercel-labs/agent-browser/. You only omit -g when using full URLs starting with https:// or git://.
For complete Skilz CLI documentation, see the Skilz CLI docs.
Skill Structure and Command Categories
The agent-browser skill is defined in SKILL.md with YAML frontmatter specifying its name, description, and allowed tools (Bash with agent-browser commands). The skill automates browser interactions for tasks like web testing, form filling, screenshots, and data extraction.
The commands are organized into categories for easy reference by AI agents:
Navigation Commands
Commands for opening URLs, going back/forward, reloading, and closing:
agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload
agent-browser close
Snapshot (Page Analysis)
Generates accessibility trees or interactive element refs:
# Full accessibility tree
agent-browser snapshot
# Interactive elements only with stable refs like @e1
agent-browser snapshot -i
Interaction Commands
Simulates user actions using refs:
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser press Enter
agent-browser hover @e3
agent-browser select @e4 "option"
Get Information
Extracts data from elements or pages:
agent-browser get text @e1
agent-browser get url
agent-browser get title
agent-browser get html @e1
Check State
Verifies element properties:
agent-browser is visible @e1
agent-browser is enabled @e2
agent-browser is checked @e3
Screenshots & PDF
Captures visuals:
agent-browser screenshot path.png
agent-browser screenshot --fullpage full.png
agent-browser pdf output.pdf
Video Recording
Records sessions:
agent-browser record start demo.webm
agent-browser record stop
Wait Commands
Handles timing for dynamic content:
agent-browser wait --text "Success"
agent-browser wait --url "**/dashboard"
agent-browser wait --visible @e1
agent-browser wait --hidden @e2
Mouse Control
Precise mouse actions:
agent-browser mouse move 100 200
agent-browser mouse click
agent-browser mouse drag 100 200 300 400
Semantic Locators
Finds elements without refs:
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@example.com"
agent-browser find placeholder "Search..." type "query"
Browser Settings
Configures viewport, device emulation, etc.:
agent-browser set viewport 1920 1080
agent-browser set device "iPhone 12"
agent-browser set geolocation 37.7749 -122.4194
Cookies & Storage
Manages browser data:
agent-browser cookies get
agent-browser cookies set name value
agent-browser cookies clear
agent-browser storage get key
agent-browser storage set key value
Network
Intercepts and mocks requests:
agent-browser network route <url> --abort
agent-browser network route <url> --mock '{"data": "test"}'
agent-browser network offline
Tabs & Windows
Handles multiple contexts:
agent-browser tab new [url]
agent-browser tab list
agent-browser tab switch 2
agent-browser tab close
Frames
Switches to iframes:
agent-browser frame "#iframe"
agent-browser frame main
Dialogs
Handles alerts and prompts:
agent-browser dialog accept
agent-browser dialog dismiss
agent-browser dialog accept --text "input"
JavaScript Execution
Executes code in page context:
agent-browser eval "document.title"
agent-browser eval "window.scrollTo(0, 1000)"
Debugging
Tools for headed mode, tracing, and highlighting:
agent-browser --headed open example.com
agent-browser trace start
agent-browser trace stop trace.zip
agent-browser highlight @e1
Global Options
| Option | Description |
|---|---|
--json | Parsable JSON output for AI consumption |
--headed | Run with visible browser window |
--profile <name> | Use persistent browser profile |
--timeout <ms> | Command timeout in milliseconds |
Usage Examples: A Complete Workflow
Let's walk through a typical AI agent workflow:
# 1. Open a page
agent-browser open example.com
# 2. Get a snapshot (outputs elements with refs)
agent-browser snapshot
# Output: [ref=e1] heading "Example Domain"
# [ref=e2] link "More information..."
# 3. Click a link
agent-browser click @e2
# 4. Fill a form
agent-browser fill @e3 "test@example.com"
# 5. Extract text
agent-browser get text @e1
# 6. Take a screenshot
agent-browser screenshot page.png
# 7. Close the browser
agent-browser close
Semantic Interactions
For more natural interactions without needing snapshot refs:
# Find by role and click
agent-browser find role button click --name "Submit"
# Find by label and fill
agent-browser find label "Email" fill "test@test.com"
# Find by placeholder
agent-browser find placeholder "Search..." type "AI agents"
JSON Output for AI Agents
In agent mode with --json, outputs are structured for direct AI consumption:
agent-browser snapshot --json
This returns structured data perfect for chaining with large language models like Claude.
Integration with Claude Code
Agent-browser includes a Claude plugin for easy addition to Claude Code via the plugin marketplace:
# Add agent-browser plugin
/plugin marketplace add vercel-labs/agent-browser
/plugin install agent-browser
Once installed, Claude can directly invoke browser automation commands during coding sessions. This is particularly powerful for:
- Testing workflows - Automate end-to-end testing
- Data extraction - Scrape structured data from websites
- Form automation - Fill and submit forms programmatically
- Screenshot documentation - Capture UI states automatically
Cloud Browser Integration
For serverless execution, agent-browser integrates with cloud browser services:
Browserbase
# Set API key
export BROWSERBASE_API_KEY=your-key
# Connect to Browserbase
agent-browser connect --browserbase
Browser Use
# Connect to Browser Use service
agent-browser connect --browser-use
These integrations enable running browser automation in CI/CD pipelines without local browser installations.
Why Agent-Browser Matters for AI Development
In an era where AI agents are automating more complex tasks, tools like agent-browser provide the missing link for web interactions. By focusing on reliability, speed, and AI compatibility, it lowers the barrier for building sophisticated agents that can handle real-world browsing.
Key Benefits
| Feature | Benefit |
|---|---|
| Semantic Locators | No fragile CSS selectors |
| Accessibility Tree | Stable element references |
| JSON Output | Direct AI consumption |
| Session Persistence | Maintain login states |
| Cross-Platform | Works everywhere |
| Cloud Integration | Serverless execution |
Use Cases
- Automated Testing - E2E tests driven by AI agents
- Data Extraction - Intelligent web scraping
- Form Automation - Automated data entry
- Interactive AI Apps - Agents that browse the web
- Documentation - Automated screenshot capture
Community and Development
The project is actively maintained with:
- 121+ commits on the main branch
- Contributions from 42+ developers
- Apache-2.0 license (free to use, fork, and contribute)
Learn More
SkillzWave Resources:
- SkillzWave Documentation
- Agent Configuration Guide
- All Guides
- Supported Platforms
- Claude Skills Part 1
- Claude Skills Part 2
- Claude Skills Concepts
External Resources:
Getting Started Today
Whether you're developing automated testing scripts, data extraction bots, or interactive AI experiences, agent-browser is a game-changer. The combination of semantic locators, accessibility tree snapshots, and AI-first design makes it the ideal tool for building reliable browser automation into your AI workflows.
# Quick start
npm install -g agent-browser
agent-browser install
agent-browser open google.com
agent-browser snapshot
Check out the GitHub repo to dive in and start automating today.
Related Posts
Discover AI Agent Skills
Browse our marketplace of 41,000+ Claude Code skills, agents, and tools. Find the perfect skill for your workflow or submit your own.