Agent-Browser: Revolutionizing Browser Automation for AI Agents

SkillzWave Team
Agent-Browser: Revolutionizing Browser Automation for AI Agents

In the rapidly evolving world of AI-driven development, tools that bridge the gap between intelligent agents and real-world web interactions are becoming essential. Enter agent-browser, an open-source headless browser automation CLI developed by Vercel Labs. This innovative tool is specifically designed to empower AI agents with the ability to navigate, interact with, and extract data from web pages in a seamless, programmatic manner.

Hosted on GitHub with over 9.7k stars and 508 forks, it's gaining traction among developers building AI-powered applications that need reliable web automation.

What is Agent-Browser?

Agent-browser is a command-line interface (CLI) that allows AI agents to perform browser automation tasks without the need for a visible browser window. Built with a fast Rust-based core and a Node.js fallback for broader compatibility, it leverages the Playwright library to control browsers like Chromium, Firefox, and WebKit.

What sets it apart is its AI-first design: instead of relying on fragile CSS selectors or XPath, it uses semantic locators and generates snapshots of the page's accessibility tree. These snapshots include stable element references (like @e1 or @e2), making interactions deterministic and reliable for AI systems.

This tool is particularly useful for scenarios where AI agents need to automate web tasks, such as filling forms, clicking buttons, or scraping content. It's compatible with popular AI platforms on all supported platforms including Claude Code, Cursor, Codex, Copilot, Gemini, and OpenCode, enabling easy integration into existing workflows.

Key Features

Agent-browser boasts an extensive set of over 50 commands, categorized into navigation, interaction, inspection, and control functions. Here's a breakdown of some standout features:

Semantic Locators and Snapshots

Find elements by ARIA roles, text, labels, placeholders, or other attributes. The snapshot command outputs a machine-readable accessibility tree with unique refs, allowing AI agents to reference elements without re-querying the DOM.

# Get accessibility tree snapshot
agent-browser snapshot

# Output includes stable refs like:
# [ref=e1] heading "Welcome"
# [ref=e2] button "Sign In"
# [ref=e3] textbox "Email"

Interaction Commands

Use click, fill, type, hover, and more to simulate user actions:

# Click using snapshot reference
agent-browser click @e2

# Fill a form field
agent-browser fill @e3 "test@example.com"

# Type with keyboard simulation
agent-browser type @e4 "Hello World"

Waiting Mechanisms

Commands like wait --text "Welcome" or wait --url "**/dash" ensure actions proceed only when conditions are met, handling dynamic web content effectively:

# Wait for text to appear
agent-browser wait --text "Welcome"

# Wait for URL pattern
agent-browser wait --url "**/dashboard"

# Wait for element to be visible
agent-browser wait --visible @e5

Session Management

Support for multiple isolated sessions and persistent profiles (via --profile) allows reusing authenticated states, like logged-in sessions, across runs:

# Start with persistent profile
agent-browser open example.com --profile my-profile

# The profile retains cookies, localStorage, etc.

Debugging and Output

Take screenshots, save PDFs, record traces, and stream the browser viewport via WebSocket for live previews. The --json flag outputs responses in a format ideal for AI parsing:

# Screenshot
agent-browser screenshot page.png

# PDF export
agent-browser pdf document.pdf

# JSON output for AI consumption
agent-browser snapshot --json

Advanced Integrations

Connect to cloud browser services like Browserbase or Browser Use for serverless execution, and even integrate with existing browsers via the Chrome DevTools Protocol.

Architecture: Client-Daemon Design

The tool's client-daemon architecture ensures efficiency:

  1. Rust CLI - Parses commands and communicates with the daemon
  2. Node.js Daemon - Manages the browser instance via Playwright
  3. Cross-platform - Works on macOS, Linux, and Windows

This separation means the browser stays warm between commands, making sequential operations fast.

Installation and Setup

Getting started with agent-browser is straightforward:

# Install via npm
npm install -g agent-browser

# Install browser (Chromium)
agent-browser install

For Linux users, additional dependencies might be needed:

agent-browser install --with-deps

Building from source requires cloning the GitHub repo, installing Rust, and running build commands with pnpm.

Installing the Skill with Skilz CLI

The skill lives in Vercel's agent-browser repository, which contains the agent-browser skill. The Skilz CLI clones the entire repository and prompts you to select the desired skill. Since this repository primarily features one main skill, the prompt may list it directly or install it automatically.

Important: Direct subpaths like vercel-labs/agent-browser/skills/agent-browser won't work. Use the repository root instead:

# Install Skilz CLI first
pip install skilz

# Install from Vercel's agent-browser repository
skilz install -g vercel-labs/agent-browser/

You'll see an interactive prompt similar to:

Found 1 skill in repository:

  [1] agent-browser  (skills/agent-browser)
  [A] Install all
  [Q] Cancel

Select skill(s) [1, A, Q]: 1

Enter 1 to install the Agent-Browser skill.

Agent-Specific Installation

The skill works across all 21+ supported agents. Here are the commands for popular agents:

Claude Code:

# User-level (available in all projects)
skilz install -g vercel-labs/agent-browser/ --agent claude

# Project-level only
skilz install -g vercel-labs/agent-browser/ --project --agent claude

OpenCode:

# User-level
skilz install -g vercel-labs/agent-browser/ --agent opencode

# Project-level
skilz install -g vercel-labs/agent-browser/ --project --agent opencode

Gemini CLI (project-level only):

skilz install -g vercel-labs/agent-browser/ --project --agent gemini

OpenAI Codex:

skilz install -g vercel-labs/agent-browser/ --agent codex

Other Agents (Windsurf, Qwen Code, Aider, etc.):

skilz install -g vercel-labs/agent-browser/ --agent <name>

Note: The -g flag is required when using GitHub shorthand paths like vercel-labs/agent-browser/. You only omit -g when using full URLs starting with https:// or git://.

For complete Skilz CLI documentation, see the Skilz CLI docs.

Skill Structure and Command Categories

The agent-browser skill is defined in SKILL.md with YAML frontmatter specifying its name, description, and allowed tools (Bash with agent-browser commands). The skill automates browser interactions for tasks like web testing, form filling, screenshots, and data extraction.

The commands are organized into categories for easy reference by AI agents:

Commands for opening URLs, going back/forward, reloading, and closing:

agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload
agent-browser close

Snapshot (Page Analysis)

Generates accessibility trees or interactive element refs:

# Full accessibility tree
agent-browser snapshot

# Interactive elements only with stable refs like @e1
agent-browser snapshot -i

Interaction Commands

Simulates user actions using refs:

agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser press Enter
agent-browser hover @e3
agent-browser select @e4 "option"

Get Information

Extracts data from elements or pages:

agent-browser get text @e1
agent-browser get url
agent-browser get title
agent-browser get html @e1

Check State

Verifies element properties:

agent-browser is visible @e1
agent-browser is enabled @e2
agent-browser is checked @e3

Screenshots & PDF

Captures visuals:

agent-browser screenshot path.png
agent-browser screenshot --fullpage full.png
agent-browser pdf output.pdf

Video Recording

Records sessions:

agent-browser record start demo.webm
agent-browser record stop

Wait Commands

Handles timing for dynamic content:

agent-browser wait --text "Success"
agent-browser wait --url "**/dashboard"
agent-browser wait --visible @e1
agent-browser wait --hidden @e2

Mouse Control

Precise mouse actions:

agent-browser mouse move 100 200
agent-browser mouse click
agent-browser mouse drag 100 200 300 400

Semantic Locators

Finds elements without refs:

agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@example.com"
agent-browser find placeholder "Search..." type "query"

Browser Settings

Configures viewport, device emulation, etc.:

agent-browser set viewport 1920 1080
agent-browser set device "iPhone 12"
agent-browser set geolocation 37.7749 -122.4194

Cookies & Storage

Manages browser data:

agent-browser cookies get
agent-browser cookies set name value
agent-browser cookies clear
agent-browser storage get key
agent-browser storage set key value

Network

Intercepts and mocks requests:

agent-browser network route <url> --abort
agent-browser network route <url> --mock '{"data": "test"}'
agent-browser network offline

Tabs & Windows

Handles multiple contexts:

agent-browser tab new [url]
agent-browser tab list
agent-browser tab switch 2
agent-browser tab close

Frames

Switches to iframes:

agent-browser frame "#iframe"
agent-browser frame main

Dialogs

Handles alerts and prompts:

agent-browser dialog accept
agent-browser dialog dismiss
agent-browser dialog accept --text "input"

JavaScript Execution

Executes code in page context:

agent-browser eval "document.title"
agent-browser eval "window.scrollTo(0, 1000)"

Debugging

Tools for headed mode, tracing, and highlighting:

agent-browser --headed open example.com
agent-browser trace start
agent-browser trace stop trace.zip
agent-browser highlight @e1

Global Options

OptionDescription
--jsonParsable JSON output for AI consumption
--headedRun with visible browser window
--profile <name>Use persistent browser profile
--timeout <ms>Command timeout in milliseconds

Usage Examples: A Complete Workflow

Let's walk through a typical AI agent workflow:

# 1. Open a page
agent-browser open example.com

# 2. Get a snapshot (outputs elements with refs)
agent-browser snapshot
# Output: [ref=e1] heading "Example Domain"
#         [ref=e2] link "More information..."

# 3. Click a link
agent-browser click @e2

# 4. Fill a form
agent-browser fill @e3 "test@example.com"

# 5. Extract text
agent-browser get text @e1

# 6. Take a screenshot
agent-browser screenshot page.png

# 7. Close the browser
agent-browser close

Semantic Interactions

For more natural interactions without needing snapshot refs:

# Find by role and click
agent-browser find role button click --name "Submit"

# Find by label and fill
agent-browser find label "Email" fill "test@test.com"

# Find by placeholder
agent-browser find placeholder "Search..." type "AI agents"

JSON Output for AI Agents

In agent mode with --json, outputs are structured for direct AI consumption:

agent-browser snapshot --json

This returns structured data perfect for chaining with large language models like Claude.

Integration with Claude Code

Agent-browser includes a Claude plugin for easy addition to Claude Code via the plugin marketplace:

# Add agent-browser plugin
/plugin marketplace add vercel-labs/agent-browser
/plugin install agent-browser

Once installed, Claude can directly invoke browser automation commands during coding sessions. This is particularly powerful for:

  • Testing workflows - Automate end-to-end testing
  • Data extraction - Scrape structured data from websites
  • Form automation - Fill and submit forms programmatically
  • Screenshot documentation - Capture UI states automatically

Cloud Browser Integration

For serverless execution, agent-browser integrates with cloud browser services:

Browserbase

# Set API key
export BROWSERBASE_API_KEY=your-key

# Connect to Browserbase
agent-browser connect --browserbase

Browser Use

# Connect to Browser Use service
agent-browser connect --browser-use

These integrations enable running browser automation in CI/CD pipelines without local browser installations.

Why Agent-Browser Matters for AI Development

In an era where AI agents are automating more complex tasks, tools like agent-browser provide the missing link for web interactions. By focusing on reliability, speed, and AI compatibility, it lowers the barrier for building sophisticated agents that can handle real-world browsing.

Key Benefits

FeatureBenefit
Semantic LocatorsNo fragile CSS selectors
Accessibility TreeStable element references
JSON OutputDirect AI consumption
Session PersistenceMaintain login states
Cross-PlatformWorks everywhere
Cloud IntegrationServerless execution

Use Cases

  • Automated Testing - E2E tests driven by AI agents
  • Data Extraction - Intelligent web scraping
  • Form Automation - Automated data entry
  • Interactive AI Apps - Agents that browse the web
  • Documentation - Automated screenshot capture

Community and Development

The project is actively maintained with:

  • 121+ commits on the main branch
  • Contributions from 42+ developers
  • Apache-2.0 license (free to use, fork, and contribute)

Learn More

SkillzWave Resources:

External Resources:

Getting Started Today

Whether you're developing automated testing scripts, data extraction bots, or interactive AI experiences, agent-browser is a game-changer. The combination of semantic locators, accessibility tree snapshots, and AI-first design makes it the ideal tool for building reliable browser automation into your AI workflows.

# Quick start
npm install -g agent-browser
agent-browser install
agent-browser open google.com
agent-browser snapshot

Check out the GitHub repo to dive in and start automating today.

Discover AI Agent Skills

Browse our marketplace of 41,000+ Claude Code skills, agents, and tools. Find the perfect skill for your workflow or submit your own.