What it does
Launches a headless Chromium browser that the assistant can control — navigate to URLs, click elements, fill forms, execute JavaScript, scroll pages, and read content. Uses accessibility snapshots to give the LLM numbered element references instead of fragile CSS selectors.
Requirements
Install Playwright and Chromium (one-time):
bunx playwright install chromium
The browser tool is automatically registered at gateway startup when Playwright is available.
How the assistant uses it
The browser exposes seven actions, each registered as a separate tool:
| Action | Tool name | What it does |
|---|
| Navigate | browser_navigate | Go to a URL |
| Snapshot | browser_snapshot | Get the current page structure as numbered element refs |
| Click | browser_click | Click an element by its ref number |
| Type | browser_type | Type text into an input (append or replace with clear: true) |
| Scroll | browser_scroll | Scroll the page in a direction (up, down, left, right) |
| Wait | browser_wait | Wait for a selector, URL pattern, page state, JS condition, or fixed delay |
| Evaluate | browser_evaluate | Execute JavaScript in the page context and return the result |
Typical browsing flow
- Assistant calls
browser_navigate to open a page
- Assistant calls
browser_snapshot to see the page structure
- Assistant identifies the right element by its ref number
- Assistant calls
browser_click or browser_type to interact
- Assistant takes another snapshot if the page changed
For JavaScript-heavy pages (product listings, search results, data tables), browser_evaluate is often faster than repeated snapshot/scroll cycles:
return JSON.stringify(
[...document.querySelectorAll('.product-card')].map(el => ({
name: el.querySelector('h2')?.textContent?.trim(),
price: el.querySelector('.price')?.textContent?.trim(),
}))
);
Scripts with top-level return statements are automatically wrapped in an IIFE, so you can write return ... directly without wrapping in a function.
browser_scroll scrolls the viewport by a pixel amount (default 500px). Useful for loading lazy content or reaching elements below the fold:
| Parameter | Type | Default | Description |
|---|
direction | "up" | "down" | "left" | "right" | required | Scroll direction |
amount | number | 500 | Pixels to scroll |
Wait conditions
browser_wait supports multiple strategies:
| Parameter | What it does |
|---|
timeMs | Simple delay in milliseconds — best for SPAs |
selector | Wait for a CSS selector to appear in the DOM |
url | Wait for the URL to match a pattern |
state | Wait for a page load state (load, domcontentloaded) |
jsCondition | Wait for a JavaScript expression to evaluate to truthy |
Avoid state: "networkidle" on single-page applications — SPAs never stop making requests and it will timeout.
Configuration
| Setting | Default | Description |
|---|
headless | true | Run without a visible browser window |
maxResultChars | 50,000 | Max characters returned from snapshots and evaluate |
defaultTimeout | 30,000 ms | Timeout for page actions |
Live Preview
When enabled, a real-time browser view panel appears on the right side of the chat while the assistant is browsing. You can watch navigation, clicks, form fills, and page transitions as they happen.
Live Preview uses Chrome DevTools Protocol screencast to stream compressed JPEG frames directly from Chromium — the same technology that powers Chrome’s remote debugging. Frames are only pushed when the page visually changes, so bandwidth is naturally throttled.
Enabling Live Preview
Live Preview is disabled by default. To turn it on:
- Go to Settings → Tools → Browser
- Toggle Live Preview on
Or set it via the config file:
{
tools: {
browser: {
enabled: true,
livePreview: true
}
}
}
The setting is hot-applied — no gateway restart required.
Live Preview adds ~20–50 KB per frame over the WebSocket connection. On typical browsing sessions this is well within normal bandwidth, but you may want to disable it on very slow connections.
When to use browser vs web fetch
| Use browser when | Use web fetch when |
|---|
| Page requires JavaScript to render | Page is server-rendered HTML |
| You need to click or fill forms | You just need the text content |
| Content is behind interactions | Content is at a direct URL |
| You need to navigate through pages | Single page read is enough |
| You need to extract structured data from JS-rendered DOM | Static HTML with clean structure |