Browser - spaceduck

What it does

Launches a headless Chromium browser that the assistant can control — navigate to URLs, click elements, fill forms, execute JavaScript, scroll pages, and read content. Uses accessibility snapshots to give the LLM numbered element references instead of fragile CSS selectors.

Requirements

Install Playwright and Chromium (one-time):

bunx playwright install chromium

The browser tool is automatically registered at gateway startup when Playwright is available.

How the assistant uses it

The browser exposes seven actions, each registered as a separate tool:

Action	Tool name	What it does
Navigate	`browser_navigate`	Go to a URL
Snapshot	`browser_snapshot`	Get the current page structure as numbered element refs
Click	`browser_click`	Click an element by its ref number
Type	`browser_type`	Type text into an input (append or replace with `clear: true`)
Scroll	`browser_scroll`	Scroll the page in a direction (`up`, `down`, `left`, `right`)
Wait	`browser_wait`	Wait for a selector, URL pattern, page state, JS condition, or fixed delay
Evaluate	`browser_evaluate`	Execute JavaScript in the page context and return the result

Typical browsing flow

Assistant calls browser_navigate to open a page
Assistant calls browser_snapshot to see the page structure
Assistant identifies the right element by its ref number
Assistant calls browser_click or browser_type to interact
Assistant takes another snapshot if the page changed

Extracting structured data

For JavaScript-heavy pages (product listings, search results, data tables), browser_evaluate is often faster than repeated snapshot/scroll cycles:

return JSON.stringify(
  [...document.querySelectorAll('.product-card')].map(el => ({
    name: el.querySelector('h2')?.textContent?.trim(),
    price: el.querySelector('.price')?.textContent?.trim(),
  }))
);

Scripts with top-level return statements are automatically wrapped in an IIFE, so you can write return ... directly without wrapping in a function.

Scrolling

browser_scroll scrolls the viewport by a pixel amount (default 500px). Useful for loading lazy content or reaching elements below the fold:

Parameter	Type	Default	Description
`direction`	`"up" \| "down" \| "left" \| "right"`	required	Scroll direction
`amount`	number	500	Pixels to scroll

Wait conditions

browser_wait supports multiple strategies:

Parameter	What it does
`timeMs`	Simple delay in milliseconds — best for SPAs
`selector`	Wait for a CSS selector to appear in the DOM
`url`	Wait for the URL to match a pattern
`state`	Wait for a page load state (`load`, `domcontentloaded`)
`jsCondition`	Wait for a JavaScript expression to evaluate to truthy

Avoid state: "networkidle" on single-page applications — SPAs never stop making requests and it will timeout.

Configuration

Setting	Default	Description
`headless`	`true`	Run without a visible browser window
`maxResultChars`	50,000	Max characters returned from snapshots and evaluate
`defaultTimeout`	30,000 ms	Timeout for page actions

Live Preview

When enabled, a real-time browser view panel appears on the right side of the chat while the assistant is browsing. You can watch navigation, clicks, form fills, and page transitions as they happen. Live Preview uses Chrome DevTools Protocol screencast to stream compressed JPEG frames directly from Chromium — the same technology that powers Chrome’s remote debugging. Frames are only pushed when the page visually changes, so bandwidth is naturally throttled.

Enabling Live Preview

Live Preview is disabled by default. To turn it on:

Go to Settings → Tools → Browser
Toggle Live Preview on

Or set it via the config file:

{
  tools: {
    browser: {
      enabled: true,
      livePreview: true
    }
  }
}

The setting is hot-applied — no gateway restart required.

Live Preview adds ~20–50 KB per frame over the WebSocket connection. On typical browsing sessions this is well within normal bandwidth, but you may want to disable it on very slow connections.

When to use browser vs web fetch

Use browser when	Use web fetch when
Page requires JavaScript to render	Page is server-rendered HTML
You need to click or fill forms	You just need the text content
Content is behind interactions	Content is at a direct URL
You need to navigate through pages	Single page read is enough
You need to extract structured data from JS-rendered DOM	Static HTML with clean structure

​What it does

​Requirements

​How the assistant uses it

​Typical browsing flow

​Extracting structured data

​Scrolling

​Wait conditions

​Configuration

​Live Preview

​Enabling Live Preview

​When to use browser vs web fetch

What it does

Requirements

How the assistant uses it

Typical browsing flow

Extracting structured data

Scrolling

Wait conditions

Configuration

Live Preview

Enabling Live Preview

When to use browser vs web fetch