Control

Kernel browsers expose three control primitives. For agents, we recommend computer use — the primitives match how computer-use models were trained to drive a computer, and they sidestep the bot-detection surface that CDP introduces.

Computer Use
CDP
WebDriver BiDi

Kernel’s Computer Controls API exposes OS-level mouse, keyboard, and screen primitives — the surface a computer-use model already knows how to drive (screenshot, click, type, key, scroll, drag). No CDP or WebDriver connection required, so there’s no protocol fingerprint to leak. Ideal for Claude, OpenAI, or Gemini computer-use loops.

import Kernel from '@onkernel/sdk';

const kernel = new Kernel();
const kernelBrowser = await kernel.browsers.create();

const screenshot = await kernel.browsers.computer.captureScreenshot(kernelBrowser.session_id);

await kernel.browsers.computer.clickMouse(kernelBrowser.session_id, {
  x: 420,
  y: 280,
});

await kernel.browsers.computer.typeText(kernelBrowser.session_id, {
  text: 'kernel cloud browsers',
});

Chrome DevTools Protocol — the wire format Playwright, Puppeteer, and most browser frameworks speak. Use cdp_ws_url from the created browser session for deterministic, scripted automation.

import { chromium } from 'playwright';

const browser = await chromium.connectOverCDP(kernelBrowser.cdp_ws_url);
const context = browser.contexts()[0];
const page = context.pages()[0];

await page.goto('https://example.com');
const title = await page.title();
console.log(title);

W3C-standard browser control. Use webdriver_ws_url with Vibium or any other BiDi client.

import { browser } from 'vibium';

const bro = await browser.start(kernelBrowser.webdriver_ws_url);
const page = await bro.page();

await page.goto('https://example.com');
const title = await page.title();
console.log(title);

Why computer use for agents

Kernel’s computer controls are built to match how computer-use models were trained — the same primitives the model emits (screenshot, click at coords, type, key, scroll, drag) map 1:1 onto the API. There’s no harness translating model output into framework calls.

Native fit. Screenshot, click, type, key, scroll, drag — the primitives the model already speaks. Kernel uses these same controls in its own managed auth agent.
Faster screenshots. Captures bypass CDP, which removes the largest source of latency in a vision loop.
Better against bot detection. No CDP connection means no CDP fingerprint to leak. Pairs naturally with stealth mode and residential proxies.
Human-like input. OS-level events with Bézier-curve mouse paths, variable typing speed, and configurable mistype rate.
Not DOM-limited. Screenshots capture the full VM, so the agent can see and interact with native dialogs, canvas elements, iframes, and PDFs — not just things you can address with a selector.

Computer use + Playwright execution

The two things computer controls don’t do natively: read the DOM, and take a full-page screenshot. The recommended pattern for agents is computer controls for interaction, Playwright execution as a DOM-reading tool when the agent needs structured data. Playwright execution runs arbitrary Playwright code in a fresh context inside the browser’s VM. Your agent can call it as a tool whenever it needs structured DOM data or a full-page capture, then go right back to driving with computer controls. It ships with Patchright by default, so DOM-side calls are hardened against bot detection too.

const response = await kernel.browsers.playwright.execute(
  kernelBrowser.session_id,
  {
    code: `
      const rows = await page.$$eval('table tr', (trs) =>
        trs.map((tr) => Array.from(tr.querySelectorAll('td')).map((td) => td.textContent))
      );
      return rows;
    `,
  },
);

console.log(response.result);

Going deeper

Computer Controls reference — every mouse, keyboard, and screen primitive.
Playwright Execution reference — the full execution surface, return values, and timeouts.
Computer use integrations — drop-in examples for Anthropic, Gemini, OpenAI, and more.

Introduction

Working with your browser

deploying your agent

Agent Skills

Integrations

Migrations

Community

Info

Why computer use for agents

Computer use + Playwright execution

Going deeper

Introduction

Working with your browser

deploying your agent

Agent Skills

Integrations

Migrations

Community

Info

Documentation Index

​Why computer use for agents

​Computer use + Playwright execution

​Going deeper

Why computer use for agents

Computer use + Playwright execution

Going deeper