Skip to main content

Documentation Index

Fetch the complete documentation index at: https://tbd-6fc993ce-hypeship-intro-create-control-observe.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Kernel browsers expose three control primitives. For agents, we recommend computer use — the primitives match how computer-use models were trained to drive a computer, and they sidestep the bot-detection surface that CDP introduces.
Kernel’s Computer Controls API exposes OS-level mouse, keyboard, and screen primitives — the surface a computer-use model already knows how to drive (screenshot, click, type, key, scroll, drag). No CDP or WebDriver connection required, so there’s no protocol fingerprint to leak. Ideal for Claude, OpenAI, or Gemini computer-use loops.
import Kernel from '@onkernel/sdk';

const kernel = new Kernel();
const kernelBrowser = await kernel.browsers.create();

const screenshot = await kernel.browsers.computer.captureScreenshot(kernelBrowser.session_id);

await kernel.browsers.computer.clickMouse(kernelBrowser.session_id, {
  x: 420,
  y: 280,
});

await kernel.browsers.computer.typeText(kernelBrowser.session_id, {
  text: 'kernel cloud browsers',
});

Why computer use for agents

Kernel’s computer controls are built to match how computer-use models were trained — the same primitives the model emits (screenshot, click at coords, type, key, scroll, drag) map 1:1 onto the API. There’s no harness translating model output into framework calls.
  • Native fit. Screenshot, click, type, key, scroll, drag — the primitives the model already speaks. Kernel uses these same controls in its own managed auth agent.
  • Faster screenshots. Captures bypass CDP, which removes the largest source of latency in a vision loop.
  • Better against bot detection. No CDP connection means no CDP fingerprint to leak. Pairs naturally with stealth mode and residential proxies.
  • Human-like input. OS-level events with Bézier-curve mouse paths, variable typing speed, and configurable mistype rate.
  • Not DOM-limited. Screenshots capture the full VM, so the agent can see and interact with native dialogs, canvas elements, iframes, and PDFs — not just things you can address with a selector.

Computer use + Playwright execution

The two things computer controls don’t do natively: read the DOM, and take a full-page screenshot. The recommended pattern for agents is computer controls for interaction, Playwright execution as a DOM-reading tool when the agent needs structured data. Playwright execution runs arbitrary Playwright code in a fresh context inside the browser’s VM. Your agent can call it as a tool whenever it needs structured DOM data or a full-page capture, then go right back to driving with computer controls. It ships with Patchright by default, so DOM-side calls are hardened against bot detection too.
const response = await kernel.browsers.playwright.execute(
  kernelBrowser.session_id,
  {
    code: `
      const rows = await page.$$eval('table tr', (trs) =>
        trs.map((tr) => Array.from(tr.querySelectorAll('td')).map((td) => td.textContent))
      );
      return rows;
    `,
  },
);

console.log(response.result);

Going deeper