Kernel browsers expose three control primitives. For agents, we recommend computer use — the primitives match how computer-use models were trained to drive a computer, and they sidestep the bot-detection surface that CDP introduces.Documentation Index
Fetch the complete documentation index at: https://tbd-6fc993ce-hypeship-intro-create-control-observe.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- Computer Use
- CDP
- WebDriver BiDi
Kernel’s Computer Controls API exposes OS-level mouse, keyboard, and screen primitives — the surface a computer-use model already knows how to drive (screenshot, click, type, key, scroll, drag). No CDP or WebDriver connection required, so there’s no protocol fingerprint to leak. Ideal for Claude, OpenAI, or Gemini computer-use loops.
Why computer use for agents
Kernel’s computer controls are built to match how computer-use models were trained — the same primitives the model emits (screenshot, click at coords, type, key, scroll, drag) map 1:1 onto the API. There’s no harness translating model output into framework calls.- Native fit. Screenshot, click, type, key, scroll, drag — the primitives the model already speaks. Kernel uses these same controls in its own managed auth agent.
- Faster screenshots. Captures bypass CDP, which removes the largest source of latency in a vision loop.
- Better against bot detection. No CDP connection means no CDP fingerprint to leak. Pairs naturally with stealth mode and residential proxies.
- Human-like input. OS-level events with Bézier-curve mouse paths, variable typing speed, and configurable mistype rate.
- Not DOM-limited. Screenshots capture the full VM, so the agent can see and interact with native dialogs, canvas elements, iframes, and PDFs — not just things you can address with a selector.
Computer use + Playwright execution
The two things computer controls don’t do natively: read the DOM, and take a full-page screenshot. The recommended pattern for agents is computer controls for interaction, Playwright execution as a DOM-reading tool when the agent needs structured data. Playwright execution runs arbitrary Playwright code in a fresh context inside the browser’s VM. Your agent can call it as a tool whenever it needs structured DOM data or a full-page capture, then go right back to driving with computer controls. It ships with Patchright by default, so DOM-side calls are hardened against bot detection too.Going deeper
- Computer Controls reference — every mouse, keyboard, and screen primitive.
- Playwright Execution reference — the full execution surface, return values, and timeouts.
- Computer use integrations — drop-in examples for Anthropic, Gemini, OpenAI, and more.