computer/README.md at cua-v0.1.1

mirror of https://github.com/trycua/computer.git synced 2026-04-30 12:11:35 -05:00

Files

T

ddupont 7d1fa31fb6 feat(sandbox-sdk): Cua Sandbox SDK — unified API for Linux, macOS, Windows, Android (#1218 )

* feat(cua-sandbox): Add sandbox SDK with QEMU WSL2/KVM, Hyper-V, and Docker runtimes

- New cua-sandbox package: declarative Image API, layered disk caching, multi-runtime support
- QEMU WSL2 runtime: runs QEMU inside WSL2 with KVM hardware acceleration on Windows
- Hyper-V runtime: builds Windows images from ISO with native Hyper-V Gen2 VMs
- Shared Windows unattended install (builder/windows_unattend.py): Autounattend.xml, ISO creation
- OCI registry push/pull for QEMU disk images
- Computer-server setup script installs cua-computer-server only (no PyTorch/agent)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(cua-sandbox): Add usage examples to README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add cloud transport with ephemeral VM support

Cloud sandboxes are now the default path — sandbox() connects to the
CUA platform API, provisions VMs, and delegates control via HTTPTransport.
Ephemeral inference: image= creates+destroys, name= connects only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add Android emulator runtime, transports, and example sandboxes

Adds AndroidEmulatorRuntime with headless toggle, ADB/VNC/SSH/QMP transports,
cloud transport timeout increase (10min), and example sandbox scripts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add ephemeral cloud sandbox example

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): Remove name from ephemeral cloud example to trigger VM creation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add Mobile interface for Android touch, gestures, and hardware keys

Adds sb.mobile.* methods (tap, swipe, scroll, pinch, home, back, etc.)
backed by ADB shell commands, and an ephemeral Android example.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(ci): pass SLACK_WEBHOOK to cold start benchmark step

* add benchmark script

* feat(android): true MT Protocol B multitouch, gesture() API, auto port detection

- mobile.py: replace asyncio.gather pinch with single-shell MT Protocol B
  sendevent script; add gesture(*finger_paths) primitive; pinch_in/pinch_out
  delegate to gesture()
- android_emulator.py: make adb_port Optional[int]=None; add
  _find_free_emulator_port() scanning even console ports 5554-5682 via
  socket.bind
- examples/touch_test_app/: Android APK logging every MotionEvent as JSON
  to Logcat under tag "TouchTest"; supports RESET_LOG broadcast
- tests/test_android_multitouch.py: integration test suite using sandbox()
  context manager; Local/Cloud split (Cloud skipped without CUA_TEST_API_KEY)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): add get_display_url(share=False) across transports

share=False → vnc://localhost:{port} for local VNC runtimes,
              https://cua.ai/connect/incus/{name} for cloud (auth-gated)
share=True  → noVNC/ws-scrcpy URL with embedded password (cloud only)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* add ephemeral android test

* refactor(tests): move TouchTest APK to standalone repo; download from releases

- Remove examples/touch_test_app — now lives at
  https://github.com/trycua/android-touch-test-app
- test_android_multitouch.py: download APK from GitHub Releases by default
  (latest release URL) instead of building from source
- CUA_ANDROID_TEST_APK can still be set to a local path to override

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tests): implement cloud Android multitouch tests

Extract shared test logic into _MultitouchTests mixin so Local and Cloud
classes run identical assertions. Add cloud_android_sb session fixture that
spins up an ephemeral cloud Android VM, installs the TouchTest APK via
curl + pm install, and yields the ready sandbox.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): implement apk_install for cloud transport; simplify root escalation

- CloudTransport._apply_image_layers: applies apk_install/run layers after
  server is ready (curl + pm install on device)
- Replace transport._adb_cmd("root") with sb.shell.run("su root id") in
  local fixture for consistency with cloud
- Cloud fixture now uses Image.android("14").apk_install(url) same as local

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): add multitouch_gesture server action; fix cloud multi-touch injection

Move MT Protocol B sendevent injection to a server-side `multitouch_gesture`
action so that `adb root` can be called before injecting events. This fixes
cloud Android VMs where `su root sendevent` runs silently but events are not
delivered to the app (likely SELinux blocking kernel input injection from the
su context).

Changes:
- computer-server: add `multitouch_gesture` to AndroidAutomationHandler — calls
  `adb root`, detects touch device + axis range via `getevent -p`, builds and
  runs MT Protocol B sendevent script as root adbd
- computer-server/main.py: register `multitouch_gesture` in handlers map
- mobile.py: `gesture()` now sends the `multitouch_gesture` action with
  structured JSON params instead of building a shell script client-side;
  remove `_build_two_finger_script` and MT Protocol B helpers (logic in server)
- adb.py: handle `multitouch_gesture` via `adb root` + sendevent (local path)
- tests: `test_true_multitouch_*` use `sb.mobile.gesture()` instead of manual
  sendevent scripts; remove `su root id` escalation from fixtures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): add _apply_image_layers to CloudTransport for apk_install support

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(computer-server): add missing logger in android handler

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(computer-server): fix duplicate logger definition

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(cua-sandbox): replace sandbox()/close() with Sandbox.create/connect/ephemeral + disconnect/destroy

- Sandbox.create(image) — provision a persistent sandbox
- Sandbox.connect(name) — attach to an existing sandbox
- Sandbox.ephemeral(image) — async context manager, auto-destroys on exit
- Sandbox.disconnect() — drop connection, sandbox keeps running
- Sandbox.destroy() — disconnect + permanently delete
- Localhost.close() renamed to disconnect()
- sandbox() module-level function kept as deprecated shim
- Updated all tests, examples, conftest, agent docstring, and README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(cua-sandbox): add Localhost.connect() and make Sandbox.connect() dual-mode

- _ConnectResult supports both await and async with on connect()
- Sandbox.connect("name") works as plain await or context manager (disconnects on exit)
- Localhost.connect() mirrors the same pattern
- localhost() module-level function kept as deprecated shim
- conftest fixtures updated to use Localhost.connect()
- README updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(cua-sandbox): update README with new API and connect() dual-mode examples

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add JPEG screenshot support and Android RL fleet benchmark

computer-server: add format/quality params to screenshot() on all handlers
(android, linux, macos, windows, base). Defaults to PNG for backwards compat;
pass format="jpeg" to get ~5-10x smaller payloads for RL workloads.
The existing inspect.signature dispatch picks up the new params automatically.

cua-sandbox: thread format/quality through Transport.screenshot(),
HTTPTransport, CloudTransport, Screen interface, and Sandbox.screenshot()
so callers can do sb.screenshot(format="jpeg", quality=85).

tests: add android_rps_benchmark.py — provisions N Android sandboxes in
parallel and drives them at a target aggregate RPS with per-command latency
logging, p50/p95/p99 reporting, and PASS/FAIL verdict for RL infra validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): update default screenshot quality to 95

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): add pwa_install — build & install TWA APK from a PWA manifest URL

- Image.pwa_install(manifest_url) — new Android-only chainable layer that uses
  Bubblewrap to generate a signed debug APK from a Web App Manifest URL and
  install it via adb
- _bw_init.js — Node.js helper that calls @bubblewrap/core directly to generate
  twa-manifest.json non-interactively (bypasses the interactive CLI)
- AndroidEmulatorRuntime._apply_layers: handle pwa_install layer (init → update
  → build → adb install); auto-creates debug keystore; passes passwords via env
  vars; caches built APKs by manifest URL hash
- transport/*: add format/quality params to all screenshot() implementations;
  add convert_screenshot() helper in base.py for png→jpeg conversion
- examples/pwa_install_test.py: end-to-end test — installs Starbucks PWA,
  resolves launcher activity dynamically, launches and screenshots

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(benchmark): refactor android benchmark to measure max RPS

Remove --target-rps / _TokenBucket / PASS-FAIL verdict; workers now loop
as fast as possible so the run measures achievable throughput. Add flush=True
globally for real-time log output, and use JPEG screenshots.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): validate screenshot magic bytes match requested format

Raise ValueError if the returned image magic bytes don't match the requested
format, e.g. requested 'jpeg' but got 'png' (magic bytes: 89504e47).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(benchmark): add local android benchmark using AndroidEmulatorRuntime

Mirror of android_rps_benchmark.py but uses local=True + AndroidEmulatorRuntime
for baremetal comparison against cloud.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): add JPEG conversion to ADBTransport.screenshot

ADBTransport always returned PNG regardless of the format parameter.
Now converts to JPEG via Pillow when format='jpeg'/'jpg', matching
the behaviour of the server-side android handler.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): run ADB subprocess calls in thread executor

_adb_cmd was a synchronous subprocess.run that blocked the event loop,
preventing asyncio.sleep timers and task cancellation from firing on time.
Add _adb_cmd_async which runs _adb_cmd via loop.run_in_executor, and switch
screenshot, get_screen_size, and send to use it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* perf(cua-sandbox): use raw RGBA screencap + simplejpeg for faster JPEG screenshots

Replace PNG screencap + PIL JPEG encode with raw RGBA screencap (no emulator-side
PNG encode) + simplejpeg (libjpeg-turbo, fastdct=True). Skips the emulator-side
PNG encode entirely and uses a faster JPEG encoder on the host.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* perf(cua-sandbox): revert to PNG screencap, keep simplejpeg for host-side encode

Raw RGBA screencap transfers ~10MB over ADB vs ~1-2MB for PNG (emulator
compresses before sending). Revert to -p PNG screencap, but use simplejpeg
(libjpeg-turbo, fastdct) instead of PIL for the host-side JPEG encode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* revert(cua-sandbox): revert simplejpeg, back to PIL for JPEG encode

simplejpeg showed no measurable improvement over PIL (p50 507ms vs 519ms,
within noise). The bottleneck is ADB transfer (~400ms), not encode time.
PIL produces smaller output (219KB vs 305KB) due to 4:2:0 vs 4:4:4 subsampling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): add GRPCEmulatorTransport for fast Android screenshots

The Android emulator's gRPC service (EmulatorController) bypasses ADB entirely,
reducing screenshot latency from ~500ms to ~50ms. Changes:

- Add GRPCEmulatorTransport using getScreenshot(RGB888) + PIL JPEG encode
- Generate protobuf stubs from emulator_controller.proto into transport/_grpc_emulator/
- AndroidEmulatorRuntime now launches with -grpc <port> and sets grpc_port in RuntimeInfo
- sandbox._create picks GRPCEmulatorTransport when grpc_port is set, else falls back to ADB
- Add grpcio>=1.60.0 to cua-sandbox dependencies

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): add protobuf dependency for gRPC emulator stubs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): fix gRPC stubs and increase max message size to 32MB

- Regenerate emulator_controller stubs with grpcio-tools/_proto include path
  to resolve 'google/protobuf/empty.proto not loaded' error
- Fix relative import in generated grpc stub (bare import → from . import)
- Increase gRPC channel max_receive/send_message_length to 32MB
  (RGB888 screenshot is ~6MB, exceeding the 4MB default)

Result: gRPC screenshot transport now fully functional.
Benchmark: 48.90 RPS / p50=20ms vs ADB baseline 1.80 RPS / p50=519ms (27x faster)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(computer-server): note Android emulator gRPC interface and GRPCEmulatorTransport

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): implement touch/click and fix screen_size in GRPCEmulatorTransport

- send() now handles left_click, right_click, double_click, mouse_down, mouse_up
  via EmulatorController.sendTouch() (press + release TouchEvent pair)
- move_cursor is a no-op (no hover concept on Android)
- Fix get_screen_size(): was requesting 1x1 thumbnail which returned 1080x1;
  now requests full PNG so emulator returns native display dimensions
- Regenerate _grpc_emulator stubs with grpcio-tools/_proto include path

Benchmark (--action step = screen_size + tap + screenshot):
  42.2 RPS / p50=22ms / p95=32ms

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): full gRPC transport — multitouch, shell fallback, sync channel

- Switch grpc.aio → sync grpc channel + run_in_executor
  Avoids "Future attached to a different loop" in pytest session fixtures
- Add shell/run_command handler (ADB fallback via _find_adb)
- Add multitouch_gesture: interpolated N-finger sendTouch frames sent
  simultaneously per frame — passes all 17 multitouch tests
- Pass serial + sdk_root to GRPCEmulatorTransport from sandbox._create
- Regenerate _grpc_emulator stubs

All 17 TestAndroidMultitouchLocal tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): pin grpcio==1.78.0 and protobuf==6.31.1

Generated stubs require exact versions — grpcio-tools 1.78.0 was used to
regenerate and emulator_controller_pb2.py calls ValidateProtobufRuntimeVersion
with 6.31.1. Pinning eliminates stub regeneration on venv recreation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): add agoda media type backward-compat aliases for ghcr.io images

Existing images on ghcr.io still use vnd.agoda.macosvz.* types. Keep them
as OCI_VM_{CONFIG,DISK,AUX}_LEGACY constants, include in VM_MEDIA_TYPES,
and match them in detect_format/detect_os so pulling those images still works.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): fix VNC backend, port, and pull ref for macos-tahoe-cua

- Change LUME_API_PORT from 8000 to 8443 (setup-cua.sh uses port 8443)
- Fix ConnectTimeout not caught in is_ready — was propagating immediately instead of retrying
- Fix pull payload: split full OCI ref (e.g. ghcr.io/trycua/img:tag) into registry/organization/image components to avoid lume API double-prefixing the org
- Install cua-computer-server[vnc] (includes vncdotool/twisted) in setup-cua.sh — required for VNC backend screenshots
- Add test_lume_macos_tahoe_cua test using Image.from_registry with LumeRuntime
- Replace vnd.agoda.macosvz media types with vnd.trycua.lume, keep legacy as backward-compat constants

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): auto-runtime, transport selection, macOS versions, error handling

- Fix local=True with no runtime not calling _auto_runtime — now auto-selects
  DockerRuntime/QEMURuntime/LumeRuntime/AndroidEmulatorRuntime/HyperVRuntime
- Fix transport selection preferring VNCTransport over HTTPTransport when both
  api_port and vnc_port are set (e.g. Docker containers, Lume VMs)
- Add MACOS_VERSION_IMAGES dict mapping version strings to OCI refs
  ("15"/"sequoia" → macos-sequoia-cua, "26"/"tahoe" → macos-tahoe-cua)
- Image.macos() now validates version and errors with supported list; default "26"
- LumeRuntime: handle async pull (ReadError on connection close), bump
  _wait_for_ip timeout to 3600s for large image pulls, use version map
- Add httpx.ReadError to is_ready exception handlers in docker/hyperv/lume
- Add auto-runtime tests (linux container, linux vm, macos, android, windows)
- Add cloud ephemeral tests (linux, android) and Sandbox.create persistent tests
- Fix test_macos_vm hardcoded api_port=18005 → LumeRuntime() with default port

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(examples): replace legacy computer SDK examples with Cua Sandbox SDK

- Remove all examples using the old computer/agent SDK imports
- Add 11 new pytest-compatible examples covering all supported runtimes:
  linux/macos/windows/android × local/cloud × container/vm
- Each example is both runnable (if __name__ == "__main__") and a pytest test
- Docstrings optimized for answer engine discoverability
- Wire examples/sandboxes/ into pytest testpaths in pyproject.toml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox-sdk): persistent sandboxes, auto-ports, pull progress, lume async pull

Python SDK:
- Add random two-word sandbox names (_random_name) instead of "cua-sandbox" fallback
- Add _find_free_port() to docker/qemu runtimes to avoid port conflicts
- Add AndroidEmulatorRuntime with list/stop support, wired into _list_local
- Parallelize cua sb ls across Docker/Lume/QEMU/Android runtimes
- Fix UnboundLocalError for conditional HTTPTransport import
- Fix sandbox name resolution after runtime start (resolved_name)
- Fix Android reconnect to use GRPCEmulatorTransport
- Fix cua sb delete to skip confirmation prompt in non-interactive mode
- Add sandbox_state.py with grpc_port/adb_serial/sdk_root params
- Suppress httpx/cua_sandbox INFO logs in CLI output

Lume:
- Add POST /lume/pull/start async endpoint (202 immediately, polls via GET /lume/vms/{name})
- Add PullProgressTracker actor tracking download % per VM name
- Add downloadProgress field to GET /lume/vms/{name} during pulls
- Fix setProgress to clear stale errors so retries work
- Add progressHandler to pullImage(), handlePull, and lume pull CLI
- Add setTotal() in pullOCI so progress % is accurate (was always 0%)
- Unify /lume/pull and /lume/pull/start to both use progressHandler
- Add diagnostic logging for OCI config/nvram layer parsing
- Fix _wait_for_ip to raise immediately if VM status is "stopped"
- Reduce _wait_for_ip timeout from 3600s to 300s

Examples:
- Add examples/sandboxes-cli/ with CLI-based persistent sandbox tests
- Tests assert VM appears in cua sb ls --all after launch and disappears after delete

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): catch ReadError on sync pull fallback with helpful auth hint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): handle lume v0.3.x connection drop on sync pull — check VM exists after ReadError

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): catch ReadError on /pull/start for lume v0.3.x compat

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): poll VM status after /pull/start connection drop (lume v0.3.x)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): handle lume v0.3.x compat — sync pull + connection drop

lume v0.3.4 doesn't have /pull/start (drops connection immediately)
and also drops the connection on /lume/pull when done. Fall back to
sync /pull, handle the ReadError by verifying VM was created, then
run the VM and return directly instead of falling through to the
async poll path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): find lume binary in ~/.local/bin when not on PATH

lume installs to ~/.local/bin which may not be in PATH for non-interactive
shells (e.g. SSH sessions, LaunchAgents). Fall back to checking the
common install location directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume,tests): redirect progress to stderr; add ~/.local/bin to PATH in tests

- lume.py: all pull progress prints go to sys.stderr so --json output
  is clean JSON on stdout (fixes JSONDecodeError in test_macos_local_vm)
- conftest.py: pytest_configure adds ~/.local/bin to PATH so cua/lume
  binaries installed there are found in non-interactive shells

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): pin macos-tahoe-cua to known-good sha256 digest

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): wait for VNC readiness in is_ready(), not just HTTP /status

macOS VNC (Screen Sharing) starts after the HTTP computer-server, so
screenshot() fails immediately after launch. is_ready() now polls
POST /cmd screenshot until VNC accepts connections before returning.
Timeout extended to 180s to cover both phases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): deliver VNC config to VM before is_ready check

lume v0.3.x doesn't push VNC port/password to the VM via VirtioFS,
so the computer-server uses a stale ~/.vnc.env from a previous run.
After _wait_for_ip, query the lume API for the current vncUrl, parse
port and password, write ~/.vnc.env via `lume ssh`, and restart the
computer-server LaunchAgent. This makes VNC available immediately.
Also reverts is_ready to HTTP-only check (no VNC phase needed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): use pkill to restart computer-server after VNC config update

launchctl kickstart -k fails silently from a non-GUI SSH session.
Kill the python computer_server process directly so launchd revives
it with the new ~/.vnc.env config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): actually delete VM on Sandbox.delete() instead of just stopping

_delete_local called LumeRuntime().suspend() which only stops the VM,
leaving it in lume's registry as 'stopped'. Add LumeRuntime.delete()
which stops then DELETEs via the lume API, and use it in _delete_local.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): use :latest tag for macos-tahoe-cua (lume v0.3.4 can't pull by digest)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(qemu): check Homebrew/MacPorts paths on macOS; improve error message

qemu-system-x86_64 may be installed to /opt/homebrew/bin (Apple Silicon)
or /usr/local/bin (Intel) or /opt/local/bin (MacPorts) without those dirs
being on PATH in subprocess envs. Check known locations before failing.
Error message now also mentions MacPorts as an alternative.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): remove Windows-host-only guard from windows local VM test

QEMU is cross-platform; the test should run on any host where qemu-system-x86_64 is available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sandbox): fall back to bare-metal QEMU for Windows when Docker unavailable

When Docker is not installed or not running, and the image is a Windows VM,
use bare-metal QEMU mode instead of failing with "Docker is not installed".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(bench): add --provision/--continue/--delete modes to android benchmark

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sandbox): add pycdlib as a required dependency

pycdlib is used by the Windows ISO builder (windows_unattend.py) to create
the unattended install ISO. Without it, bare-metal Windows VM creation fails
with ModuleNotFoundError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(qemu): find OVMF firmware in Homebrew's share/qemu/ layout

When QEMU is installed via Homebrew, the binary is at /opt/homebrew/bin/qemu-system-x86_64
but firmware files are at /opt/homebrew/share/qemu/. The previous search only looked
in <bin_dir>/share/ which doesn't exist. Add <bin_dir>/../share/qemu/ to the search path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(qemu): increase bare-metal boot timeout to 600s for Windows/Android

Windows and Android VMs need 3-10 minutes to boot. The previous 120s default
was causing launch to time out before the OS was ready.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(benchmark): add provision resume + lower default parallel to 4

--provision now reads the existing state file and only provisions the
remaining sandboxes to reach --sandboxes N, appending new names.
Default --parallel lowered from 2 to 4 (fewer concurrent provisions
to reduce kopf event-loop overload at scale).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): add OrbStack and Homebrew to PATH in conftest

Ensures docker (OrbStack) and qemu (Homebrew) are found in subprocess calls
during pytest collection and test execution on macOS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): use Sandbox.connect(name=) for --continue reconnect

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(windows): skip test on macOS ARM; validate cached base image size

- Skip Windows local VM test on macOS Apple Silicon: x86_64 Windows via
  QEMU TCG (no hardware accel) would take hours to install and boot.
- Add minimum size check in ensure_base_image to detect and rebuild
  incomplete/corrupt base images left behind by failed builds.
- Remove unused QEMUBaremetalRuntime assignment in _build_windows_base.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cloud-transport): fail fast on 4xx in _wait_for_server_ready + add debug logging

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): skip 401 sandboxes in --continue, reconnect concurrently

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): --continue no longer deletes sandboxes, use --delete explicitly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): fix --delete to use CloudTransport instead of broken Sandbox(name=)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(grpc-emulator): pre-register empty_pb2 to fix protobuf 6.x descriptor load

AddSerializedFile fails on protobuf 6.33+ if google/protobuf/empty.proto
hasn't been loaded yet. Import empty_pb2 before the serialized file to
pre-register it in the descriptor pool.

Also add demo/ scripts for fleet throughput and ephemeral F-Droid.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: replace Computer SDK references with Sandbox SDK throughout

- README.md: update packages table and hero code example to use cua-sandbox
- quickstart.mdx: install cua-sandbox instead of cua-computer; update hello/agent examples
- using-computer-sdk.mdx → using-sandbox-sdk.mdx: new doc with Sandbox SDK API
- using-agent-sdk.mdx: update Python examples to use Sandbox instead of Computer
- reference/sandbox-sdk/: new reference page for cua-sandbox API
- reference/meta.json + get-started/meta.json: update nav to sandbox-sdk

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(readme): unified API example + platform support matrix

* docs(readme): replace iOS with BYOI (.qcow2, .iso) in platform matrix

* docs(readme): move Cua SDK section above CuaBot

* docs(readme): new header + add sb.mobile.gesture() to example

* feat(sandbox): add sb.tunnel.forward() port-forwarding interface

Adds Tunnel interface with forward() supporting ADB (Android), gRPC
emulator, and SSH transports. Includes CDP-over-ADB test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): gate android tests on Java only, not pre-installed SDK

SDK auto-installs on first run; only Java is a hard prereq.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(android): check java returncode in _java_env() — macOS stub exits non-zero

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tunnel): support abstract socket forwarding for Chrome DevTools on Android

adb forward tcp:0 localabstract:chrome_devtools_remote instead of tcp:9222.
Update test to use socket name and tunnel.port for all CDP URLs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tests): add gym-pwa end-to-end Android test with CDP bonus

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): disable Chrome FRE before launching gym-pwa

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(sandbox-sdk): fill documentation gaps vs Modal-style DX

- Add Sandbox section to guide: lifecycle, images, secrets, scale-out
- Add full sub-interface reference (shell, mouse, keyboard, screen,
  clipboard, tunnel, mobile, terminal, window, Localhost)
- Add migration guide from cua-computer to cua-sandbox
- Deprecate Computer SDK page with red callout + migration link
- Update quickstart with local Docker no-account path
- Update what-is-cua to reference Sandbox SDK instead of Computer Framework
- Wire all new pages into nav meta.json files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): clear Chrome data to bypass first-run wizard on emulator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(pwa_install): accept keystore param, auto-configure bubblewrap, return fingerprint

- Image.android().pwa_install() now accepts keystore, keystore_alias,
  keystore_password params — pass the keystore bundled in your PWA repo
  for deterministic fingerprints baked into assetlinks.json
- _build_pwa_apk auto-installs @bubblewrap/cli via npm if not on PATH
- _build_pwa_apk auto-writes ~/.bubblewrap/config.json from known JDK/SDK
  paths — no manual interactive setup required
- Returns (apk_path, sha256_fingerprint) tuple
- _bw_init.js accepts keystore path/alias/password as positional args
- Remove get_pwa_keystore_fingerprint (keystore in repo is the pattern)
- test_android_local_gym_pwa uses Sandbox.ephemeral + pwa_install with
  the committed android.keystore; launches TWA app instead of Chrome

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs): rewrite quickstart to fix broken ephemeral/CLI flow

- Remove pre-create sandbox step — Sandbox.ephemeral manages its own lifecycle
- Remove outdated cua sandbox create --os/--size CLI usage
- Add local Docker path (no account needed) as primary hello world
- Fix VNC step to use Sandbox.create so sandbox is alive to open
- Clean up CLI reference to only show commands that are correct

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs): update CLI reference to real cua-cli sandbox commands

- Rewrite cli/commands.mdx with actual cua sb launch <image> syntax
  (image as positional arg, --cpu/--memory/--disk/--region as options)
- Document all image shorthands (macos, ubuntu:24.04, windows, android)
- Fix quickstart VNC/cleanup steps to use cua sb vnc / cua sb ls / cua sb delete
- Fix using-sandbox-sdk.mdx CLI comment to show correct launch syntax
- Remove libs/python/cli (old mock CLI replaced by cua-cli)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): use Modal-hosted gym-pwa, fix 10.0.2.2 manifest fetch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(examples): add sandboxes section from examples/sandboxes/

One page per OS family (linux, macos, windows, android, custom-images),
each showing cloud + local variants with runnable code.

Every code block carries a `# source:` comment pointing to the corresponding
test file in examples/sandboxes/ so a future CI workflow can verify that
every doc example has a live test case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): auto-install Android build-tools required by bubblewrap

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(what-is-cua): rewrite as concise Modal-style intro with full example

- Lead with a complete sandbox + agent snippet instead of graphics/diagrams
- Show the full API surface inline (shell, screenshot, mouse, keyboard, mobile, tunnel)
- Show the image builder pattern
- Remove ASCII diagrams and redundant explanation prose
- Keep use cases and next-steps links

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(quickstart): use sb.get_display_url(share=True) for live view

Replace persistent sandbox + CLI vnc with get_display_url(share=True)
inline in the agent script — simpler, no CLI needed, works with ephemeral.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: remove self-hosted sandboxes page

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: rename Fundamentals section to Agent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): prefer Java 21, fix jdkPath bundle format, create tools/ stub

- Auto-detect openjdk@21 (Gradle 8.x requires Java ≤ 21; openjdk@25 breaks)
- bubblewrap jdkPath must be .jdk bundle root (it appends Contents/Home)
- JAVA_HOME for gradle resolves to Contents/Home from the bundle
- Create sdk/tools/ stub so bubblewrap SDK validation passes
- Install build-tools;34.0.0 if missing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): suppress Chrome FRE via set-debug-app + command-line flags

After installing the TWA APK, use adb to:
1. am set-debug-app --persistent com.android.chrome (enables flag file)
2. Write chrome-command-line with --no-first-run --disable-fre

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): don't override startUrl with manifest path in _bw_init.js

twa.startUrl should come from the Web App Manifest's start_url field,
not from the manifest file URL's pathname (which was /manifest.json).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(examples): add android local gym-pwa e2e test

End-to-end test for the gym-pwa PWA running as a TWA on a local Android
emulator. Uses the Modal-hosted gym at cuaai--todo-gym-web.modal.run.

Flow:
- POST /api/gym/start/add_item → fresh session + task prompt
- Launch TWA, warm-up, re-launch to pick up session
- Agent taps input, types "Buy groceries", taps Add
- GET /api/gym/evaluate (x-session-id header) → reward == 1.0
- CDP verification: query li span text via Chrome DevTools Remote

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(examples): gym-pwa test uses ?session= URL + bgColor for session isolation

- POST /api/gym/start with bgColor; get back sessionId
- CDP Page.navigate to /?session=<id>&bg=<color> after TWA warm-up
- All API calls pass x-session-id header; no shared server state needed
- Pre-agent screenshot saved to /tmp/gym_pwa_pre_agent.png

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(examples): use lighter bg color for gym-pwa test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: expand pwa_install docs with full params, signing flow, and requirements

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(docs): auto-generate sandbox SDK reference from source

Add cua-sandbox to SDK_CONFIGS in python-sdk.ts generator so the
reference page is generated from docstrings via griffe, matching the
format of computer-sdk and agent-sdk reference pages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs): replace cua-computer imports with cua-sandbox across guide and examples

Update all code blocks referencing the deprecated cua-computer SDK to
use cua-sandbox equivalents (Sandbox, Image) across guide, examples,
and reference pages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(docs): move interactive-shell + add tunneling to Sandbox section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(examples): move test_android_local_gym_pwa to examples/sandboxes/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: add cua-sandbox to bump/publish pipeline

- Add .bumpversion.cfg for sandbox-v* tag format
- Add cd-py-sandbox.yml workflow triggered by sandbox-v* tags
- Add pypi/sandbox option to release-bump-version.yml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-25 17:53:18 -07:00

9.4 KiB

Raw Permalink Blame History

Build, benchmark, and deploy agents that use computers

Choose Your Path

Cua - Agent-Ready Sandboxes for Any OS

Build agents that see screens, click buttons, and complete tasks autonomously. One API for any VM or container image — cloud or local.

# Requires Python 3.12 or 3.13
from cua_sandbox import Sandbox, Image

# Same API regardless of OS or runtime
async with Sandbox.ephemeral(Image.linux()) as sb:   # or .macos() .windows() .android()
    result = await sb.shell.run("echo hello")
    screenshot = await sb.screenshot()
    await sb.mouse.click(100, 200)
    await sb.keyboard.type("Hello from Cua!")
    await sb.mobile.gesture((100, 500), (100, 200))  # multi-touch gestures

	Linux container	Linux VM	macOS	Windows	Android	BYOI (.qcow2, .iso)
Cloud (cua.ai)	✅	✅	✅	✅	✅	🔜 soon
Local (QEMU)	✅	✅	✅	✅	✅	✅

Get Started | Examples | API Reference

CuaBot - Co-op computer-use for any agent

cuabot gives any coding agent a seamless sandbox for computer-use. Individual windows appear natively on your desktop with H.265, shared clipboard, and audio.

npx cuabot                 # Setup onboarding

# Run any agent in a sandbox
cuabot claude              # Claude Code
cuabot openclaw            # OpenClaw in the sandbox

# Run any GUI workflow in a sandbox
cuabot chromium
cuabot --screenshot
cuabot --type "hello"
cuabot --click <x> <y> [button]

Built-in support for agent-browser and agent-device (iOS, Android) out of the box.

Get Started | Installation | First spotted at ClawCon

Cua-Bench - Benchmarks & RL Environments

Evaluate computer-use agents on OSWorld, ScreenSpot, Windows Arena, and custom tasks. Export trajectories for training.

# Install and create base image
cd cua-bench
uv tool install -e . && cb image create linux-docker

# Run benchmark with agent
cb run dataset datasets/cua-bench-basic --agent cua-agent --max-parallel 4

Get Started | Partner With Us | Registry | CLI Reference

Lume - macOS Virtualization

Create and manage macOS/Linux VMs with near-native performance on Apple Silicon using Apple's Virtualization.Framework.

# Install Lume
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"

# Pull & start a macOS VM
lume run macos-sequoia-vanilla:latest

Get Started | FAQ | CLI Reference

Packages

Package	Description
cuabot	Multi-agent computer-use sandbox CLI
cua-agent	AI agent framework for computer-use tasks
cua-sandbox	SDK for creating and controlling sandboxes
cua-computer-server	Driver for UI interactions and code execution in sandboxes
cua-bench	Benchmarks and RL environments for computer-use
lume	macOS/Linux VM management on Apple Silicon
lumier	Docker-compatible interface for Lume VMs

Resources

Documentation — Guides, examples, and API reference
Blog — Tutorials, updates, and research
Discord — Community support and discussions
GitHub Issues — Bug reports and feature requests

Contributing

We welcome contributions! See our Contributing Guidelines for details.

License

MIT License — see LICENSE for details.

Third-party components have their own licenses:

Kasm (MIT)
OmniParser (CC-BY-4.0)
Optional cua-agent[omni] includes ultralytics (AGPL-3.0)

Trademarks

Apple, macOS, Ubuntu, Canonical, and Microsoft are trademarks of their respective owners. This project is not affiliated with or endorsed by these companies.

Thank you to all our GitHub Sponsors!

9.4 KiB Raw Permalink Blame History