add gif for demo

This commit is contained in:
Sarina Li
2025-11-19 17:37:06 -05:00
parent 54c1ba22c0
commit c9751302dd
28 changed files with 233 additions and 103 deletions

View File

@@ -4,7 +4,6 @@ If you've been building computer-use agents, you know the reality: every model p
Today we're launching the **Cua VLM Router**: a managed inference API that gives you unified access to multiple vision-language model providers through a single API key. We're starting with Anthropic's Claude models (Sonnet 4.5 and Haiku 4.5)—some of the most loved and widely-used computer-use models in the Cua ecosystem - with more providers coming soon.
![Cua VLM Router Banner](https://github.com/user-attachments/assets/1b978f62-2cae-4cf7-932a-55ac8c8f2e06)
## What You Get
@@ -12,21 +11,25 @@ Today we're launching the **Cua VLM Router**: a managed inference API that gives
The Cua VLM Router handles the infrastructure so you can focus on building:
**Single API Key**
- One key for all model providers (no juggling multiple credentials)
- Works for both model inference and sandbox access
- Manage everything from one dashboard at cua.ai
**Smart Routing**
- Automatic provider selection for optimal availability and performance
- For Anthropic models, we route to the best provider (Anthropic, AWS Bedrock, or Microsoft Foundry)
- No configuration needed—just specify the model and we handle the rest
**Cost Tracking & Optimization**
- Unified usage dashboard across all models
- Real-time credit balance tracking
- Detailed cost breakdown per request (gateway cost + upstream cost)
**Production-Ready**
- OpenAI-compatible API (drop-in replacement for existing code)
- Full streaming support with Server-Sent Events
- Metadata about routing decisions in every response
@@ -35,10 +38,10 @@ The Cua VLM Router handles the infrastructure so you can focus on building:
We're starting with Anthropic's latest Claude models:
| Model | Best For |
|-------|----------|
| Model | Best For |
| --------------------------------- | ---------------------------------- |
| `cua/anthropic/claude-sonnet-4.5` | General-purpose tasks, recommended |
| `cua/anthropic/claude-haiku-4.5` | Fast responses, cost-effective |
| `cua/anthropic/claude-haiku-4.5` | Fast responses, cost-effective |
## How It Works
@@ -85,12 +88,14 @@ async for result in agent.run(messages):
Already using Anthropic directly? Just add the `cua/` prefix:
**Before:**
```python
export ANTHROPIC_API_KEY="sk-ant-..."
agent = ComputerAgent(model="anthropic/claude-sonnet-4-5-20250929")
```
**After:**
```python
export CUA_API_KEY="sk_cua-api01_..."
agent = ComputerAgent(model="cua/anthropic/claude-sonnet-4.5")

View File

@@ -11,11 +11,13 @@ Today we're launching the **Cua CLI**: a command-line interface that brings the
The Cua CLI handles everything you need to work with Cloud Sandboxes:
**Authentication**
- Browser-based OAuth login with automatic credential storage
- Direct API key support for CI/CD pipelines
- Export credentials to `.env` files for SDK integration
**Sandbox Management**
- Create sandboxes with your choice of OS, size, and region
- List all your sandboxes with status and connection details
- Start, stop, restart, and delete sandboxes
@@ -123,17 +125,20 @@ await computer.run()
Create sandboxes in the size and region that fits your needs:
**Sizes:**
- `small` - 2 cores, 8 GB RAM, 128 GB SSD
- `medium` - 4 cores, 16 GB RAM, 128 GB SSD
- `large` - 8 cores, 32 GB RAM, 256 GB SSD
**Regions:**
- `north-america`
- `europe`
- `asia-pacific`
- `south-america`
**OS Options:**
- `linux` - Ubuntu with XFCE desktop
- `windows` - Windows 11 with Edge and Python
- `macos` - macOS (preview access)
@@ -141,6 +146,7 @@ Create sandboxes in the size and region that fits your needs:
## Example Workflows
**Quick Testing Environment**
```bash
# Spin up a sandbox, test something, tear it down
cua sb create --os linux --size small --region north-america
@@ -149,6 +155,7 @@ cua sb delete my-sandbox-abc123
```
**Persistent Development Sandbox**
```bash
# Create a sandbox for long-term use
cua sb create --os linux --size medium --region north-america
@@ -221,11 +228,13 @@ Yes. The CLI and dashboard share the same API. Any sandbox you create in the das
<summary><strong>How do I update the CLI?</strong></summary>
If you installed via script:
```bash
curl -LsSf https://cua.ai/cli/install.sh | sh
```
If you installed via npm:
```bash
npm install -g @trycua/cli@latest
```
@@ -235,6 +244,7 @@ npm install -g @trycua/cli@latest
## What's Next
We're actively iterating based on feedback. Planned features include:
- SSH key management for secure sandbox access
- Template-based sandbox creation
- Batch operations (start/stop multiple sandboxes)

View File

@@ -4,7 +4,11 @@ description: Supported computer-using agent loops and models
---
<Callout>
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/agent_nb.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
A corresponding{' '}
<a href="https://github.com/trycua/cua/blob/main/notebooks/agent_nb.ipynb" target="_blank">
Jupyter Notebook
</a>{' '}
is available for this documentation.
</Callout>
An agent can be thought of as a loop - it generates actions, executes them, and repeats until done:

View File

@@ -3,7 +3,14 @@ title: Customize ComputerAgent
---
<Callout>
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/customizing_computeragent.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
A corresponding{' '}
<a
href="https://github.com/trycua/cua/blob/main/notebooks/customizing_computeragent.ipynb"
target="_blank"
>
Jupyter Notebook
</a>{' '}
is available for this documentation.
</Callout>
The `ComputerAgent` interface provides an easy proxy to any computer-using model configuration, and it is a powerful framework for extending and building your own agentic systems.

View File

@@ -4,7 +4,11 @@ description: Use ComputerAgent with HUD for benchmarking and evaluation
---
<Callout>
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/eval_osworld.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
A corresponding{' '}
<a href="https://github.com/trycua/cua/blob/main/notebooks/eval_osworld.ipynb" target="_blank">
Jupyter Notebook
</a>{' '}
is available for this documentation.
</Callout>
The HUD integration allows an agent to be benchmarked using the [HUD framework](https://www.hud.so/). Through the HUD integration, the agent controls a computer inside HUD, where tests are run to evaluate the success of each task.

View File

@@ -59,4 +59,8 @@ you will see all the agent execution steps, including computer actions, LLM call
For each step, you will see the LLM call, the computer action. The computer actions are highlighted in the timeline in yellow.
<img src="/docs/img/laminar_trace_example.png" alt="Example trace in Laminar showing the litellm.response span and its output." width="800px" />
<img
src="/docs/img/laminar_trace_example.png"
alt="Example trace in Laminar showing the litellm.response span and its output."
width="800px"
/>

View File

@@ -55,10 +55,10 @@ async for result in agent.run(messages):
The CUA VLM Router currently supports these models:
| Model ID | Provider | Description | Best For |
|----------|----------|-------------|----------|
| Model ID | Provider | Description | Best For |
| --------------------------------- | --------- | ----------------- | ---------------------------------- |
| `cua/anthropic/claude-sonnet-4.5` | Anthropic | Claude Sonnet 4.5 | General-purpose tasks, recommended |
| `cua/anthropic/claude-haiku-4.5` | Anthropic | Claude Haiku 4.5 | Fast responses, cost-effective |
| `cua/anthropic/claude-haiku-4.5` | Anthropic | Claude Haiku 4.5 | Fast responses, cost-effective |
## How It Works
@@ -95,6 +95,7 @@ GET /v1/models
```
**Response:**
```json
{
"data": [
@@ -117,12 +118,11 @@ Content-Type: application/json
```
**Request:**
```json
{
"model": "anthropic/claude-sonnet-4.5",
"messages": [
{"role": "user", "content": "Hello!"}
],
"messages": [{ "role": "user", "content": "Hello!" }],
"max_tokens": 100,
"temperature": 0.7,
"stream": false
@@ -130,20 +130,23 @@ Content-Type: application/json
```
**Response:**
```json
{
"id": "gen_...",
"object": "chat.completion",
"created": 1763554838,
"model": "anthropic/claude-sonnet-4.5",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 12,
@@ -170,6 +173,7 @@ curl -X POST https://inference.cua.ai/v1/chat/completions \
```
**Response (SSE format):**
```
data: {"id":"gen_...","choices":[{"delta":{"content":"1"}}],"object":"chat.completion.chunk"}
@@ -187,6 +191,7 @@ GET /v1/balance
```
**Response:**
```json
{
"balance": 211689.85,
@@ -201,6 +206,7 @@ CUA VLM Router provides detailed cost information in every response:
### Credit System
Requests are billed in **credits**:
- Credits are deducted from your CUA account balance
- Prices vary by model and usage
- CUA manages all provider API keys and infrastructure
@@ -210,8 +216,8 @@ Requests are billed in **credits**:
```json
{
"usage": {
"cost": 0.01, // CUA gateway cost in credits
"market_cost": 0.000065 // Actual upstream API cost
"cost": 0.01, // CUA gateway cost in credits
"market_cost": 0.000065 // Actual upstream API cost
}
}
```
@@ -251,19 +257,20 @@ agent = ComputerAgent(
## Benefits Over Direct Provider Access
| Feature | CUA VLM Router | Direct Provider (BYOK) |
|---------|---------------|------------------------|
| **Single API Key** | ✅ One key for all providers | ❌ Multiple keys to manage |
| **Managed Infrastructure** | ✅ No API key management | ❌ Manage multiple provider keys |
| **Usage Tracking** | ✅ Unified dashboard | ❌ Per-provider tracking |
| **Model Switching** | ✅ Change model string only | ❌ Change code + keys |
| **Setup Complexity** | ✅ One environment variable | ❌ Multiple environment variables |
| Feature | CUA VLM Router | Direct Provider (BYOK) |
| -------------------------- | ---------------------------- | --------------------------------- |
| **Single API Key** | ✅ One key for all providers | ❌ Multiple keys to manage |
| **Managed Infrastructure** | ✅ No API key management | ❌ Manage multiple provider keys |
| **Usage Tracking** | ✅ Unified dashboard | ❌ Per-provider tracking |
| **Model Switching** | ✅ Change model string only | ❌ Change code + keys |
| **Setup Complexity** | ✅ One environment variable | ❌ Multiple environment variables |
## Error Handling
### Common Error Responses
#### Invalid API Key
```json
{
"detail": "Insufficient credits. Current balance: 0.00 credits"
@@ -271,6 +278,7 @@ agent = ComputerAgent(
```
#### Missing Authorization
```json
{
"detail": "Missing Authorization: Bearer token"
@@ -278,6 +286,7 @@ agent = ComputerAgent(
```
#### Invalid Model
```json
{
"detail": "Invalid or unavailable model"
@@ -343,6 +352,7 @@ agent = ComputerAgent(
Switching from direct provider access (BYOK) to CUA VLM Router is simple:
**Before (Direct Provider Access with BYOK):**
```python
import os
# Required: Provider-specific API key
@@ -355,6 +365,7 @@ agent = ComputerAgent(
```
**After (CUA VLM Router - Cloud Service):**
```python
import os
# Required: CUA API key only (no provider keys needed)

View File

@@ -14,6 +14,7 @@ model="cua/anthropic/claude-haiku-4.5" # Claude Haiku 4.5 (faster)
```
**Benefits:**
- Single API key for multiple providers
- Cost tracking and optimization
- Fully managed infrastructure (no provider keys to manage)

View File

@@ -19,6 +19,7 @@ Cua collects anonymized usage and error statistics. We follow [Posthog's ethical
### Disabled by default (opt-in)
**Trajectory logging** captures full conversation history:
- User messages and agent responses
- Computer actions and outputs
- Agent reasoning traces

View File

@@ -3,7 +3,8 @@ title: Computer UI (Deprecated)
---
<Callout type="warn" title="Deprecated">
The Computer UI is deprecated and will be replaced with a revamped playground experience soon. We recommend using VNC or Screen Sharing for precise control of the computer instead.
The Computer UI is deprecated and will be replaced with a revamped playground experience soon. We
recommend using VNC or Screen Sharing for precise control of the computer instead.
</Callout>
The computer module includes a Gradio UI for creating and sharing demonstration data. We make it easy for people to build community datasets for better computer use models with an upload to Huggingface feature.

View File

@@ -4,7 +4,14 @@ slug: sandboxed-python
---
<Callout>
A corresponding <a href="https://github.com/trycua/cua/blob/main/examples/sandboxed_functions_examples.py" target="_blank">Python example</a> is available for this documentation.
A corresponding{' '}
<a
href="https://github.com/trycua/cua/blob/main/examples/sandboxed_functions_examples.py"
target="_blank"
>
Python example
</a>{' '}
is available for this documentation.
</Callout>
You can run Python functions securely inside a sandboxed virtual environment on a remote Cua Computer. This is useful for executing untrusted user code, isolating dependencies, or providing a safe environment for automation tasks.

View File

@@ -473,6 +473,7 @@ python form_filling.py
```
The agent will:
1. Download the PDF resume from Overleaf
2. Extract information from the PDF
3. Fill out the JotForm with the extracted information

View File

@@ -11,6 +11,12 @@ import { Callout } from 'fumadocs-ui/components/callout';
This example demonstrates how to use Google's Gemini 3 models with OmniParser for complex GUI grounding tasks. Gemini 3 Pro achieves exceptional performance on the [ScreenSpot-Pro benchmark](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding) with a **72.7% accuracy** (compared to Claude Sonnet 4.5's 36.2%), making it ideal for precise UI element location and complex navigation tasks.
<img
src="/docs/img/grounding-with-gemini3.gif"
alt="Demo of Gemini 3 with OmniParser performing complex GUI navigation tasks"
width="800px"
/>
<Callout type="info" title="Why Gemini 3 for UI Navigation?">
According to [Google's Gemini 3 announcement](https://blog.google/products/gemini/gemini-3/),
Gemini 3 Pro achieves: - **72.7%** on ScreenSpot-Pro (vs. Gemini 2.5 Pro's 11.4%) -

View File

@@ -441,6 +441,7 @@ python contact_export.py
```
The agent will:
1. Navigate to your LinkedIn connections page
2. Extract data from 20 contacts (first name, last name, role, company, LinkedIn URL)
3. Save contacts to a timestamped CSV file

View File

@@ -11,19 +11,23 @@ import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
This guide demonstrates how to automate Windows desktop applications (like eGecko HR/payroll systems) that run behind corporate VPN. This is a common enterprise scenario where legacy desktop applications require manual data entry, report generation, or workflow execution.
**Use cases:**
- HR/payroll processing (employee onboarding, payroll runs, benefits administration)
- Desktop ERP systems behind corporate networks
- Legacy financial applications requiring VPN access
- Compliance reporting from on-premise systems
**Architecture:**
- Client-side Cua agent (Python SDK or Playground UI)
- Windows VM/Sandbox with VPN client configured
- RDP/remote desktop connection to target environment
- Desktop application automation via computer vision and UI control
<Callout type="info">
**Production Deployment**: For production use, consider workflow mining and custom finetuning to create vertical-specific actions (e.g., "Run payroll", "Onboard employee") instead of generic UI automation. This provides better audit trails and higher success rates.
**Production Deployment**: For production use, consider workflow mining and custom finetuning to
create vertical-specific actions (e.g., "Run payroll", "Onboard employee") instead of generic UI
automation. This provides better audit trails and higher success rates.
</Callout>
---
@@ -31,7 +35,11 @@ This guide demonstrates how to automate Windows desktop applications (like eGeck
## Video Demo
<div className="rounded-lg border bg-card text-card-foreground shadow-sm p-4 mb-6">
<video src="https://github.com/user-attachments/assets/8ab07646-6018-4128-87ce-53180cfea696" controls className="w-full rounded">
<video
src="https://github.com/user-attachments/assets/8ab07646-6018-4128-87ce-53180cfea696"
controls
className="w-full rounded"
>
Your browser does not support the video tag.
</video>
<div className="text-sm text-muted-foreground mt-2">
@@ -106,7 +114,8 @@ For local development on Windows 10 Pro/Enterprise or Windows 11:
4. Configure your desktop application installation within the sandbox
<Callout type="warn">
**Manual VPN Setup**: Windows Sandbox requires manual VPN configuration each time it starts. For production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections.
**Manual VPN Setup**: Windows Sandbox requires manual VPN configuration each time it starts. For
production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections.
</Callout>
</Tab>
@@ -421,6 +430,7 @@ python hr_automation.py
```
The agent will:
1. Connect to your Windows environment (with VPN if configured)
2. Launch and navigate the desktop application
3. Execute each workflow step sequentially
@@ -506,6 +516,7 @@ agent = ComputerAgent(
### 1. Workflow Mining
Before deploying, analyze your actual workflows:
- Record user interactions with the application
- Identify common patterns and edge cases
- Map out decision trees and validation requirements
@@ -524,6 +535,7 @@ tasks = ["onboard_employee", "run_payroll", "generate_compliance_report"]
```
This provides:
- Better audit trails
- Approval gates at business logic level
- Higher success rates
@@ -547,12 +559,14 @@ agent = ComputerAgent(
Choose your deployment model:
**Managed (Recommended)**
- Cua hosts Windows sandboxes, VPN/RDP stack, and agent runtime
- You get UI/API endpoints for triggering workflows
- Automatic scaling, monitoring, and maintenance
- SLA guarantees and enterprise support
**Self-Hosted**
- You manage Windows VMs, VPN infrastructure, and agent deployment
- Full control over data and security
- Custom network configurations

View File

@@ -5,7 +5,8 @@ title: Introduction
import { Monitor, Code, BookOpen, Zap, Bot, Boxes, Rocket } from 'lucide-react';
<div className="rounded-lg border bg-card text-card-foreground shadow-sm px-4 py-2 mb-6">
Cua is an open-source framework for building **Computer-Use Agents** - AI systems that see, understand, and interact with desktop applications through vision and action, just like humans do.
Cua is an open-source framework for building **Computer-Use Agents** - AI systems that see,
understand, and interact with desktop applications through vision and action, just like humans do.
</div>
## Why Cua?

View File

@@ -7,7 +7,14 @@ github:
---
<Callout>
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/computer_server_nb.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
A corresponding{' '}
<a
href="https://github.com/trycua/cua/blob/main/notebooks/computer_server_nb.ipynb"
target="_blank"
>
Jupyter Notebook
</a>{' '}
is available for this documentation.
</Callout>
The Computer Server API reference documentation is currently under development.

View File

@@ -15,6 +15,7 @@ The CUA CLI provides commands for authentication and sandbox management.
The CLI supports **two command styles** for flexibility:
**Flat style** (quick & concise):
```bash
cua list
cua create --os linux --size small --region north-america
@@ -22,6 +23,7 @@ cua start my-sandbox
```
**Grouped style** (explicit & clear):
```bash
cua sb list # or: cua sandbox list
cua sb create # or: cua sandbox create
@@ -54,9 +56,11 @@ cua login --api-key sk-your-api-key-here
```
**Options:**
- `--api-key <key>` - Provide API key directly instead of browser flow
**Example:**
```bash
$ cua auth login
Opening browser for CLI auth...
@@ -75,12 +79,14 @@ cua env
```
**Example:**
```bash
$ cua auth env
Wrote /path/to/your/project/.env
```
The generated `.env` file will contain:
```
CUA_API_KEY=sk-your-api-key-here
```
@@ -97,6 +103,7 @@ cua logout
```
**Example:**
```bash
$ cua auth logout
Logged out
@@ -121,6 +128,7 @@ cua ps
```
**Example Output (default, passwords hidden):**
```
NAME STATUS HOST
my-dev-sandbox running my-dev-sandbox.sandbox.cua.ai
@@ -128,6 +136,7 @@ test-windows stopped test-windows.sandbox.cua.ai
```
**Example Output (with --show-passwords):**
```
NAME STATUS PASSWORD HOST
my-dev-sandbox running secure-pass-123 my-dev-sandbox.sandbox.cua.ai
@@ -143,11 +152,13 @@ cua create --os <OS> --size <SIZE> --region <REGION>
```
**Required Options:**
- `--os` - Operating system: `linux`, `windows`, `macos`
- `--size` - Sandbox size: `small`, `medium`, `large`
- `--region` - Region: `north-america`, `europe`, `asia-pacific`, `south-america`
**Examples:**
```bash
# Create a small Linux sandbox in North America
cua create --os linux --size small --region north-america
@@ -162,6 +173,7 @@ cua create --os macos --size large --region asia-pacific
**Response Types:**
**Immediate (Status 200):**
```bash
Sandbox created and ready: my-new-sandbox-abc123
Password: secure-password-here
@@ -169,6 +181,7 @@ Host: my-new-sandbox-abc123.sandbox.cua.ai
```
**Provisioning (Status 202):**
```bash
Sandbox provisioning started: my-new-sandbox-abc123
Job ID: job-xyz789
@@ -184,6 +197,7 @@ cua start <name>
```
**Example:**
```bash
$ cua start my-dev-sandbox
Start accepted
@@ -198,6 +212,7 @@ cua stop <name>
```
**Example:**
```bash
$ cua stop my-dev-sandbox
stopping
@@ -212,6 +227,7 @@ cua restart <name>
```
**Example:**
```bash
$ cua restart my-dev-sandbox
restarting
@@ -226,6 +242,7 @@ cua delete <name>
```
**Example:**
```bash
$ cua delete old-test-sandbox
Sandbox deletion initiated: deleting
@@ -247,6 +264,7 @@ cua open <name>
```
**Example:**
```bash
$ cua vnc my-dev-sandbox
Opening NoVNC: https://my-dev-sandbox.sandbox.cua.ai/vnc.html?autoconnect=true&password=...
@@ -254,7 +272,6 @@ Opening NoVNC: https://my-dev-sandbox.sandbox.cua.ai/vnc.html?autoconnect=true&p
This command automatically opens your default browser to the VNC interface with the correct password pre-filled.
## Global Options
### Help
@@ -273,18 +290,21 @@ cua list --help
The CLI provides clear error messages for common issues:
### Authentication Errors
```bash
$ cua list
Unauthorized. Try 'cua auth login' again.
```
### Sandbox Not Found
```bash
$ cua start nonexistent-sandbox
Sandbox not found
```
### Invalid Configuration
```bash
$ cua create --os invalid --configuration small --region north-america
Invalid request or unsupported configuration
@@ -293,6 +313,7 @@ Invalid request or unsupported configuration
## Tips and Best Practices
### 1. Use Descriptive Sandbox Names
```bash
# Good
cua create --os linux --size small --region north-america
@@ -304,6 +325,7 @@ cua list # Check the generated name
```
### 2. Environment Management
```bash
# Set up your project with API key
cd my-project
@@ -312,6 +334,7 @@ cua auth env
```
### 3. Quick Sandbox Access
```bash
# Create aliases for frequently used sandboxes
alias dev-sandbox="cua vnc my-development-sandbox"
@@ -319,6 +342,7 @@ alias prod-sandbox="cua vnc my-production-sandbox"
```
### 4. Monitoring Provisioning
```bash
# For sandboxes that need provisioning time
cua create --os windows --size large --region europe

View File

@@ -34,16 +34,19 @@ cua sb list
## Use Cases
### Development Workflow
- Quickly spin up cloud sandboxes for testing
- Manage multiple sandboxes across different regions
- Integrate with CI/CD pipelines
### Team Collaboration
- Share sandbox configurations and access
- Standardize development environments
- Quick onboarding for new team members
### Automation
- Script sandbox provisioning and management
- Integrate with deployment workflows
- Automate environment setup

View File

@@ -11,24 +11,21 @@ import { Callout } from 'fumadocs-ui/components/callout';
The fastest way to install the CUA CLI is using our installation scripts:
<Tabs items={['macOS / Linux', 'Windows']}>
<Tab value="macOS / Linux">
```bash
curl -LsSf https://cua.ai/cli/install.sh | sh
```
</Tab>
<Tab value="macOS / Linux">```bash curl -LsSf https://cua.ai/cli/install.sh | sh ```</Tab>
<Tab value="Windows">
```powershell
powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
```powershell powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
```
</Tab>
</Tabs>
These scripts will automatically:
1. Install [Bun](https://bun.sh) (a fast JavaScript runtime)
2. Install the CUA CLI via `bun add -g @trycua/cli`
<Callout type="info">
The installation scripts will automatically detect your system and install the appropriate binary to your PATH.
The installation scripts will automatically detect your system and install the appropriate binary
to your PATH.
</Callout>
## Alternative: Install with Bun
@@ -44,8 +41,8 @@ bun add -g @trycua/cli
```
<Callout type="info">
Using Bun provides faster installation and better performance compared to npm.
If you don't have Bun installed, the first command will install it for you.
Using Bun provides faster installation and better performance compared to npm. If you don't have
Bun installed, the first command will install it for you.
</Callout>
## Verify Installation
@@ -76,40 +73,21 @@ To update to the latest version:
<Tabs items={['Script Install', 'npm Install']}>
<Tab value="Script Install">
Re-run the installation script:
```bash
# macOS/Linux
curl -LsSf https://cua.ai/cli/install.sh | sh
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
```
</Tab>
<Tab value="npm Install">
```bash
npm update -g @trycua/cli
Re-run the installation script: ```bash # macOS/Linux curl -LsSf https://cua.ai/cli/install.sh |
sh # Windows powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
```
</Tab>
<Tab value="npm Install">```bash npm update -g @trycua/cli ```</Tab>
</Tabs>
## Uninstalling
<Tabs items={['Script Install', 'npm Install']}>
<Tab value="Script Install">
Remove the binary from your PATH:
```bash
# macOS/Linux
rm $(which cua)
# Windows
# Remove from your PATH or delete the executable
```
</Tab>
<Tab value="npm Install">
```bash
npm uninstall -g @trycua/cli
```
Remove the binary from your PATH: ```bash # macOS/Linux rm $(which cua) # Windows # Remove from
your PATH or delete the executable ```
</Tab>
<Tab value="npm Install">```bash npm uninstall -g @trycua/cli ```</Tab>
</Tabs>
## Troubleshooting
@@ -128,17 +106,12 @@ If you encounter permission issues during installation:
<Tabs items={['macOS / Linux', 'Windows']}>
<Tab value="macOS / Linux">
Try running with sudo (not recommended for the curl method):
```bash
# If using npm
sudo npm install -g @trycua/cli
```
Try running with sudo (not recommended for the curl method): ```bash # If using npm sudo npm
install -g @trycua/cli ```
</Tab>
<Tab value="Windows">
Run PowerShell as Administrator:
```powershell
# Right-click PowerShell and "Run as Administrator"
powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
Run PowerShell as Administrator: ```powershell # Right-click PowerShell and "Run as
Administrator" powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
```
</Tab>
</Tabs>

View File

@@ -30,13 +30,15 @@ To use with Claude Desktop, add an entry to your Claude Desktop configuration (`
If you're working with the CUA source code:
**Standard VM Mode:**
```json
{
"mcpServers": {
"cua-agent": {
"command": "/usr/bin/env",
"args": [
"bash", "-lc",
"bash",
"-lc",
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
]
}
@@ -45,13 +47,15 @@ If you're working with the CUA source code:
```
**Host Computer Control Mode:**
```json
{
"mcpServers": {
"cua-agent": {
"command": "/usr/bin/env",
"args": [
"bash", "-lc",
"bash",
"-lc",
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; export CUA_USE_HOST_COMPUTER_SERVER='true'; export CUA_MAX_IMAGES='1'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
]
}
@@ -62,6 +66,7 @@ If you're working with the CUA source code:
**Note**: Replace `/path/to/cua` with the absolute path to your CUA repository directory.
**⚠️ Host Computer Control Setup**: When using `CUA_USE_HOST_COMPUTER_SERVER='true'`, you must also:
1. Install computer server dependencies: `python3 -m pip install uvicorn fastapi`
2. Install the computer server: `python3 -m pip install -e libs/python/computer-server --break-system-packages`
3. Start the computer server: `python -m computer_server --log-level debug`

View File

@@ -4,18 +4,19 @@ title: Configuration
The server is configured using environment variables (can be set in the Claude Desktop config):
| Variable | Description | Default |
|----------|-------------|---------|
| `CUA_MODEL_NAME` | Model string (e.g., "anthropic/claude-sonnet-4-20250514", "anthropic/claude-3-5-sonnet-20240620", "openai/computer-use-preview", "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", "omniparser+litellm/gpt-4o", "omniparser+ollama_chat/gemma3") | anthropic/claude-sonnet-4-20250514 |
| `ANTHROPIC_API_KEY` | Your Anthropic API key (required for Anthropic models) | None |
| `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 |
| `CUA_USE_HOST_COMPUTER_SERVER` | Target your local desktop instead of a VM. Set to "true" to use your host system. **Warning:** AI models may perform risky actions. | false |
| Variable | Description | Default |
| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- |
| `CUA_MODEL_NAME` | Model string (e.g., "anthropic/claude-sonnet-4-20250514", "anthropic/claude-3-5-sonnet-20240620", "openai/computer-use-preview", "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", "omniparser+litellm/gpt-4o", "omniparser+ollama_chat/gemma3") | anthropic/claude-sonnet-4-20250514 |
| `ANTHROPIC_API_KEY` | Your Anthropic API key (required for Anthropic models) | None |
| `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 |
| `CUA_USE_HOST_COMPUTER_SERVER` | Target your local desktop instead of a VM. Set to "true" to use your host system. **Warning:** AI models may perform risky actions. | false |
## Model Configuration
The `CUA_MODEL_NAME` environment variable supports various model providers through LiteLLM integration:
### Supported Providers
- **Anthropic**: `anthropic/claude-sonnet-4-20250514`, `anthropic/claude-3-5-sonnet-20240620`, `anthropic/claude-3-haiku-20240307`
- **OpenAI**: `openai/computer-use-preview`, `openai/gpt-4o`
- **Local Models**: `huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B`
@@ -25,6 +26,7 @@ The `CUA_MODEL_NAME` environment variable supports various model providers throu
### Example Configurations
**Claude Desktop Configuration:**
```json
{
"mcpServers": {
@@ -43,6 +45,7 @@ The `CUA_MODEL_NAME` environment variable supports various model providers throu
```
**Local Model Configuration:**
```json
{
"mcpServers": {
@@ -61,6 +64,7 @@ The `CUA_MODEL_NAME` environment variable supports various model providers throu
## Session Management Configuration
The MCP server automatically manages sessions with the following defaults:
- **Max Concurrent Sessions**: 10
- **Session Timeout**: 10 minutes of inactivity
- **Computer Pool Size**: 5 instances

View File

@@ -58,7 +58,8 @@ If you're working with the CUA source code directly (like in the CUA repository)
"cua-agent": {
"command": "/usr/bin/env",
"args": [
"bash", "-lc",
"bash",
"-lc",
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
]
}
@@ -69,16 +70,19 @@ If you're working with the CUA source code directly (like in the CUA repository)
**For host computer control** (development setup):
1. **Install Computer Server Dependencies**:
```bash
python3 -m pip install uvicorn fastapi
python3 -m pip install -e libs/python/computer-server --break-system-packages
```
2. **Start the Computer Server**:
```bash
cd /path/to/cua
python -m computer_server --log-level debug
```
This will start the computer server on `http://localhost:8000` that controls your actual desktop.
3. **Configure Claude Desktop**:
@@ -88,7 +92,8 @@ If you're working with the CUA source code directly (like in the CUA repository)
"cua-agent": {
"command": "/usr/bin/env",
"args": [
"bash", "-lc",
"bash",
"-lc",
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; export CUA_USE_HOST_COMPUTER_SERVER='true'; export CUA_MAX_IMAGES='1'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
]
}
@@ -110,6 +115,7 @@ If you're working with the CUA source code directly (like in the CUA repository)
- Check logs for specific error messages
2. **"Missing Anthropic API Key"** - Add your API key to the configuration:
```json
"env": {
"ANTHROPIC_API_KEY": "your-api-key-here"
@@ -130,6 +136,7 @@ If you're working with the CUA source code directly (like in the CUA repository)
- **Image size errors**: Use `CUA_MAX_IMAGES='1'` to reduce image context size
**Viewing Logs:**
```bash
tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
```

View File

@@ -45,17 +45,20 @@ The MCP server supports multi-client sessions with automatic resource management
## Usage Examples
### Basic Task Execution
```
"Open Chrome and navigate to github.com"
"Create a folder called 'Projects' on my desktop"
```
### Multi-Task Execution
```
"Run these tasks: 1) Open Finder, 2) Navigate to Documents, 3) Create a new folder called 'Work'"
```
### Session Management
```
"Take a screenshot of the current screen"
"Show me the session statistics"

View File

@@ -16,27 +16,35 @@ Claude will automatically use your CUA agent to perform these tasks.
## Advanced Features
### Progress Reporting
The MCP server provides real-time progress updates during task execution:
- Task progress is reported as percentages (0-100%)
- Multi-task operations show progress for each individual task
- Progress updates are streamed to the MCP client for real-time feedback
### Error Handling
Robust error handling ensures reliable operation:
- Failed tasks return error messages with screenshots when possible
- Session state is preserved even when individual tasks fail
- Automatic cleanup prevents resource leaks
- Detailed error logging for troubleshooting
### Concurrent Task Execution
For improved performance, multiple tasks can run concurrently:
- Set `concurrent=true` in `run_multi_cua_tasks` for parallel execution
- Each task runs in its own context with isolated state
- Progress tracking works for both sequential and concurrent modes
- Resource pooling ensures efficient computer instance usage
### Session Management
Multi-client support with automatic resource management:
- Each client gets isolated sessions with separate computer instances
- Sessions automatically clean up after 10 minutes of inactivity
- Resource pooling prevents resource exhaustion
@@ -55,7 +63,8 @@ No additional configuration is needed - this is the default behavior.
### Option: Targeting Your Local Desktop
<Callout type="warn">
**Warning:** When targeting your local system, AI models have direct access to your desktop and may perform risky actions. Use with caution.
**Warning:** When targeting your local system, AI models have direct access to your desktop and
may perform risky actions. Use with caution.
</Callout>
To have the MCP server control your local desktop instead of a VM:
@@ -89,6 +98,7 @@ Add the `CUA_USE_HOST_COMPUTER_SERVER` environment variable to your MCP client c
}
}
```
</Tab>
<Tab value="Other MCP Clients">
Set the environment variable in your MCP client configuration:
@@ -98,6 +108,7 @@ Add the `CUA_USE_HOST_COMPUTER_SERVER` environment variable to your MCP client c
```
Then start your MCP client as usual.
</Tab>
</Tabs>
@@ -108,6 +119,7 @@ Now Claude will control your local desktop directly when you ask it to perform c
## Usage Examples
### Single Task Execution
```
"Open Safari and navigate to apple.com"
"Create a new folder on the desktop called 'My Projects'"
@@ -115,16 +127,19 @@ Now Claude will control your local desktop directly when you ask it to perform c
```
### Multi-Task Execution (Sequential)
```
"Run these tasks in order: 1) Open Finder, 2) Navigate to Documents folder, 3) Create a new folder called 'Work'"
```
### Multi-Task Execution (Concurrent)
```
"Run these tasks simultaneously: 1) Open Chrome, 2) Open Safari, 3) Open Finder"
```
### Session Management
```
"Show me the current session statistics"
"Take a screenshot using session abc123"
@@ -132,6 +147,7 @@ Now Claude will control your local desktop directly when you ask it to perform c
```
### Error Recovery
```
"Try to open a non-existent application and show me the error"
"Find all files with .tmp extension and delete them safely"
@@ -140,13 +156,15 @@ Now Claude will control your local desktop directly when you ask it to perform c
## First-time Usage Notes
**API Keys**: Ensure you have valid API keys:
- Add your Anthropic API key in the Claude Desktop config (as shown above)
- Or set it as an environment variable in your shell profile
- **Required**: The MCP server needs an API key to authenticate with the model provider
- Add your Anthropic API key in the Claude Desktop config (as shown above)
- Or set it as an environment variable in your shell profile
- **Required**: The MCP server needs an API key to authenticate with the model provider
**Model Selection**: Choose the appropriate model for your needs:
- **Claude Sonnet 4**: Latest model with best performance (`anthropic/claude-sonnet-4-20250514`)
- **Claude 3.5 Sonnet**: Reliable performance (`anthropic/claude-3-5-sonnet-20240620`)
- **Computer-Use Preview**: Specialized for computer tasks (`openai/computer-use-preview`)
- **Local Models**: For privacy-sensitive environments
- **Ollama**: For offline usage
- **Claude Sonnet 4**: Latest model with best performance (`anthropic/claude-sonnet-4-20250514`)
- **Claude 3.5 Sonnet**: Reliable performance (`anthropic/claude-3-5-sonnet-20240620`)
- **Computer-Use Preview**: Specialized for computer tasks (`openai/computer-use-preview`)
- **Local Models**: For privacy-sensitive environments
- **Ollama**: For offline usage

View File

@@ -7,7 +7,11 @@ github:
---
<Callout>
A corresponding <a href="https://github.com/trycua/cua/blob/main/examples/som_examples.py" target="_blank">Python example</a> is available for this documentation.
A corresponding{' '}
<a href="https://github.com/trycua/cua/blob/main/examples/som_examples.py" target="_blank">
Python example
</a>{' '}
is available for this documentation.
</Callout>
## Overview

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.2 MiB

View File

@@ -53,6 +53,10 @@ async def run_agent_example():
# == Omniparser + Any LLM ==
# model="omniparser+anthropic/claude-opus-4-20250514",
# model="omniparser+ollama_chat/gemma3:12b-it-q4_K_M",
# == Omniparser + Vertex AI Gemini 3 (with thinking_level) ==
# model="omni+vertex_ai/gemini-3-flash",
# thinking_level="high", # or "low"
# media_resolution="medium", # or "low" or "high"
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.DEBUG,