mirror of
https://github.com/trycua/lume.git
synced 2025-12-21 12:20:01 -06:00
add gif for demo
This commit is contained in:
@@ -4,7 +4,6 @@ If you've been building computer-use agents, you know the reality: every model p
|
||||
|
||||
Today we're launching the **Cua VLM Router**: a managed inference API that gives you unified access to multiple vision-language model providers through a single API key. We're starting with Anthropic's Claude models (Sonnet 4.5 and Haiku 4.5)—some of the most loved and widely-used computer-use models in the Cua ecosystem - with more providers coming soon.
|
||||
|
||||
|
||||

|
||||
|
||||
## What You Get
|
||||
@@ -12,21 +11,25 @@ Today we're launching the **Cua VLM Router**: a managed inference API that gives
|
||||
The Cua VLM Router handles the infrastructure so you can focus on building:
|
||||
|
||||
**Single API Key**
|
||||
|
||||
- One key for all model providers (no juggling multiple credentials)
|
||||
- Works for both model inference and sandbox access
|
||||
- Manage everything from one dashboard at cua.ai
|
||||
|
||||
**Smart Routing**
|
||||
|
||||
- Automatic provider selection for optimal availability and performance
|
||||
- For Anthropic models, we route to the best provider (Anthropic, AWS Bedrock, or Microsoft Foundry)
|
||||
- No configuration needed—just specify the model and we handle the rest
|
||||
|
||||
**Cost Tracking & Optimization**
|
||||
|
||||
- Unified usage dashboard across all models
|
||||
- Real-time credit balance tracking
|
||||
- Detailed cost breakdown per request (gateway cost + upstream cost)
|
||||
|
||||
**Production-Ready**
|
||||
|
||||
- OpenAI-compatible API (drop-in replacement for existing code)
|
||||
- Full streaming support with Server-Sent Events
|
||||
- Metadata about routing decisions in every response
|
||||
@@ -35,10 +38,10 @@ The Cua VLM Router handles the infrastructure so you can focus on building:
|
||||
|
||||
We're starting with Anthropic's latest Claude models:
|
||||
|
||||
| Model | Best For |
|
||||
|-------|----------|
|
||||
| Model | Best For |
|
||||
| --------------------------------- | ---------------------------------- |
|
||||
| `cua/anthropic/claude-sonnet-4.5` | General-purpose tasks, recommended |
|
||||
| `cua/anthropic/claude-haiku-4.5` | Fast responses, cost-effective |
|
||||
| `cua/anthropic/claude-haiku-4.5` | Fast responses, cost-effective |
|
||||
|
||||
## How It Works
|
||||
|
||||
@@ -85,12 +88,14 @@ async for result in agent.run(messages):
|
||||
Already using Anthropic directly? Just add the `cua/` prefix:
|
||||
|
||||
**Before:**
|
||||
|
||||
```python
|
||||
export ANTHROPIC_API_KEY="sk-ant-..."
|
||||
agent = ComputerAgent(model="anthropic/claude-sonnet-4-5-20250929")
|
||||
```
|
||||
|
||||
**After:**
|
||||
|
||||
```python
|
||||
export CUA_API_KEY="sk_cua-api01_..."
|
||||
agent = ComputerAgent(model="cua/anthropic/claude-sonnet-4.5")
|
||||
|
||||
@@ -11,11 +11,13 @@ Today we're launching the **Cua CLI**: a command-line interface that brings the
|
||||
The Cua CLI handles everything you need to work with Cloud Sandboxes:
|
||||
|
||||
**Authentication**
|
||||
|
||||
- Browser-based OAuth login with automatic credential storage
|
||||
- Direct API key support for CI/CD pipelines
|
||||
- Export credentials to `.env` files for SDK integration
|
||||
|
||||
**Sandbox Management**
|
||||
|
||||
- Create sandboxes with your choice of OS, size, and region
|
||||
- List all your sandboxes with status and connection details
|
||||
- Start, stop, restart, and delete sandboxes
|
||||
@@ -123,17 +125,20 @@ await computer.run()
|
||||
Create sandboxes in the size and region that fits your needs:
|
||||
|
||||
**Sizes:**
|
||||
|
||||
- `small` - 2 cores, 8 GB RAM, 128 GB SSD
|
||||
- `medium` - 4 cores, 16 GB RAM, 128 GB SSD
|
||||
- `large` - 8 cores, 32 GB RAM, 256 GB SSD
|
||||
|
||||
**Regions:**
|
||||
|
||||
- `north-america`
|
||||
- `europe`
|
||||
- `asia-pacific`
|
||||
- `south-america`
|
||||
|
||||
**OS Options:**
|
||||
|
||||
- `linux` - Ubuntu with XFCE desktop
|
||||
- `windows` - Windows 11 with Edge and Python
|
||||
- `macos` - macOS (preview access)
|
||||
@@ -141,6 +146,7 @@ Create sandboxes in the size and region that fits your needs:
|
||||
## Example Workflows
|
||||
|
||||
**Quick Testing Environment**
|
||||
|
||||
```bash
|
||||
# Spin up a sandbox, test something, tear it down
|
||||
cua sb create --os linux --size small --region north-america
|
||||
@@ -149,6 +155,7 @@ cua sb delete my-sandbox-abc123
|
||||
```
|
||||
|
||||
**Persistent Development Sandbox**
|
||||
|
||||
```bash
|
||||
# Create a sandbox for long-term use
|
||||
cua sb create --os linux --size medium --region north-america
|
||||
@@ -221,11 +228,13 @@ Yes. The CLI and dashboard share the same API. Any sandbox you create in the das
|
||||
<summary><strong>How do I update the CLI?</strong></summary>
|
||||
|
||||
If you installed via script:
|
||||
|
||||
```bash
|
||||
curl -LsSf https://cua.ai/cli/install.sh | sh
|
||||
```
|
||||
|
||||
If you installed via npm:
|
||||
|
||||
```bash
|
||||
npm install -g @trycua/cli@latest
|
||||
```
|
||||
@@ -235,6 +244,7 @@ npm install -g @trycua/cli@latest
|
||||
## What's Next
|
||||
|
||||
We're actively iterating based on feedback. Planned features include:
|
||||
|
||||
- SSH key management for secure sandbox access
|
||||
- Template-based sandbox creation
|
||||
- Batch operations (start/stop multiple sandboxes)
|
||||
|
||||
@@ -4,7 +4,11 @@ description: Supported computer-using agent loops and models
|
||||
---
|
||||
|
||||
<Callout>
|
||||
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/agent_nb.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
|
||||
A corresponding{' '}
|
||||
<a href="https://github.com/trycua/cua/blob/main/notebooks/agent_nb.ipynb" target="_blank">
|
||||
Jupyter Notebook
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
An agent can be thought of as a loop - it generates actions, executes them, and repeats until done:
|
||||
|
||||
@@ -3,7 +3,14 @@ title: Customize ComputerAgent
|
||||
---
|
||||
|
||||
<Callout>
|
||||
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/customizing_computeragent.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
|
||||
A corresponding{' '}
|
||||
<a
|
||||
href="https://github.com/trycua/cua/blob/main/notebooks/customizing_computeragent.ipynb"
|
||||
target="_blank"
|
||||
>
|
||||
Jupyter Notebook
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
The `ComputerAgent` interface provides an easy proxy to any computer-using model configuration, and it is a powerful framework for extending and building your own agentic systems.
|
||||
|
||||
@@ -4,7 +4,11 @@ description: Use ComputerAgent with HUD for benchmarking and evaluation
|
||||
---
|
||||
|
||||
<Callout>
|
||||
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/eval_osworld.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
|
||||
A corresponding{' '}
|
||||
<a href="https://github.com/trycua/cua/blob/main/notebooks/eval_osworld.ipynb" target="_blank">
|
||||
Jupyter Notebook
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
The HUD integration allows an agent to be benchmarked using the [HUD framework](https://www.hud.so/). Through the HUD integration, the agent controls a computer inside HUD, where tests are run to evaluate the success of each task.
|
||||
|
||||
@@ -59,4 +59,8 @@ you will see all the agent execution steps, including computer actions, LLM call
|
||||
|
||||
For each step, you will see the LLM call, the computer action. The computer actions are highlighted in the timeline in yellow.
|
||||
|
||||
<img src="/docs/img/laminar_trace_example.png" alt="Example trace in Laminar showing the litellm.response span and its output." width="800px" />
|
||||
<img
|
||||
src="/docs/img/laminar_trace_example.png"
|
||||
alt="Example trace in Laminar showing the litellm.response span and its output."
|
||||
width="800px"
|
||||
/>
|
||||
|
||||
@@ -55,10 +55,10 @@ async for result in agent.run(messages):
|
||||
|
||||
The CUA VLM Router currently supports these models:
|
||||
|
||||
| Model ID | Provider | Description | Best For |
|
||||
|----------|----------|-------------|----------|
|
||||
| Model ID | Provider | Description | Best For |
|
||||
| --------------------------------- | --------- | ----------------- | ---------------------------------- |
|
||||
| `cua/anthropic/claude-sonnet-4.5` | Anthropic | Claude Sonnet 4.5 | General-purpose tasks, recommended |
|
||||
| `cua/anthropic/claude-haiku-4.5` | Anthropic | Claude Haiku 4.5 | Fast responses, cost-effective |
|
||||
| `cua/anthropic/claude-haiku-4.5` | Anthropic | Claude Haiku 4.5 | Fast responses, cost-effective |
|
||||
|
||||
## How It Works
|
||||
|
||||
@@ -95,6 +95,7 @@ GET /v1/models
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
@@ -117,12 +118,11 @@ Content-Type: application/json
|
||||
```
|
||||
|
||||
**Request:**
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "anthropic/claude-sonnet-4.5",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello!"}
|
||||
],
|
||||
"messages": [{ "role": "user", "content": "Hello!" }],
|
||||
"max_tokens": 100,
|
||||
"temperature": 0.7,
|
||||
"stream": false
|
||||
@@ -130,20 +130,23 @@ Content-Type: application/json
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "gen_...",
|
||||
"object": "chat.completion",
|
||||
"created": 1763554838,
|
||||
"model": "anthropic/claude-sonnet-4.5",
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "Hello! How can I help you today?"
|
||||
},
|
||||
"finish_reason": "stop"
|
||||
}],
|
||||
"choices": [
|
||||
{
|
||||
"index": 0,
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "Hello! How can I help you today?"
|
||||
},
|
||||
"finish_reason": "stop"
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 10,
|
||||
"completion_tokens": 12,
|
||||
@@ -170,6 +173,7 @@ curl -X POST https://inference.cua.ai/v1/chat/completions \
|
||||
```
|
||||
|
||||
**Response (SSE format):**
|
||||
|
||||
```
|
||||
data: {"id":"gen_...","choices":[{"delta":{"content":"1"}}],"object":"chat.completion.chunk"}
|
||||
|
||||
@@ -187,6 +191,7 @@ GET /v1/balance
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"balance": 211689.85,
|
||||
@@ -201,6 +206,7 @@ CUA VLM Router provides detailed cost information in every response:
|
||||
### Credit System
|
||||
|
||||
Requests are billed in **credits**:
|
||||
|
||||
- Credits are deducted from your CUA account balance
|
||||
- Prices vary by model and usage
|
||||
- CUA manages all provider API keys and infrastructure
|
||||
@@ -210,8 +216,8 @@ Requests are billed in **credits**:
|
||||
```json
|
||||
{
|
||||
"usage": {
|
||||
"cost": 0.01, // CUA gateway cost in credits
|
||||
"market_cost": 0.000065 // Actual upstream API cost
|
||||
"cost": 0.01, // CUA gateway cost in credits
|
||||
"market_cost": 0.000065 // Actual upstream API cost
|
||||
}
|
||||
}
|
||||
```
|
||||
@@ -251,19 +257,20 @@ agent = ComputerAgent(
|
||||
|
||||
## Benefits Over Direct Provider Access
|
||||
|
||||
| Feature | CUA VLM Router | Direct Provider (BYOK) |
|
||||
|---------|---------------|------------------------|
|
||||
| **Single API Key** | ✅ One key for all providers | ❌ Multiple keys to manage |
|
||||
| **Managed Infrastructure** | ✅ No API key management | ❌ Manage multiple provider keys |
|
||||
| **Usage Tracking** | ✅ Unified dashboard | ❌ Per-provider tracking |
|
||||
| **Model Switching** | ✅ Change model string only | ❌ Change code + keys |
|
||||
| **Setup Complexity** | ✅ One environment variable | ❌ Multiple environment variables |
|
||||
| Feature | CUA VLM Router | Direct Provider (BYOK) |
|
||||
| -------------------------- | ---------------------------- | --------------------------------- |
|
||||
| **Single API Key** | ✅ One key for all providers | ❌ Multiple keys to manage |
|
||||
| **Managed Infrastructure** | ✅ No API key management | ❌ Manage multiple provider keys |
|
||||
| **Usage Tracking** | ✅ Unified dashboard | ❌ Per-provider tracking |
|
||||
| **Model Switching** | ✅ Change model string only | ❌ Change code + keys |
|
||||
| **Setup Complexity** | ✅ One environment variable | ❌ Multiple environment variables |
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common Error Responses
|
||||
|
||||
#### Invalid API Key
|
||||
|
||||
```json
|
||||
{
|
||||
"detail": "Insufficient credits. Current balance: 0.00 credits"
|
||||
@@ -271,6 +278,7 @@ agent = ComputerAgent(
|
||||
```
|
||||
|
||||
#### Missing Authorization
|
||||
|
||||
```json
|
||||
{
|
||||
"detail": "Missing Authorization: Bearer token"
|
||||
@@ -278,6 +286,7 @@ agent = ComputerAgent(
|
||||
```
|
||||
|
||||
#### Invalid Model
|
||||
|
||||
```json
|
||||
{
|
||||
"detail": "Invalid or unavailable model"
|
||||
@@ -343,6 +352,7 @@ agent = ComputerAgent(
|
||||
Switching from direct provider access (BYOK) to CUA VLM Router is simple:
|
||||
|
||||
**Before (Direct Provider Access with BYOK):**
|
||||
|
||||
```python
|
||||
import os
|
||||
# Required: Provider-specific API key
|
||||
@@ -355,6 +365,7 @@ agent = ComputerAgent(
|
||||
```
|
||||
|
||||
**After (CUA VLM Router - Cloud Service):**
|
||||
|
||||
```python
|
||||
import os
|
||||
# Required: CUA API key only (no provider keys needed)
|
||||
|
||||
@@ -14,6 +14,7 @@ model="cua/anthropic/claude-haiku-4.5" # Claude Haiku 4.5 (faster)
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
|
||||
- Single API key for multiple providers
|
||||
- Cost tracking and optimization
|
||||
- Fully managed infrastructure (no provider keys to manage)
|
||||
|
||||
@@ -19,6 +19,7 @@ Cua collects anonymized usage and error statistics. We follow [Posthog's ethical
|
||||
### Disabled by default (opt-in)
|
||||
|
||||
**Trajectory logging** captures full conversation history:
|
||||
|
||||
- User messages and agent responses
|
||||
- Computer actions and outputs
|
||||
- Agent reasoning traces
|
||||
|
||||
@@ -3,7 +3,8 @@ title: Computer UI (Deprecated)
|
||||
---
|
||||
|
||||
<Callout type="warn" title="Deprecated">
|
||||
The Computer UI is deprecated and will be replaced with a revamped playground experience soon. We recommend using VNC or Screen Sharing for precise control of the computer instead.
|
||||
The Computer UI is deprecated and will be replaced with a revamped playground experience soon. We
|
||||
recommend using VNC or Screen Sharing for precise control of the computer instead.
|
||||
</Callout>
|
||||
|
||||
The computer module includes a Gradio UI for creating and sharing demonstration data. We make it easy for people to build community datasets for better computer use models with an upload to Huggingface feature.
|
||||
|
||||
@@ -4,7 +4,14 @@ slug: sandboxed-python
|
||||
---
|
||||
|
||||
<Callout>
|
||||
A corresponding <a href="https://github.com/trycua/cua/blob/main/examples/sandboxed_functions_examples.py" target="_blank">Python example</a> is available for this documentation.
|
||||
A corresponding{' '}
|
||||
<a
|
||||
href="https://github.com/trycua/cua/blob/main/examples/sandboxed_functions_examples.py"
|
||||
target="_blank"
|
||||
>
|
||||
Python example
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
You can run Python functions securely inside a sandboxed virtual environment on a remote Cua Computer. This is useful for executing untrusted user code, isolating dependencies, or providing a safe environment for automation tasks.
|
||||
|
||||
@@ -473,6 +473,7 @@ python form_filling.py
|
||||
```
|
||||
|
||||
The agent will:
|
||||
|
||||
1. Download the PDF resume from Overleaf
|
||||
2. Extract information from the PDF
|
||||
3. Fill out the JotForm with the extracted information
|
||||
|
||||
@@ -11,6 +11,12 @@ import { Callout } from 'fumadocs-ui/components/callout';
|
||||
|
||||
This example demonstrates how to use Google's Gemini 3 models with OmniParser for complex GUI grounding tasks. Gemini 3 Pro achieves exceptional performance on the [ScreenSpot-Pro benchmark](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding) with a **72.7% accuracy** (compared to Claude Sonnet 4.5's 36.2%), making it ideal for precise UI element location and complex navigation tasks.
|
||||
|
||||
<img
|
||||
src="/docs/img/grounding-with-gemini3.gif"
|
||||
alt="Demo of Gemini 3 with OmniParser performing complex GUI navigation tasks"
|
||||
width="800px"
|
||||
/>
|
||||
|
||||
<Callout type="info" title="Why Gemini 3 for UI Navigation?">
|
||||
According to [Google's Gemini 3 announcement](https://blog.google/products/gemini/gemini-3/),
|
||||
Gemini 3 Pro achieves: - **72.7%** on ScreenSpot-Pro (vs. Gemini 2.5 Pro's 11.4%) -
|
||||
|
||||
@@ -441,6 +441,7 @@ python contact_export.py
|
||||
```
|
||||
|
||||
The agent will:
|
||||
|
||||
1. Navigate to your LinkedIn connections page
|
||||
2. Extract data from 20 contacts (first name, last name, role, company, LinkedIn URL)
|
||||
3. Save contacts to a timestamped CSV file
|
||||
|
||||
@@ -11,19 +11,23 @@ import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
|
||||
This guide demonstrates how to automate Windows desktop applications (like eGecko HR/payroll systems) that run behind corporate VPN. This is a common enterprise scenario where legacy desktop applications require manual data entry, report generation, or workflow execution.
|
||||
|
||||
**Use cases:**
|
||||
|
||||
- HR/payroll processing (employee onboarding, payroll runs, benefits administration)
|
||||
- Desktop ERP systems behind corporate networks
|
||||
- Legacy financial applications requiring VPN access
|
||||
- Compliance reporting from on-premise systems
|
||||
|
||||
**Architecture:**
|
||||
|
||||
- Client-side Cua agent (Python SDK or Playground UI)
|
||||
- Windows VM/Sandbox with VPN client configured
|
||||
- RDP/remote desktop connection to target environment
|
||||
- Desktop application automation via computer vision and UI control
|
||||
|
||||
<Callout type="info">
|
||||
**Production Deployment**: For production use, consider workflow mining and custom finetuning to create vertical-specific actions (e.g., "Run payroll", "Onboard employee") instead of generic UI automation. This provides better audit trails and higher success rates.
|
||||
**Production Deployment**: For production use, consider workflow mining and custom finetuning to
|
||||
create vertical-specific actions (e.g., "Run payroll", "Onboard employee") instead of generic UI
|
||||
automation. This provides better audit trails and higher success rates.
|
||||
</Callout>
|
||||
|
||||
---
|
||||
@@ -31,7 +35,11 @@ This guide demonstrates how to automate Windows desktop applications (like eGeck
|
||||
## Video Demo
|
||||
|
||||
<div className="rounded-lg border bg-card text-card-foreground shadow-sm p-4 mb-6">
|
||||
<video src="https://github.com/user-attachments/assets/8ab07646-6018-4128-87ce-53180cfea696" controls className="w-full rounded">
|
||||
<video
|
||||
src="https://github.com/user-attachments/assets/8ab07646-6018-4128-87ce-53180cfea696"
|
||||
controls
|
||||
className="w-full rounded"
|
||||
>
|
||||
Your browser does not support the video tag.
|
||||
</video>
|
||||
<div className="text-sm text-muted-foreground mt-2">
|
||||
@@ -106,7 +114,8 @@ For local development on Windows 10 Pro/Enterprise or Windows 11:
|
||||
4. Configure your desktop application installation within the sandbox
|
||||
|
||||
<Callout type="warn">
|
||||
**Manual VPN Setup**: Windows Sandbox requires manual VPN configuration each time it starts. For production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections.
|
||||
**Manual VPN Setup**: Windows Sandbox requires manual VPN configuration each time it starts. For
|
||||
production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections.
|
||||
</Callout>
|
||||
|
||||
</Tab>
|
||||
@@ -421,6 +430,7 @@ python hr_automation.py
|
||||
```
|
||||
|
||||
The agent will:
|
||||
|
||||
1. Connect to your Windows environment (with VPN if configured)
|
||||
2. Launch and navigate the desktop application
|
||||
3. Execute each workflow step sequentially
|
||||
@@ -506,6 +516,7 @@ agent = ComputerAgent(
|
||||
### 1. Workflow Mining
|
||||
|
||||
Before deploying, analyze your actual workflows:
|
||||
|
||||
- Record user interactions with the application
|
||||
- Identify common patterns and edge cases
|
||||
- Map out decision trees and validation requirements
|
||||
@@ -524,6 +535,7 @@ tasks = ["onboard_employee", "run_payroll", "generate_compliance_report"]
|
||||
```
|
||||
|
||||
This provides:
|
||||
|
||||
- Better audit trails
|
||||
- Approval gates at business logic level
|
||||
- Higher success rates
|
||||
@@ -547,12 +559,14 @@ agent = ComputerAgent(
|
||||
Choose your deployment model:
|
||||
|
||||
**Managed (Recommended)**
|
||||
|
||||
- Cua hosts Windows sandboxes, VPN/RDP stack, and agent runtime
|
||||
- You get UI/API endpoints for triggering workflows
|
||||
- Automatic scaling, monitoring, and maintenance
|
||||
- SLA guarantees and enterprise support
|
||||
|
||||
**Self-Hosted**
|
||||
|
||||
- You manage Windows VMs, VPN infrastructure, and agent deployment
|
||||
- Full control over data and security
|
||||
- Custom network configurations
|
||||
|
||||
@@ -5,7 +5,8 @@ title: Introduction
|
||||
import { Monitor, Code, BookOpen, Zap, Bot, Boxes, Rocket } from 'lucide-react';
|
||||
|
||||
<div className="rounded-lg border bg-card text-card-foreground shadow-sm px-4 py-2 mb-6">
|
||||
Cua is an open-source framework for building **Computer-Use Agents** - AI systems that see, understand, and interact with desktop applications through vision and action, just like humans do.
|
||||
Cua is an open-source framework for building **Computer-Use Agents** - AI systems that see,
|
||||
understand, and interact with desktop applications through vision and action, just like humans do.
|
||||
</div>
|
||||
|
||||
## Why Cua?
|
||||
|
||||
@@ -7,7 +7,14 @@ github:
|
||||
---
|
||||
|
||||
<Callout>
|
||||
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/computer_server_nb.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
|
||||
A corresponding{' '}
|
||||
<a
|
||||
href="https://github.com/trycua/cua/blob/main/notebooks/computer_server_nb.ipynb"
|
||||
target="_blank"
|
||||
>
|
||||
Jupyter Notebook
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
The Computer Server API reference documentation is currently under development.
|
||||
|
||||
@@ -15,6 +15,7 @@ The CUA CLI provides commands for authentication and sandbox management.
|
||||
The CLI supports **two command styles** for flexibility:
|
||||
|
||||
**Flat style** (quick & concise):
|
||||
|
||||
```bash
|
||||
cua list
|
||||
cua create --os linux --size small --region north-america
|
||||
@@ -22,6 +23,7 @@ cua start my-sandbox
|
||||
```
|
||||
|
||||
**Grouped style** (explicit & clear):
|
||||
|
||||
```bash
|
||||
cua sb list # or: cua sandbox list
|
||||
cua sb create # or: cua sandbox create
|
||||
@@ -54,9 +56,11 @@ cua login --api-key sk-your-api-key-here
|
||||
```
|
||||
|
||||
**Options:**
|
||||
|
||||
- `--api-key <key>` - Provide API key directly instead of browser flow
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
$ cua auth login
|
||||
Opening browser for CLI auth...
|
||||
@@ -75,12 +79,14 @@ cua env
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
$ cua auth env
|
||||
Wrote /path/to/your/project/.env
|
||||
```
|
||||
|
||||
The generated `.env` file will contain:
|
||||
|
||||
```
|
||||
CUA_API_KEY=sk-your-api-key-here
|
||||
```
|
||||
@@ -97,6 +103,7 @@ cua logout
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
$ cua auth logout
|
||||
Logged out
|
||||
@@ -121,6 +128,7 @@ cua ps
|
||||
```
|
||||
|
||||
**Example Output (default, passwords hidden):**
|
||||
|
||||
```
|
||||
NAME STATUS HOST
|
||||
my-dev-sandbox running my-dev-sandbox.sandbox.cua.ai
|
||||
@@ -128,6 +136,7 @@ test-windows stopped test-windows.sandbox.cua.ai
|
||||
```
|
||||
|
||||
**Example Output (with --show-passwords):**
|
||||
|
||||
```
|
||||
NAME STATUS PASSWORD HOST
|
||||
my-dev-sandbox running secure-pass-123 my-dev-sandbox.sandbox.cua.ai
|
||||
@@ -143,11 +152,13 @@ cua create --os <OS> --size <SIZE> --region <REGION>
|
||||
```
|
||||
|
||||
**Required Options:**
|
||||
|
||||
- `--os` - Operating system: `linux`, `windows`, `macos`
|
||||
- `--size` - Sandbox size: `small`, `medium`, `large`
|
||||
- `--region` - Region: `north-america`, `europe`, `asia-pacific`, `south-america`
|
||||
|
||||
**Examples:**
|
||||
|
||||
```bash
|
||||
# Create a small Linux sandbox in North America
|
||||
cua create --os linux --size small --region north-america
|
||||
@@ -162,6 +173,7 @@ cua create --os macos --size large --region asia-pacific
|
||||
**Response Types:**
|
||||
|
||||
**Immediate (Status 200):**
|
||||
|
||||
```bash
|
||||
Sandbox created and ready: my-new-sandbox-abc123
|
||||
Password: secure-password-here
|
||||
@@ -169,6 +181,7 @@ Host: my-new-sandbox-abc123.sandbox.cua.ai
|
||||
```
|
||||
|
||||
**Provisioning (Status 202):**
|
||||
|
||||
```bash
|
||||
Sandbox provisioning started: my-new-sandbox-abc123
|
||||
Job ID: job-xyz789
|
||||
@@ -184,6 +197,7 @@ cua start <name>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
$ cua start my-dev-sandbox
|
||||
Start accepted
|
||||
@@ -198,6 +212,7 @@ cua stop <name>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
$ cua stop my-dev-sandbox
|
||||
stopping
|
||||
@@ -212,6 +227,7 @@ cua restart <name>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
$ cua restart my-dev-sandbox
|
||||
restarting
|
||||
@@ -226,6 +242,7 @@ cua delete <name>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
$ cua delete old-test-sandbox
|
||||
Sandbox deletion initiated: deleting
|
||||
@@ -247,6 +264,7 @@ cua open <name>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
$ cua vnc my-dev-sandbox
|
||||
Opening NoVNC: https://my-dev-sandbox.sandbox.cua.ai/vnc.html?autoconnect=true&password=...
|
||||
@@ -254,7 +272,6 @@ Opening NoVNC: https://my-dev-sandbox.sandbox.cua.ai/vnc.html?autoconnect=true&p
|
||||
|
||||
This command automatically opens your default browser to the VNC interface with the correct password pre-filled.
|
||||
|
||||
|
||||
## Global Options
|
||||
|
||||
### Help
|
||||
@@ -273,18 +290,21 @@ cua list --help
|
||||
The CLI provides clear error messages for common issues:
|
||||
|
||||
### Authentication Errors
|
||||
|
||||
```bash
|
||||
$ cua list
|
||||
Unauthorized. Try 'cua auth login' again.
|
||||
```
|
||||
|
||||
### Sandbox Not Found
|
||||
|
||||
```bash
|
||||
$ cua start nonexistent-sandbox
|
||||
Sandbox not found
|
||||
```
|
||||
|
||||
### Invalid Configuration
|
||||
|
||||
```bash
|
||||
$ cua create --os invalid --configuration small --region north-america
|
||||
Invalid request or unsupported configuration
|
||||
@@ -293,6 +313,7 @@ Invalid request or unsupported configuration
|
||||
## Tips and Best Practices
|
||||
|
||||
### 1. Use Descriptive Sandbox Names
|
||||
|
||||
```bash
|
||||
# Good
|
||||
cua create --os linux --size small --region north-america
|
||||
@@ -304,6 +325,7 @@ cua list # Check the generated name
|
||||
```
|
||||
|
||||
### 2. Environment Management
|
||||
|
||||
```bash
|
||||
# Set up your project with API key
|
||||
cd my-project
|
||||
@@ -312,6 +334,7 @@ cua auth env
|
||||
```
|
||||
|
||||
### 3. Quick Sandbox Access
|
||||
|
||||
```bash
|
||||
# Create aliases for frequently used sandboxes
|
||||
alias dev-sandbox="cua vnc my-development-sandbox"
|
||||
@@ -319,6 +342,7 @@ alias prod-sandbox="cua vnc my-production-sandbox"
|
||||
```
|
||||
|
||||
### 4. Monitoring Provisioning
|
||||
|
||||
```bash
|
||||
# For sandboxes that need provisioning time
|
||||
cua create --os windows --size large --region europe
|
||||
|
||||
@@ -34,16 +34,19 @@ cua sb list
|
||||
## Use Cases
|
||||
|
||||
### Development Workflow
|
||||
|
||||
- Quickly spin up cloud sandboxes for testing
|
||||
- Manage multiple sandboxes across different regions
|
||||
- Integrate with CI/CD pipelines
|
||||
|
||||
### Team Collaboration
|
||||
|
||||
- Share sandbox configurations and access
|
||||
- Standardize development environments
|
||||
- Quick onboarding for new team members
|
||||
|
||||
### Automation
|
||||
|
||||
- Script sandbox provisioning and management
|
||||
- Integrate with deployment workflows
|
||||
- Automate environment setup
|
||||
|
||||
@@ -11,24 +11,21 @@ import { Callout } from 'fumadocs-ui/components/callout';
|
||||
The fastest way to install the CUA CLI is using our installation scripts:
|
||||
|
||||
<Tabs items={['macOS / Linux', 'Windows']}>
|
||||
<Tab value="macOS / Linux">
|
||||
```bash
|
||||
curl -LsSf https://cua.ai/cli/install.sh | sh
|
||||
```
|
||||
</Tab>
|
||||
<Tab value="macOS / Linux">```bash curl -LsSf https://cua.ai/cli/install.sh | sh ```</Tab>
|
||||
<Tab value="Windows">
|
||||
```powershell
|
||||
powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
|
||||
```powershell powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
These scripts will automatically:
|
||||
|
||||
1. Install [Bun](https://bun.sh) (a fast JavaScript runtime)
|
||||
2. Install the CUA CLI via `bun add -g @trycua/cli`
|
||||
|
||||
<Callout type="info">
|
||||
The installation scripts will automatically detect your system and install the appropriate binary to your PATH.
|
||||
The installation scripts will automatically detect your system and install the appropriate binary
|
||||
to your PATH.
|
||||
</Callout>
|
||||
|
||||
## Alternative: Install with Bun
|
||||
@@ -44,8 +41,8 @@ bun add -g @trycua/cli
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
Using Bun provides faster installation and better performance compared to npm.
|
||||
If you don't have Bun installed, the first command will install it for you.
|
||||
Using Bun provides faster installation and better performance compared to npm. If you don't have
|
||||
Bun installed, the first command will install it for you.
|
||||
</Callout>
|
||||
|
||||
## Verify Installation
|
||||
@@ -76,40 +73,21 @@ To update to the latest version:
|
||||
|
||||
<Tabs items={['Script Install', 'npm Install']}>
|
||||
<Tab value="Script Install">
|
||||
Re-run the installation script:
|
||||
```bash
|
||||
# macOS/Linux
|
||||
curl -LsSf https://cua.ai/cli/install.sh | sh
|
||||
|
||||
# Windows
|
||||
powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
|
||||
```
|
||||
</Tab>
|
||||
<Tab value="npm Install">
|
||||
```bash
|
||||
npm update -g @trycua/cli
|
||||
Re-run the installation script: ```bash # macOS/Linux curl -LsSf https://cua.ai/cli/install.sh |
|
||||
sh # Windows powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
|
||||
```
|
||||
</Tab>
|
||||
<Tab value="npm Install">```bash npm update -g @trycua/cli ```</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Uninstalling
|
||||
|
||||
<Tabs items={['Script Install', 'npm Install']}>
|
||||
<Tab value="Script Install">
|
||||
Remove the binary from your PATH:
|
||||
```bash
|
||||
# macOS/Linux
|
||||
rm $(which cua)
|
||||
|
||||
# Windows
|
||||
# Remove from your PATH or delete the executable
|
||||
```
|
||||
</Tab>
|
||||
<Tab value="npm Install">
|
||||
```bash
|
||||
npm uninstall -g @trycua/cli
|
||||
```
|
||||
Remove the binary from your PATH: ```bash # macOS/Linux rm $(which cua) # Windows # Remove from
|
||||
your PATH or delete the executable ```
|
||||
</Tab>
|
||||
<Tab value="npm Install">```bash npm uninstall -g @trycua/cli ```</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Troubleshooting
|
||||
@@ -128,17 +106,12 @@ If you encounter permission issues during installation:
|
||||
|
||||
<Tabs items={['macOS / Linux', 'Windows']}>
|
||||
<Tab value="macOS / Linux">
|
||||
Try running with sudo (not recommended for the curl method):
|
||||
```bash
|
||||
# If using npm
|
||||
sudo npm install -g @trycua/cli
|
||||
```
|
||||
Try running with sudo (not recommended for the curl method): ```bash # If using npm sudo npm
|
||||
install -g @trycua/cli ```
|
||||
</Tab>
|
||||
<Tab value="Windows">
|
||||
Run PowerShell as Administrator:
|
||||
```powershell
|
||||
# Right-click PowerShell and "Run as Administrator"
|
||||
powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
|
||||
Run PowerShell as Administrator: ```powershell # Right-click PowerShell and "Run as
|
||||
Administrator" powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
@@ -30,13 +30,15 @@ To use with Claude Desktop, add an entry to your Claude Desktop configuration (`
|
||||
If you're working with the CUA source code:
|
||||
|
||||
**Standard VM Mode:**
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"cua-agent": {
|
||||
"command": "/usr/bin/env",
|
||||
"args": [
|
||||
"bash", "-lc",
|
||||
"bash",
|
||||
"-lc",
|
||||
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
|
||||
]
|
||||
}
|
||||
@@ -45,13 +47,15 @@ If you're working with the CUA source code:
|
||||
```
|
||||
|
||||
**Host Computer Control Mode:**
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"cua-agent": {
|
||||
"command": "/usr/bin/env",
|
||||
"args": [
|
||||
"bash", "-lc",
|
||||
"bash",
|
||||
"-lc",
|
||||
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; export CUA_USE_HOST_COMPUTER_SERVER='true'; export CUA_MAX_IMAGES='1'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
|
||||
]
|
||||
}
|
||||
@@ -62,6 +66,7 @@ If you're working with the CUA source code:
|
||||
**Note**: Replace `/path/to/cua` with the absolute path to your CUA repository directory.
|
||||
|
||||
**⚠️ Host Computer Control Setup**: When using `CUA_USE_HOST_COMPUTER_SERVER='true'`, you must also:
|
||||
|
||||
1. Install computer server dependencies: `python3 -m pip install uvicorn fastapi`
|
||||
2. Install the computer server: `python3 -m pip install -e libs/python/computer-server --break-system-packages`
|
||||
3. Start the computer server: `python -m computer_server --log-level debug`
|
||||
|
||||
@@ -4,18 +4,19 @@ title: Configuration
|
||||
|
||||
The server is configured using environment variables (can be set in the Claude Desktop config):
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `CUA_MODEL_NAME` | Model string (e.g., "anthropic/claude-sonnet-4-20250514", "anthropic/claude-3-5-sonnet-20240620", "openai/computer-use-preview", "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", "omniparser+litellm/gpt-4o", "omniparser+ollama_chat/gemma3") | anthropic/claude-sonnet-4-20250514 |
|
||||
| `ANTHROPIC_API_KEY` | Your Anthropic API key (required for Anthropic models) | None |
|
||||
| `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 |
|
||||
| `CUA_USE_HOST_COMPUTER_SERVER` | Target your local desktop instead of a VM. Set to "true" to use your host system. **Warning:** AI models may perform risky actions. | false |
|
||||
| Variable | Description | Default |
|
||||
| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- |
|
||||
| `CUA_MODEL_NAME` | Model string (e.g., "anthropic/claude-sonnet-4-20250514", "anthropic/claude-3-5-sonnet-20240620", "openai/computer-use-preview", "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", "omniparser+litellm/gpt-4o", "omniparser+ollama_chat/gemma3") | anthropic/claude-sonnet-4-20250514 |
|
||||
| `ANTHROPIC_API_KEY` | Your Anthropic API key (required for Anthropic models) | None |
|
||||
| `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 |
|
||||
| `CUA_USE_HOST_COMPUTER_SERVER` | Target your local desktop instead of a VM. Set to "true" to use your host system. **Warning:** AI models may perform risky actions. | false |
|
||||
|
||||
## Model Configuration
|
||||
|
||||
The `CUA_MODEL_NAME` environment variable supports various model providers through LiteLLM integration:
|
||||
|
||||
### Supported Providers
|
||||
|
||||
- **Anthropic**: `anthropic/claude-sonnet-4-20250514`, `anthropic/claude-3-5-sonnet-20240620`, `anthropic/claude-3-haiku-20240307`
|
||||
- **OpenAI**: `openai/computer-use-preview`, `openai/gpt-4o`
|
||||
- **Local Models**: `huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B`
|
||||
@@ -25,6 +26,7 @@ The `CUA_MODEL_NAME` environment variable supports various model providers throu
|
||||
### Example Configurations
|
||||
|
||||
**Claude Desktop Configuration:**
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
@@ -43,6 +45,7 @@ The `CUA_MODEL_NAME` environment variable supports various model providers throu
|
||||
```
|
||||
|
||||
**Local Model Configuration:**
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
@@ -61,6 +64,7 @@ The `CUA_MODEL_NAME` environment variable supports various model providers throu
|
||||
## Session Management Configuration
|
||||
|
||||
The MCP server automatically manages sessions with the following defaults:
|
||||
|
||||
- **Max Concurrent Sessions**: 10
|
||||
- **Session Timeout**: 10 minutes of inactivity
|
||||
- **Computer Pool Size**: 5 instances
|
||||
|
||||
@@ -58,7 +58,8 @@ If you're working with the CUA source code directly (like in the CUA repository)
|
||||
"cua-agent": {
|
||||
"command": "/usr/bin/env",
|
||||
"args": [
|
||||
"bash", "-lc",
|
||||
"bash",
|
||||
"-lc",
|
||||
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
|
||||
]
|
||||
}
|
||||
@@ -69,16 +70,19 @@ If you're working with the CUA source code directly (like in the CUA repository)
|
||||
**For host computer control** (development setup):
|
||||
|
||||
1. **Install Computer Server Dependencies**:
|
||||
|
||||
```bash
|
||||
python3 -m pip install uvicorn fastapi
|
||||
python3 -m pip install -e libs/python/computer-server --break-system-packages
|
||||
```
|
||||
|
||||
2. **Start the Computer Server**:
|
||||
|
||||
```bash
|
||||
cd /path/to/cua
|
||||
python -m computer_server --log-level debug
|
||||
```
|
||||
|
||||
This will start the computer server on `http://localhost:8000` that controls your actual desktop.
|
||||
|
||||
3. **Configure Claude Desktop**:
|
||||
@@ -88,7 +92,8 @@ If you're working with the CUA source code directly (like in the CUA repository)
|
||||
"cua-agent": {
|
||||
"command": "/usr/bin/env",
|
||||
"args": [
|
||||
"bash", "-lc",
|
||||
"bash",
|
||||
"-lc",
|
||||
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; export CUA_USE_HOST_COMPUTER_SERVER='true'; export CUA_MAX_IMAGES='1'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
|
||||
]
|
||||
}
|
||||
@@ -110,6 +115,7 @@ If you're working with the CUA source code directly (like in the CUA repository)
|
||||
- Check logs for specific error messages
|
||||
|
||||
2. **"Missing Anthropic API Key"** - Add your API key to the configuration:
|
||||
|
||||
```json
|
||||
"env": {
|
||||
"ANTHROPIC_API_KEY": "your-api-key-here"
|
||||
@@ -130,6 +136,7 @@ If you're working with the CUA source code directly (like in the CUA repository)
|
||||
- **Image size errors**: Use `CUA_MAX_IMAGES='1'` to reduce image context size
|
||||
|
||||
**Viewing Logs:**
|
||||
|
||||
```bash
|
||||
tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
|
||||
```
|
||||
|
||||
@@ -45,17 +45,20 @@ The MCP server supports multi-client sessions with automatic resource management
|
||||
## Usage Examples
|
||||
|
||||
### Basic Task Execution
|
||||
|
||||
```
|
||||
"Open Chrome and navigate to github.com"
|
||||
"Create a folder called 'Projects' on my desktop"
|
||||
```
|
||||
|
||||
### Multi-Task Execution
|
||||
|
||||
```
|
||||
"Run these tasks: 1) Open Finder, 2) Navigate to Documents, 3) Create a new folder called 'Work'"
|
||||
```
|
||||
|
||||
### Session Management
|
||||
|
||||
```
|
||||
"Take a screenshot of the current screen"
|
||||
"Show me the session statistics"
|
||||
|
||||
@@ -16,27 +16,35 @@ Claude will automatically use your CUA agent to perform these tasks.
|
||||
## Advanced Features
|
||||
|
||||
### Progress Reporting
|
||||
|
||||
The MCP server provides real-time progress updates during task execution:
|
||||
|
||||
- Task progress is reported as percentages (0-100%)
|
||||
- Multi-task operations show progress for each individual task
|
||||
- Progress updates are streamed to the MCP client for real-time feedback
|
||||
|
||||
### Error Handling
|
||||
|
||||
Robust error handling ensures reliable operation:
|
||||
|
||||
- Failed tasks return error messages with screenshots when possible
|
||||
- Session state is preserved even when individual tasks fail
|
||||
- Automatic cleanup prevents resource leaks
|
||||
- Detailed error logging for troubleshooting
|
||||
|
||||
### Concurrent Task Execution
|
||||
|
||||
For improved performance, multiple tasks can run concurrently:
|
||||
|
||||
- Set `concurrent=true` in `run_multi_cua_tasks` for parallel execution
|
||||
- Each task runs in its own context with isolated state
|
||||
- Progress tracking works for both sequential and concurrent modes
|
||||
- Resource pooling ensures efficient computer instance usage
|
||||
|
||||
### Session Management
|
||||
|
||||
Multi-client support with automatic resource management:
|
||||
|
||||
- Each client gets isolated sessions with separate computer instances
|
||||
- Sessions automatically clean up after 10 minutes of inactivity
|
||||
- Resource pooling prevents resource exhaustion
|
||||
@@ -55,7 +63,8 @@ No additional configuration is needed - this is the default behavior.
|
||||
### Option: Targeting Your Local Desktop
|
||||
|
||||
<Callout type="warn">
|
||||
**Warning:** When targeting your local system, AI models have direct access to your desktop and may perform risky actions. Use with caution.
|
||||
**Warning:** When targeting your local system, AI models have direct access to your desktop and
|
||||
may perform risky actions. Use with caution.
|
||||
</Callout>
|
||||
|
||||
To have the MCP server control your local desktop instead of a VM:
|
||||
@@ -89,6 +98,7 @@ Add the `CUA_USE_HOST_COMPUTER_SERVER` environment variable to your MCP client c
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="Other MCP Clients">
|
||||
Set the environment variable in your MCP client configuration:
|
||||
@@ -98,6 +108,7 @@ Add the `CUA_USE_HOST_COMPUTER_SERVER` environment variable to your MCP client c
|
||||
```
|
||||
|
||||
Then start your MCP client as usual.
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
@@ -108,6 +119,7 @@ Now Claude will control your local desktop directly when you ask it to perform c
|
||||
## Usage Examples
|
||||
|
||||
### Single Task Execution
|
||||
|
||||
```
|
||||
"Open Safari and navigate to apple.com"
|
||||
"Create a new folder on the desktop called 'My Projects'"
|
||||
@@ -115,16 +127,19 @@ Now Claude will control your local desktop directly when you ask it to perform c
|
||||
```
|
||||
|
||||
### Multi-Task Execution (Sequential)
|
||||
|
||||
```
|
||||
"Run these tasks in order: 1) Open Finder, 2) Navigate to Documents folder, 3) Create a new folder called 'Work'"
|
||||
```
|
||||
|
||||
### Multi-Task Execution (Concurrent)
|
||||
|
||||
```
|
||||
"Run these tasks simultaneously: 1) Open Chrome, 2) Open Safari, 3) Open Finder"
|
||||
```
|
||||
|
||||
### Session Management
|
||||
|
||||
```
|
||||
"Show me the current session statistics"
|
||||
"Take a screenshot using session abc123"
|
||||
@@ -132,6 +147,7 @@ Now Claude will control your local desktop directly when you ask it to perform c
|
||||
```
|
||||
|
||||
### Error Recovery
|
||||
|
||||
```
|
||||
"Try to open a non-existent application and show me the error"
|
||||
"Find all files with .tmp extension and delete them safely"
|
||||
@@ -140,13 +156,15 @@ Now Claude will control your local desktop directly when you ask it to perform c
|
||||
## First-time Usage Notes
|
||||
|
||||
**API Keys**: Ensure you have valid API keys:
|
||||
- Add your Anthropic API key in the Claude Desktop config (as shown above)
|
||||
- Or set it as an environment variable in your shell profile
|
||||
- **Required**: The MCP server needs an API key to authenticate with the model provider
|
||||
|
||||
- Add your Anthropic API key in the Claude Desktop config (as shown above)
|
||||
- Or set it as an environment variable in your shell profile
|
||||
- **Required**: The MCP server needs an API key to authenticate with the model provider
|
||||
|
||||
**Model Selection**: Choose the appropriate model for your needs:
|
||||
- **Claude Sonnet 4**: Latest model with best performance (`anthropic/claude-sonnet-4-20250514`)
|
||||
- **Claude 3.5 Sonnet**: Reliable performance (`anthropic/claude-3-5-sonnet-20240620`)
|
||||
- **Computer-Use Preview**: Specialized for computer tasks (`openai/computer-use-preview`)
|
||||
- **Local Models**: For privacy-sensitive environments
|
||||
- **Ollama**: For offline usage
|
||||
|
||||
- **Claude Sonnet 4**: Latest model with best performance (`anthropic/claude-sonnet-4-20250514`)
|
||||
- **Claude 3.5 Sonnet**: Reliable performance (`anthropic/claude-3-5-sonnet-20240620`)
|
||||
- **Computer-Use Preview**: Specialized for computer tasks (`openai/computer-use-preview`)
|
||||
- **Local Models**: For privacy-sensitive environments
|
||||
- **Ollama**: For offline usage
|
||||
|
||||
@@ -7,7 +7,11 @@ github:
|
||||
---
|
||||
|
||||
<Callout>
|
||||
A corresponding <a href="https://github.com/trycua/cua/blob/main/examples/som_examples.py" target="_blank">Python example</a> is available for this documentation.
|
||||
A corresponding{' '}
|
||||
<a href="https://github.com/trycua/cua/blob/main/examples/som_examples.py" target="_blank">
|
||||
Python example
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
## Overview
|
||||
|
||||
BIN
docs/public/img/grounding-with-gemini3.gif
Normal file
BIN
docs/public/img/grounding-with-gemini3.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 5.2 MiB |
@@ -53,6 +53,10 @@ async def run_agent_example():
|
||||
# == Omniparser + Any LLM ==
|
||||
# model="omniparser+anthropic/claude-opus-4-20250514",
|
||||
# model="omniparser+ollama_chat/gemma3:12b-it-q4_K_M",
|
||||
# == Omniparser + Vertex AI Gemini 3 (with thinking_level) ==
|
||||
# model="omni+vertex_ai/gemini-3-flash",
|
||||
# thinking_level="high", # or "low"
|
||||
# media_resolution="medium", # or "low" or "high"
|
||||
tools=[computer],
|
||||
only_n_most_recent_images=3,
|
||||
verbosity=logging.DEBUG,
|
||||
|
||||
Reference in New Issue
Block a user