Reworked docs

This commit is contained in:
Dillon DuPont
2025-07-25 10:39:37 -04:00
parent 350795615f
commit 5d598fc027
52 changed files with 1454 additions and 2868 deletions

View File

@@ -1,41 +0,0 @@
---
title: Agent
description: Reference for the current version of the Agent library.
github:
- https://github.com/trycua/cua/tree/main/libs/python/agent
---
## ⚠️ 🚧 Under Construction 🚧 ⚠️
The Agent API reference documentation is currently under development.
## Overview
The Agent library provides programmatic interfaces for AI agent interactions.
## API Documentation
```python
# Import necessary components
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
# UI-TARS-1.5 agent for local execution with MLX
ComputerAgent(loop=AgentLoop.UITARS, model=LLM(provider=LLMProvider.MLXVLM, name="mlx-community/UI-TARS-1.5-7B-6bit"))
# OpenAI Computer-Use agent using OPENAI_API_KEY
ComputerAgent(loop=AgentLoop.OPENAI, model=LLM(provider=LLMProvider.OPENAI, name="computer-use-preview"))
# Anthropic Claude agent using ANTHROPIC_API_KEY
ComputerAgent(loop=AgentLoop.ANTHROPIC, model=LLM(provider=LLMProvider.ANTHROPIC))
# OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision LLM
ComputerAgent(loop=AgentLoop.OMNI, model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:12b-it-q4_K_M"))
# OpenRouter example using OAICOMPAT provider
ComputerAgent(
loop=AgentLoop.OMNI,
model=LLM(
provider=LLMProvider.OAICOMPAT,
name="openai/gpt-4o-mini",
provider_base_url="https://openrouter.ai/api/v1"
),
api_key="your-openrouter-api-key"
)
```

View File

@@ -1,24 +0,0 @@
---
title: API Reference
description: Explore API reference for Cua services and libraries.
icon: CodeXml
---
## ⚠️ 🚧 Under Construction 🚧 ⚠️
Please note that the API Reference documenation is currently under construction. Some libraries will have limited documentation written, while others will have none.
We're currently working on generated comprehensive classes and definitions for all libraries.
If you need to find anything specific and it's not here, you can visit the repository below to browse implementations.
<Card
title="Cua - GitHub"
icon={
<svg role="img" viewBox="0 0 24 24" fill="currentColor">
<path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path>
</svg>
}
href="https://github.com/trycua/cua/tree/main/libs">
Visit the repository that contains all libraries.
</Card>

View File

@@ -1,18 +0,0 @@
---
title: Lume
description: Reference for the current version of the Lume CLI.
github:
- https://github.com/trycua/cua/tree/main/libs/lume
---
## ⚠️ 🚧 Under Construction 🚧 ⚠️
The Lume API reference documentation is currently under development.
## Overview
The Lume CLI provides command line tools for managing virtual machines with Lume.
## API Documentation
Coming soon.

View File

@@ -1,6 +0,0 @@
{
"title": "API Reference",
"description": "API Reference",
"root": true,
"pages": ["index", "---", "..."]
}

View File

@@ -0,0 +1,38 @@
---
title: Agent Loops
description: Supported computer-using agent loops and models
---
An agent can be thought of as a loop - it generates actions, executes them, and repeats until done:
1. **Generate**: Your `model` generates `output_text`, `computer_call`, `function_call`
2. **Execute**: The `computer` safely executes those items
3. **Complete**: If the model has no more calls, it's done!
To run an agent loop simply do:
```python
from agent2 import ComputerAgent
from computer import Computer
computer = Computer() # Connect to a c/ua container
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer]
)
prompt = "open github, navigate to trycua/cua"
async for result in agent.run(prompt):
print("Agent:", result["output"][-1]["content"][0]["text"])
```
We currently support 4 computer-using agent loops:
- Anthropic CUAs
- OpenAI CUA Preview
- UI-TARS 1.5
- Omniparser + LLMs
For a full list of supported models and configurations, see the [Supported Agents](./supported-agents) page.

View File

@@ -0,0 +1,52 @@
---
title: Agent Lifecycle
description: Agent callback lifecycle and hooks
---
# Callbacks
Callbacks provide hooks into the agent lifecycle for extensibility. They're called in a specific order during agent execution.
## Callback Lifecycle
### 1. `on_run_start(kwargs, old_items)`
Called once when agent run begins. Initialize tracking, logging, or state.
### 2. `on_run_continue(kwargs, old_items, new_items)` → bool
Called before each iteration. Return `False` to stop execution (e.g., budget limits).
### 3. `on_llm_start(messages)` → messages
Preprocess messages before LLM call. Use for PII anonymization, image retention.
### 4. `on_api_start(kwargs)`
Called before each LLM API call.
### 5. `on_api_end(kwargs, result)`
Called after each LLM API call completes.
### 6. `on_usage(usage)`
Called when usage information is received from LLM.
### 7. `on_llm_end(messages)` → messages
Postprocess messages after LLM call. Use for PII deanonymization.
### 8. `on_responses(kwargs, responses)`
Called when responses are received from agent loop.
### 9. Response-specific hooks:
- `on_text(item)` - Text messages
- `on_computer_call_start(item)` - Before computer actions
- `on_computer_call_end(item, result)` - After computer actions
- `on_function_call_start(item)` - Before function calls
- `on_function_call_end(item, result)` - After function calls
- `on_screenshot(screenshot, name)` - When screenshots are taken
### 10. `on_run_end(kwargs, old_items, new_items)`
Called when agent run completes. Finalize tracking, save trajectories.
## Built-in Callbacks
- **ImageRetentionCallback**: Limits recent images in context
- **BudgetManagerCallback**: Stops execution when budget exceeded
- **TrajectorySaverCallback**: Saves conversation trajectories
- **LoggingCallback**: Logs agent activities

View File

@@ -0,0 +1,87 @@
---
title: Cost Optimization
description: Budget management and image retention for cost optimization
---
# Cost Optimization Callbacks
Optimize agent costs with budget management and image retention callbacks.
## Budget Manager Callbacks Example
```python
from agent2.callbacks import BudgetManagerCallback
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
callbacks=[
BudgetManagerCallback(
max_budget=5.0, # $5 limit
reset_after_each_run=False,
raise_error=True
)
]
)
```
## Budget Manager Shorthand
```python
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
max_trajectory_budget=5.0 # Auto-adds BudgetManagerCallback
)
```
**Or with options:**
```python
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
max_trajectory_budget={"max_budget": 5.0, "raise_error": True}
)
```
## Image Retention Callbacks Example
```python
from agent2.callbacks import ImageRetentionCallback
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
callbacks=[
ImageRetentionCallback(only_n_most_recent_images=3)
]
)
```
## Image Retention Shorthand
```python
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
only_n_most_recent_images=3 # Auto-adds ImageRetentionCallback
)
```
## Combined Cost Optimization
```python
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
max_trajectory_budget=5.0, # Budget limit
only_n_most_recent_images=3, # Image retention
trajectory_dir="trajectories" # Track spending
)
```
## Budget Manager Options
- `max_budget`: Dollar limit for trajectory
- `reset_after_each_run`: Reset budget per run (default: True)
- `raise_error`: Raise exception vs. graceful stop (default: False)

View File

@@ -0,0 +1,88 @@
---
title: Logging
description: Agent logging and custom logger implementation
---
# Logging Callback
Built-in logging callback and custom logger creation for agent monitoring.
## Callbacks Example
```python
from agent2.callbacks import LoggingCallback
import logging
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
callbacks=[
LoggingCallback(
logger=logging.getLogger("cua"),
level=logging.INFO
)
]
)
```
## Shorthand
```python
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
verbosity=logging.INFO # Auto-adds LoggingCallback
)
```
## Custom Logger
Create custom loggers by extending AsyncCallbackHandler:
```python
from agent2.callbacks.base import AsyncCallbackHandler
import logging
class CustomLogger(AsyncCallbackHandler):
def __init__(self, logger_name="agent"):
self.logger = logging.getLogger(logger_name)
self.logger.setLevel(logging.INFO)
# Add console handler
handler = logging.StreamHandler()
formatter = logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
handler.setFormatter(formatter)
self.logger.addHandler(handler)
async def on_run_start(self, kwargs, old_items):
self.logger.info(f"Agent run started with model: {kwargs.get('model')}")
async def on_computer_call_start(self, item):
action = item.get('action', {})
self.logger.info(f"Computer action: {action.get('type')}")
async def on_usage(self, usage):
cost = usage.get('response_cost', 0)
self.logger.info(f"API call cost: ${cost:.4f}")
async def on_run_end(self, kwargs, old_items, new_items):
self.logger.info("Agent run completed")
# Use custom logger
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
callbacks=[CustomLogger("my_agent")]
)
```
## Available Hooks
Log any agent event using these callback methods:
- `on_run_start/end` - Run lifecycle
- `on_computer_call_start/end` - Computer actions
- `on_api_start/end` - LLM API calls
- `on_usage` - Cost tracking
- `on_screenshot` - Screenshot events

View File

@@ -0,0 +1,11 @@
{
"title": "Callbacks",
"description": "Extending agents with callback hooks and built-in handlers",
"pages": [
"agent-lifecycle",
"trajectories",
"logging",
"cost-saving",
"pii-anonymization"
]
}

View File

@@ -0,0 +1,12 @@
---
title: PII Anonymization
description: PII anonymization and data protection callbacks
---
# PII Anonymization Callback
🚧 Coming Soon 🚧
🔒 🕵️ 🛡️ 📝 ✨
🚀 Stay tuned for PII anonymization features! 🚀

View File

@@ -0,0 +1,51 @@
---
title: Trajectories
description: Recording and viewing agent conversation trajectories
---
# Trajectory Saving Callback
The TrajectorySaverCallback records complete agent conversations including messages, actions, and screenshots for debugging and analysis.
## Callbacks Example
```python
from agent2.callbacks import TrajectorySaverCallback
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
callbacks=[
TrajectorySaverCallback(
trajectory_dir="my_trajectories",
save_screenshots=True
)
]
)
```
## Shorthand
```python
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
trajectory_dir="trajectories" # Auto-adds TrajectorySaverCallback
)
```
## View Trajectories Online
View trajectories in the browser at:
**[trycua.com/trajectory-viewer](http://trycua.com/trajectory-viewer)**
The viewer provides:
- Interactive conversation replay
- Screenshot galleries
- No data collection
## Trajectory Structure
Each trajectory contains:
- **metadata.json**: Run info, timestamps, usage stats (`total_tokens`, `response_cost`)
- **turn_000/**: Turn-by-turn conversation history (api calls, responses, computer calls, screenshots)

View File

@@ -0,0 +1,84 @@
---
title: Chat History
description: Managing conversation history and message arrays
---
Managing conversation history is essential for multi-turn agent interactions. The agent maintains a messages array that tracks the entire conversation flow.
## Managing History
### Continuous Conversation
```python
history = []
while True:
user_input = input("> ")
history.append({"role": "user", "content": user_input})
async for result in agent.run(history, stream=False):
history += result["output"]
```
## Message Array Structure
The messages array contains different types of messages that represent the conversation state:
```python
messages = [
{
"role": "user",
"content": "go to trycua on gh"
},
{
"summary": [
{
"text": "Searching Firefox for Trycua GitHub",
"type": "summary_text"
}
],
"type": "reasoning"
},
{
"action": {
"text": "Trycua GitHub",
"type": "type"
},
"call_id": "call_QI6OsYkXxl6Ww1KvyJc4LKKq",
"status": "completed",
"type": "computer_call"
},
{
"type": "computer_call_output",
"call_id": "call_QI6OsYkXxl6Ww1KvyJc4LKKq",
"output": {
"type": "input_image",
"image_url": "[omitted]"
}
}
]
```
## Message Types
- **user**: User input messages
- **computer_call**: Computer actions (click, type, keypress, etc.)
- **computer_call_output**: Results from computer actions (usually screenshots)
- **function_call**: Function calls (e.g., `computer.call`)
- **function_call_output**: Results from function calls
- **reasoning**: Agent's internal reasoning and planning
- **message**: Agent text responses
### Memory Management
For long conversations, consider using the `only_n_most_recent_images` parameter to manage memory:
```python
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
only_n_most_recent_images=3
)
```
This automatically removes old images from the conversation history to prevent context window overflow.

View File

@@ -0,0 +1,12 @@
{
"title": "Agent SDK",
"description": "Build computer-using agents with the Agent SDK",
"pages": [
"agent-loops",
"supported-agents",
"chat-history",
"callbacks",
"sandboxed-tools",
"migration-guide"
]
}

View File

@@ -0,0 +1,124 @@
---
title: Migration Guide
---
This guide lists **breaking changes** when migrating from the original `ComputerAgent` (v0.3.x) to the rewritten `ComputerAgent` (v0.4.x) and shows old vs new usage for all four agent loops.
## Breaking Changes
- **Initialization:**
- `ComputerAgent` (v0.4.x) uses `model` as a string (e.g. "anthropic/claude-3-5-sonnet-20241022") instead of `LLM` and `AgentLoop` objects.
- `tools` is a list (can include multiple computers and decorated functions).
- `callbacks` are now first-class for extensibility (image retention, budget, trajectory, logging, etc).
- **No explicit `loop` parameter:**
- Loop is inferred from the `model` string (e.g. `anthropic/`, `openai/`, `omniparser+`, `ui-tars`).
- **No explicit `computer` parameter:**
- Computers are added to `tools` list.
---
## Usage Examples: Old vs New
### 1. Anthropic Loop
**Old:**
```python
async with Computer() as computer:
agent = ComputerAgent(
computer=computer,
loop=AgentLoop.ANTHROPIC,
model=LLM(provider=LLMProvider.ANTHROPIC)
)
async for result in agent.run("Take a screenshot"):
print(result)
```
**New:**
```python
async with Computer() as computer:
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer]
)
messages = [{"role": "user", "content": "Take a screenshot"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
```
### 2. OpenAI Loop
**Old:**
```python
async with Computer() as computer:
agent = ComputerAgent(
computer=computer,
loop=AgentLoop.OPENAI,
model=LLM(provider=LLMProvider.OPENAI)
)
async for result in agent.run("Take a screenshot"):
print(result)
```
**New:**
```python
async with Computer() as computer:
agent = ComputerAgent(
model="openai/computer-use-preview",
tools=[computer]
)
messages = [{"role": "user", "content": "Take a screenshot"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
```
### 3. UI-TARS Loop
**Old:**
```python
async with Computer() as computer:
agent = ComputerAgent(
computer=computer,
loop=AgentLoop.UITARS,
model=LLM(provider=LLMProvider.OAICOMPAT, name="ByteDance-Seed/UI-TARS-1.5-7B", provider_base_url="https://.../v1")
)
async for result in agent.run("Take a screenshot"):
print(result)
```
**New:**
```python
async with Computer() as computer:
agent = ComputerAgent(
model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B",
tools=[computer]
)
messages = [{"role": "user", "content": "Take a screenshot"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
```
### 4. Omni Loop
**Old:**
```python
async with Computer() as computer:
agent = ComputerAgent(
computer=computer,
loop=AgentLoop.OMNI,
model=LLM(provider=LLMProvider.OLLAMA, name="gemma3")
)
async for result in agent.run("Take a screenshot"):
print(result)
```
**New:**
```python
async with Computer() as computer:
agent = ComputerAgent(
model="omniparser+ollama_chat/gemma3",
tools=[computer]
)
messages = [{"role": "user", "content": "Take a screenshot"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
```

View File

@@ -0,0 +1,31 @@
---
title: Sandboxed Tools
slug: sandboxed-tools
---
The Agent SDK supports defining custom Python tools that run securely in sandboxed environments on remote C/ua Computers. This enables safe execution of user-defined functions, isolation of dependencies, and robust automation workflows.
## Example: Defining a Sandboxed Tool
```python
from computer.helpers import sandboxed
@sandboxed()
def read_file(location: str) -> str:
"""Read contents of a file"""
with open(location, 'r') as f:
return f.read()
```
You can then register this as a tool for your agent:
```python
from agent2 import ComputerAgent
from computer import Computer
computer = Computer(...)
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20240620",
tools=[computer, read_file],
)
```

View File

@@ -0,0 +1,32 @@
---
title: Supported Agents
---
This page lists all supported agent loops and their compatible models/configurations in c/ua.
All agent loops are compatible with any LLM provider supported by LiteLLM.
## Anthropic CUAs
- Claude 4: `claude-opus-4-20250514`, `claude-sonnet-4-20250514`
- Claude 3.7: `claude-3-7-sonnet-20250219`
- Claude 3.5: `claude-3-5-sonnet-20240620`
## OpenAI CUA Preview
- Computer-use-preview: `computer-use-preview`
## UI-TARS 1.5
- `huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B`
- `huggingface/ByteDance-Seed/UI-TARS-1.5-7B` (requires TGI endpoint)
## Omniparser + LLMs
- `omniparser+vertex_ai/gemini-pro`
- `omniparser+openai/gpt-4o`
- Any LiteLLM-compatible model combined with Omniparser
---
For details on agent loop behavior and usage, see [Agent Loops](./agent-loops).

View File

@@ -1,77 +0,0 @@
---
title: Compatibility
description: Compatibility information for running cua services.
icon: MonitorCheck
---
# Host OS Compatibility
_This section shows compatibility based on your **host operating system** (the OS you're running Cua on)._
## macOS Host
| Installation Method | Requirements | Lume | Cloud | Notes |
| ------------------------ | ------------------------- | ------- | ------- | --------------------------- |
| **playground-docker.sh** | Docker Desktop | ✅ Full | ✅ Full | Recommended for quick setup |
| **Dev Container** | VS Code/WindSurf + Docker | ✅ Full | ✅ Full | Best for development |
| **PyPI packages** | Python 3.12+ | ✅ Full | ✅ Full | Most flexible |
### macOS Host Requirements:
- macOS 15+ (Sequoia) for local VM support
- Apple Silicon (M1/M2/M3/M4) recommended for best performance
- Docker Desktop for containerized installations
## Ubuntu/Linux Host
| Installation Method | Requirements | Lume | Cloud | Notes |
| ------------------------ | ------------------------- | ------- | ------- | --------------------------- |
| **playground-docker.sh** | Docker Engine | ✅ Full | ✅ Full | Recommended for quick setup |
| **Dev Container** | VS Code/WindSurf + Docker | ✅ Full | ✅ Full | Best for development |
| **PyPI packages** | Python 3.12+ | ✅ Full | ✅ Full | Most flexible |
### Ubuntu/Linux Host Requirements:
- Ubuntu 20.04+ or equivalent Linux distribution
- Docker Engine or Docker Desktop
- Python 3.12+ for PyPI installation
## Windows Host
| Installation Method | Requirements | Lume | Winsandbox | Cloud | Notes |
| ------------------------ | -------------------------------- | ---------------- | ---------------- | ------- | ------------- |
| **playground-docker.sh** | Docker Desktop + WSL2 | ❌ Not supported | ❌ Not supported | ✅ Full | Requires WSL2 |
| **Dev Container** | VS Code/WindSurf + Docker + WSL2 | ❌ Not supported | ❌ Not supported | ✅ Full | Requires WSL2 |
| **PyPI packages** | Python 3.12+ | ❌ Not supported | ✅ Full | ✅ Full | |
### Windows Host Requirements:
- Windows 10/11 with WSL2 enabled for shell script execution
- Docker Desktop with WSL2 backend
- Windows Sandbox feature enabled (for Winsandbox support)
- Python 3.12+ installed in WSL2 or Windows
- **Note**: Lume CLI is not available on Windows - use Cloud or Winsandbox providers
---
# VM Emulation Support
_This section shows which **virtual machine operating systems** each provider can emulate._
| Provider | macOS VM | Ubuntu/Linux VM | Windows VM | Notes |
| -------------- | ---------------- | ------------------ | ------------------ | ------------------------------------------------------ |
| **Lume** | ✅ Full support | ⚠️ Limited support | ⚠️ Limited support | macOS: native; Ubuntu/Linux/Windows: need custom image |
| **Cloud** | 🚧 Coming soon | ✅ Full support | 🚧 Coming soon | Currently Ubuntu only, macOS/Windows in development |
| **Winsandbox** | ❌ Not supported | ❌ Not supported | ✅ Windows only | Windows 10/11 environments only |
# Model Provider Compatibility
_This section shows which **AI model providers** are supported on each host operating system._
| Provider | macOS Host | Ubuntu/Linux Host | Windows Host | Notes |
| --------------------- | --------------- | ----------------- | ---------------- | ----------------------------------------------- |
| **Anthropic** | ✅ Full support | ✅ Full support | ✅ Full support | Cloud-based API |
| **OpenAI** | ✅ Full support | ✅ Full support | ✅ Full support | Cloud-based API |
| **Ollama** | ✅ Full support | ✅ Full support | ✅ Full support | Local model serving |
| **OpenAI Compatible** | ✅ Full support | ✅ Full support | ✅ Full support | Any OpenAI-compatible API endpoint |
| **MLX VLM** | ✅ macOS only | ❌ Not supported | ❌ Not supported | Apple Silicon required. PyPI installation only. |

View File

@@ -1,100 +1,115 @@
---
title: Computer
description: Reference for the current version of the Computer library.
github:
- https://github.com/trycua/cua/tree/main/libs/python/computer
- https://github.com/trycua/cua/tree/main/libs/typescript/computer
title: Commands
description: Computer commands and interface methods
---
## ⚠️ 🚧 Under Construction 🚧 ⚠️
This page describes the set of supported **commands** you can use to control a C/ua Computer directly via the Python SDK.
The Computer API reference documentation is currently under development.
These commands map to the same actions available in the [Computer Server API Commands Reference](../libraries/computer-server/Commands), and provide low-level, async access to system operations from your agent or automation code.
## Overview
## Shell Actions
The Computer library provides programmatic interfaces for computer automation and control.
## Reference
Execute shell commands and get detailed results:
```python
# Shell Actions
result = await computer.interface.run_command(cmd) # Run shell command
# Run shell command
result = await computer.interface.run_command(cmd)
# result.stdout, result.stderr, result.returncode
```
# Mouse Actions
## Mouse Actions
Precise mouse control and interaction:
```python
# Basic clicks
await computer.interface.left_click(x, y) # Left click at coordinates
await computer.interface.right_click(x, y) # Right click at coordinates
await computer.interface.double_click(x, y) # Double click at coordinates
# Cursor movement and dragging
await computer.interface.move_cursor(x, y) # Move cursor to coordinates
await computer.interface.drag_to(x, y, duration) # Drag to coordinates
await computer.interface.get_cursor_position() # Get current cursor position
# Advanced mouse control
await computer.interface.mouse_down(x, y, button="left") # Press and hold a mouse button
await computer.interface.mouse_up(x, y, button="left") # Release a mouse button
```
# Keyboard Actions
## Keyboard Actions
Text input and key combinations:
```python
# Text input
await computer.interface.type_text("Hello") # Type text
await computer.interface.press_key("enter") # Press a single key
# Key combinations and advanced control
await computer.interface.hotkey("command", "c") # Press key combination
await computer.interface.key_down("command") # Press and hold a key
await computer.interface.key_up("command") # Release a key
```
# Scrolling Actions
## Scrolling Actions
Mouse wheel and scrolling control:
```python
# Scrolling
await computer.interface.scroll(x, y) # Scroll the mouse wheel
await computer.interface.scroll_down(clicks) # Scroll down
await computer.interface.scroll_up(clicks) # Scroll up
```
# Screen Actions
## Screen Actions
Screen capture and display information:
```python
# Screen operations
await computer.interface.screenshot() # Take a screenshot
await computer.interface.get_screen_size() # Get screen dimensions
```
# Clipboard Actions
## Clipboard Actions
System clipboard management:
```python
# Clipboard operations
await computer.interface.set_clipboard(text) # Set clipboard content
await computer.interface.copy_to_clipboard() # Get clipboard content
```
# File System Operations
## File System Operations
Direct file and directory manipulation:
```python
# File existence checks
await computer.interface.file_exists(path) # Check if file exists
await computer.interface.directory_exists(path) # Check if directory exists
# File content operations
await computer.interface.read_text(path, encoding="utf-8") # Read file content
await computer.interface.write_text(path, content, encoding="utf-8") # Write file content
await computer.interface.read_bytes(path) # Read file content as bytes
await computer.interface.write_bytes(path, content) # Write file content as bytes
# File and directory management
await computer.interface.delete_file(path) # Delete file
await computer.interface.create_dir(path) # Create directory
await computer.interface.delete_dir(path) # Delete directory
await computer.interface.list_dir(path) # List directory contents
```
## Accessibility
Access system accessibility information:
```python
# Accessibility
await computer.interface.get_accessibility_tree() # Get accessibility tree
# Delay Configuration
# Set default delay between all actions (in seconds)
computer.interface.delay = 0.5 # 500ms delay between actions
# Or specify delay for individual actions
await computer.interface.left_click(x, y, delay=1.0) # 1 second delay after click
await computer.interface.type_text("Hello", delay=0.2) # 200ms delay after typing
await computer.interface.press_key("enter", delay=0.5) # 500ms delay after key press
# Python Virtual Environment Operations
await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment
await computer.venv_cmd("demo_venv", "python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'") # Run a shell command in a virtual environment
await computer.venv_exec("demo_venv", python_function_or_code, *args, **kwargs) # Run a Python function in a virtual environment and return the result / raise an exception
# Example: Use sandboxed functions to execute code in a Cua Container
from computer.helpers import sandboxed
@sandboxed("demo_venv")
def greet_and_print(name):
"""Get the HTML of the current Safari tab"""
import PyXA
safari = PyXA.Application("Safari")
html = safari.current_document.source()
print(f"Hello from inside the container, {name}!")
return {"greeted": name, "safari_html": html}
# When a @sandboxed function is called, it will execute in the container
result = await greet_and_print("Cua")
# Result: {"greeted": "Cua", "safari_html": "<html>...</html>"}
# stdout and stderr are also captured and printed / raised
print("Result from sandboxed function:", result)
```

View File

@@ -0,0 +1,66 @@
---
title: C/ua Computers
description: Understanding c/ua computer types and connection methods
---
Before we can automate apps using AI, we need to first connect to a Computer Server to give the AI a safe environment to execute workflows in.
C/ua Computers are preconfigured virtual machines running the Computer Server. They can be either macOS, Linux, or Windows. They're found in either a cloud-native container, or on your host desktop.
# c/ua cloud container
This is a cloud container running the Computer Server. This is the easiest & safest way to get a c/ua computer, and can be done by going on the trycua.com website.
```python
from computer import Computer
computer = Computer(
os_type="linux",
provider_type="cloud",
name="your-container-name",
api_key="your-api-key"
)
await computer.run() # Connect to the container
```
# c/ua local containers
c/ua provides local containers. This can be done using either the Lume CLI (macOS) or Docker CLI (Linux, Windows).
### Lume (macOS Only):
1. Install lume cli
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
```
2. Start a local c/ua container
```bash
lume run macos-sequoia-cua:latest
```
3. Connect with Computer
```python
computer = Computer(
os_type="macos",
provider_type="lume",
name="macos-sequoia-cua:latest"
)
await computer.run() # Connect to the container
```
# Your host desktop
You can also have agents control your desktop directly by running Computer Server without any containerization layer. Beware that AI models may perform risky actions.
```bash
pip install cua-computer-server
python -m computer-server
```
Connect with:
```python
computer = Computer(use_host_computer_server=True)
await computer.run() # Connect to the host desktop
```

View File

@@ -0,0 +1,9 @@
{
"title": "Computer SDK",
"description": "Build computer-using agents with the Computer SDK",
"pages": [
"computers",
"commands",
"sandboxed-python"
]
}

View File

@@ -0,0 +1,49 @@
---
title: Sandboxed Python
slug: sandboxed-python
---
You can run Python functions securely inside a sandboxed virtual environment on a remote C/ua Computer. This is useful for executing untrusted user code, isolating dependencies, or providing a safe environment for automation tasks.
## How It Works
The `sandboxed` decorator from the Computer SDK wraps a Python function so that it is executed remotely in a specified virtual environment on the target Computer. The function and its arguments are serialized, sent to the remote, and executed in isolation. Results or errors are returned to the caller.
## Example Usage
```python
from computer import Computer
from computer.helpers import sandboxed
@sandboxed()
def read_file(location: str) -> str:
"""Read contents of a file"""
with open(location, 'r') as f:
return f.read()
async def main():
async with Computer(os_type="linux", provider_type="cloud", name="my-container", api_key="...") as computer:
# Call the sandboxed function (runs remotely)
result = await read_file("/etc/hostname")
print(result)
```
## Installing Python Packages
You can specify the virtual environment name and target computer:
```python
@sandboxed(venv_name="myenv", computer=my_computer, max_retries=5)
def my_function(...):
...
```
You can also install packages in the virtual environment using the `venv_install` method:
```python
await my_computer.venv_install("myenv", ["requests"])
```
## Error Handling
If the remote execution fails, the decorator will retry up to `max_retries` times. If all attempts fail, the last exception is raised locally.

View File

@@ -1,136 +0,0 @@
---
title: FAQ
description: Find answers to the most common issues or questions when using Cua tools.
icon: CircleQuestionMark
---
### Why a local sandbox?
A local sandbox is a dedicated environment that is isolated from the rest of the system. As AI agents rapidly evolve towards 70-80% success rates on average tasks, having a controlled and secure environment becomes crucial. Cua's Computer-Use AI agents run in a local sandbox to ensure reliability, safety, and controlled execution.
Benefits of using a local sandbox rather than running the Computer-Use AI agent in the host system:
- **Reliability**: The sandbox provides a reproducible environment - critical for benchmarking and debugging agent behavior. Frameworks like [OSWorld](https://github.com/xlang-ai/OSWorld), [Simular AI](https://github.com/simular-ai/Agent-S), Microsoft's [OmniTool](https://github.com/microsoft/OmniParser/tree/master/omnitool), [WindowsAgentArena](https://github.com/microsoft/WindowsAgentArena) and more are using Computer-Use AI agents running in local sandboxes.
- **Safety & Isolation**: The sandbox is isolated from the rest of the system, protecting sensitive data and system resources. As CUA agent capabilities grow, this isolation becomes increasingly important for preventing potential safety breaches.
- **Control**: The sandbox can be easily monitored and terminated if needed, providing oversight for autonomous agent operation.
### Where are the sandbox images stored?
Sandbox are stored in `~/.lume`, and cached images are stored in `~/.lume/cache`.
### Which image is Computer using?
Computer uses an optimized macOS image for Computer-Use interactions, with pre-installed apps and settings for optimal performance.
The image is available on our [ghcr registry](https://github.com/orgs/trycua/packages/container/package/macos-sequoia-cua).
### Are Sandbox disks taking up all the disk space?
No, macOS uses sparse files, which only allocate space as needed. For example, VM disks totaling 50 GB may only use 20 GB on disk.
### How do I delete a VM?
```bash
lume delete <name>
```
### How do I fix EasyOCR `[SSL: CERTIFICATE_VERIFY_FAILED]` errors?
**Symptom:**
When running an agent that uses OCR (e.g., with `AgentLoop.OMNI`), you might encounter an error during the first run or initialization phase that includes:
```
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)
```
**Cause:**
This usually happens when EasyOCR attempts to download its language models over HTTPS for the first time. Python's SSL module cannot verify the server's certificate because it can't locate the necessary root Certificate Authority (CA) certificates in your environment's trust store.
**Solution:**
You need to explicitly tell Python where to find a trusted CA bundle. The `certifi` package provides one. Before running your Python agent script **the first time it needs to download models**, set the following environment variables in the _same terminal session_:
```bash
# Ensure certifi is installed: pip show certifi
export SSL_CERT_FILE=$(python -m certifi)
export REQUESTS_CA_BUNDLE=$(python -m certifi)
# Now run your Python script that uses the agent...
# python your_agent_script.py
```
This directs Python to use the CA bundle provided by `certifi` for SSL verification. **Note:** Once EasyOCR has successfully downloaded its models, you typically do not need to set these environment variables before every subsequent run.
### How do I troubleshoot the agent failing to get the VM IP address or getting stuck on "VM status changed to: stopped"?
**Symptom:**
When running your agent script (e.g., using `Computer().run(...)`), the script might hang during the VM startup phase, logging messages like:
- `Waiting for VM to be ready...`
- `VM status changed to: stopped (after 0.0s)`
- `Still waiting for VM IP address... (elapsed: XX.Xs)`
- Eventually, it might time out, or you might notice the VM window never appears or closes quickly.
**Cause:**
This is typically due to known instability issues with the `lume serve` background daemon process, as documented in the main `README.md`:
1. **`lume serve` Crash:** The `lume serve` process might terminate unexpectedly shortly after launch or when the script tries to interact with it. If it's not running, the script cannot get VM status updates or the IP address.
2. **Incorrect Status Reporting:** Even if `lume serve` is running, its API sometimes incorrectly reports the VM status as `stopped` immediately after startup is initiated. While the underlying `Computer` library tries to poll and wait for the correct `running` status, this initial incorrect report can cause delays or failures if the status doesn't update correctly within the timeout or if `lume serve` crashes during the polling.
**Troubleshooting Steps:**
1. **Check `lume serve`:** Is the `lume serve` process still running in its terminal? Did it print any errors or exit? If it's not running, stop your agent script (`Ctrl+C`) and proceed to step 2.
2. **Force Cleanup:** Before _every_ run, perform a rigorous cleanup to ensure no old `lume` processes or VM states interfere. Open a **new terminal** and run:
```bash
# Stop any running Lume VM gracefully first (replace <vm_name> if needed)
lume stop macos-sequoia-cua_latest
# Force kill lume serve and related processes
pkill -f "lume serve"
pkill -9 -f "lume"
pkill -9 -f "VzVirtualMachine" # Kills underlying VM process
# Optional: Verify they are gone
# ps aux | grep -E 'lume|VzVirtualMachine' | grep -v grep
```
3. **Restart Sequence:**
- **Terminal 1:** Start `lume serve` cleanly:
```bash
lume serve
```
_(Watch this terminal to ensure it stays running)._
- **Terminal 2:** Run your agent script (including the `export SSL_CERT_FILE...` commands if _first time_ using OCR):
```bash
# export SSL_CERT_FILE=$(python -m certifi) # Only if first run with OCR
# export REQUESTS_CA_BUNDLE=$(python -m certifi) # Only if first run with OCR
python your_agent_script.py
```
4. **Retry:** Due to the intermittent nature of the Lume issues, sometimes simply repeating steps 2 and 3 allows the run to succeed if the timing avoids the status reporting bug or the `lume serve` crash.
**Related Issue: "No route to host" Error (macOS Sequoia+)**
- **Symptom:** Even if the `Computer` library logs show the VM has obtained an IP address, you might encounter connection errors like `No route to host` when the agent tries to connect to the internal server, especially when running the agent script from within an IDE (like VS Code or Cursor).
- **Cause:** This is often due to macOS Sequoia's enhanced local network privacy controls. Applications need explicit permission to access the local network, which includes communicating with the VM.
- **Solution:** Grant "Local Network" access to the application you are running the script from (e.g., your IDE or terminal application). Go to **System Settings > Privacy & Security > Local Network**, find your application in the list, and toggle the switch ON. You might need to trigger a connection attempt from the application first for it to appear in the list. See [GitHub Issue #61](https://github.com/trycua/cua/issues/61) for more details and discussion.
**Note:** Improving the stability of `lume serve` is an ongoing development area.
### How do I troubleshoot Computer not connecting to lume daemon?
If you're experiencing connection issues between Computer and the lume daemon, it could be because the port 7777 (used by lume) is already in use by an orphaned process. You can diagnose this issue with:
```bash
sudo lsof -i :7777
```
This command will show all processes using port 7777. If you see a lume process already running, you can terminate it with:
```bash
kill <PID>
```
Where `<PID>` is the process ID shown in the output of the `lsof` command. After terminating the process, run `lume serve` again to start the lume daemon.
### What information does Cua track?
Cua tracks anonymized usage and error report statistics; we ascribe to Posthog's approach as detailed [here](https://posthog.com/blog/open-source-telemetry-ethical). If you would like to opt out of sending anonymized info, you can set `telemetry_enabled` to false in the Computer or Agent constructor. Check out our [telemetry](./telemetry) documentation for more details.

View File

@@ -1,51 +0,0 @@
---
title: Computer-Use Agent Quickstart
description: Launch a computer-use agent UI interface with Docker, Dev Container, or Python.
---
## Docker
_Best for a simple, fully managed installation for testing and experimentation._
**macOS/Linux/Windows (via WSL):**
Run the following command to setup the Docker containers and launch the Computer-Use Agent UI:
```bash
# Requires Docker
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground-docker.sh)"
```
## Dev Container
_Best for contributors and active development._
Visit the [Dev Container](./dev-container-setup) guide to use the configuration that simplifies development setup to a few steps.
## PyPI
_Direct Python package installation_
```bash
# conda create -yn cua python==3.12
pip install -U "cua-computer[all]" "cua-agent[all]"
python -m agent.ui # Start the agent UI
```
Or check out the [Usage Guide](./cua-usage-guide) to learn how to use our Python SDK in your own code.
---
# Supported [Agent Loops](../libraries/agent#agent-loops)
- [UITARS-1.5](https://github.com/bytedance/UI-TARS) - Run locally on Apple Silicon with MLX, or use cloud providers
- [OpenAI CUA](https://openai.com/index/computer-using-agent/) - Use OpenAI's Computer-Use Preview model
- [Anthropic CUA](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/computer-use-tool) - Use Anthropic's Computer-Use capabilities
- [OmniParser-v2.0](https://github.com/microsoft/OmniParser) - Control UI with [Set-of-Marks prompting](https://som-gpt4v.github.io/) using any vision model
---
# Compatibility
For detailed compatibility information including host OS support, VM emulation capabilities, and model provider compatibility, see the [Compatibility Guide](../compatibility).

View File

@@ -1,83 +0,0 @@
---
title: Cua Usage Guide
descrption: Follow these steps to use Cua in your own Python code.
---
import { Step, Steps } from 'fumadocs-ui/components/steps';
<Steps>
<Step>
### Install the Lume CLI
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
```
Lume CLI manages high-performance macOS/Linux VMs with near-native speed on Apple Silicon.
</Step>
<Step>
### Pull the macOS CUA Image
```bash
lume pull macos-sequoia-cua:latest
```
The macOS CUA image contains the default macOS apps and the Computer Server for easy automation.
</Step>
<Step>
### Install the Python SDK
```bash
pip install "cua-computer[all]" "cua-agent[all]"
```
</Step>
<Step>
### Integrate with Your Own Projects
```python
from computer import Computer
from agent import ComputerAgent, LLM
async def main():
# Start a local macOS VM
computer = Computer(os_type="macos")
await computer.run()
# Or with Cua Cloud Container
computer = Computer(
os_type="linux",
api_key="your_cua_api_key_here",
name="your_container_name_here"
)
# Example: Direct control of a macOS VM with Computer
computer.interface.delay = 0.1 # Wait 0.1 seconds between kb/m actions
await computer.interface.left_click(100, 200)
await computer.interface.type_text("Hello, world!")
screenshot_bytes = await computer.interface.screenshot()
# Example: Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit
agent = ComputerAgent(
computer=computer,
loop="uitars",
model=LLM(provider="mlxvlm", name="mlx-community/UI-TARS-1.5-7B-6bit")
)
async for result in agent.run("Find the trycua/cua repository on GitHub and follow the quick start guide"):
print(result)
if __name__ == "__main__":
asyncio.run(main())
```
For ready-to-use examples, check out our [Notebooks](https://github.com/trycua/cua/tree/main/notebooks) collection.
</Step>
</Steps>

View File

@@ -1,82 +0,0 @@
---
title: Dev Container Setup
description: Learn how to set up the Dev Container configuration that simplifies the development setup.
---
## Quick Start
![Guide-Animation](https://github.com/user-attachments/assets/447eaeeb-0eec-4354-9a82-44446e202e06)
1. **Install the Dev Containers extension ([VSCode](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) or [WindSurf](https://docs.windsurf.com/windsurf/advanced#dev-containers-beta))**
2. **Open the repository in the Dev Container:**
- Press `Ctrl+Shift+P` (or `⌘+Shift+P` on macOS)
- **If you have _not_ cloned the repo:**
- Select `Dev Containers: Clone Repository in Container Volume...` and paste the repository URL:
```
https://github.com/trycua/cua.git
```
- **If you have already cloned the repo:** - Select `Dev Containers: Open Folder in Container...` and choose your local folder.
<Callout title="Windsurf Caveats">
The post install hook might not run automatically if you're using
Windsurf. If it didn't run, execute it manually:
<pre>
<code>/bin/bash .devcontainer/post-install.sh</code>
</pre>
</Callout>
3. **Open the VS Code workspace:** Once the post-install.sh is done running, open the python workspace located at `.vscode/py.code-workspace`.
4. **Run the Agent UI example:** Click <img src="https://github.com/user-attachments/assets/7a61ef34-4b22-4dab-9864-f86bf83e290b" className='inline-block mt-1 mb-1 rounded-md mx-1'/>
to start the Gradio UI. If prompted to install **debugpy (Python Debugger)** for remote debugging, select 'Yes' to proceed.
5. **Access the Gradio UI:** The Gradio UI will now be accessible http://localhost:7860.
## What's Included
The dev container automatically:
- ✅ Sets up Python 3.11 environment
- ✅ Installs all system dependencies (build tools, OpenGL, etc.)
- ✅ Configures Python paths for all packages
- ✅ Installs Python extensions (Black, Ruff, Pylance)
- ✅ Forwards port 7860 for the Gradio web UI
- ✅ Mounts your source code for live editing
- ✅ Creates the required `.env.local` file
## Running Examples
After the container is built, you can run examples directly:
```bash
# Run the agent UI (Gradio web interface)
python examples/agent_ui_examples.py
# Run computer examples
python examples/computer_examples.py
# Run computer UI examples
python examples/computer_ui_examples.py
```
The Gradio UI will be available at `http://localhost:7860` and will automatically forward to your host machine.
## Environment Variables
You'll need to add your API keys to `.env.local`:
```bash
# Required for Anthropic provider
ANTHROPIC_API_KEY=your_anthropic_key_here
# Required for OpenAI provider
OPENAI_API_KEY=your_openai_key_here
```
## Notes
- The container connects to `host.docker.internal:7777` for Lume server communication
- All Python packages are pre-installed and configured
- Source code changes are reflected immediately (no rebuild needed)
- The container uses the same Dockerfile as the regular Docker development environment

View File

@@ -1,303 +0,0 @@
---
title: Developer Guide
description: Set up development for the Cua open source repository.
---
import { GithubInfo } from 'fumadocs-ui/components/github-info';
## Project Structure
<GithubInfo owner="trycua" repo="cua" token={process.env.GITHUB_TOKEN} />
The project is organized as a monorepo with these main packages:
### Python
- `libs/python/core/` - Base package with telemetry support
- `libs/python/computer/` - Computer-use interface (CUI) library
- `libs/python/agent/` - AI agent library with multi-provider support
- `libs/python/som/` - Set-of-Mark parser
- `libs/python/computer-server/` - Server component for VM
- `libs/python/pylume/` - Python bindings for Lume
### TypeScript
- `libs/typescript/computer/` - Computer-use interface (CUI) library
- `libs/typescript/agent/` - AI agent library with multi-provider support
### Other
- `libs/lume/` - Lume CLI
Each package has its own virtual environment and dependencies, managed through PDM.
## Local Development Setup
1. Install Lume CLI:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
```
2. Clone the repository:
```bash
git clone https://github.com/trycua/cua.git
cd cua
```
3. Create a `.env.local` file in the root directory with your API keys:
```bash
# Required for Anthropic provider
ANTHROPIC_API_KEY=your_anthropic_key_here
# Required for OpenAI provider
OPENAI_API_KEY=your_openai_key_here
```
4. Open the workspace in VSCode or Cursor:
```bash
# For Cua Python development
code .vscode/py.code-workspace
# For Lume (Swift) development
code .vscode/lume.code-workspace
```
Using the workspace file is strongly recommended as it:
- Sets up correct Python environments for each package
- Configures proper import paths
- Enables debugging configurations
- Maintains consistent settings across packages
## Lume Development
Refer to the [Lume README](../libs/lume/docs/Development.md) for instructions on how to develop the Lume CLI.
## Python Development
There are two ways to install Lume:
### Run the build script
Run the build script to set up all packages:
```bash
./scripts/build.sh
```
The build script creates a shared virtual environment for all packages. The workspace configuration automatically handles import paths with the correct Python path settings.
This will:
- Create a virtual environment for the project
- Install all packages in development mode
- Set up the correct Python path
- Install development tools
### Install with PDM
If PDM is not already installed, you can follow the installation instructions [here](https://pdm-project.org/en/latest/#installation).
To install with PDM, simply run:
```console
pdm install -G:all
```
This installs all the dependencies for development, testing, and building the docs. If you'd only like development dependencies, you can run:
```console
pdm install -d
```
## Running Examples
The Python workspace includes launch configurations for all packages:
- "Run Computer Examples" - Runs computer examples
- "Run Computer API Server" - Runs the computer-server
- "Run Agent Examples" - Runs agent examples
- "SOM" configurations - Various settings for running SOM
To run examples from VSCode / Cursor:
1. Press F5 or use the Run/Debug view
2. Select the desired configuration
The workspace also includes compound launch configurations:
- "Run Computer Examples + Server" - Runs both the Computer Examples and Server simultaneously
## Docker Development Environment
As an alternative to installing directly on your host machine, you can use Docker for development. This approach has several advantages:
### Prerequisites
- Docker installed on your machine
- Lume server running on your host (port 7777): `lume serve`
### Setup and Usage
1. Build the development Docker image:
```bash
./scripts/run-docker-dev.sh build
```
2. Run an example in the container:
```bash
./scripts/run-docker-dev.sh run computer_examples.py
```
3. Get an interactive shell in the container:
```bash
./scripts/run-docker-dev.sh run --interactive
```
4. Stop any running containers:
```bash
./scripts/run-docker-dev.sh stop
```
### How it Works
The Docker development environment:
- Installs all required Python dependencies in the container
- Mounts your source code from the host at runtime
- Automatically configures the connection to use host.docker.internal:7777 for accessing the Lume server on your host machine
- Preserves your code changes without requiring rebuilds (source code is mounted as a volume)
> **Note**: The Docker container doesn't include the macOS-specific Lume executable. Instead, it connects to the Lume server running on your host machine via host.docker.internal:7777. Make sure to start the Lume server on your host before running examples in the container.
## Cleanup and Reset
If you need to clean up the environment (non-docker) and start fresh:
```bash
./scripts/cleanup.sh
```
This will:
- Remove all virtual environments
- Clean Python cache files and directories
- Remove build artifacts
- Clean PDM-related files
- Reset environment configurations
## Code Formatting Standards
The cua project follows strict code formatting standards to ensure consistency across all packages.
### Python Code Formatting
#### Tools
The project uses the following tools for code formatting and linting:
- **[Black](https://black.readthedocs.io/)**: Code formatter
- **[Ruff](https://beta.ruff.rs/docs/)**: Fast linter and formatter
- **[MyPy](https://mypy.readthedocs.io/)**: Static type checker
These tools are automatically installed when you set up the development environment using the `./scripts/build.sh` script.
#### Configuration
The formatting configuration is defined in the root `pyproject.toml` file:
```toml
[tool.black]
line-length = 100
target-version = ["py311"]
[tool.ruff]
line-length = 100
target-version = "py311"
select = ["E", "F", "B", "I"]
fix = true
[tool.ruff.format]
docstring-code-format = true
[tool.mypy]
strict = true
python_version = "3.11"
ignore_missing_imports = true
disallow_untyped_defs = true
check_untyped_defs = true
warn_return_any = true
show_error_codes = true
warn_unused_ignores = false
```
#### Key Formatting Rules
- **Line Length**: Maximum of 100 characters
- **Python Version**: Code should be compatible with Python 3.11+
- **Imports**: Automatically sorted (using Ruff's "I" rule)
- **Type Hints**: Required for all function definitions (strict mypy mode)
#### IDE Integration
The repository includes VSCode workspace configurations that enable automatic formatting. When you open the workspace files (as recommended in the setup instructions), the correct formatting settings are automatically applied.
Python-specific settings in the workspace files:
```json
"[python]": {
"editor.formatOnSave": true,
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.codeActionsOnSave": {
"source.organizeImports": "explicit"
}
}
```
Recommended VS Code extensions:
- Black Formatter (ms-python.black-formatter)
- Ruff (charliermarsh.ruff)
- Pylance (ms-python.vscode-pylance)
#### Manual Formatting
To manually format code:
```bash
# Format all Python files using Black
pdm run black .
# Run Ruff linter with auto-fix
pdm run ruff check --fix .
# Run type checking with MyPy
pdm run mypy .
```
#### Pre-commit Validation
Before submitting a pull request, ensure your code passes all formatting checks:
```bash
# Run all checks
pdm run black --check .
pdm run ruff check .
pdm run mypy .
```
### Swift Code (Lume)
For Swift code in the `libs/lume` directory:
- Follow the [Swift API Design Guidelines](https://www.swift.org/documentation/api-design-guidelines/)
- Use SwiftFormat for consistent formatting
- Code will be automatically formatted on save when using the lume workspace

View File

@@ -1,5 +0,0 @@
{
"title": "Guides",
"description": "Guides",
"icon": "BookCopy"
}

View File

@@ -7,85 +7,59 @@ import { buttonVariants } from 'fumadocs-ui/components/ui/button';
import { cn } from 'fumadocs-ui/utils/cn';
import { ChevronRight } from 'lucide-react';
## What is Cua?
# Welcome!
Cua is a collection of cross-platform libraries and tools for building Computer-Use AI agents.
c/ua is a framework for automating Windows, Mac, and Linux apps powered by computer-using agents (CUAs).
## Quick Start
c/ua makes every stage of computer-using agent development simple:
<Cards>
<Card
href="./home/guides/computer-use-agent-quickstart"
title="Computer-Use Agent UI">
Read our guide on getting started with a Computer-Use Agent.
</Card>
- **Development**: Use any LLM provider with liteLLM. The agent SDK makes multiple agent loop providers, trajectory tracing, caching, and budget management easy
- **Containerization**: c/ua offers Docker containers pre-installed with everything needed for AI-powered RPA
- **Deployment**: c/ua cloud gives you a production-ready cloud environment for your assistants
<Card href="./home/guides/cua-usage-guide" title="Cua Usage Guide">
Get started using Cua services on your machine.
</Card>
<Card href="./home/guides/dev-container-setup" title="Dev Container Setup">
Set up a development environment with the Dev Container.
</Card>
</Cards>
---
<Callout type="info">
**Need detailed API documentation?**
<span className="w-full">
Explore the complete API reference with detailed class documentation, and
method signatures.
</span>
<a
href="/api"
className={cn(
buttonVariants({
color: 'secondary',
}),
'no-underline h-10'
)}>
View API Reference
<ChevronRight size={18} />
</a>
</Callout>
## Resources
- [How to use the MCP Server with Claude Desktop or other MCP clients](./libraries/mcp-server) - One of the easiest ways to get started with Cua
- [How to use OpenAI Computer-Use, Anthropic, OmniParser, or UI-TARS for your Computer-Use Agent](./libraries/agent)
- [How to use Lume CLI for managing desktops](./libraries/lume)
- [Training Computer-Use Models: Collecting Human Trajectories with Cua (Part 1)](https://www.trycua.com/blog/training-computer-use-models-trajectories-1)
- [Build Your Own Operator on macOS (Part 1)](https://www.trycua.com/blog/build-your-own-operator-on-macos-1)
## Modules
| Module | Description | Installation |
| ------------------------------------------------------ | -------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
| [**Lume**](./libraries/lume.mdx) | VM management for macOS/Linux using Apple's Virtualization.Framework | `curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh \| bash` |
| [**Lumier**](./libraries/lumier.mdx) | Docker interface for macOS and Linux VMs | `docker pull trycua/lumier:latest` |
| [**Computer**](./libraries/computer.mdx) | Python Interface for controlling virtual machines | `pip install "cua-computer[all]"`<br/><br/>`npm install @trycua/computer` |
| [**Agent**](./libraries/agent.mdx) | AI agent framework for automating tasks | `pip install "cua-agent[all]"` |
| [**MCP Server**](./libraries/mcp-server.mdx) | MCP server for using CUA with Claude Desktop | `pip install cua-mcp-server` |
| [**SOM**](./libs/python/som/README.md) | Self-of-Mark library for Agent | `pip install cua-som` |
| [**Computer Server**](./libraries/computer-server.mdx) | Server component for Computer | `pip install cua-computer-server` |
| [**Core**](./libraries/core.mdx) | Python Core utilities | `pip install cua-core`<br/><br/>`npm install @trycua/core` |
## Community
Join our [Discord community](https://discord.com/invite/mVnXXpdE85) to discuss ideas, get assistance, or share your demos!
## License
Cua is open-sourced under the MIT License - see the [LICENSE](https://github.com/trycua/cua/blob/main/LICENSE.md) file for details.
Microsoft's OmniParser, which is used in this project, is licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0) - see the [OmniParser LICENSE](https://github.com/microsoft/OmniParser/blob/master/LICENSE) file for details.
## Contributing
We welcome contributions to CUA! Please refer to our [Contributing Guidelines](https://github.com/trycua/cua/blob/main/CONTRIBUTING.md) for details.
## Trademarks
Apple, macOS, and Apple Silicon are trademarks of Apple Inc. Ubuntu and Canonical are registered trademarks of Canonical Ltd. Microsoft is a registered trademark of Microsoft Corporation. This project is not affiliated with, endorsed by, or sponsored by Apple Inc., Canonical Ltd., or Microsoft Corporation.
<div className="grid grid-cols-1 md:grid-cols-2 gap-6 mt-8">
<div className="border rounded-lg p-6">
<h3 className="text-lg font-semibold mb-2">🖥️ Quickstart (UI)</h3>
<p className="text-muted-foreground mb-4">Try the c/ua Agent UI in your browser—no coding required.</p>
<a
href="/home/quickstart-ui"
className={cn(
buttonVariants({ variant: 'default' }),
'w-full'
)}
>
Get Started (UI)
<ChevronRight className="ml-2 h-4 w-4" />
</a>
</div>
<div className="border rounded-lg p-6">
<h3 className="text-lg font-semibold mb-2">💻 Quickstart (Developers)</h3>
<p className="text-muted-foreground mb-4">Build with Python—full SDK and agent code examples.</p>
<a
href="/home/quickstart-devs"
className={cn(
buttonVariants({ variant: 'secondary' }),
'w-full'
)}
>
Get Started (Python)
<ChevronRight className="ml-2 h-4 w-4" />
</a>
</div>
</div>
<div className="grid grid-cols-1 gap-6 mt-6">
<div className="border rounded-lg p-6">
<h3 className="text-lg font-semibold mb-2">📚 API Reference</h3>
<p className="text-muted-foreground mb-4">Explore the agent SDK and APIs</p>
<a
href="/home/libraries/agent"
className={cn(
buttonVariants({ variant: 'outline' }),
'w-full'
)}
>
View API Reference
<ChevronRight className="ml-2 h-4 w-4" />
</a>
</div>
</div>

View File

@@ -1,158 +0,0 @@
---
title: Gradio UI with the Python Agent
description: The agent module includes a Gradio-based user interface for easier interaction with Computer-Use Agent workflows.
---
The agent includes a Gradio-based user interface for easier interaction.
<div align="center">
<img src="/img/agent_gradio_ui.png" />
</div>
## Install
```bash
# Install with Gradio support
pip install "cua-agent[ui]"
```
## Create a simple launcher script
```python
# launch_ui.py
from agent.ui.gradio.app import create_gradio_ui
app = create_gradio_ui()
app.launch(share=False)
```
### Run the launcher
```bash
python launch_ui.py
```
This will start the Gradio interface on `http://localhost:7860`.
## Features
The Gradio UI provides:
- **Model Selection**: Choose between different AI models and providers
- **Task Input**: Enter tasks for the agent to execute
- **Real-time Output**: View the agent's actions and results as they happen
- **Screenshot Display**: See visual feedback from the computer screen
- **Settings Management**: Configure and save your preferred settings
## Supported Providers
1. **OpenAI**: GPT-4 and GPT-4 Vision models
2. **Anthropic**: Claude models
3. **Ollama**: Local models like Gemma3
4. **UI-TARS**: Specialized UI understanding models
### Using UI-TARS
UI-TARS is a specialized model for UI understanding tasks. You have two options:
1. **Local MLX UI-TARS**: For running the model locally on Apple Silicon
```bash
# Install MLX support
pip install "cua-agent[uitars-mlx]"
pip install git+https://github.com/ddupont808/mlx-vlm.git@stable/fix/qwen2-position-id
```
Then select "UI-TARS (MLX)" in the Gradio interface.
2. **OpenAI-compatible UI-TARS**: For using the original ByteDance model
- If you want to use the original ByteDance UI-TARS model via an OpenAI-compatible API, follow the [deployment guide](https://github.com/bytedance/UI-TARS/blob/main/README_deploy.md)
- This will give you a provider URL like `https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1` which you can use in the code or Gradio UI:
```python
agent = ComputerAgent(
computer=macos_computer,
loop=AgentLoop.UITARS,
model=LLM(
provider=LLMProvider.OAICOMPAT,
name="ByteDance-Seed/UI-TARS-1.5-7B",
provider_base_url="https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1"
)
)
```
Or in the Gradio UI, select "OpenAI Compatible" and enter:
- Model Name: `ByteDance-Seed/UI-TARS-1.5-7B`
- Base URL: Your deployment URL
- API Key: Your API key (if required)
## Advanced Configuration
### Custom Provider Settings
You can configure custom providers in the UI:
1. Select "OpenAI Compatible" from the provider dropdown
2. Enter your custom model name, base URL, and API key
3. The settings will be saved for future sessions
## Environment Variables
Set API keys as environment variables for security:
```bash
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export GROQ_API_KEY="your-groq-key"
export DEEPSEEK_API_KEY="your-deepseek-key"
export QWEN_API_KEY="your-qwen-key"
```
Or use a `.env` file:
```bash
# .env
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
# ... other keys
```
## Settings Persistence
The Gradio UI automatically saves your settings to `.gradio_settings.json` in your working directory. This includes:
- Selected provider and model
- Custom provider configurations (URLs and model names)
- Other UI preferences
**Note**: API keys entered into the custom provider field are **not** saved in this file for security reasons. Manage API keys using environment variables (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`) or a `.env` file.
It's recommended to add `.gradio_settings.json` to your `.gitignore` file.
## Example Usage
Here's a complete example of using the Gradio UI with different providers:
```python
# launch_ui_with_env.py
from agent.ui.gradio.app import create_gradio_ui
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Create and launch the UI
app = create_gradio_ui()
app.launch(share=False, server_port=7860)
```
Once launched, you can:
1. Select your preferred AI provider and model
2. Enter a task like "Open a web browser and search for Python tutorials"
3. Click "Run" to execute the task
4. Watch the agent perform the actions in real-time
5. View screenshots and logs of the execution
The UI makes it easy to experiment with different models and tasks without writing code for each interaction.

View File

@@ -1,266 +1,123 @@
---
title: Agent
description: The Computer-Use framework for running multi-app agentic workflows targeting macOS, Linux, and Windows sandboxes.
pypi: cua-computer
macos: true
windows: true
linux: true
description: Reference for the current version of the Agent library.
github:
- https://github.com/trycua/cua/tree/main/libs/python/agent
---
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
import { cn } from 'fumadocs-ui/utils/cn';
import { ChevronRight } from 'lucide-react';
The Agent library provides the ComputerAgent class and tools for building AI agents that automate workflows on C/ua Computers.
**Agent** is a powerful Computer-Use framework that enables AI agents to interact with desktop applications and perform complex multi-step workflows across macOS, Linux, and Windows environments. Built on the Cua platform, it supports both local models (via Ollama) and cloud providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen).
## Installation
Install CUA Agent with pip. Choose the installation that matches your needs:
### All Providers (Recommended)
```bash
# Install everything you need
pip install "cua-agent[all]"
```
### Selective Installation
```bash
# OpenAI models (GPT-4, Computer Use Preview)
pip install "cua-agent[openai]"
# Anthropic models (Claude 3.5 Sonnet)
pip install "cua-agent[anthropic]"
# Local UI-TARS models
pip install "cua-agent[uitars]"
# OmniParser + Ollama for local models
pip install "cua-agent[omni]"
# Gradio web interface
pip install "cua-agent[ui]"
```
### Advanced: Local UI-TARS with MLX
```bash
pip install "cua-agent[uitars-mlx]"
pip install git+https://github.com/ddupont808/mlx-vlm.git@stable/fix/qwen2-position-id
```
### Requirements
- Python 3.8+
- macOS, Linux, or Windows
- For cloud providers: API keys (OpenAI, Anthropic, etc.)
- For local models: Sufficient RAM and compute resources
## Getting Started
## Reference
### Basic Usage
Here's a simple example to get you started with CUA Agent. It instructs the agent to open a text editor and write "Hello World."
```python
from cua_agent import ComputerAgent, AgentLoop, LLM, LLMProvider
from cua_computer import Computer
from agent2 import ComputerAgent
from computer import Computer
# Set your API key
import os
os.environ["OPENAI_API_KEY"] = "your-api-key-here"
async with Computer() as computer:
# Create agent with OpenAI
agent = ComputerAgent(
computer=computer,
loop=AgentLoop.OPENAI,
model=LLM(provider=LLMProvider.OPENAI)
)
# Run a simple task
async for result in agent.run("Open a text editor and write 'Hello, World!'"):
print(result.get("text"))
```
### Multi-Step Workflow
This example defines multiple tasks for the agent to complete:
```python
async with Computer() as computer:
# Create agent with your preferred provider
agent = ComputerAgent(
computer=computer,
loop=AgentLoop.OPENAI, # or ANTHROPIC, OMNI, UITARS
model=LLM(provider=LLMProvider.OPENAI)
)
# Define complex workflow
tasks = [
"Look for a repository named trycua/cua on GitHub.",
"Check the open issues, open the most recent one and read it.",
"Clone the repository in users/lume/projects if it doesn't exist yet.",
"Open the repository with an app named Cursor.",
"From Cursor, open Composer and write a task to help resolve the GitHub issue.",
]
# Execute tasks sequentially
for i, task in enumerate(tasks):
print(f"\nExecuting task {i+1}/{len(tasks)}: {task}")
async for result in agent.run(task):
print(result.get("text"))
print(f"✅ Task {i+1} completed")
```
### Alternative Model Providers
You may use different models with the agent library -- below are a couple of alternatives that we already support.
```python
# Anthropic Claude
computer = Computer() # Connect to a c/ua container
agent = ComputerAgent(
computer=computer,
loop=AgentLoop.ANTHROPIC,
model=LLM(provider=LLMProvider.ANTHROPIC)
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer]
)
# Local Ollama model
agent = ComputerAgent(
computer=computer,
loop=AgentLoop.OMNI,
model=LLM(provider=LLMProvider.OLLAMA, name="gemma3")
)
prompt = "open github, navigate to trycua/cua"
# UI-TARS model
agent = ComputerAgent(
computer=computer,
loop=AgentLoop.UITARS,
model=LLM(
provider=LLMProvider.OAICOMPAT,
name="ByteDance-Seed/UI-TARS-1.5-7B",
provider_base_url="https://your-endpoint.com/v1"
)
)
async for result in agent.run(prompt):
print("Agent:", result["output"][-1]["content"][0]["text"])
```
## Agent Loops
The `cua-agent` package provides three agent loops variations, based on different CUA models providers and techniques:
| Agent Loop | Supported Models | Description | Set-Of-Marks |
| :-------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------- | :----------- |
| `AgentLoop.OPENAI` | • `computer_use_preview` | Use OpenAI Operator CUA model | Not Required |
| `AgentLoop.ANTHROPIC` | • `claude-3-5-sonnet-20240620`<br/>• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use | Not Required |
| `AgentLoop.UITARS` | • `mlx-community/UI-TARS-1.5-7B-4bit` (default)<br/>• `mlx-community/UI-TARS-1.5-7B-6bit`<br/>• `ByteDance-Seed/UI-TARS-1.5-7B` (via openAI-compatible endpoint) | Uses UI-TARS models with MLXVLM (default) or OAICOMPAT providers | Not Required |
| `AgentLoop.OMNI` | • `claude-3-5-sonnet-20240620`<br/>• `claude-3-7-sonnet-20250219`<br/>• `gpt-4.5-preview`<br/>• `gpt-4o`<br/>• `gpt-4`<br/>• `phi4`<br/>• `phi4-mini`<br/>• `gemma3`<br/>• `...`<br/>• `Any Ollama or OpenAI-compatible model` | Use OmniParser for element pixel-detection (SoM) and any VLMs for UI Grounding and Reasoning | OmniParser |
## Agent Response
The `AgentResponse` class represents the structured output returned after each agent turn. It contains the agent's response, reasoning, tool usage, and other metadata. The response format aligns with the new [OpenAI Agent SDK specification](https://platform.openai.com/docs/api-reference/responses) for better consistency across different agent loops.
```typescript
interface AgentResponse {
id: string;
text: string;
usage?: {
input_tokens: number;
input_tokens_details?: {
text_tokens: number;
image_tokens: number;
};
output_tokens: number;
output_tokens_details?: {
text_tokens: number;
reasoning_tokens: number;
};
total_tokens: number;
};
tools?: Array<{
name: string;
description: string;
}>;
output?: Array<{
type: 'reasoning' | 'computer_call';
content?: string; // for reasoning type
tool_name?: string; // for computer_call type
parameters?: Record<string, any>; // for computer_call type
result?: string; // for computer_call type
}>;
}
```
### Example Usage
```python
async for result in agent.run(task):
print("Response ID: ", result.get("id"))
# Print detailed usage information
usage = result.get("usage")
if usage:
print("\nUsage Details:")
print(f" Input Tokens: {usage.get('input_tokens')}")
if "input_tokens_details" in usage:
print(f" Input Tokens Details: {usage.get('input_tokens_details')}")
print(f" Output Tokens: {usage.get('output_tokens')}")
if "output_tokens_details" in usage:
print(f" Output Tokens Details: {usage.get('output_tokens_details')}")
print(f" Total Tokens: {usage.get('total_tokens')}")
print("Response Text: ", result.get("text"))
# Print tools information
tools = result.get("tools")
if tools:
print("\nTools:")
print(tools)
# Print reasoning and tool call outputs
outputs = result.get("output", [])
for output in outputs:
output_type = output.get("type")
if output_type == "reasoning":
print("\nReasoning Output:")
print(output)
elif output_type == "computer_call":
print("\nTool Call Output:")
print(output)
```
## Examples & Guides
<Cards>
<Card
href="https://github.com/trycua/cua/tree/main/notebooks/agent_nb.ipynb"
title="Agent Notebook">
Step-by-step instructions on using the Computer-Use Agent (CUA)
</Card>
<Card href="../libraries/agent/agent-gradio-ui" title="Agent Gradio Guide">
Use the Agent library with a Python Gradio UI
</Card>
</Cards>
---
<Callout type="info">
**Need detailed API documentation?**{' '}
<span className="w-full">
Explore the complete API reference with detailed class documentation, and
method signatures.
</span>
<a
href="/api/agent"
className={cn(
buttonVariants({
color: 'secondary',
}),
'no-underline h-10'
)}>
View API Reference
<ChevronRight size={18} />
</a>
</Callout>
### ComputerAgent Constructor Options
The `ComputerAgent` constructor provides a wide range of options for customizing agent behavior, tool integration, callbacks, resource management, and more.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | `str` | **required** | Model name (e.g., "claude-3-5-sonnet-20241022", "computer-use-preview", "omni+vertex_ai/gemini-pro") |
| `tools` | `List[Any]` | `None` | List of tools (e.g., computer objects, decorated functions) |
| `custom_loop` | `Callable` | `None` | Custom agent loop function (overrides auto-selection) |
| `only_n_most_recent_images` | `int` | `None` | If set, only keep the N most recent images in message history (adds ImageRetentionCallback) |
| `callbacks` | `List[Any]` | `None` | List of AsyncCallbackHandler instances for preprocessing/postprocessing |
| `verbosity` | `int` | `None` | Logging level (`logging.DEBUG`, `logging.INFO`, etc.; adds LoggingCallback) |
| `trajectory_dir` | `str` | `None` | Directory to save trajectory data (adds TrajectorySaverCallback) |
| `max_retries` | `int` | `3` | Maximum number of retries for failed API calls |
| `screenshot_delay` | `float` \| `int` | `0.5` | Delay before screenshots (seconds) |
| `use_prompt_caching` | `bool` | `False` | Use prompt caching to avoid reprocessing the same prompt (mainly for Anthropic) |
| `max_trajectory_budget` | `float` \| `dict` | `None` | If set, adds BudgetManagerCallback to track usage costs and stop when budget is exceeded |
| `**kwargs` | _any_ | | Additional arguments passed to the agent loop |
#### Parameter Details
- **model**: The LLM or agent model to use. Determines which agent loop is selected unless `custom_loop` is provided.
- **tools**: List of tools the agent can use (e.g., `Computer`, sandboxed Python functions, etc.).
- **custom_loop**: Optional custom agent loop function. If provided, overrides automatic loop selection.
- **only_n_most_recent_images**: If set, only the N most recent images are kept in the message history. Useful for limiting memory usage. Automatically adds `ImageRetentionCallback`.
- **callbacks**: List of callback instances for advanced preprocessing, postprocessing, logging, or custom hooks. See [Callbacks & Extensibility](#callbacks--extensibility).
- **verbosity**: Logging level (e.g., `logging.INFO`). If set, adds a logging callback.
- **trajectory_dir**: Directory path to save full trajectory data, including screenshots and responses. Adds `TrajectorySaverCallback`.
- **max_retries**: Maximum number of retries for failed API calls (default: 3).
- **screenshot_delay**: Delay (in seconds) before taking screenshots (default: 0.5).
- **use_prompt_caching**: Enables prompt caching for repeated prompts (mainly for Anthropic models).
- **max_trajectory_budget**: If set (float or dict), adds a budget manager callback that tracks usage costs and stops execution if the budget is exceeded. Dict allows advanced options (e.g., `{ "max_budget": 5.0, "raise_error": True }`).
- **\*\*kwargs**: Any additional keyword arguments are passed through to the agent loop or model provider.
**Example with advanced options:**
```python
from agent2 import ComputerAgent
from computer import Computer
from agent2.callbacks import ImageRetentionCallback
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[Computer(...)],
only_n_most_recent_images=3,
callbacks=[ImageRetentionCallback(only_n_most_recent_images=3)],
verbosity=logging.INFO,
trajectory_dir="trajectories",
max_retries=5,
screenshot_delay=1.0,
use_prompt_caching=True,
max_trajectory_budget={"max_budget": 5.0, "raise_error": True}
)
```
---
### Message Array (Multi-turn)
```python
messages = [
{"role": "user", "content": "go to trycua on gh"},
# ... (reasoning, computer_call, computer_call_output, etc)
]
async for result in agent.run(messages):
# Handle output, tool invocations, screenshots, etc.
print("Agent:", result["output"][-1]["content"][0]["text"])
messages += result["output"] # Add agent output to message array
...
```
### Supported Agent Loops
- **Anthropic**: Claude 4, 3.7, 3.5 models
- **OpenAI**: computer-use-preview
- **UITARS**: UI-TARS 1.5 models (Hugging Face, TGI)
- **Omni**: Omniparser + any LLM
See [Agent Loops](../../agent-sdk/agent-loops) for supported models and details.
### Callbacks & Extensibility
You can add preprocessing and postprocessing hooks using callbacks, or write your own by subclassing `AsyncCallbackHandler`:
```python
from agent2.callbacks import ImageRetentionCallback, PIIAnonymizationCallback
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
callbacks=[ImageRetentionCallback(only_n_most_recent_images=3)]
)
```

View File

@@ -1,60 +0,0 @@
---
title: Computer Server
description: The server component for the Computer-Use Interface framework.
pypi: cua-computer-server
macos: true
linux: true
windows: true
github:
- https://github.com/trycua/cua/tree/main/libs/python/computer-server
---
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
import { cn } from 'fumadocs-ui/utils/cn';
import { ChevronRight } from 'lucide-react';
**Computer Server** provides the websocket interface for the [Computer-Use Interface (CUI)](./computer/) to interact with.
## Features
- WebSocket API for computer-use
- Cross-platform support (macOS, Linux, Windows)
- Integration with the CUI library for screen control, keyboard/mouse automation, and accessibility
## Install
```bash
pip install cua-computer-server
```
## Examples & Guides
<Cards>
<Card
href="https://github.com/trycua/cua/tree/main/notebooks/computer_server_nb.ipynb"
title="Computer-Use Server Notebook">
Step-by-step guide using the Computer-Use Server on a host system or virtual
machine.
</Card>
</Cards>
---
<Callout type="info">
**Need detailed API documentation?**
<span className="w-full">
Explore the complete API reference with detailed class documentation, and
method signatures.
</span>
<a
href="/api/computer-server"
className={cn(
buttonVariants({
color: 'secondary',
}),
'no-underline h-10'
)}>
View API Reference
<ChevronRight size={18} />
</a>
</Callout>

View File

@@ -0,0 +1,48 @@
---
title: Supported Commands
description: List of all commands supported by the Computer Server API (WebSocket and REST).
---
# Commands Reference
This page lists all supported commands for the Computer Server, available via both WebSocket and REST API endpoints.
| Command | Description |
|---------------------|--------------------------------------------|
| version | Get protocol and package version info |
| run_command | Run a shell command |
| screenshot | Capture a screenshot |
| get_screen_size | Get the screen size |
| get_cursor_position | Get the current mouse cursor position |
| mouse_down | Mouse button down |
| mouse_up | Mouse button up |
| left_click | Left mouse click |
| right_click | Right mouse click |
| double_click | Double mouse click |
| move_cursor | Move mouse cursor to coordinates |
| drag_to | Drag mouse to coordinates |
| drag | Drag mouse by offset |
| key_down | Keyboard key down |
| key_up | Keyboard key up |
| type_text | Type text |
| press_key | Press a single key |
| hotkey | Press a hotkey combination |
| scroll | Scroll the screen |
| scroll_down | Scroll down |
| scroll_up | Scroll up |
| copy_to_clipboard | Copy text to clipboard |
| set_clipboard | Set clipboard content |
| file_exists | Check if a file exists |
| directory_exists | Check if a directory exists |
| list_dir | List files/directories in a directory |
| read_text | Read text from a file |
| write_text | Write text to a file |
| read_bytes | Read bytes from a file |
| write_bytes | Write bytes to a file |
| get_file_size | Get file size |
| delete_file | Delete a file |
| create_dir | Create a directory |
| delete_dir | Delete a directory |
| get_accessibility_tree | Get accessibility tree (if supported) |
| find_element | Find element in accessibility tree |
| diorama_cmd | Run a diorama command (if supported) |

View File

@@ -0,0 +1,63 @@
---
title: REST API Reference
description: Reference for the /cmd REST endpoint of the Computer Server.
---
# REST API Reference
The Computer Server exposes a single REST endpoint for command execution:
- `http://localhost:8000/cmd`
- `https://your-container.containers.cloud.trycua.com:8443/cmd` (cloud)
## POST /cmd
- Accepts commands as JSON in the request body
- Returns results as a streaming response (text/event-stream)
### Request Format
```json
{
"command": "<command_name>",
"params": { ... }
}
```
### Required Headers (for cloud containers)
- `X-Container-Name`: Name of the container (cloud only)
- `X-API-Key`: API key for authentication (cloud only)
### Example Request (Python)
```python
import requests
url = "http://localhost:8000/cmd"
body = {"command": "screenshot", "params": {}}
resp = requests.post(url, json=body)
print(resp.text)
```
### Example Request (Cloud)
```python
import requests
url = "https://your-container.containers.cloud.trycua.com:8443/cmd"
headers = {
"X-Container-Name": "your-container",
"X-API-Key": "your-api-key"
}
body = {"command": "screenshot", "params": {}}
resp = requests.post(url, json=body, headers=headers)
print(resp.text)
```
### Response Format
Streaming text/event-stream with JSON objects, e.g.:
```
data: {"success": true, "content": "..."}
data: {"success": false, "error": "..."}
```
### Supported Commands
See [Commands Reference](./Commands) for the full list of commands and parameters.

View File

@@ -0,0 +1,86 @@
---
title: WebSocket API Reference
description: Reference for the /ws WebSocket endpoint of the Computer Server.
---
# WebSocket API Reference
The Computer Server exposes a WebSocket endpoint for real-time command execution and streaming results.
- `ws://localhost:8000/ws`
- `wss://your-container.containers.cloud.trycua.com:8443/ws` (cloud)
### Authentication (Cloud Only)
For cloud containers, you must authenticate immediately after connecting:
```json
{
"command": "authenticate",
"params": {
"container_name": "your-container",
"api_key": "your-api-key"
}
}
```
If authentication fails, the connection is closed.
### Command Format
Send JSON messages:
```json
{
"command": "<command_name>",
"params": { ... }
}
```
### Example (Python)
```python
import websockets
import asyncio
import json
async def main():
uri = "ws://localhost:8000/ws"
async with websockets.connect(uri) as ws:
await ws.send(json.dumps({"command": "version", "params": {}}))
response = await ws.recv()
print(response)
asyncio.run(main())
```
### Example (Cloud)
```python
import websockets
import asyncio
import json
async def main():
uri = "wss://your-container.containers.cloud.trycua.com:8443/ws"
async with websockets.connect(uri) as ws:
await ws.send(json.dumps({
"command": "authenticate",
"params": {
"container_name": "your-container",
"api_key": "your-api-key"
}
}))
auth_response = await ws.recv()
print(auth_response)
await ws.send(json.dumps({"command": "version", "params": {}}))
response = await ws.recv()
print(response)
asyncio.run(main())
```
### Response Format
Each response is a JSON object:
```json
{
"success": true,
...
}
```
### Supported Commands
See [Commands Reference](./Commands) for the full list of commands and parameters.

View File

@@ -5,14 +5,8 @@ github:
- https://github.com/trycua/cua/tree/main/libs/python/computer-server
---
## ⚠️ 🚧 Under Construction 🚧 ⚠️
The Computer Server API reference documentation is currently under development.
## Overview
The Computer Server provides HTTP API endpoints for remote computer control and automation.
## API Documentation
Coming soon.
The Computer Server provides WebSocket and REST API endpoints for remote computer control and automation.

View File

@@ -1,90 +0,0 @@
---
title: Gradio UI with the Python Computer Interface
description: The computer module includes a Gradio UI for creating and sharing demonstration data. This guide makes it easy for people to build community datasets for better computer use models with an upload to Huggingface feature.
---
<Callout title="Note">
For precise control of the computer, we recommend using VNC or Screen Sharing
instead of Gradio UI.
</Callout>
```bash
# Install with UI support
pip install "cua/computer[ui]"
```
## Building and Sharing Demonstrations with Huggingface
Follow these steps to contribute your own demonstrations:
### 1. Set up Huggingface Access
Set your HF_TOKEN in a .env file or in your environment variables:
```bash
# In .env file
HF_TOKEN=your_huggingface_token
```
### 2. Launch the Computer UI
```python
# launch_ui.py
from computer.ui.gradio.app import create_gradio_ui
from dotenv import load_dotenv
load_dotenv('.env')
app = create_gradio_ui()
app.launch(share=False)
```
For examples, see [Computer UI Examples](https://github.com/trycua/cua/tree/main/examples/computer_ui_examples.py)
### 3. Record Your Tasks
<details open>
<summary>View demonstration video</summary>
<video
src="https://github.com/user-attachments/assets/de3c3477-62fe-413c-998d-4063e48de176"
controls
width="600"></video>
</details>
Record yourself performing various computer tasks using the UI.
### 4. Save Your Demonstrations
<details open>
<summary>View demonstration video</summary>
<video
src="https://github.com/user-attachments/assets/5ad1df37-026a-457f-8b49-922ae805faef"
controls
width="600"></video>
</details>
Save each task by picking a descriptive name and adding relevant tags (e.g., "office", "web-browsing", "coding").
### 5. Record Additional Demonstrations
Repeat steps 3 and 4 until you have a good amount of demonstrations covering different tasks and scenarios.
### 6. Upload to Huggingface
<details open>
<summary>View demonstration video</summary>
<video
src="https://github.com/user-attachments/assets/c586d460-3877-4b5f-a736-3248886d2134"
controls
width="600"></video>
</details>
Upload your dataset to Huggingface by:
- Naming it as `{your_username}/{dataset_name}`
- Choosing public or private visibility
- Optionally selecting specific tags to upload only tasks with certain tags
### Examples and Resources
- Example Dataset: [ddupont/test-dataset](https://huggingface.co/datasets/ddupont/test-dataset)
- Find Community Datasets: 🔍 [Browse CUA Datasets on Huggingface](https://huggingface.co/datasets?other=cua)

View File

@@ -1,185 +1,123 @@
---
title: Computer
description: The Computer-Use Interface (CUI) framework for interacting with local macOS, Linux, and Windows sandboxes.
macos: true
windows: true
linux: true
pypi: cua-computer
npm: '@trycua/computer'
description: Reference for the current version of the Computer library.
github:
- https://github.com/trycua/cua/tree/main/libs/python/computer
- https://github.com/trycua/cua/tree/main/libs/typescript/computer
---
import { Tabs, Tab } from 'fumadocs-ui/components/tabs';
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
import { cn } from 'fumadocs-ui/utils/cn';
import { ChevronRight } from 'lucide-react';
The Computer library provides a Computer class that can be used to control and automate a container running the Computer Server.
Computer, when paired with [Computer Server](../computer-server.mdx) enables programatic interaction with cross-platform sandboxes. It powers Cua systems and is PyAutoGUI-compatible and pluggable with any AI agent system (Cua, Langchain, CrewAI, AutoGen).
## Reference
The Python version relies on [Lume](./lume.mdx) for creating and managing sandbox environments.
### Basic Usage
## Installation
Connect to a c/ua cloud container:
```python
from computer import Computer
<Tabs groupId='language' persist items={['Python', 'TypeScript']}>
<Tab value="Python">
```bash
pip install "cua-computer[all]"
```
The `cua-computer` PyPi package automatically pulls the latest executable version of Lume through [pylume](https://github.com/trycua/pylume).
computer = Computer(
os_type="linux",
provider_type="cloud",
name="your-container-name",
api_key="your-api-key"
)
</Tab>
<Tab value="TypeScript">
```bash
npm install @trycua/computer
```
</Tab>
</Tabs>
computer = await computer.run() # Connect to a c/ua cloud container
```
## Features
Connect to a c/ua local container:
```python
from computer import Computer
- Create and manage virtual machine sandboxes
- Take screenshots of the virtual machine
- Control mouse movements and clicks
- Simulate keyboard input
- Manage clipboard content
- Interact with the operating system interface
- Support for macOS and Linux environments
computer = Computer(
os_type="macos"
)
## Simple Example
computer = await computer.run() # Connect to the container
```
<Tabs groupId='language' persist items={['Python', 'TypeScript']}>
<Tab value="Python">
```python
from computer import Computer
### Interface Actions
computer = Computer(os_type="macos", display="1024x768", memory="8GB", cpu="4")
try:
# Start a new local vm instance using Lume
await computer.run()
```python
# Shell Actions
result = await computer.interface.run_command(cmd) # Run shell command
# result.stdout, result.stderr, result.returncode
# Interface with the instance
screenshot = await computer.interface.screenshot()
with open("screenshot.png", "wb") as f:
f.write(screenshot)
# Mouse Actions
await computer.interface.left_click(x, y) # Left click at coordinates
await computer.interface.right_click(x, y) # Right click at coordinates
await computer.interface.double_click(x, y) # Double click at coordinates
await computer.interface.move_cursor(x, y) # Move cursor to coordinates
await computer.interface.drag_to(x, y, duration) # Drag to coordinates
await computer.interface.get_cursor_position() # Get current cursor position
await computer.interface.mouse_down(x, y, button="left") # Press and hold a mouse button
await computer.interface.mouse_up(x, y, button="left") # Release a mouse button
await computer.interface.move_cursor(100, 100)
await computer.interface.left_click()
await computer.interface.right_click(300, 300)
await computer.interface.double_click(400, 400)
# Keyboard Actions
await computer.interface.type_text("Hello") # Type text
await computer.interface.press_key("enter") # Press a single key
await computer.interface.hotkey("command", "c") # Press key combination
await computer.interface.key_down("command") # Press and hold a key
await computer.interface.key_up("command") # Release a key
await computer.interface.type("Hello, World!")
await computer.interface.press_key("enter")
# Scrolling Actions
await computer.interface.scroll(x, y) # Scroll the mouse wheel
await computer.interface.scroll_down(clicks) # Scroll down
await computer.interface.scroll_up(clicks) # Scroll up
await computer.interface.set_clipboard("Test clipboard")
content = await computer.interface.copy_to_clipboard()
print(f"Clipboard content: {content}")
finally:
# Stop the vm instance
await computer.stop()
```
# Screen Actions
await computer.interface.screenshot() # Take a screenshot
await computer.interface.get_screen_size() # Get screen dimensions
</Tab>
<Tab value="TypeScript">
```typescript
import { Computer, OSType } from '@trycua/computer';
# Clipboard Actions
await computer.interface.set_clipboard(text) # Set clipboard content
await computer.interface.copy_to_clipboard() # Get clipboard content
// This creates and interfaces with a cloud-based cua container.
const main = async () => {
// Create a cloud-based computer
const computer = new Computer({
name: 'cloud-vm',
osType: OSType.Linux,
apiKey: 'your-api-key',
});
# File System Operations
await computer.interface.file_exists(path) # Check if file exists
await computer.interface.directory_exists(path) # Check if directory exists
await computer.interface.read_text(path, encoding="utf-8") # Read file content
await computer.interface.write_text(path, content, encoding="utf-8") # Write file content
await computer.interface.read_bytes(path) # Read file content as bytes
await computer.interface.write_bytes(path, content) # Write file content as bytes
await computer.interface.delete_file(path) # Delete file
await computer.interface.create_dir(path) # Create directory
await computer.interface.delete_dir(path) # Delete directory
await computer.interface.list_dir(path) # List directory contents
// Access the interface
const interface = computer.interface;
# Accessibility
await computer.interface.get_accessibility_tree() # Get accessibility tree
// Screenshot operations
const screenshot = await interface.screenshot();
# Delay Configuration
# Set default delay between all actions (in seconds)
computer.interface.delay = 0.5 # 500ms delay between actions
// Mouse operations
await interface.moveCursor(100, 100);
await interface.leftClick();
await interface.rightClick(300, 300);
await interface.doubleClick(400, 400);
await interface.dragTo(500, 500, 'left', 1000); // Drag with left button for 1 second
# Or specify delay for individual actions
await computer.interface.left_click(x, y, delay=1.0) # 1 second delay after click
await computer.interface.type_text("Hello", delay=0.2) # 200ms delay after typing
await computer.interface.press_key("enter", delay=0.5) # 500ms delay after key press
// Keyboard operations
await interface.typeText('Hello from TypeScript!');
await interface.pressKey('enter');
await interface.hotkey('command', 'a'); // Select all
# Python Virtual Environment Operations
await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment
await computer.venv_cmd("demo_venv", "python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'") # Run a shell command in a virtual environment
await computer.venv_exec("demo_venv", python_function_or_code, *args, **kwargs) # Run a Python function in a virtual environment and return the result / raise an exception
// Clipboard operations
await interface.setClipboard('Clipboard content');
const content = await interface.copyToClipboard();
# Example: Use sandboxed functions to execute code in a Cua Container
from computer.helpers import sandboxed
// File operations
await interface.writeText('/tmp/test.txt', 'Hello world');
const fileContent = await interface.readText('/tmp/test.txt');
@sandboxed("demo_venv")
def greet_and_print(name):
"""Get the HTML of the current Safari tab"""
import PyXA
safari = PyXA.Application("Safari")
html = safari.current_document.source()
print(f"Hello from inside the container, {name}!")
return {"greeted": name, "safari_html": html}
// Run a command in the VM
const [stdout, stderr] = await interface.runCommand('ls -la');
// Disconnect from the cloud VM
await computer.disconnect();
};
main().catch(console.error);
```
</Tab>
</Tabs>
## Examples & Guides
<Tabs groupId="language" persist items={['Python', 'TypeScript']}>
<Tab value="Python">
<Cards>
<Card
href="https://github.com/trycua/cua/tree/main/notebooks/samples/computer_nb.ipynb"
title="Computer-Use Interface (CUI)">
Step-by-step guide on using the Computer-Use Interface (CUI)
</Card>
<Card
href="../libraries/computer/computer-use-gradio-ui"
title="Computer-Use Gradio UI">
Use the Computer library with a Python Gradio UI
</Card>
</Cards>
</Tab>
<Tab value="TypeScript">
<Cards>
<Card
href="https://github.com/trycua/cua/tree/main/examples/computer-example-ts"
title="Computer Cloud OpenAI">
Use Cua Cloud Containers with OpenAI's API to execute tasks in a sandbox
</Card>
</Cards>
</Tab>
</Tabs>
---
<Callout type="info">
**Need detailed API documentation?**{' '}
<span className="w-full">
Explore the complete API reference with detailed class documentation, and
method signatures.
</span>
<a
href="/api/computer"
className={cn(
buttonVariants({
color: 'secondary',
}),
'no-underline h-10'
)}>
View API Reference
<ChevronRight size={18} />
</a>
</Callout>
# When a @sandboxed function is called, it will execute in the container
result = await greet_and_print("Cua")
# Result: {"greeted": "Cua", "safari_html": "<html>...</html>"}
# stdout and stderr are also captured and printed / raised
print("Result from sandboxed function:", result)
```

View File

@@ -1,49 +0,0 @@
---
title: Core
description: Core infrastructure and shared utilities powering the Cua computer-use platform
pypi: cua-core
npm: '@trycua/core'
macos: true
windows: true
linux: true
github:
- https://github.com/trycua/cua/tree/main/libs/python/core
- https://github.com/trycua/cua/tree/main/libs/typescript/core
---
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
import { cn } from 'fumadocs-ui/utils/cn';
import { ChevronRight } from 'lucide-react';
# Features
- Privacy-focused telemetry system for transparent usage analytics
- Common helper functions and utilities used by other Cua packages
- Core infrastructure components shared between modules
## Installation
```bash
pip install cua-core
```
---
<Callout type="info">
**Need detailed API documentation?**{' '}
<span className="w-full">
Explore the complete API reference with detailed class documentation, and
method signatures.
</span>
<a
href="/api/core"
className={cn(
buttonVariants({
color: 'secondary',
}),
'no-underline h-10'
)}>
View API Reference
<ChevronRight size={18} />
</a>
</Callout>

View File

@@ -1,19 +0,0 @@
---
title: Getting Started
description: Getting started with the Cua libraries
---
## Overview
The Cua project provides several libraries for building Computer-Use AI agents.
| Library | Description | Installation |
| -------------------------------------------- | -------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
| [**Lume**](./lume.mdx) | VM management for macOS/Linux using Apple's Virtualization.Framework | `curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh \| bash` |
| [**Lumier**](./lumier.mdx) | Docker interface for macOS and Linux VMs | `docker pull trycua/lumier:latest` |
| [**Computer**](./computer.mdx) | Python Interface for controlling virtual machines | `pip install "cua-computer[all]"`<br/><br/>`npm install @trycua/computer` |
| [**Agent**](./agent.mdx) | AI agent framework for automating tasks | `pip install "cua-agent[all]"` |
| [**MCP Server**](./mcp-server.mdx) | MCP server for using CUA with Claude Desktop | `pip install cua-mcp-server` |
| [**SOM**](./som.mdx) | Self-of-Mark library for Agent | `pip install cua-som` |
| [**Computer Server**](./computer-server.mdx) | Server component for Computer | `pip install cua-computer-server` |
| [**Core**](./core.mdx) | Python Core utilities | `pip install cua-core`<br/><br/>`npm install @trycua/core` |

View File

@@ -0,0 +1,71 @@
---
title: Lume CLI Reference
description: Command Line Interface reference for Lume
---
Lume is a lightweight Command Line Interface and local API server for creating, running and managing **macOS and Linux virtual machines** with near-native performance on Apple Silicon, using Apple's [Virtualization.Framework](https://developer.apple.com/documentation/virtualization).
## Quick Start
Install and run a prebuilt macOS VM in two commands:
```bash
# Install Lume
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
# Pull & start a macOS image
lume run macos-sequoia-vanilla:latest
```
> **Security Note**: All prebuilt images use the default password `lume`. Change this immediately after your first login using the `passwd` command.
**System Requirements**:
- Apple Silicon Mac (M1, M2, M3, etc.)
- macOS 13.0 or later
- At least 8GB of RAM (16GB recommended)
- At least 50GB of free disk space
## Install
Install with a single command:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
```
By default, Lume is installed as a background service that starts automatically on login. If you prefer to start the Lume API service manually when needed, you can use the `--no-background-service` option:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh) --no-background-service"
```
> **Note:** With this option, you'll need to manually start the Lume API service by running `lume serve` in your terminal whenever you need to use tools or libraries that rely on the Lume API (such as the Computer-Use Agent).
You can also download the `lume.pkg.tar.gz` archive from the [latest release](https://github.com/trycua/cua/releases?q=lume&expanded=true), extract it, and install the package manually.
## Using Lume
Once installed, you can start using Lume with these common workflows:
### Run a Prebuilt VM
```bash
# Run a macOS Sequoia VM
lume run macos-sequoia-vanilla:latest
# Run an Ubuntu VM
lume run ubuntu-noble-vanilla:latest
```
> We provide [prebuilt VM images](#prebuilt-images) in our [ghcr registry](https://github.com/orgs/trycua/packages).
### Create a Custom VM
```bash
# Create a new macOS VM
lume create my-macos-vm --cpu 4 --memory 8GB --disk-size 50GB
# Create a Linux VM
lume create my-linux-vm --os linux --cpu 2 --memory 4GB
```
> **Disk Space**: The actual disk space used by sparse images will be much lower than the logical size listed. You can resize VM disks after creation using `lume set <name> --disk-size <size>`.

View File

@@ -1,353 +1,18 @@
---
title: Lume
description: A lightweight Command Line Interface and local API server for creating, running and managing macOS and Linux virtual machines.
macos: true
linux: true
description: Reference for the current version of the Lume CLI.
github:
- https://github.com/trycua/cua/tree/main/libs/lume
---
import Link from 'next/link';
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
import { Step, Steps } from 'fumadocs-ui/components/steps';
import { cn } from 'fumadocs-ui/utils/cn';
import { ChevronRight } from 'lucide-react';
## ⚠️ 🚧 Under Construction 🚧 ⚠️
# Lume
The Lume API reference documentation is currently under development.
Lume is a lightweight Command Line Interface and local API server for creating, running and managing **macOS and Linux virtual machines** with near-native performance on Apple Silicon, using Apple's [`Virtualization.Framework`](https://developer.apple.com/documentation/virtualization).
## Overview
## Quick Start
The Lume CLI provides command line tools for managing virtual machines with Lume.
Install and run a prebuilt macOS VM in two commands:
## API Documentation
```bash
# Install Lume
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
# Pull & start a macOS image
lume run macos-sequoia-vanilla:latest
```
<Callout type="warning">
**Security Note**: All prebuilt images use the default password `lume`. Change
this immediately after your first login using the `passwd` command.
</Callout>
**System Requirements**:
- Apple Silicon Mac (M1, M2, M3, etc.)
- macOS 13.0 or later
- At least 8GB of RAM (16GB recommended)
- At least 50GB of free disk space
## Install
Install with a single command:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
```
By default, Lume is installed as a background service that starts automatically on login. If you prefer to start the Lume API service manually when needed, you can use the `--no-background-service` option:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh) --no-background-service"
```
<Callout type="info">
**Note:** With this option, you'll need to manually start the Lume API service
by running `lume serve` in your terminal whenever you need to use tools or
libraries that rely on the Lume API (such as the Computer-Use Agent).
</Callout>
You can also download the `lume.pkg.tar.gz` archive from the [latest release](https://github.com/trycua/cua/releases?q=lume&expanded=true), extract it, and install the package manually.
## Using Lume
Once installed, you can start using Lume with these common workflows:
<Steps>
<Step>
### Run a Prebuilt VM
```bash
# Run a macOS Sequoia VM
lume run macos-sequoia-vanilla:latest
# Run an Ubuntu VM
lume run ubuntu-noble-vanilla:latest
```
<Callout type="info">
We provide [prebuilt VM images](#prebuilt-images) in our [ghcr
registry](https://github.com/orgs/trycua/packages).
</Callout>
</Step>
<Step>
### Create a Custom VM
```bash
# Create a new macOS VM
lume create my-macos-vm --cpu 4 --memory 8GB --disk-size 50GB
# Create a Linux VM
lume create my-linux-vm --os linux --cpu 2 --memory 4GB
```
<Callout type="info">
**Disk Space**: The actual disk space used by sparse images will be much lower than the logical size listed. You can resize VM disks after creation using `lume set <name> --disk-size <size>`.
</Callout>
</Step>
<Step>
### Manage Your VMs
```bash
# List all VMs
lume ls
# Get VM details
lume get my-vm
# Stop a running VM
lume stop my-vm
```
</Step>
</Steps>
## Prebuilt Images
Pre-built images are available in the registry [ghcr.io/trycua](https://github.com/orgs/trycua/packages).
| Image | Tag | Description | Logical Size |
| ----------------------- | ------------------- | ----------------------------------------------------------------------------------------------- | ------------ |
| `macos-sequoia-vanilla` | `latest`, `15.2` | macOS Sequoia 15.2 image | 20GB |
| `macos-sequoia-xcode` | `latest`, `15.2` | macOS Sequoia 15.2 image with Xcode command line tools | 22GB |
| `macos-sequoia-cua` | `latest`, `15.3` | macOS Sequoia 15.3 image compatible with the Computer interface | 24GB |
| `ubuntu-noble-vanilla` | `latest`, `24.04.1` | [Ubuntu Server for ARM 24.04.1 LTS](https://ubuntu.com/download/server/arm) with Ubuntu Desktop | 20GB |
## Lume CLI
```bash
lume <command>
Commands:
lume create <name> Create a new macOS or Linux VM
lume run <name> Run a VM
lume ls List all VMs
lume get <name> Get detailed information about a VM
lume set <name> Modify VM configuration
lume stop <name> Stop a running VM
lume delete <name> Delete a VM
lume pull <image> Pull a macOS image from container registry
lume push <name> <image:tag> Push a VM image to a container registry
lume clone <name> <new-name> Clone an existing VM
lume config Get or set lume configuration
lume images List available macOS images in local cache
lume ipsw Get the latest macOS restore image URL
lume prune Remove cached images
lume serve Start the API server
Options:
--help Show help [boolean]
--version Show version number [boolean]
Command Options:
create:
--os <os> Operating system to install (macOS or linux, default: macOS)
--cpu <cores> Number of CPU cores (default: 4)
--memory <size> Memory size, e.g., 8GB (default: 4GB)
--disk-size <size> Disk size, e.g., 50GB (default: 40GB)
--display <res> Display resolution (default: 1024x768)
--ipsw <path> Path to IPSW file or 'latest' for macOS VMs
--storage <name> VM storage location to use
run:
--no-display Do not start the VNC client app
--shared-dir <dir> Share directory with VM (format: path[:ro|rw])
--mount <path> For Linux VMs only, attach a read-only disk image
--registry <url> Container registry URL (default: ghcr.io)
--organization <org> Organization to pull from (default: trycua)
--vnc-port <port> Port to use for the VNC server (default: 0 for auto-assign)
--recovery-mode <boolean> For MacOS VMs only, start VM in recovery mode (default: false)
--storage <name> VM storage location to use
set:
--cpu <cores> New number of CPU cores (e.g., 4)
--memory <size> New memory size (e.g., 8192MB or 8GB)
--disk-size <size> New disk size (e.g., 40960MB or 40GB)
--display <res> New display resolution in format WIDTHxHEIGHT (e.g., 1024x768)
--storage <name> VM storage location to use
delete:
--force Force deletion without confirmation
--storage <name> VM storage location to use
pull:
--registry <url> Container registry URL (default: ghcr.io)
--organization <org> Organization to pull from (default: trycua)
--storage <name> VM storage location to use
push:
--additional-tags <tags...> Additional tags to push the same image to
--registry <url> Container registry URL (default: ghcr.io)
--organization <org> Organization/user to push to (default: trycua)
--storage <name> VM storage location to use
--chunk-size-mb <size> Chunk size for disk image upload in MB (default: 512)
--verbose Enable verbose logging
--dry-run Prepare files and show plan without uploading
--reassemble Verify integrity by reassembling chunks (requires --dry-run)
get:
-f, --format <format> Output format (json|text)
--storage <name> VM storage location to use
stop:
--storage <name> VM storage location to use
clone:
--source-storage <name> Source VM storage location
--dest-storage <name> Destination VM storage location
config:
get Get current configuration
storage Manage VM storage locations
add <name> <path> Add a new VM storage location
remove <name> Remove a VM storage location
list List all VM storage locations
default <name> Set the default VM storage location
cache Manage cache settings
get Get current cache directory
set <path> Set cache directory
caching Manage image caching settings
get Show current caching status
set <boolean> Enable or disable image caching
serve:
--port <port> Port to listen on (default: 7777)
```
## Common Workflows
### Development Environment Setup
```bash
# Create a development VM with more resources
lume create dev-vm --cpu 6 --memory 12GB --disk-size 100GB
# Run with shared directory for code
lume run dev-vm --shared-dir ~/Projects:rw
```
### Testing Different macOS Versions
```bash
# Pull and run different macOS versions
lume pull macos-sequoia-vanilla:latest
lume run macos-sequoia-vanilla:latest
# Clone a VM for testing
lume clone my-vm my-vm-test
```
### File Sharing Examples
```bash
# Share a read-only directory
lume run my-vm --shared-dir ~/Documents:ro
# Share multiple directories
lume run my-vm --shared-dir ~/Projects:rw --shared-dir ~/Downloads:ro
# For Linux VMs, mount additional disk images
lume run ubuntu-vm --mount ~/disk-image.img
```
## Local API Server
Lume exposes a local HTTP API server for programmatic VM management, perfect for automation and integration with other tools.
```bash
# Start the API server
lume serve
```
<Callout type="info">
<span className="w-full">
Read the doucmentation on the local API server.
</span>
<Link
href="/home/libraries/lume/http-api"
className={cn(
buttonVariants({
color: 'secondary',
}),
'no-underline h-10'
)}>
Lume API Server Documentation
<ChevronRight size={18} />
</Link>
</Callout>
## Development
If you're working on Lume in the context of the Cua monorepo, we recommend using the dedicated VS Code workspace configuration:
```bash
# Open VS Code workspace from the root of the monorepo
code .vscode/lume.code-workspace
```
This workspace is preconfigured with Swift language support, build tasks, and debug configurations.
## FAQ
### Can I run multiple VMs simultaneously?
Yes, you can run multiple VMs at the same time as long as your system has sufficient resources (CPU, memory, and disk space).
### How do I share files between the host and VM?
Use the `--shared-dir` option when running a VM:
```bash
lume run my-vm --shared-dir ~/Projects:rw
```
The shared directory will be automatically mounted in the VM.
### Where are VM files stored?
By default, VMs are stored in `~/.lume/vms/`. You can configure additional storage locations using the `lume config storage` commands.
### How do I update Lume?
Run the install script again to update to the latest version:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
```
---
<Callout type="info">
**Need detailed API documentation?**{' '}
<span className="w-full">
Explore the complete API reference with detailed class documentation, and
method signatures.
</span>
<a
href="/api/lume"
className={cn(
buttonVariants({
color: 'secondary',
}),
'no-underline h-10'
)}>
View API Reference
<ChevronRight size={18} />
</a>
</Callout>
Coming soon.

View File

@@ -1,350 +0,0 @@
---
title: Lumier
description: Run macOS and Linux virtual machines effortlessly in Docker containers with browser-based VNC access.
macos: true
linux: true
github:
- https://github.com/trycua/cua/tree/main/libs/lumier
---
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
import { cn } from 'fumadocs-ui/utils/cn';
import { ChevronRight } from 'lucide-react';
import Link from 'next/link';
import { Step, Steps } from 'fumadocs-ui/components/steps';
## What is Lumier?
Lumier is a streamlined interface for running macOS and Linux virtual machines with minimal setup. It packages a pre-configured environment in Docker that connects to the `lume` virtualization service on your host machine.
<div align="center">
<video
src="https://github.com/user-attachments/assets/2ecca01c-cb6f-4c35-a5a7-69bc58bd94e2"
width="800"
controls></video>
</div>
### Features
- **Quick Setup** - Get a VM running in minutes
- **Browser Access** - VNC interface accessible from any browser
- **Easy File Sharing** - Seamless file transfer between host and VM
- **Simple Configuration** - Environment variables for easy customization
- **Hardware Acceleration** - Native virtualization using Apple's framework
<Callout type="info">
Lumier uses Docker as a packaging system, not for isolation. It creates true
virtual machines using Apple's Virtualization Framework through the Lume CLI.
</Callout>
## Installation
### Prerequisites
<Steps>
<Step>
### Install Docker for Apple Silicon
Download and install [Docker Desktop](https://desktop.docker.com/mac/main/arm64/Docker.dmg) for Mac.
Make sure Docker is running before proceeding to the next step.
</Step>
<Step>
### Install Lume Virtualization Service
Install [Lume](./lume/) with a single command:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
```
<Callout type="info">
Lume runs as a background service on port 7777. If this port is already in
use, specify a different port with the `--port` option during installation.
</Callout>
</Step>
</Steps>
## Getting Started
### Quick Start
Run your first macOS VM with a single Docker command:
```bash
# Run a macOS VM with default settings
docker run -it \
-e LUME_SERVER_URL="host.docker.internal:7777" \
-p 5900:5900 \
ghcr.io/trycua/lumier:latest
```
### Basic Configuration
Customize your VM with environment variables:
```bash
docker run -it \
-e LUME_SERVER_URL="host.docker.internal:7777" \
-e VM_NAME="my-dev-vm" \
-e VM_CPUS="8" \
-e VM_MEMORY="16384" \
-e VM_STORAGE="100" \
-e VNC_PASSWORD="mysecretpassword" \
-p 5900:5900 \
ghcr.io/trycua/lumier:latest
```
### Access Your VM
Once running, access your VM through:
1. **VNC Client**: Connect to `vnc://localhost:5900`
2. **Web Browser**: Navigate to `http://localhost:5900` (if using noVNC)
## Examples
### Ephemeral VM (Temporary)
Run a VM that resets on restart - perfect for testing:
```bash
docker run -it --rm \
--name macos-vm \
-p 8006:8006 \
-e VM_NAME=macos-vm \
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
-e CPU_CORES=4 \
-e RAM_SIZE=8192 \
trycua/lumier:latest
```
<Callout type="info">
Access your VM at `http://localhost:8006` after startup. Changes will be lost
when the container stops.
</Callout>
### Persistent VM
Save your VM state between sessions with persistent storage:
```bash
# First, create a storage directory if it doesn't exist
mkdir -p storage
# Then run the container with persistent storage
docker run -it --rm \
--name lumier-vm \
-p 8006:8006 \
-v $(pwd)/storage:/storage \
-e VM_NAME=lumier-vm \
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
-e CPU_CORES=4 \
-e RAM_SIZE=8192 \
-e HOST_STORAGE_PATH=$(pwd)/storage \
trycua/lumier:latest
```
### File Sharing
Share files between your host and VM:
```bash
# Create both storage and shared folders
mkdir -p storage shared
# Run with both persistent storage and a shared folder
docker run -it --rm \
--name lumier-vm \
-p 8006:8006 \
-v $(pwd)/storage:/storage \
-v $(pwd)/shared:/shared \
-e VM_NAME=lumier-vm \
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
-e CPU_CORES=4 \
-e RAM_SIZE=8192 \
-e HOST_STORAGE_PATH=$(pwd)/storage \
-e HOST_SHARED_PATH=$(pwd)/shared \
trycua/lumier:latest
```
Files in the `shared` folder are accessible from both your Mac and the VM.
### Automation with Startup Scripts
Automate VM setup with startup scripts:
```bash
# Create the lifecycle directory in your shared folder
mkdir -p shared/lifecycle
# Create a sample on-logon.sh script
cat > shared/lifecycle/on-logon.sh << 'EOF'
#!/usr/bin/env bash
# Create a file on the desktop
echo "Hello from Lumier!" > /Users/lume/Desktop/hello_lume.txt
# You can add more commands to execute at VM startup
# For example:
# - Configure environment variables
# - Start applications
# - Mount network drives
# - Set up development environments
EOF
# Make the script executable
chmod +x shared/lifecycle/on-logon.sh
```
The script runs automatically on VM startup with access to:
- Home directory: `/Users/lume`
- Shared folder: `/Volumes/My Shared Files`
- All VM resources
### Docker Compose
For easier management, use Docker Compose:
```yaml
services:
lumier:
image: trycua/lumier:latest
container_name: lumier-vm
restart: unless-stopped
ports:
- '8006:8006' # Port for VNC access
volumes:
- ./storage:/storage # VM persistent storage
- ./shared:/shared # Shared folder accessible in the VM
environment:
- VM_NAME=lumier-vm
- VERSION=ghcr.io/trycua/macos-sequoia-cua:latest
- CPU_CORES=4
- RAM_SIZE=8192
- HOST_STORAGE_PATH=${PWD}/storage
- HOST_SHARED_PATH=${PWD}/shared
stop_signal: SIGINT
stop_grace_period: 2m
```
Run with Docker Compose:
```bash
# First create the required directories
mkdir -p storage shared
# Start the container
docker-compose up -d
# View the logs
docker-compose logs -f
# Stop the container when done
docker-compose down
```
## Advanced Topics
### Building from Source
Customize Lumier by building from source:
```bash
# Clone the repository
git clone https://github.com/trycua/cua.git
cd cua/libs/lumier
# Build the Docker image
docker build -t lumier-custom:latest .
# 3. Run your custom build
docker run -it --rm \
--name lumier-vm \
-p 8006:8006 \
-e VM_NAME=lumier-vm \
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
-e CPU_CORES=4 \
-e RAM_SIZE=8192 \
lumier-custom:latest
```
### Customization Options
The Dockerfile provides several customization points:
1. **Base image**: The container uses Debian Bullseye Slim as the base. You can modify this if needed.
2. **Installed packages**: You can add or remove packages in the apt-get install list.
3. **Hooks**: Check the `/run/hooks/` directory for scripts that run at specific points during VM lifecycle.
4. **Configuration**: Review `/run/config/constants.sh` for default settings.
After making your modifications, you can build and push your custom image to your own Docker Hub repository:
```bash
# Build with a custom tag
docker build -t yourusername/lumier:custom .
# Push to Docker Hub (after docker login)
docker push yourusername/lumier:custom
```
### Configuration Reference
#### Environment Variables
| Variable | Description | Default | Example |
| ------------------- | --------------------------- | --------------------------- | ----------------------------------------- |
| `LUME_SERVER_URL` | Lume service URL | `host.docker.internal:7777` | `host.docker.internal:8080` |
| `VM_NAME` | Virtual machine name | `lumier-vm` | `my-dev-vm` |
| `VERSION` | VM image to use | - | `ghcr.io/trycua/macos-sequoia-cua:latest` |
| `VM_CPUS` | Number of CPU cores | `4` | `8` |
| `VM_MEMORY` | Memory in MB | `8192` | `16384` |
| `VM_STORAGE` | Storage size in GB | `50` | `100` |
| `VNC_PASSWORD` | VNC access password | - | `mysecretpassword` |
| `HOST_STORAGE_PATH` | Host path for VM storage | - | `$(pwd)/storage` |
| `HOST_SHARED_PATH` | Host path for shared folder | - | `$(pwd)/shared` |
#### Port Configuration
- **VNC Port**: `-p 5900:5900` for standard VNC access
- **Web Port**: `-p 8006:8006` for browser-based access
- Use different host ports if defaults are occupied: `-p 8007:8006`
## Resources
<Cards>
<Card
title="Lume Documentation"
description="Learn more about the virtualization service powering Lumier"
href="/home/libraries/lume"
/>
<Card
title="Computer Library"
description="Automate your VMs with the Computer library"
href="/home/libraries/computer"
/>
</Cards>
---
<Callout type="info">
**Need detailed API documentation?**{' '}
<span className="w-full">
Explore the complete API reference with detailed class documentation, and
method signatures.
</span>
<a
href="/api/lumier"
className={cn(
buttonVariants({
color: 'secondary',
}),
'no-underline h-10'
)}>
View API Reference
<ChevronRight size={18} />
</a>
</Callout>

View File

@@ -1,202 +0,0 @@
---
title: MCP Server
description: Model Context Protocol server for Computer-Use Agent integration
pypi: cua-mcp-server
macos: true
linux: true
windows: true
github:
- https://github.com/trycua/cua/tree/main/libs/python/computer-server
---
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
import { cn } from 'fumadocs-ui/utils/cn';
import { ChevronRight } from 'lucide-react';
import { Step, Steps } from 'fumadocs-ui/components/steps';
**MCP Server** enables Computer-Use Agent (CUA) integration with Claude Desktop and other Model Context Protocol (MCP) clients, providing seamless access to computer automation capabilities through a standardized interface.
## Features
- **MCP Integration** - Connect CUA to Claude Desktop and other MCP-compatible clients
- **Computer Control** - Full screen, keyboard, and mouse automation capabilities
- **Tool System** - Execute commands, take screenshots, and interact with applications
- **Easy Setup** - Simple configuration with Claude Desktop or any MCP client
## Installation
### Prerequisites
Before installing the MCP server, ensure you have:
1. **Lume CLI** installed and configured
2. **macOS CUA image** pulled and ready
3. **Python 3.10+** installed on your system
<Callout type="info">
Follow our [Cua Usage Guide](../guides/cua-usage-guide.mdx) for help setting
everything up.
</Callout>
### Install via pip
```bash
pip install cua-mcp-server
```
This will install:
- The MCP server
- CUA agent and computer dependencies
- An executable `cua-mcp-server` script in your PATH
### Install Script
For automated installation, use our setup script:
```bash
curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/mcp-server/scripts/install_mcp_server.sh | bash
```
This script:
- Creates the `~/.cua` directory
- Generates a startup script at `~/.cua/start_mcp_server.sh`
- Manages Python virtual environments automatically
- Installs and updates the cua-mcp-server package
## Getting Started
You can then use the script in your MCP configuration like this:
```json
{
"mcpServers": {
"cua-agent": {
"command": "/bin/bash",
"args": ["~/.cua/start_mcp_server.sh"],
"env": {
"CUA_AGENT_LOOP": "OMNI",
"CUA_MODEL_PROVIDER": "ANTHROPIC",
"CUA_MODEL_NAME": "claude-3-7-sonnet-20250219",
"CUA_PROVIDER_API_KEY": "your-api-key"
}
}
}
}
```
### Development Config
If you want to develop with the cua-mcp-server directly without installation, you can use this configuration:
```json
{
"mcpServers": {
"cua-agent": {
"command": "/bin/bash",
"args": ["~/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"],
"env": {
"CUA_AGENT_LOOP": "UITARS",
"CUA_MODEL_PROVIDER": "OAICOMPAT",
"CUA_MODEL_NAME": "ByteDance-Seed/UI-TARS-1.5-7B",
"CUA_PROVIDER_BASE_URL": "https://****************.us-east-1.aws.endpoints.huggingface.cloud/v1",
"CUA_PROVIDER_API_KEY": "your-api-key"
}
}
}
}
```
This configuration:
- Uses the start_mcp_server.sh script which automatically sets up the Python path and runs the server module
- Works with Claude Desktop, Cursor, or any other MCP client
- Automatically uses your development code without requiring installation
Just add this to your MCP client's configuration and it will use your local development version of the server.
### Environment Variables
The MCP server is configured using the following environment variables.
| Variable | Description | Default |
| ----------------------- | ----------------------------------------------------- | ----------------------- |
| `CUA_AGENT_LOOP` | Agent loop to use (OPENAI, ANTHROPIC, UITARS, OMNI) | OMNI |
| `CUA_MODEL_PROVIDER` | Model provider (ANTHROPIC, OPENAI, OLLAMA, OAICOMPAT) | ANTHROPIC |
| `CUA_MODEL_NAME` | Model name to use | None (provider default) |
| `CUA_PROVIDER_BASE_URL` | Base URL for provider API | None |
| `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 |
### Usage
Once configured, you can simply ask the model to perform computer tasks:
- "Open Chrome and go to github.com"
- "Create a folder called 'Projects' on my desktop"
- "Find all PDFs in my Downloads folder"
- "Take a screenshot and highlight the error message"
The model will automatically use your CUA agent to perform these tasks.
## Available Tools
The MCP server exposes the following tools to Claude:
1. `run_cua_task` - Run a single Computer-Use Agent task with the given instruction
2. `run_multi_cua_tasks` - Run multiple tasks in sequence
## Integrations
### Claude Desktop
To use with Claude Desktop, add an entry to your Claude Desktop configuration (`claude_desktop_config.json`, typically found in `~/.config/claude-desktop/`):
For more information on MCP with Claude Desktop, see the [official MCP User Guide](https://modelcontextprotocol.io/quickstart/user).
### Cursor
To use with Cursor, add an MCP configuration file in one of these locations:
- **Project-specific**: Create `.cursor/mcp.json` in your project directory
- **Global**: Create `~/.cursor/mcp.json` in your home directory
After configuration, you can simply tell Cursor's Agent to perform computer tasks by explicitly mentioning the CUA agent, such as "Use the computer control tools to open Safari."
For more information on MCP with Cursor, see the [official Cursor MCP documentation](https://docs.cursor.com/context/model-context-protocol).
## Troubleshooting
Ensure you have valid API keys:
- Add your Anthropic API key, or other model provider API key in the Claude Desktop config (as shown above)
- Or set it as an environment variable in your shell profile
If you get a `/bin/bash: ~/cua/libs/python/mcp-server/scripts/start_mcp_server.sh: No such file or directory` error, try changing the path to the script to be absolute instead of relative.
View MCP server logs:
```bash
tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
```
---
<Callout type="info">
**Need detailed API documentation?**
<span className="w-full">
Explore the complete API reference with detailed class documentation, and
method signatures.
</span>
<a
href="/api/mcp-server"
className={cn(
buttonVariants({
color: 'secondary',
}),
'no-underline h-10'
)}>
View API Reference
<ChevronRight size={18} />
</a>
</Callout>

View File

@@ -1,16 +0,0 @@
{
"title": "Libraries",
"description": "Libraries",
"icon": "Library",
"pages": [
"agent",
"computer",
"computer-server",
"cloud",
"core",
"lume",
"lumier",
"mcp-server",
"som"
]
}

View File

@@ -1,209 +0,0 @@
---
title: Set-of-Mark
description: A high-performance visual grounding library for detecting and analyzing UI elements in screenshots.
macos: true
windows: true
linux: true
pypi: cua-computer
github:
- https://github.com/trycua/cua/tree/main/libs/python/som
---
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
import { cn } from 'fumadocs-ui/utils/cn';
import { ChevronRight } from 'lucide-react';
## Overview
**Set-of-Mark (Som)** is a high-performance visual grounding library for detecting and analyzing UI elements in screenshots. Built for the Computer-Use Agent (CUA) framework, it combines state-of-the-art computer vision models to identify icons, buttons, and text in user interfaces.
<Callout type="info">
Som is optimized for **Apple Silicon** with Metal Performance Shaders (MPS)
acceleration, achieving sub-second detection times while maintaining high
accuracy.
</Callout>
### Key Features
- **Hardware Acceleration** - Automatic detection of MPS, CUDA, or CPU
- **Multi-Model Architecture** - YOLO for icons + EasyOCR for text
- **Optimized Performance** - Sub-second detection on Apple Silicon
- **Flexible Configuration** - Tunable thresholds for different use cases
- **Rich Output Format** - Structured data with confidence scores
- **Visual Debugging** - Annotated screenshots with numbered elements
## Installation
### Install from PyPI
```bash
pip install cua-som
```
<Callout type="warning">
Som requires Python 3.11 or higher. For best performance, use macOS with Apple
Silicon.
</Callout>
### Install from Source
```bash
# Clone the repository
git clone https://github.com/cua/som.git
cd som
# Using PDM (recommended)
pdm install
# Or using pip
pip install -e .
```
### System Requirements
| Platform | Hardware | Detection Time |
| -------- | ------------------------ | -------------- |
| macOS | Apple Silicon (M1/M2/M3) | ~0.4s |
| Any | CPU only | ~1.3s |
## Getting Started
### Basic Usage
Here's a simple example to detect UI elements in a screenshot:
```python
from som import OmniParser
from PIL import Image
# Initialize the parser
parser = OmniParser()
# Load and process an image
image = Image.open("screenshot.png")
result = parser.parse(
image,
box_threshold=0.3, # Confidence threshold
iou_threshold=0.1, # Overlap threshold
use_ocr=True # Enable text detection
)
# Print detected elements
for elem in result.elements:
if elem.type == "icon":
print(f"Icon: confidence={elem.confidence:.3f}, bbox={elem.bbox.coordinates}")
else: # text
print(f"Text: '{elem.content}', confidence={elem.confidence:.3f}")
```
### Advanced Configuration
Customize detection parameters for your specific use case:
```python
result = parser.parse(
image,
box_threshold=0.3, # Confidence threshold (0.0-1.0)
iou_threshold=0.1, # Overlap threshold (0.0-1.0)
use_ocr=True, # Enable text detection
)
```
## Configuration Guide
### Box Thresholds
Controls detection confidence (default: 0.3)
- **Higher values (0.4-0.5)**: More precise, fewer false positives
- **Lower values (0.1 - 0.2)**: More detections, may include noise
- **Recommended**: 0.3 for balanced performance
### Intersection Over Union (IOU) Thresholds
Set the `iou_threshold` parameter to control when overlapping element boxes should be merged into a single detection. A value of 0.1-0.2 is recommended for most use cases. Higher values will require more overlap before merging occurs.
<div class="flex gap-x-6">
<IOU
title="Low Overlap (Keep Both)"
description="When boxes have minimal overlap (IOU ~ 0.05), both detections are kept as separate elements."
rect1={{
left: 30,
top: 30,
width: 60,
height: 50,
fill: 'rgba(0, 0, 255, 0.6)',
name: 'box1',
}}
rect2={{
left: 80,
top: 70,
width: 60,
height: 50,
fill: 'rgba(255, 165, 0, 0.6)',
name: 'box2',
}}
/>
<IOU
title="High Overlap (Merge)"
description="When boxes significant overlap (IOU ~ 0.4), they are merged into a single detection to avoid duplicates."
rect1={{
left: 30,
top: 30,
width: 80,
height: 60,
fill: 'rgba(0, 0, 255, 0.6)',
name: 'box1',
}}
rect2={{
left: 50,
top: 40,
width: 80,
height: 60,
fill: 'rgba(255, 165, 0, 0.6)',
name: 'box2',
}}
/>
</div>
## Performance
<Cards>
<Card title="Metal Performance Shaders (Apple Silicon)" description="Best performance on macOS">
- Multi-scale detection (640px, 1280px, 1920px)
- Test-time augmentation enabled
- Half-precision (FP16)
- ~0.4s average detection time
- Best for production use
</Card>
<Card title="CPU Fallback" description="Universal compatibility">
- Single-scale detection (1280px)
- Full precision (FP32)
- ~1.3s average time
- Reliable fallback option
</Card>
</Cards>
---
<Callout type="info">
**Need the full API documentation?**
<span className="w-full">
Explore the complete API reference with detailed class documentation, and
method signatures.
</span>
<a
href="/api/som"
className={cn(
buttonVariants({
color: 'secondary',
}),
'no-underline h-10'
)}>
View API Reference
<ChevronRight size={18} />
</a>
</Callout>

View File

@@ -5,17 +5,14 @@
"defaultOpen": true,
"pages": [
"index",
"compatibility",
"faq",
"quickstart-ui",
"quickstart-devs",
"telemetry",
"---[BookCopy]Guides---",
"guides/cua-usage-guide",
"guides/developer-guide",
"guides/dev-container-setup",
"guides/computer-use-agent-quickstart",
"guides/agent-gradio-ui",
"guides/computer-use-gradio-ui",
"---[Library]Libraries---",
"---[BookCopy]Computer Playbook---",
"...computer-sdk",
"---[BookCopy]Agent Playbook---",
"...agent-sdk",
"---[CodeXml]API Reference---",
"...libraries"
]
}

View File

@@ -0,0 +1,68 @@
---
title: Quickstart (for Developers)
description: Get started with c/ua in 5 steps
icon: Rocket
---
Get up and running with c/ua in 5 simple steps.
## 1. Introduction
c/ua combines Computer (interface) + Agent (AI) for automating desktop apps. Computer handles clicks/typing, Agent provides the intelligence.
## 2. Create Your First c/ua Container
1. Go to [trycua.com/signin](https://www.trycua.com/signin)
2. Navigate to **Dashboard > Containers > Create Instance**
3. Create a **Medium, Ubuntu 22** container
4. Note your container name and API key
## 3. Install c/ua
```bash
pip install "cua-agent2[all]" cua-computer
```
## 4. Using Computer
```python
from computer import Computer
async with Computer(
os_type="linux",
provider_type="cloud",
name="your-container-name",
api_key="your-api-key"
) as computer:
# Take screenshot
screenshot = await computer.interface.screenshot()
# Click and type
await computer.interface.left_click(100, 100)
await computer.interface.type("Hello!")
```
## 5. Using Agent
```python
from agent2 import ComputerAgent
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
max_trajectory_budget=5.0
)
messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
```
## Next Steps
- Explore the [SDK documentation](/docs/sdk) for advanced features
- Learn about [trajectory tracking and callbacks](/docs/concepts)
- Join our [Discord community](https://discord.com/invite/mVnXXpdE85) for support

View File

@@ -0,0 +1,43 @@
---
title: Quickstart (GUI)
description: Get started with the c/ua Agent UI in 5 steps
icon: Rocket
---
Get up and running with the c/ua Agent UI in 5 simple steps.
## 1. Introduction
c/ua combines Computer (interface) + Agent (AI) for automating desktop apps. The Agent UI provides a simple chat interface to control your remote computer using natural language.
## 2. Create Your First c/ua Container
1. Go to [trycua.com/signin](https://www.trycua.com/signin)
2. Navigate to **Dashboard > Containers > Create Instance**
3. Create a **Medium, Ubuntu 22** container
4. Note your container name and API key
## 3. Install c/ua
```bash
pip install "cua-agent2[all]" cua-computer
```
## 4. Run the Agent UI
```bash
python -m agent.ui
```
## 5. Start Chatting
Open your browser to the displayed URL and start chatting with your computer-using agent.
You can ask your agent to perform actions like:
- "Open Firefox and go to github.com"
- "Take a screenshot and tell me what's on the screen"
- "Type 'Hello world' into the terminal"
---
For advanced Python usage, see the [Quickstart for Developers](/docs/quickstart-devs).