mirror of
https://github.com/trycua/computer.git
synced 2026-02-24 15:29:01 -06:00
Reworked docs
This commit is contained in:
@@ -1,41 +0,0 @@
|
||||
---
|
||||
title: Agent
|
||||
description: Reference for the current version of the Agent library.
|
||||
github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/agent
|
||||
---
|
||||
|
||||
## ⚠️ 🚧 Under Construction 🚧 ⚠️
|
||||
|
||||
The Agent API reference documentation is currently under development.
|
||||
|
||||
## Overview
|
||||
|
||||
The Agent library provides programmatic interfaces for AI agent interactions.
|
||||
|
||||
## API Documentation
|
||||
|
||||
```python
|
||||
# Import necessary components
|
||||
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
|
||||
|
||||
# UI-TARS-1.5 agent for local execution with MLX
|
||||
ComputerAgent(loop=AgentLoop.UITARS, model=LLM(provider=LLMProvider.MLXVLM, name="mlx-community/UI-TARS-1.5-7B-6bit"))
|
||||
# OpenAI Computer-Use agent using OPENAI_API_KEY
|
||||
ComputerAgent(loop=AgentLoop.OPENAI, model=LLM(provider=LLMProvider.OPENAI, name="computer-use-preview"))
|
||||
# Anthropic Claude agent using ANTHROPIC_API_KEY
|
||||
ComputerAgent(loop=AgentLoop.ANTHROPIC, model=LLM(provider=LLMProvider.ANTHROPIC))
|
||||
|
||||
# OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision LLM
|
||||
ComputerAgent(loop=AgentLoop.OMNI, model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:12b-it-q4_K_M"))
|
||||
# OpenRouter example using OAICOMPAT provider
|
||||
ComputerAgent(
|
||||
loop=AgentLoop.OMNI,
|
||||
model=LLM(
|
||||
provider=LLMProvider.OAICOMPAT,
|
||||
name="openai/gpt-4o-mini",
|
||||
provider_base_url="https://openrouter.ai/api/v1"
|
||||
),
|
||||
api_key="your-openrouter-api-key"
|
||||
)
|
||||
```
|
||||
@@ -1,24 +0,0 @@
|
||||
---
|
||||
title: API Reference
|
||||
description: Explore API reference for Cua services and libraries.
|
||||
icon: CodeXml
|
||||
---
|
||||
|
||||
## ⚠️ 🚧 Under Construction 🚧 ⚠️
|
||||
|
||||
Please note that the API Reference documenation is currently under construction. Some libraries will have limited documentation written, while others will have none.
|
||||
|
||||
We're currently working on generated comprehensive classes and definitions for all libraries.
|
||||
|
||||
If you need to find anything specific and it's not here, you can visit the repository below to browse implementations.
|
||||
|
||||
<Card
|
||||
title="Cua - GitHub"
|
||||
icon={
|
||||
<svg role="img" viewBox="0 0 24 24" fill="currentColor">
|
||||
<path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12"></path>
|
||||
</svg>
|
||||
}
|
||||
href="https://github.com/trycua/cua/tree/main/libs">
|
||||
Visit the repository that contains all libraries.
|
||||
</Card>
|
||||
@@ -1,18 +0,0 @@
|
||||
---
|
||||
title: Lume
|
||||
description: Reference for the current version of the Lume CLI.
|
||||
github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/lume
|
||||
---
|
||||
|
||||
## ⚠️ 🚧 Under Construction 🚧 ⚠️
|
||||
|
||||
The Lume API reference documentation is currently under development.
|
||||
|
||||
## Overview
|
||||
|
||||
The Lume CLI provides command line tools for managing virtual machines with Lume.
|
||||
|
||||
## API Documentation
|
||||
|
||||
Coming soon.
|
||||
@@ -1,6 +0,0 @@
|
||||
{
|
||||
"title": "API Reference",
|
||||
"description": "API Reference",
|
||||
"root": true,
|
||||
"pages": ["index", "---", "..."]
|
||||
}
|
||||
38
docs/content/docs/home/agent-sdk/agent-loops.mdx
Normal file
38
docs/content/docs/home/agent-sdk/agent-loops.mdx
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
title: Agent Loops
|
||||
description: Supported computer-using agent loops and models
|
||||
---
|
||||
|
||||
An agent can be thought of as a loop - it generates actions, executes them, and repeats until done:
|
||||
|
||||
1. **Generate**: Your `model` generates `output_text`, `computer_call`, `function_call`
|
||||
2. **Execute**: The `computer` safely executes those items
|
||||
3. **Complete**: If the model has no more calls, it's done!
|
||||
|
||||
To run an agent loop simply do:
|
||||
|
||||
```python
|
||||
from agent2 import ComputerAgent
|
||||
from computer import Computer
|
||||
|
||||
computer = Computer() # Connect to a c/ua container
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer]
|
||||
)
|
||||
|
||||
prompt = "open github, navigate to trycua/cua"
|
||||
|
||||
async for result in agent.run(prompt):
|
||||
print("Agent:", result["output"][-1]["content"][0]["text"])
|
||||
```
|
||||
|
||||
We currently support 4 computer-using agent loops:
|
||||
|
||||
- Anthropic CUAs
|
||||
- OpenAI CUA Preview
|
||||
- UI-TARS 1.5
|
||||
- Omniparser + LLMs
|
||||
|
||||
For a full list of supported models and configurations, see the [Supported Agents](./supported-agents) page.
|
||||
@@ -0,0 +1,52 @@
|
||||
---
|
||||
title: Agent Lifecycle
|
||||
description: Agent callback lifecycle and hooks
|
||||
---
|
||||
|
||||
# Callbacks
|
||||
|
||||
Callbacks provide hooks into the agent lifecycle for extensibility. They're called in a specific order during agent execution.
|
||||
|
||||
## Callback Lifecycle
|
||||
|
||||
### 1. `on_run_start(kwargs, old_items)`
|
||||
Called once when agent run begins. Initialize tracking, logging, or state.
|
||||
|
||||
### 2. `on_run_continue(kwargs, old_items, new_items)` → bool
|
||||
Called before each iteration. Return `False` to stop execution (e.g., budget limits).
|
||||
|
||||
### 3. `on_llm_start(messages)` → messages
|
||||
Preprocess messages before LLM call. Use for PII anonymization, image retention.
|
||||
|
||||
### 4. `on_api_start(kwargs)`
|
||||
Called before each LLM API call.
|
||||
|
||||
### 5. `on_api_end(kwargs, result)`
|
||||
Called after each LLM API call completes.
|
||||
|
||||
### 6. `on_usage(usage)`
|
||||
Called when usage information is received from LLM.
|
||||
|
||||
### 7. `on_llm_end(messages)` → messages
|
||||
Postprocess messages after LLM call. Use for PII deanonymization.
|
||||
|
||||
### 8. `on_responses(kwargs, responses)`
|
||||
Called when responses are received from agent loop.
|
||||
|
||||
### 9. Response-specific hooks:
|
||||
- `on_text(item)` - Text messages
|
||||
- `on_computer_call_start(item)` - Before computer actions
|
||||
- `on_computer_call_end(item, result)` - After computer actions
|
||||
- `on_function_call_start(item)` - Before function calls
|
||||
- `on_function_call_end(item, result)` - After function calls
|
||||
- `on_screenshot(screenshot, name)` - When screenshots are taken
|
||||
|
||||
### 10. `on_run_end(kwargs, old_items, new_items)`
|
||||
Called when agent run completes. Finalize tracking, save trajectories.
|
||||
|
||||
## Built-in Callbacks
|
||||
|
||||
- **ImageRetentionCallback**: Limits recent images in context
|
||||
- **BudgetManagerCallback**: Stops execution when budget exceeded
|
||||
- **TrajectorySaverCallback**: Saves conversation trajectories
|
||||
- **LoggingCallback**: Logs agent activities
|
||||
87
docs/content/docs/home/agent-sdk/callbacks/cost-saving.mdx
Normal file
87
docs/content/docs/home/agent-sdk/callbacks/cost-saving.mdx
Normal file
@@ -0,0 +1,87 @@
|
||||
---
|
||||
title: Cost Optimization
|
||||
description: Budget management and image retention for cost optimization
|
||||
---
|
||||
|
||||
# Cost Optimization Callbacks
|
||||
|
||||
Optimize agent costs with budget management and image retention callbacks.
|
||||
|
||||
## Budget Manager Callbacks Example
|
||||
|
||||
```python
|
||||
from agent2.callbacks import BudgetManagerCallback
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
callbacks=[
|
||||
BudgetManagerCallback(
|
||||
max_budget=5.0, # $5 limit
|
||||
reset_after_each_run=False,
|
||||
raise_error=True
|
||||
)
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
## Budget Manager Shorthand
|
||||
|
||||
```python
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
max_trajectory_budget=5.0 # Auto-adds BudgetManagerCallback
|
||||
)
|
||||
```
|
||||
|
||||
**Or with options:**
|
||||
```python
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
max_trajectory_budget={"max_budget": 5.0, "raise_error": True}
|
||||
)
|
||||
```
|
||||
|
||||
## Image Retention Callbacks Example
|
||||
|
||||
```python
|
||||
from agent2.callbacks import ImageRetentionCallback
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
callbacks=[
|
||||
ImageRetentionCallback(only_n_most_recent_images=3)
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
## Image Retention Shorthand
|
||||
|
||||
```python
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
only_n_most_recent_images=3 # Auto-adds ImageRetentionCallback
|
||||
)
|
||||
```
|
||||
|
||||
## Combined Cost Optimization
|
||||
|
||||
```python
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
max_trajectory_budget=5.0, # Budget limit
|
||||
only_n_most_recent_images=3, # Image retention
|
||||
trajectory_dir="trajectories" # Track spending
|
||||
)
|
||||
```
|
||||
|
||||
## Budget Manager Options
|
||||
|
||||
- `max_budget`: Dollar limit for trajectory
|
||||
- `reset_after_each_run`: Reset budget per run (default: True)
|
||||
- `raise_error`: Raise exception vs. graceful stop (default: False)
|
||||
88
docs/content/docs/home/agent-sdk/callbacks/logging.mdx
Normal file
88
docs/content/docs/home/agent-sdk/callbacks/logging.mdx
Normal file
@@ -0,0 +1,88 @@
|
||||
---
|
||||
title: Logging
|
||||
description: Agent logging and custom logger implementation
|
||||
---
|
||||
|
||||
# Logging Callback
|
||||
|
||||
Built-in logging callback and custom logger creation for agent monitoring.
|
||||
|
||||
## Callbacks Example
|
||||
|
||||
```python
|
||||
from agent2.callbacks import LoggingCallback
|
||||
import logging
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
callbacks=[
|
||||
LoggingCallback(
|
||||
logger=logging.getLogger("cua"),
|
||||
level=logging.INFO
|
||||
)
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
## Shorthand
|
||||
|
||||
```python
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
verbosity=logging.INFO # Auto-adds LoggingCallback
|
||||
)
|
||||
```
|
||||
|
||||
## Custom Logger
|
||||
|
||||
Create custom loggers by extending AsyncCallbackHandler:
|
||||
|
||||
```python
|
||||
from agent2.callbacks.base import AsyncCallbackHandler
|
||||
import logging
|
||||
|
||||
class CustomLogger(AsyncCallbackHandler):
|
||||
def __init__(self, logger_name="agent"):
|
||||
self.logger = logging.getLogger(logger_name)
|
||||
self.logger.setLevel(logging.INFO)
|
||||
|
||||
# Add console handler
|
||||
handler = logging.StreamHandler()
|
||||
formatter = logging.Formatter(
|
||||
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
||||
)
|
||||
handler.setFormatter(formatter)
|
||||
self.logger.addHandler(handler)
|
||||
|
||||
async def on_run_start(self, kwargs, old_items):
|
||||
self.logger.info(f"Agent run started with model: {kwargs.get('model')}")
|
||||
|
||||
async def on_computer_call_start(self, item):
|
||||
action = item.get('action', {})
|
||||
self.logger.info(f"Computer action: {action.get('type')}")
|
||||
|
||||
async def on_usage(self, usage):
|
||||
cost = usage.get('response_cost', 0)
|
||||
self.logger.info(f"API call cost: ${cost:.4f}")
|
||||
|
||||
async def on_run_end(self, kwargs, old_items, new_items):
|
||||
self.logger.info("Agent run completed")
|
||||
|
||||
# Use custom logger
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
callbacks=[CustomLogger("my_agent")]
|
||||
)
|
||||
```
|
||||
|
||||
## Available Hooks
|
||||
|
||||
Log any agent event using these callback methods:
|
||||
- `on_run_start/end` - Run lifecycle
|
||||
- `on_computer_call_start/end` - Computer actions
|
||||
- `on_api_start/end` - LLM API calls
|
||||
- `on_usage` - Cost tracking
|
||||
- `on_screenshot` - Screenshot events
|
||||
11
docs/content/docs/home/agent-sdk/callbacks/meta.json
Normal file
11
docs/content/docs/home/agent-sdk/callbacks/meta.json
Normal file
@@ -0,0 +1,11 @@
|
||||
{
|
||||
"title": "Callbacks",
|
||||
"description": "Extending agents with callback hooks and built-in handlers",
|
||||
"pages": [
|
||||
"agent-lifecycle",
|
||||
"trajectories",
|
||||
"logging",
|
||||
"cost-saving",
|
||||
"pii-anonymization"
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
---
|
||||
title: PII Anonymization
|
||||
description: PII anonymization and data protection callbacks
|
||||
---
|
||||
|
||||
# PII Anonymization Callback
|
||||
|
||||
🚧 Coming Soon 🚧
|
||||
|
||||
🔒 🕵️ 🛡️ 📝 ✨
|
||||
|
||||
🚀 Stay tuned for PII anonymization features! 🚀
|
||||
51
docs/content/docs/home/agent-sdk/callbacks/trajectories.mdx
Normal file
51
docs/content/docs/home/agent-sdk/callbacks/trajectories.mdx
Normal file
@@ -0,0 +1,51 @@
|
||||
---
|
||||
title: Trajectories
|
||||
description: Recording and viewing agent conversation trajectories
|
||||
---
|
||||
|
||||
# Trajectory Saving Callback
|
||||
|
||||
The TrajectorySaverCallback records complete agent conversations including messages, actions, and screenshots for debugging and analysis.
|
||||
|
||||
## Callbacks Example
|
||||
|
||||
```python
|
||||
from agent2.callbacks import TrajectorySaverCallback
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
callbacks=[
|
||||
TrajectorySaverCallback(
|
||||
trajectory_dir="my_trajectories",
|
||||
save_screenshots=True
|
||||
)
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
## Shorthand
|
||||
|
||||
```python
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
trajectory_dir="trajectories" # Auto-adds TrajectorySaverCallback
|
||||
)
|
||||
```
|
||||
|
||||
## View Trajectories Online
|
||||
|
||||
View trajectories in the browser at:
|
||||
**[trycua.com/trajectory-viewer](http://trycua.com/trajectory-viewer)**
|
||||
|
||||
The viewer provides:
|
||||
- Interactive conversation replay
|
||||
- Screenshot galleries
|
||||
- No data collection
|
||||
|
||||
## Trajectory Structure
|
||||
|
||||
Each trajectory contains:
|
||||
- **metadata.json**: Run info, timestamps, usage stats (`total_tokens`, `response_cost`)
|
||||
- **turn_000/**: Turn-by-turn conversation history (api calls, responses, computer calls, screenshots)
|
||||
84
docs/content/docs/home/agent-sdk/chat-history.mdx
Normal file
84
docs/content/docs/home/agent-sdk/chat-history.mdx
Normal file
@@ -0,0 +1,84 @@
|
||||
---
|
||||
title: Chat History
|
||||
description: Managing conversation history and message arrays
|
||||
---
|
||||
|
||||
Managing conversation history is essential for multi-turn agent interactions. The agent maintains a messages array that tracks the entire conversation flow.
|
||||
|
||||
## Managing History
|
||||
|
||||
### Continuous Conversation
|
||||
|
||||
```python
|
||||
history = []
|
||||
|
||||
while True:
|
||||
user_input = input("> ")
|
||||
history.append({"role": "user", "content": user_input})
|
||||
|
||||
async for result in agent.run(history, stream=False):
|
||||
history += result["output"]
|
||||
```
|
||||
|
||||
## Message Array Structure
|
||||
|
||||
The messages array contains different types of messages that represent the conversation state:
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "go to trycua on gh"
|
||||
},
|
||||
{
|
||||
"summary": [
|
||||
{
|
||||
"text": "Searching Firefox for Trycua GitHub",
|
||||
"type": "summary_text"
|
||||
}
|
||||
],
|
||||
"type": "reasoning"
|
||||
},
|
||||
{
|
||||
"action": {
|
||||
"text": "Trycua GitHub",
|
||||
"type": "type"
|
||||
},
|
||||
"call_id": "call_QI6OsYkXxl6Ww1KvyJc4LKKq",
|
||||
"status": "completed",
|
||||
"type": "computer_call"
|
||||
},
|
||||
{
|
||||
"type": "computer_call_output",
|
||||
"call_id": "call_QI6OsYkXxl6Ww1KvyJc4LKKq",
|
||||
"output": {
|
||||
"type": "input_image",
|
||||
"image_url": "[omitted]"
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Message Types
|
||||
|
||||
- **user**: User input messages
|
||||
- **computer_call**: Computer actions (click, type, keypress, etc.)
|
||||
- **computer_call_output**: Results from computer actions (usually screenshots)
|
||||
- **function_call**: Function calls (e.g., `computer.call`)
|
||||
- **function_call_output**: Results from function calls
|
||||
- **reasoning**: Agent's internal reasoning and planning
|
||||
- **message**: Agent text responses
|
||||
|
||||
### Memory Management
|
||||
|
||||
For long conversations, consider using the `only_n_most_recent_images` parameter to manage memory:
|
||||
|
||||
```python
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
only_n_most_recent_images=3
|
||||
)
|
||||
```
|
||||
|
||||
This automatically removes old images from the conversation history to prevent context window overflow.
|
||||
12
docs/content/docs/home/agent-sdk/meta.json
Normal file
12
docs/content/docs/home/agent-sdk/meta.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"title": "Agent SDK",
|
||||
"description": "Build computer-using agents with the Agent SDK",
|
||||
"pages": [
|
||||
"agent-loops",
|
||||
"supported-agents",
|
||||
"chat-history",
|
||||
"callbacks",
|
||||
"sandboxed-tools",
|
||||
"migration-guide"
|
||||
]
|
||||
}
|
||||
124
docs/content/docs/home/agent-sdk/migration-guide.mdx
Normal file
124
docs/content/docs/home/agent-sdk/migration-guide.mdx
Normal file
@@ -0,0 +1,124 @@
|
||||
---
|
||||
title: Migration Guide
|
||||
---
|
||||
|
||||
This guide lists **breaking changes** when migrating from the original `ComputerAgent` (v0.3.x) to the rewritten `ComputerAgent` (v0.4.x) and shows old vs new usage for all four agent loops.
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
- **Initialization:**
|
||||
- `ComputerAgent` (v0.4.x) uses `model` as a string (e.g. "anthropic/claude-3-5-sonnet-20241022") instead of `LLM` and `AgentLoop` objects.
|
||||
- `tools` is a list (can include multiple computers and decorated functions).
|
||||
- `callbacks` are now first-class for extensibility (image retention, budget, trajectory, logging, etc).
|
||||
- **No explicit `loop` parameter:**
|
||||
- Loop is inferred from the `model` string (e.g. `anthropic/`, `openai/`, `omniparser+`, `ui-tars`).
|
||||
- **No explicit `computer` parameter:**
|
||||
- Computers are added to `tools` list.
|
||||
|
||||
---
|
||||
|
||||
## Usage Examples: Old vs New
|
||||
|
||||
### 1. Anthropic Loop
|
||||
**Old:**
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop=AgentLoop.ANTHROPIC,
|
||||
model=LLM(provider=LLMProvider.ANTHROPIC)
|
||||
)
|
||||
async for result in agent.run("Take a screenshot"):
|
||||
print(result)
|
||||
```
|
||||
**New:**
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer]
|
||||
)
|
||||
messages = [{"role": "user", "content": "Take a screenshot"}]
|
||||
async for result in agent.run(messages):
|
||||
for item in result["output"]:
|
||||
if item["type"] == "message":
|
||||
print(item["content"][0]["text"])
|
||||
```
|
||||
|
||||
### 2. OpenAI Loop
|
||||
**Old:**
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop=AgentLoop.OPENAI,
|
||||
model=LLM(provider=LLMProvider.OPENAI)
|
||||
)
|
||||
async for result in agent.run("Take a screenshot"):
|
||||
print(result)
|
||||
```
|
||||
**New:**
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
model="openai/computer-use-preview",
|
||||
tools=[computer]
|
||||
)
|
||||
messages = [{"role": "user", "content": "Take a screenshot"}]
|
||||
async for result in agent.run(messages):
|
||||
for item in result["output"]:
|
||||
if item["type"] == "message":
|
||||
print(item["content"][0]["text"])
|
||||
```
|
||||
|
||||
### 3. UI-TARS Loop
|
||||
**Old:**
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop=AgentLoop.UITARS,
|
||||
model=LLM(provider=LLMProvider.OAICOMPAT, name="ByteDance-Seed/UI-TARS-1.5-7B", provider_base_url="https://.../v1")
|
||||
)
|
||||
async for result in agent.run("Take a screenshot"):
|
||||
print(result)
|
||||
```
|
||||
**New:**
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B",
|
||||
tools=[computer]
|
||||
)
|
||||
messages = [{"role": "user", "content": "Take a screenshot"}]
|
||||
async for result in agent.run(messages):
|
||||
for item in result["output"]:
|
||||
if item["type"] == "message":
|
||||
print(item["content"][0]["text"])
|
||||
```
|
||||
|
||||
### 4. Omni Loop
|
||||
**Old:**
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop=AgentLoop.OMNI,
|
||||
model=LLM(provider=LLMProvider.OLLAMA, name="gemma3")
|
||||
)
|
||||
async for result in agent.run("Take a screenshot"):
|
||||
print(result)
|
||||
```
|
||||
**New:**
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
model="omniparser+ollama_chat/gemma3",
|
||||
tools=[computer]
|
||||
)
|
||||
messages = [{"role": "user", "content": "Take a screenshot"}]
|
||||
async for result in agent.run(messages):
|
||||
for item in result["output"]:
|
||||
if item["type"] == "message":
|
||||
print(item["content"][0]["text"])
|
||||
```
|
||||
31
docs/content/docs/home/agent-sdk/sandboxed-tools.mdx
Normal file
31
docs/content/docs/home/agent-sdk/sandboxed-tools.mdx
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
title: Sandboxed Tools
|
||||
slug: sandboxed-tools
|
||||
---
|
||||
|
||||
The Agent SDK supports defining custom Python tools that run securely in sandboxed environments on remote C/ua Computers. This enables safe execution of user-defined functions, isolation of dependencies, and robust automation workflows.
|
||||
|
||||
## Example: Defining a Sandboxed Tool
|
||||
|
||||
```python
|
||||
from computer.helpers import sandboxed
|
||||
|
||||
@sandboxed()
|
||||
def read_file(location: str) -> str:
|
||||
"""Read contents of a file"""
|
||||
with open(location, 'r') as f:
|
||||
return f.read()
|
||||
```
|
||||
|
||||
You can then register this as a tool for your agent:
|
||||
|
||||
```python
|
||||
from agent2 import ComputerAgent
|
||||
from computer import Computer
|
||||
|
||||
computer = Computer(...)
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20240620",
|
||||
tools=[computer, read_file],
|
||||
)
|
||||
```
|
||||
32
docs/content/docs/home/agent-sdk/supported-agents.mdx
Normal file
32
docs/content/docs/home/agent-sdk/supported-agents.mdx
Normal file
@@ -0,0 +1,32 @@
|
||||
---
|
||||
title: Supported Agents
|
||||
---
|
||||
|
||||
This page lists all supported agent loops and their compatible models/configurations in c/ua.
|
||||
|
||||
All agent loops are compatible with any LLM provider supported by LiteLLM.
|
||||
|
||||
## Anthropic CUAs
|
||||
|
||||
- Claude 4: `claude-opus-4-20250514`, `claude-sonnet-4-20250514`
|
||||
- Claude 3.7: `claude-3-7-sonnet-20250219`
|
||||
- Claude 3.5: `claude-3-5-sonnet-20240620`
|
||||
|
||||
## OpenAI CUA Preview
|
||||
|
||||
- Computer-use-preview: `computer-use-preview`
|
||||
|
||||
## UI-TARS 1.5
|
||||
|
||||
- `huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B`
|
||||
- `huggingface/ByteDance-Seed/UI-TARS-1.5-7B` (requires TGI endpoint)
|
||||
|
||||
## Omniparser + LLMs
|
||||
|
||||
- `omniparser+vertex_ai/gemini-pro`
|
||||
- `omniparser+openai/gpt-4o`
|
||||
- Any LiteLLM-compatible model combined with Omniparser
|
||||
|
||||
---
|
||||
|
||||
For details on agent loop behavior and usage, see [Agent Loops](./agent-loops).
|
||||
@@ -1,77 +0,0 @@
|
||||
---
|
||||
title: Compatibility
|
||||
description: Compatibility information for running cua services.
|
||||
icon: MonitorCheck
|
||||
---
|
||||
|
||||
# Host OS Compatibility
|
||||
|
||||
_This section shows compatibility based on your **host operating system** (the OS you're running Cua on)._
|
||||
|
||||
## macOS Host
|
||||
|
||||
| Installation Method | Requirements | Lume | Cloud | Notes |
|
||||
| ------------------------ | ------------------------- | ------- | ------- | --------------------------- |
|
||||
| **playground-docker.sh** | Docker Desktop | ✅ Full | ✅ Full | Recommended for quick setup |
|
||||
| **Dev Container** | VS Code/WindSurf + Docker | ✅ Full | ✅ Full | Best for development |
|
||||
| **PyPI packages** | Python 3.12+ | ✅ Full | ✅ Full | Most flexible |
|
||||
|
||||
### macOS Host Requirements:
|
||||
|
||||
- macOS 15+ (Sequoia) for local VM support
|
||||
- Apple Silicon (M1/M2/M3/M4) recommended for best performance
|
||||
- Docker Desktop for containerized installations
|
||||
|
||||
## Ubuntu/Linux Host
|
||||
|
||||
| Installation Method | Requirements | Lume | Cloud | Notes |
|
||||
| ------------------------ | ------------------------- | ------- | ------- | --------------------------- |
|
||||
| **playground-docker.sh** | Docker Engine | ✅ Full | ✅ Full | Recommended for quick setup |
|
||||
| **Dev Container** | VS Code/WindSurf + Docker | ✅ Full | ✅ Full | Best for development |
|
||||
| **PyPI packages** | Python 3.12+ | ✅ Full | ✅ Full | Most flexible |
|
||||
|
||||
### Ubuntu/Linux Host Requirements:
|
||||
|
||||
- Ubuntu 20.04+ or equivalent Linux distribution
|
||||
- Docker Engine or Docker Desktop
|
||||
- Python 3.12+ for PyPI installation
|
||||
|
||||
## Windows Host
|
||||
|
||||
| Installation Method | Requirements | Lume | Winsandbox | Cloud | Notes |
|
||||
| ------------------------ | -------------------------------- | ---------------- | ---------------- | ------- | ------------- |
|
||||
| **playground-docker.sh** | Docker Desktop + WSL2 | ❌ Not supported | ❌ Not supported | ✅ Full | Requires WSL2 |
|
||||
| **Dev Container** | VS Code/WindSurf + Docker + WSL2 | ❌ Not supported | ❌ Not supported | ✅ Full | Requires WSL2 |
|
||||
| **PyPI packages** | Python 3.12+ | ❌ Not supported | ✅ Full | ✅ Full | |
|
||||
|
||||
### Windows Host Requirements:
|
||||
|
||||
- Windows 10/11 with WSL2 enabled for shell script execution
|
||||
- Docker Desktop with WSL2 backend
|
||||
- Windows Sandbox feature enabled (for Winsandbox support)
|
||||
- Python 3.12+ installed in WSL2 or Windows
|
||||
- **Note**: Lume CLI is not available on Windows - use Cloud or Winsandbox providers
|
||||
|
||||
---
|
||||
|
||||
# VM Emulation Support
|
||||
|
||||
_This section shows which **virtual machine operating systems** each provider can emulate._
|
||||
|
||||
| Provider | macOS VM | Ubuntu/Linux VM | Windows VM | Notes |
|
||||
| -------------- | ---------------- | ------------------ | ------------------ | ------------------------------------------------------ |
|
||||
| **Lume** | ✅ Full support | ⚠️ Limited support | ⚠️ Limited support | macOS: native; Ubuntu/Linux/Windows: need custom image |
|
||||
| **Cloud** | 🚧 Coming soon | ✅ Full support | 🚧 Coming soon | Currently Ubuntu only, macOS/Windows in development |
|
||||
| **Winsandbox** | ❌ Not supported | ❌ Not supported | ✅ Windows only | Windows 10/11 environments only |
|
||||
|
||||
# Model Provider Compatibility
|
||||
|
||||
_This section shows which **AI model providers** are supported on each host operating system._
|
||||
|
||||
| Provider | macOS Host | Ubuntu/Linux Host | Windows Host | Notes |
|
||||
| --------------------- | --------------- | ----------------- | ---------------- | ----------------------------------------------- |
|
||||
| **Anthropic** | ✅ Full support | ✅ Full support | ✅ Full support | Cloud-based API |
|
||||
| **OpenAI** | ✅ Full support | ✅ Full support | ✅ Full support | Cloud-based API |
|
||||
| **Ollama** | ✅ Full support | ✅ Full support | ✅ Full support | Local model serving |
|
||||
| **OpenAI Compatible** | ✅ Full support | ✅ Full support | ✅ Full support | Any OpenAI-compatible API endpoint |
|
||||
| **MLX VLM** | ✅ macOS only | ❌ Not supported | ❌ Not supported | Apple Silicon required. PyPI installation only. |
|
||||
@@ -1,100 +1,115 @@
|
||||
---
|
||||
title: Computer
|
||||
description: Reference for the current version of the Computer library.
|
||||
github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/computer
|
||||
- https://github.com/trycua/cua/tree/main/libs/typescript/computer
|
||||
title: Commands
|
||||
description: Computer commands and interface methods
|
||||
---
|
||||
|
||||
## ⚠️ 🚧 Under Construction 🚧 ⚠️
|
||||
This page describes the set of supported **commands** you can use to control a C/ua Computer directly via the Python SDK.
|
||||
|
||||
The Computer API reference documentation is currently under development.
|
||||
These commands map to the same actions available in the [Computer Server API Commands Reference](../libraries/computer-server/Commands), and provide low-level, async access to system operations from your agent or automation code.
|
||||
|
||||
## Overview
|
||||
## Shell Actions
|
||||
|
||||
The Computer library provides programmatic interfaces for computer automation and control.
|
||||
|
||||
## Reference
|
||||
Execute shell commands and get detailed results:
|
||||
|
||||
```python
|
||||
# Shell Actions
|
||||
result = await computer.interface.run_command(cmd) # Run shell command
|
||||
# Run shell command
|
||||
result = await computer.interface.run_command(cmd)
|
||||
# result.stdout, result.stderr, result.returncode
|
||||
```
|
||||
|
||||
# Mouse Actions
|
||||
## Mouse Actions
|
||||
|
||||
Precise mouse control and interaction:
|
||||
|
||||
```python
|
||||
# Basic clicks
|
||||
await computer.interface.left_click(x, y) # Left click at coordinates
|
||||
await computer.interface.right_click(x, y) # Right click at coordinates
|
||||
await computer.interface.double_click(x, y) # Double click at coordinates
|
||||
|
||||
# Cursor movement and dragging
|
||||
await computer.interface.move_cursor(x, y) # Move cursor to coordinates
|
||||
await computer.interface.drag_to(x, y, duration) # Drag to coordinates
|
||||
await computer.interface.get_cursor_position() # Get current cursor position
|
||||
|
||||
# Advanced mouse control
|
||||
await computer.interface.mouse_down(x, y, button="left") # Press and hold a mouse button
|
||||
await computer.interface.mouse_up(x, y, button="left") # Release a mouse button
|
||||
```
|
||||
|
||||
# Keyboard Actions
|
||||
## Keyboard Actions
|
||||
|
||||
Text input and key combinations:
|
||||
|
||||
```python
|
||||
# Text input
|
||||
await computer.interface.type_text("Hello") # Type text
|
||||
await computer.interface.press_key("enter") # Press a single key
|
||||
|
||||
# Key combinations and advanced control
|
||||
await computer.interface.hotkey("command", "c") # Press key combination
|
||||
await computer.interface.key_down("command") # Press and hold a key
|
||||
await computer.interface.key_up("command") # Release a key
|
||||
```
|
||||
|
||||
# Scrolling Actions
|
||||
## Scrolling Actions
|
||||
|
||||
Mouse wheel and scrolling control:
|
||||
|
||||
```python
|
||||
# Scrolling
|
||||
await computer.interface.scroll(x, y) # Scroll the mouse wheel
|
||||
await computer.interface.scroll_down(clicks) # Scroll down
|
||||
await computer.interface.scroll_up(clicks) # Scroll up
|
||||
```
|
||||
|
||||
# Screen Actions
|
||||
## Screen Actions
|
||||
|
||||
Screen capture and display information:
|
||||
|
||||
```python
|
||||
# Screen operations
|
||||
await computer.interface.screenshot() # Take a screenshot
|
||||
await computer.interface.get_screen_size() # Get screen dimensions
|
||||
```
|
||||
|
||||
# Clipboard Actions
|
||||
## Clipboard Actions
|
||||
|
||||
System clipboard management:
|
||||
|
||||
```python
|
||||
# Clipboard operations
|
||||
await computer.interface.set_clipboard(text) # Set clipboard content
|
||||
await computer.interface.copy_to_clipboard() # Get clipboard content
|
||||
```
|
||||
|
||||
# File System Operations
|
||||
## File System Operations
|
||||
|
||||
Direct file and directory manipulation:
|
||||
|
||||
```python
|
||||
# File existence checks
|
||||
await computer.interface.file_exists(path) # Check if file exists
|
||||
await computer.interface.directory_exists(path) # Check if directory exists
|
||||
|
||||
# File content operations
|
||||
await computer.interface.read_text(path, encoding="utf-8") # Read file content
|
||||
await computer.interface.write_text(path, content, encoding="utf-8") # Write file content
|
||||
await computer.interface.read_bytes(path) # Read file content as bytes
|
||||
await computer.interface.write_bytes(path, content) # Write file content as bytes
|
||||
|
||||
# File and directory management
|
||||
await computer.interface.delete_file(path) # Delete file
|
||||
await computer.interface.create_dir(path) # Create directory
|
||||
await computer.interface.delete_dir(path) # Delete directory
|
||||
await computer.interface.list_dir(path) # List directory contents
|
||||
```
|
||||
|
||||
## Accessibility
|
||||
|
||||
Access system accessibility information:
|
||||
|
||||
```python
|
||||
# Accessibility
|
||||
await computer.interface.get_accessibility_tree() # Get accessibility tree
|
||||
|
||||
# Delay Configuration
|
||||
# Set default delay between all actions (in seconds)
|
||||
computer.interface.delay = 0.5 # 500ms delay between actions
|
||||
|
||||
# Or specify delay for individual actions
|
||||
await computer.interface.left_click(x, y, delay=1.0) # 1 second delay after click
|
||||
await computer.interface.type_text("Hello", delay=0.2) # 200ms delay after typing
|
||||
await computer.interface.press_key("enter", delay=0.5) # 500ms delay after key press
|
||||
|
||||
# Python Virtual Environment Operations
|
||||
await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment
|
||||
await computer.venv_cmd("demo_venv", "python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'") # Run a shell command in a virtual environment
|
||||
await computer.venv_exec("demo_venv", python_function_or_code, *args, **kwargs) # Run a Python function in a virtual environment and return the result / raise an exception
|
||||
|
||||
# Example: Use sandboxed functions to execute code in a Cua Container
|
||||
from computer.helpers import sandboxed
|
||||
|
||||
@sandboxed("demo_venv")
|
||||
def greet_and_print(name):
|
||||
"""Get the HTML of the current Safari tab"""
|
||||
import PyXA
|
||||
safari = PyXA.Application("Safari")
|
||||
html = safari.current_document.source()
|
||||
print(f"Hello from inside the container, {name}!")
|
||||
return {"greeted": name, "safari_html": html}
|
||||
|
||||
# When a @sandboxed function is called, it will execute in the container
|
||||
result = await greet_and_print("Cua")
|
||||
# Result: {"greeted": "Cua", "safari_html": "<html>...</html>"}
|
||||
# stdout and stderr are also captured and printed / raised
|
||||
print("Result from sandboxed function:", result)
|
||||
```
|
||||
66
docs/content/docs/home/computer-sdk/computers.mdx
Normal file
66
docs/content/docs/home/computer-sdk/computers.mdx
Normal file
@@ -0,0 +1,66 @@
|
||||
---
|
||||
title: C/ua Computers
|
||||
description: Understanding c/ua computer types and connection methods
|
||||
---
|
||||
|
||||
Before we can automate apps using AI, we need to first connect to a Computer Server to give the AI a safe environment to execute workflows in.
|
||||
|
||||
C/ua Computers are preconfigured virtual machines running the Computer Server. They can be either macOS, Linux, or Windows. They're found in either a cloud-native container, or on your host desktop.
|
||||
|
||||
# c/ua cloud container
|
||||
|
||||
This is a cloud container running the Computer Server. This is the easiest & safest way to get a c/ua computer, and can be done by going on the trycua.com website.
|
||||
|
||||
```python
|
||||
from computer import Computer
|
||||
|
||||
computer = Computer(
|
||||
os_type="linux",
|
||||
provider_type="cloud",
|
||||
name="your-container-name",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
await computer.run() # Connect to the container
|
||||
```
|
||||
|
||||
# c/ua local containers
|
||||
|
||||
c/ua provides local containers. This can be done using either the Lume CLI (macOS) or Docker CLI (Linux, Windows).
|
||||
|
||||
### Lume (macOS Only):
|
||||
1. Install lume cli
|
||||
```bash
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
```
|
||||
|
||||
2. Start a local c/ua container
|
||||
```bash
|
||||
lume run macos-sequoia-cua:latest
|
||||
```
|
||||
|
||||
3. Connect with Computer
|
||||
```python
|
||||
computer = Computer(
|
||||
os_type="macos",
|
||||
provider_type="lume",
|
||||
name="macos-sequoia-cua:latest"
|
||||
)
|
||||
|
||||
await computer.run() # Connect to the container
|
||||
```
|
||||
|
||||
# Your host desktop
|
||||
|
||||
You can also have agents control your desktop directly by running Computer Server without any containerization layer. Beware that AI models may perform risky actions.
|
||||
|
||||
```bash
|
||||
pip install cua-computer-server
|
||||
python -m computer-server
|
||||
```
|
||||
|
||||
Connect with:
|
||||
```python
|
||||
computer = Computer(use_host_computer_server=True)
|
||||
await computer.run() # Connect to the host desktop
|
||||
```
|
||||
9
docs/content/docs/home/computer-sdk/meta.json
Normal file
9
docs/content/docs/home/computer-sdk/meta.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"title": "Computer SDK",
|
||||
"description": "Build computer-using agents with the Computer SDK",
|
||||
"pages": [
|
||||
"computers",
|
||||
"commands",
|
||||
"sandboxed-python"
|
||||
]
|
||||
}
|
||||
49
docs/content/docs/home/computer-sdk/sandboxed-python.mdx
Normal file
49
docs/content/docs/home/computer-sdk/sandboxed-python.mdx
Normal file
@@ -0,0 +1,49 @@
|
||||
---
|
||||
title: Sandboxed Python
|
||||
slug: sandboxed-python
|
||||
---
|
||||
|
||||
You can run Python functions securely inside a sandboxed virtual environment on a remote C/ua Computer. This is useful for executing untrusted user code, isolating dependencies, or providing a safe environment for automation tasks.
|
||||
|
||||
## How It Works
|
||||
|
||||
The `sandboxed` decorator from the Computer SDK wraps a Python function so that it is executed remotely in a specified virtual environment on the target Computer. The function and its arguments are serialized, sent to the remote, and executed in isolation. Results or errors are returned to the caller.
|
||||
|
||||
## Example Usage
|
||||
|
||||
```python
|
||||
from computer import Computer
|
||||
from computer.helpers import sandboxed
|
||||
|
||||
@sandboxed()
|
||||
def read_file(location: str) -> str:
|
||||
"""Read contents of a file"""
|
||||
with open(location, 'r') as f:
|
||||
return f.read()
|
||||
|
||||
async def main():
|
||||
async with Computer(os_type="linux", provider_type="cloud", name="my-container", api_key="...") as computer:
|
||||
# Call the sandboxed function (runs remotely)
|
||||
result = await read_file("/etc/hostname")
|
||||
print(result)
|
||||
```
|
||||
|
||||
## Installing Python Packages
|
||||
|
||||
You can specify the virtual environment name and target computer:
|
||||
|
||||
```python
|
||||
@sandboxed(venv_name="myenv", computer=my_computer, max_retries=5)
|
||||
def my_function(...):
|
||||
...
|
||||
```
|
||||
|
||||
You can also install packages in the virtual environment using the `venv_install` method:
|
||||
|
||||
```python
|
||||
await my_computer.venv_install("myenv", ["requests"])
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
If the remote execution fails, the decorator will retry up to `max_retries` times. If all attempts fail, the last exception is raised locally.
|
||||
@@ -1,136 +0,0 @@
|
||||
---
|
||||
title: FAQ
|
||||
description: Find answers to the most common issues or questions when using Cua tools.
|
||||
icon: CircleQuestionMark
|
||||
---
|
||||
|
||||
### Why a local sandbox?
|
||||
|
||||
A local sandbox is a dedicated environment that is isolated from the rest of the system. As AI agents rapidly evolve towards 70-80% success rates on average tasks, having a controlled and secure environment becomes crucial. Cua's Computer-Use AI agents run in a local sandbox to ensure reliability, safety, and controlled execution.
|
||||
|
||||
Benefits of using a local sandbox rather than running the Computer-Use AI agent in the host system:
|
||||
|
||||
- **Reliability**: The sandbox provides a reproducible environment - critical for benchmarking and debugging agent behavior. Frameworks like [OSWorld](https://github.com/xlang-ai/OSWorld), [Simular AI](https://github.com/simular-ai/Agent-S), Microsoft's [OmniTool](https://github.com/microsoft/OmniParser/tree/master/omnitool), [WindowsAgentArena](https://github.com/microsoft/WindowsAgentArena) and more are using Computer-Use AI agents running in local sandboxes.
|
||||
- **Safety & Isolation**: The sandbox is isolated from the rest of the system, protecting sensitive data and system resources. As CUA agent capabilities grow, this isolation becomes increasingly important for preventing potential safety breaches.
|
||||
- **Control**: The sandbox can be easily monitored and terminated if needed, providing oversight for autonomous agent operation.
|
||||
|
||||
### Where are the sandbox images stored?
|
||||
|
||||
Sandbox are stored in `~/.lume`, and cached images are stored in `~/.lume/cache`.
|
||||
|
||||
### Which image is Computer using?
|
||||
|
||||
Computer uses an optimized macOS image for Computer-Use interactions, with pre-installed apps and settings for optimal performance.
|
||||
The image is available on our [ghcr registry](https://github.com/orgs/trycua/packages/container/package/macos-sequoia-cua).
|
||||
|
||||
### Are Sandbox disks taking up all the disk space?
|
||||
|
||||
No, macOS uses sparse files, which only allocate space as needed. For example, VM disks totaling 50 GB may only use 20 GB on disk.
|
||||
|
||||
### How do I delete a VM?
|
||||
|
||||
```bash
|
||||
lume delete <name>
|
||||
```
|
||||
|
||||
### How do I fix EasyOCR `[SSL: CERTIFICATE_VERIFY_FAILED]` errors?
|
||||
|
||||
**Symptom:**
|
||||
When running an agent that uses OCR (e.g., with `AgentLoop.OMNI`), you might encounter an error during the first run or initialization phase that includes:
|
||||
|
||||
```
|
||||
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)
|
||||
```
|
||||
|
||||
**Cause:**
|
||||
This usually happens when EasyOCR attempts to download its language models over HTTPS for the first time. Python's SSL module cannot verify the server's certificate because it can't locate the necessary root Certificate Authority (CA) certificates in your environment's trust store.
|
||||
|
||||
**Solution:**
|
||||
You need to explicitly tell Python where to find a trusted CA bundle. The `certifi` package provides one. Before running your Python agent script **the first time it needs to download models**, set the following environment variables in the _same terminal session_:
|
||||
|
||||
```bash
|
||||
# Ensure certifi is installed: pip show certifi
|
||||
export SSL_CERT_FILE=$(python -m certifi)
|
||||
export REQUESTS_CA_BUNDLE=$(python -m certifi)
|
||||
|
||||
# Now run your Python script that uses the agent...
|
||||
# python your_agent_script.py
|
||||
```
|
||||
|
||||
This directs Python to use the CA bundle provided by `certifi` for SSL verification. **Note:** Once EasyOCR has successfully downloaded its models, you typically do not need to set these environment variables before every subsequent run.
|
||||
|
||||
### How do I troubleshoot the agent failing to get the VM IP address or getting stuck on "VM status changed to: stopped"?
|
||||
|
||||
**Symptom:**
|
||||
When running your agent script (e.g., using `Computer().run(...)`), the script might hang during the VM startup phase, logging messages like:
|
||||
|
||||
- `Waiting for VM to be ready...`
|
||||
- `VM status changed to: stopped (after 0.0s)`
|
||||
- `Still waiting for VM IP address... (elapsed: XX.Xs)`
|
||||
- Eventually, it might time out, or you might notice the VM window never appears or closes quickly.
|
||||
|
||||
**Cause:**
|
||||
This is typically due to known instability issues with the `lume serve` background daemon process, as documented in the main `README.md`:
|
||||
|
||||
1. **`lume serve` Crash:** The `lume serve` process might terminate unexpectedly shortly after launch or when the script tries to interact with it. If it's not running, the script cannot get VM status updates or the IP address.
|
||||
2. **Incorrect Status Reporting:** Even if `lume serve` is running, its API sometimes incorrectly reports the VM status as `stopped` immediately after startup is initiated. While the underlying `Computer` library tries to poll and wait for the correct `running` status, this initial incorrect report can cause delays or failures if the status doesn't update correctly within the timeout or if `lume serve` crashes during the polling.
|
||||
|
||||
**Troubleshooting Steps:**
|
||||
|
||||
1. **Check `lume serve`:** Is the `lume serve` process still running in its terminal? Did it print any errors or exit? If it's not running, stop your agent script (`Ctrl+C`) and proceed to step 2.
|
||||
2. **Force Cleanup:** Before _every_ run, perform a rigorous cleanup to ensure no old `lume` processes or VM states interfere. Open a **new terminal** and run:
|
||||
|
||||
```bash
|
||||
# Stop any running Lume VM gracefully first (replace <vm_name> if needed)
|
||||
lume stop macos-sequoia-cua_latest
|
||||
|
||||
# Force kill lume serve and related processes
|
||||
pkill -f "lume serve"
|
||||
pkill -9 -f "lume"
|
||||
pkill -9 -f "VzVirtualMachine" # Kills underlying VM process
|
||||
|
||||
# Optional: Verify they are gone
|
||||
# ps aux | grep -E 'lume|VzVirtualMachine' | grep -v grep
|
||||
```
|
||||
|
||||
3. **Restart Sequence:**
|
||||
- **Terminal 1:** Start `lume serve` cleanly:
|
||||
```bash
|
||||
lume serve
|
||||
```
|
||||
_(Watch this terminal to ensure it stays running)._
|
||||
- **Terminal 2:** Run your agent script (including the `export SSL_CERT_FILE...` commands if _first time_ using OCR):
|
||||
```bash
|
||||
# export SSL_CERT_FILE=$(python -m certifi) # Only if first run with OCR
|
||||
# export REQUESTS_CA_BUNDLE=$(python -m certifi) # Only if first run with OCR
|
||||
python your_agent_script.py
|
||||
```
|
||||
4. **Retry:** Due to the intermittent nature of the Lume issues, sometimes simply repeating steps 2 and 3 allows the run to succeed if the timing avoids the status reporting bug or the `lume serve` crash.
|
||||
|
||||
**Related Issue: "No route to host" Error (macOS Sequoia+)**
|
||||
|
||||
- **Symptom:** Even if the `Computer` library logs show the VM has obtained an IP address, you might encounter connection errors like `No route to host` when the agent tries to connect to the internal server, especially when running the agent script from within an IDE (like VS Code or Cursor).
|
||||
- **Cause:** This is often due to macOS Sequoia's enhanced local network privacy controls. Applications need explicit permission to access the local network, which includes communicating with the VM.
|
||||
- **Solution:** Grant "Local Network" access to the application you are running the script from (e.g., your IDE or terminal application). Go to **System Settings > Privacy & Security > Local Network**, find your application in the list, and toggle the switch ON. You might need to trigger a connection attempt from the application first for it to appear in the list. See [GitHub Issue #61](https://github.com/trycua/cua/issues/61) for more details and discussion.
|
||||
|
||||
**Note:** Improving the stability of `lume serve` is an ongoing development area.
|
||||
|
||||
### How do I troubleshoot Computer not connecting to lume daemon?
|
||||
|
||||
If you're experiencing connection issues between Computer and the lume daemon, it could be because the port 7777 (used by lume) is already in use by an orphaned process. You can diagnose this issue with:
|
||||
|
||||
```bash
|
||||
sudo lsof -i :7777
|
||||
```
|
||||
|
||||
This command will show all processes using port 7777. If you see a lume process already running, you can terminate it with:
|
||||
|
||||
```bash
|
||||
kill <PID>
|
||||
```
|
||||
|
||||
Where `<PID>` is the process ID shown in the output of the `lsof` command. After terminating the process, run `lume serve` again to start the lume daemon.
|
||||
|
||||
### What information does Cua track?
|
||||
|
||||
Cua tracks anonymized usage and error report statistics; we ascribe to Posthog's approach as detailed [here](https://posthog.com/blog/open-source-telemetry-ethical). If you would like to opt out of sending anonymized info, you can set `telemetry_enabled` to false in the Computer or Agent constructor. Check out our [telemetry](./telemetry) documentation for more details.
|
||||
@@ -1,51 +0,0 @@
|
||||
---
|
||||
title: Computer-Use Agent Quickstart
|
||||
description: Launch a computer-use agent UI interface with Docker, Dev Container, or Python.
|
||||
---
|
||||
|
||||
## Docker
|
||||
|
||||
_Best for a simple, fully managed installation for testing and experimentation._
|
||||
|
||||
**macOS/Linux/Windows (via WSL):**
|
||||
|
||||
Run the following command to setup the Docker containers and launch the Computer-Use Agent UI:
|
||||
|
||||
```bash
|
||||
# Requires Docker
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground-docker.sh)"
|
||||
```
|
||||
|
||||
## Dev Container
|
||||
|
||||
_Best for contributors and active development._
|
||||
|
||||
Visit the [Dev Container](./dev-container-setup) guide to use the configuration that simplifies development setup to a few steps.
|
||||
|
||||
## PyPI
|
||||
|
||||
_Direct Python package installation_
|
||||
|
||||
```bash
|
||||
# conda create -yn cua python==3.12
|
||||
|
||||
pip install -U "cua-computer[all]" "cua-agent[all]"
|
||||
python -m agent.ui # Start the agent UI
|
||||
```
|
||||
|
||||
Or check out the [Usage Guide](./cua-usage-guide) to learn how to use our Python SDK in your own code.
|
||||
|
||||
---
|
||||
|
||||
# Supported [Agent Loops](../libraries/agent#agent-loops)
|
||||
|
||||
- [UITARS-1.5](https://github.com/bytedance/UI-TARS) - Run locally on Apple Silicon with MLX, or use cloud providers
|
||||
- [OpenAI CUA](https://openai.com/index/computer-using-agent/) - Use OpenAI's Computer-Use Preview model
|
||||
- [Anthropic CUA](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/computer-use-tool) - Use Anthropic's Computer-Use capabilities
|
||||
- [OmniParser-v2.0](https://github.com/microsoft/OmniParser) - Control UI with [Set-of-Marks prompting](https://som-gpt4v.github.io/) using any vision model
|
||||
|
||||
---
|
||||
|
||||
# Compatibility
|
||||
|
||||
For detailed compatibility information including host OS support, VM emulation capabilities, and model provider compatibility, see the [Compatibility Guide](../compatibility).
|
||||
@@ -1,83 +0,0 @@
|
||||
---
|
||||
title: Cua Usage Guide
|
||||
descrption: Follow these steps to use Cua in your own Python code.
|
||||
---
|
||||
|
||||
import { Step, Steps } from 'fumadocs-ui/components/steps';
|
||||
|
||||
<Steps>
|
||||
<Step>
|
||||
|
||||
### Install the Lume CLI
|
||||
|
||||
```bash
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
```
|
||||
|
||||
Lume CLI manages high-performance macOS/Linux VMs with near-native speed on Apple Silicon.
|
||||
|
||||
</Step>
|
||||
|
||||
<Step>
|
||||
|
||||
### Pull the macOS CUA Image
|
||||
|
||||
```bash
|
||||
lume pull macos-sequoia-cua:latest
|
||||
```
|
||||
|
||||
The macOS CUA image contains the default macOS apps and the Computer Server for easy automation.
|
||||
|
||||
</Step>
|
||||
<Step>
|
||||
|
||||
### Install the Python SDK
|
||||
|
||||
```bash
|
||||
pip install "cua-computer[all]" "cua-agent[all]"
|
||||
```
|
||||
|
||||
</Step>
|
||||
<Step>
|
||||
|
||||
### Integrate with Your Own Projects
|
||||
|
||||
```python
|
||||
from computer import Computer
|
||||
from agent import ComputerAgent, LLM
|
||||
|
||||
async def main():
|
||||
# Start a local macOS VM
|
||||
computer = Computer(os_type="macos")
|
||||
await computer.run()
|
||||
|
||||
# Or with Cua Cloud Container
|
||||
computer = Computer(
|
||||
os_type="linux",
|
||||
api_key="your_cua_api_key_here",
|
||||
name="your_container_name_here"
|
||||
)
|
||||
|
||||
# Example: Direct control of a macOS VM with Computer
|
||||
computer.interface.delay = 0.1 # Wait 0.1 seconds between kb/m actions
|
||||
await computer.interface.left_click(100, 200)
|
||||
await computer.interface.type_text("Hello, world!")
|
||||
screenshot_bytes = await computer.interface.screenshot()
|
||||
|
||||
# Example: Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop="uitars",
|
||||
model=LLM(provider="mlxvlm", name="mlx-community/UI-TARS-1.5-7B-6bit")
|
||||
)
|
||||
async for result in agent.run("Find the trycua/cua repository on GitHub and follow the quick start guide"):
|
||||
print(result)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
For ready-to-use examples, check out our [Notebooks](https://github.com/trycua/cua/tree/main/notebooks) collection.
|
||||
|
||||
</Step>
|
||||
</Steps>
|
||||
@@ -1,82 +0,0 @@
|
||||
---
|
||||
title: Dev Container Setup
|
||||
description: Learn how to set up the Dev Container configuration that simplifies the development setup.
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||

|
||||
|
||||
1. **Install the Dev Containers extension ([VSCode](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) or [WindSurf](https://docs.windsurf.com/windsurf/advanced#dev-containers-beta))**
|
||||
2. **Open the repository in the Dev Container:**
|
||||
|
||||
- Press `Ctrl+Shift+P` (or `⌘+Shift+P` on macOS)
|
||||
- **If you have _not_ cloned the repo:**
|
||||
|
||||
- Select `Dev Containers: Clone Repository in Container Volume...` and paste the repository URL:
|
||||
|
||||
```
|
||||
https://github.com/trycua/cua.git
|
||||
```
|
||||
|
||||
- **If you have already cloned the repo:** - Select `Dev Containers: Open Folder in Container...` and choose your local folder.
|
||||
<Callout title="Windsurf Caveats">
|
||||
The post install hook might not run automatically if you're using
|
||||
Windsurf. If it didn't run, execute it manually:
|
||||
<pre>
|
||||
<code>/bin/bash .devcontainer/post-install.sh</code>
|
||||
</pre>
|
||||
</Callout>
|
||||
|
||||
3. **Open the VS Code workspace:** Once the post-install.sh is done running, open the python workspace located at `.vscode/py.code-workspace`.
|
||||
4. **Run the Agent UI example:** Click <img src="https://github.com/user-attachments/assets/7a61ef34-4b22-4dab-9864-f86bf83e290b" className='inline-block mt-1 mb-1 rounded-md mx-1'/>
|
||||
to start the Gradio UI. If prompted to install **debugpy (Python Debugger)** for remote debugging, select 'Yes' to proceed.
|
||||
5. **Access the Gradio UI:** The Gradio UI will now be accessible http://localhost:7860.
|
||||
|
||||
## What's Included
|
||||
|
||||
The dev container automatically:
|
||||
|
||||
- ✅ Sets up Python 3.11 environment
|
||||
- ✅ Installs all system dependencies (build tools, OpenGL, etc.)
|
||||
- ✅ Configures Python paths for all packages
|
||||
- ✅ Installs Python extensions (Black, Ruff, Pylance)
|
||||
- ✅ Forwards port 7860 for the Gradio web UI
|
||||
- ✅ Mounts your source code for live editing
|
||||
- ✅ Creates the required `.env.local` file
|
||||
|
||||
## Running Examples
|
||||
|
||||
After the container is built, you can run examples directly:
|
||||
|
||||
```bash
|
||||
# Run the agent UI (Gradio web interface)
|
||||
python examples/agent_ui_examples.py
|
||||
|
||||
# Run computer examples
|
||||
python examples/computer_examples.py
|
||||
|
||||
# Run computer UI examples
|
||||
python examples/computer_ui_examples.py
|
||||
```
|
||||
|
||||
The Gradio UI will be available at `http://localhost:7860` and will automatically forward to your host machine.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
You'll need to add your API keys to `.env.local`:
|
||||
|
||||
```bash
|
||||
# Required for Anthropic provider
|
||||
ANTHROPIC_API_KEY=your_anthropic_key_here
|
||||
|
||||
# Required for OpenAI provider
|
||||
OPENAI_API_KEY=your_openai_key_here
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- The container connects to `host.docker.internal:7777` for Lume server communication
|
||||
- All Python packages are pre-installed and configured
|
||||
- Source code changes are reflected immediately (no rebuild needed)
|
||||
- The container uses the same Dockerfile as the regular Docker development environment
|
||||
@@ -1,303 +0,0 @@
|
||||
---
|
||||
title: Developer Guide
|
||||
description: Set up development for the Cua open source repository.
|
||||
---
|
||||
|
||||
import { GithubInfo } from 'fumadocs-ui/components/github-info';
|
||||
|
||||
## Project Structure
|
||||
|
||||
<GithubInfo owner="trycua" repo="cua" token={process.env.GITHUB_TOKEN} />
|
||||
|
||||
The project is organized as a monorepo with these main packages:
|
||||
|
||||
### Python
|
||||
|
||||
- `libs/python/core/` - Base package with telemetry support
|
||||
- `libs/python/computer/` - Computer-use interface (CUI) library
|
||||
- `libs/python/agent/` - AI agent library with multi-provider support
|
||||
- `libs/python/som/` - Set-of-Mark parser
|
||||
- `libs/python/computer-server/` - Server component for VM
|
||||
- `libs/python/pylume/` - Python bindings for Lume
|
||||
|
||||
### TypeScript
|
||||
|
||||
- `libs/typescript/computer/` - Computer-use interface (CUI) library
|
||||
- `libs/typescript/agent/` - AI agent library with multi-provider support
|
||||
|
||||
### Other
|
||||
|
||||
- `libs/lume/` - Lume CLI
|
||||
|
||||
Each package has its own virtual environment and dependencies, managed through PDM.
|
||||
|
||||
## Local Development Setup
|
||||
|
||||
1. Install Lume CLI:
|
||||
|
||||
```bash
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
```
|
||||
|
||||
2. Clone the repository:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/trycua/cua.git
|
||||
cd cua
|
||||
```
|
||||
|
||||
3. Create a `.env.local` file in the root directory with your API keys:
|
||||
|
||||
```bash
|
||||
# Required for Anthropic provider
|
||||
ANTHROPIC_API_KEY=your_anthropic_key_here
|
||||
|
||||
# Required for OpenAI provider
|
||||
OPENAI_API_KEY=your_openai_key_here
|
||||
```
|
||||
|
||||
4. Open the workspace in VSCode or Cursor:
|
||||
|
||||
```bash
|
||||
# For Cua Python development
|
||||
code .vscode/py.code-workspace
|
||||
|
||||
# For Lume (Swift) development
|
||||
code .vscode/lume.code-workspace
|
||||
```
|
||||
|
||||
Using the workspace file is strongly recommended as it:
|
||||
|
||||
- Sets up correct Python environments for each package
|
||||
- Configures proper import paths
|
||||
- Enables debugging configurations
|
||||
- Maintains consistent settings across packages
|
||||
|
||||
## Lume Development
|
||||
|
||||
Refer to the [Lume README](../libs/lume/docs/Development.md) for instructions on how to develop the Lume CLI.
|
||||
|
||||
## Python Development
|
||||
|
||||
There are two ways to install Lume:
|
||||
|
||||
### Run the build script
|
||||
|
||||
Run the build script to set up all packages:
|
||||
|
||||
```bash
|
||||
./scripts/build.sh
|
||||
```
|
||||
|
||||
The build script creates a shared virtual environment for all packages. The workspace configuration automatically handles import paths with the correct Python path settings.
|
||||
|
||||
This will:
|
||||
|
||||
- Create a virtual environment for the project
|
||||
- Install all packages in development mode
|
||||
- Set up the correct Python path
|
||||
- Install development tools
|
||||
|
||||
### Install with PDM
|
||||
|
||||
If PDM is not already installed, you can follow the installation instructions [here](https://pdm-project.org/en/latest/#installation).
|
||||
|
||||
To install with PDM, simply run:
|
||||
|
||||
```console
|
||||
pdm install -G:all
|
||||
```
|
||||
|
||||
This installs all the dependencies for development, testing, and building the docs. If you'd only like development dependencies, you can run:
|
||||
|
||||
```console
|
||||
pdm install -d
|
||||
```
|
||||
|
||||
## Running Examples
|
||||
|
||||
The Python workspace includes launch configurations for all packages:
|
||||
|
||||
- "Run Computer Examples" - Runs computer examples
|
||||
- "Run Computer API Server" - Runs the computer-server
|
||||
- "Run Agent Examples" - Runs agent examples
|
||||
- "SOM" configurations - Various settings for running SOM
|
||||
|
||||
To run examples from VSCode / Cursor:
|
||||
|
||||
1. Press F5 or use the Run/Debug view
|
||||
2. Select the desired configuration
|
||||
|
||||
The workspace also includes compound launch configurations:
|
||||
|
||||
- "Run Computer Examples + Server" - Runs both the Computer Examples and Server simultaneously
|
||||
|
||||
## Docker Development Environment
|
||||
|
||||
As an alternative to installing directly on your host machine, you can use Docker for development. This approach has several advantages:
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Docker installed on your machine
|
||||
- Lume server running on your host (port 7777): `lume serve`
|
||||
|
||||
### Setup and Usage
|
||||
|
||||
1. Build the development Docker image:
|
||||
|
||||
```bash
|
||||
./scripts/run-docker-dev.sh build
|
||||
```
|
||||
|
||||
2. Run an example in the container:
|
||||
|
||||
```bash
|
||||
./scripts/run-docker-dev.sh run computer_examples.py
|
||||
```
|
||||
|
||||
3. Get an interactive shell in the container:
|
||||
|
||||
```bash
|
||||
./scripts/run-docker-dev.sh run --interactive
|
||||
```
|
||||
|
||||
4. Stop any running containers:
|
||||
|
||||
```bash
|
||||
./scripts/run-docker-dev.sh stop
|
||||
```
|
||||
|
||||
### How it Works
|
||||
|
||||
The Docker development environment:
|
||||
|
||||
- Installs all required Python dependencies in the container
|
||||
- Mounts your source code from the host at runtime
|
||||
- Automatically configures the connection to use host.docker.internal:7777 for accessing the Lume server on your host machine
|
||||
- Preserves your code changes without requiring rebuilds (source code is mounted as a volume)
|
||||
|
||||
> **Note**: The Docker container doesn't include the macOS-specific Lume executable. Instead, it connects to the Lume server running on your host machine via host.docker.internal:7777. Make sure to start the Lume server on your host before running examples in the container.
|
||||
|
||||
## Cleanup and Reset
|
||||
|
||||
If you need to clean up the environment (non-docker) and start fresh:
|
||||
|
||||
```bash
|
||||
./scripts/cleanup.sh
|
||||
```
|
||||
|
||||
This will:
|
||||
|
||||
- Remove all virtual environments
|
||||
- Clean Python cache files and directories
|
||||
- Remove build artifacts
|
||||
- Clean PDM-related files
|
||||
- Reset environment configurations
|
||||
|
||||
## Code Formatting Standards
|
||||
|
||||
The cua project follows strict code formatting standards to ensure consistency across all packages.
|
||||
|
||||
### Python Code Formatting
|
||||
|
||||
#### Tools
|
||||
|
||||
The project uses the following tools for code formatting and linting:
|
||||
|
||||
- **[Black](https://black.readthedocs.io/)**: Code formatter
|
||||
- **[Ruff](https://beta.ruff.rs/docs/)**: Fast linter and formatter
|
||||
- **[MyPy](https://mypy.readthedocs.io/)**: Static type checker
|
||||
|
||||
These tools are automatically installed when you set up the development environment using the `./scripts/build.sh` script.
|
||||
|
||||
#### Configuration
|
||||
|
||||
The formatting configuration is defined in the root `pyproject.toml` file:
|
||||
|
||||
```toml
|
||||
[tool.black]
|
||||
line-length = 100
|
||||
target-version = ["py311"]
|
||||
|
||||
[tool.ruff]
|
||||
line-length = 100
|
||||
target-version = "py311"
|
||||
select = ["E", "F", "B", "I"]
|
||||
fix = true
|
||||
|
||||
[tool.ruff.format]
|
||||
docstring-code-format = true
|
||||
|
||||
[tool.mypy]
|
||||
strict = true
|
||||
python_version = "3.11"
|
||||
ignore_missing_imports = true
|
||||
disallow_untyped_defs = true
|
||||
check_untyped_defs = true
|
||||
warn_return_any = true
|
||||
show_error_codes = true
|
||||
warn_unused_ignores = false
|
||||
```
|
||||
|
||||
#### Key Formatting Rules
|
||||
|
||||
- **Line Length**: Maximum of 100 characters
|
||||
- **Python Version**: Code should be compatible with Python 3.11+
|
||||
- **Imports**: Automatically sorted (using Ruff's "I" rule)
|
||||
- **Type Hints**: Required for all function definitions (strict mypy mode)
|
||||
|
||||
#### IDE Integration
|
||||
|
||||
The repository includes VSCode workspace configurations that enable automatic formatting. When you open the workspace files (as recommended in the setup instructions), the correct formatting settings are automatically applied.
|
||||
|
||||
Python-specific settings in the workspace files:
|
||||
|
||||
```json
|
||||
"[python]": {
|
||||
"editor.formatOnSave": true,
|
||||
"editor.defaultFormatter": "ms-python.black-formatter",
|
||||
"editor.codeActionsOnSave": {
|
||||
"source.organizeImports": "explicit"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Recommended VS Code extensions:
|
||||
|
||||
- Black Formatter (ms-python.black-formatter)
|
||||
- Ruff (charliermarsh.ruff)
|
||||
- Pylance (ms-python.vscode-pylance)
|
||||
|
||||
#### Manual Formatting
|
||||
|
||||
To manually format code:
|
||||
|
||||
```bash
|
||||
# Format all Python files using Black
|
||||
pdm run black .
|
||||
|
||||
# Run Ruff linter with auto-fix
|
||||
pdm run ruff check --fix .
|
||||
|
||||
# Run type checking with MyPy
|
||||
pdm run mypy .
|
||||
```
|
||||
|
||||
#### Pre-commit Validation
|
||||
|
||||
Before submitting a pull request, ensure your code passes all formatting checks:
|
||||
|
||||
```bash
|
||||
# Run all checks
|
||||
pdm run black --check .
|
||||
pdm run ruff check .
|
||||
pdm run mypy .
|
||||
```
|
||||
|
||||
### Swift Code (Lume)
|
||||
|
||||
For Swift code in the `libs/lume` directory:
|
||||
|
||||
- Follow the [Swift API Design Guidelines](https://www.swift.org/documentation/api-design-guidelines/)
|
||||
- Use SwiftFormat for consistent formatting
|
||||
- Code will be automatically formatted on save when using the lume workspace
|
||||
@@ -1,5 +0,0 @@
|
||||
{
|
||||
"title": "Guides",
|
||||
"description": "Guides",
|
||||
"icon": "BookCopy"
|
||||
}
|
||||
@@ -7,85 +7,59 @@ import { buttonVariants } from 'fumadocs-ui/components/ui/button';
|
||||
import { cn } from 'fumadocs-ui/utils/cn';
|
||||
import { ChevronRight } from 'lucide-react';
|
||||
|
||||
## What is Cua?
|
||||
# Welcome!
|
||||
|
||||
Cua is a collection of cross-platform libraries and tools for building Computer-Use AI agents.
|
||||
c/ua is a framework for automating Windows, Mac, and Linux apps powered by computer-using agents (CUAs).
|
||||
|
||||
## Quick Start
|
||||
c/ua makes every stage of computer-using agent development simple:
|
||||
|
||||
<Cards>
|
||||
<Card
|
||||
href="./home/guides/computer-use-agent-quickstart"
|
||||
title="Computer-Use Agent UI">
|
||||
Read our guide on getting started with a Computer-Use Agent.
|
||||
</Card>
|
||||
- **Development**: Use any LLM provider with liteLLM. The agent SDK makes multiple agent loop providers, trajectory tracing, caching, and budget management easy
|
||||
- **Containerization**: c/ua offers Docker containers pre-installed with everything needed for AI-powered RPA
|
||||
- **Deployment**: c/ua cloud gives you a production-ready cloud environment for your assistants
|
||||
|
||||
<Card href="./home/guides/cua-usage-guide" title="Cua Usage Guide">
|
||||
Get started using Cua services on your machine.
|
||||
</Card>
|
||||
|
||||
<Card href="./home/guides/dev-container-setup" title="Dev Container Setup">
|
||||
Set up a development environment with the Dev Container.
|
||||
</Card>
|
||||
|
||||
</Cards>
|
||||
|
||||
---
|
||||
|
||||
<Callout type="info">
|
||||
**Need detailed API documentation?**
|
||||
<span className="w-full">
|
||||
Explore the complete API reference with detailed class documentation, and
|
||||
method signatures.
|
||||
</span>
|
||||
<a
|
||||
href="/api"
|
||||
className={cn(
|
||||
buttonVariants({
|
||||
color: 'secondary',
|
||||
}),
|
||||
'no-underline h-10'
|
||||
)}>
|
||||
View API Reference
|
||||
<ChevronRight size={18} />
|
||||
</a>
|
||||
</Callout>
|
||||
|
||||
## Resources
|
||||
|
||||
- [How to use the MCP Server with Claude Desktop or other MCP clients](./libraries/mcp-server) - One of the easiest ways to get started with Cua
|
||||
- [How to use OpenAI Computer-Use, Anthropic, OmniParser, or UI-TARS for your Computer-Use Agent](./libraries/agent)
|
||||
- [How to use Lume CLI for managing desktops](./libraries/lume)
|
||||
- [Training Computer-Use Models: Collecting Human Trajectories with Cua (Part 1)](https://www.trycua.com/blog/training-computer-use-models-trajectories-1)
|
||||
- [Build Your Own Operator on macOS (Part 1)](https://www.trycua.com/blog/build-your-own-operator-on-macos-1)
|
||||
|
||||
## Modules
|
||||
|
||||
| Module | Description | Installation |
|
||||
| ------------------------------------------------------ | -------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
|
||||
| [**Lume**](./libraries/lume.mdx) | VM management for macOS/Linux using Apple's Virtualization.Framework | `curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh \| bash` |
|
||||
| [**Lumier**](./libraries/lumier.mdx) | Docker interface for macOS and Linux VMs | `docker pull trycua/lumier:latest` |
|
||||
| [**Computer**](./libraries/computer.mdx) | Python Interface for controlling virtual machines | `pip install "cua-computer[all]"`<br/><br/>`npm install @trycua/computer` |
|
||||
| [**Agent**](./libraries/agent.mdx) | AI agent framework for automating tasks | `pip install "cua-agent[all]"` |
|
||||
| [**MCP Server**](./libraries/mcp-server.mdx) | MCP server for using CUA with Claude Desktop | `pip install cua-mcp-server` |
|
||||
| [**SOM**](./libs/python/som/README.md) | Self-of-Mark library for Agent | `pip install cua-som` |
|
||||
| [**Computer Server**](./libraries/computer-server.mdx) | Server component for Computer | `pip install cua-computer-server` |
|
||||
| [**Core**](./libraries/core.mdx) | Python Core utilities | `pip install cua-core`<br/><br/>`npm install @trycua/core` |
|
||||
|
||||
## Community
|
||||
|
||||
Join our [Discord community](https://discord.com/invite/mVnXXpdE85) to discuss ideas, get assistance, or share your demos!
|
||||
|
||||
## License
|
||||
|
||||
Cua is open-sourced under the MIT License - see the [LICENSE](https://github.com/trycua/cua/blob/main/LICENSE.md) file for details.
|
||||
|
||||
Microsoft's OmniParser, which is used in this project, is licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0) - see the [OmniParser LICENSE](https://github.com/microsoft/OmniParser/blob/master/LICENSE) file for details.
|
||||
|
||||
## Contributing
|
||||
|
||||
We welcome contributions to CUA! Please refer to our [Contributing Guidelines](https://github.com/trycua/cua/blob/main/CONTRIBUTING.md) for details.
|
||||
|
||||
## Trademarks
|
||||
|
||||
Apple, macOS, and Apple Silicon are trademarks of Apple Inc. Ubuntu and Canonical are registered trademarks of Canonical Ltd. Microsoft is a registered trademark of Microsoft Corporation. This project is not affiliated with, endorsed by, or sponsored by Apple Inc., Canonical Ltd., or Microsoft Corporation.
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 gap-6 mt-8">
|
||||
<div className="border rounded-lg p-6">
|
||||
<h3 className="text-lg font-semibold mb-2">🖥️ Quickstart (UI)</h3>
|
||||
<p className="text-muted-foreground mb-4">Try the c/ua Agent UI in your browser—no coding required.</p>
|
||||
<a
|
||||
href="/home/quickstart-ui"
|
||||
className={cn(
|
||||
buttonVariants({ variant: 'default' }),
|
||||
'w-full'
|
||||
)}
|
||||
>
|
||||
Get Started (UI)
|
||||
<ChevronRight className="ml-2 h-4 w-4" />
|
||||
</a>
|
||||
</div>
|
||||
<div className="border rounded-lg p-6">
|
||||
<h3 className="text-lg font-semibold mb-2">💻 Quickstart (Developers)</h3>
|
||||
<p className="text-muted-foreground mb-4">Build with Python—full SDK and agent code examples.</p>
|
||||
<a
|
||||
href="/home/quickstart-devs"
|
||||
className={cn(
|
||||
buttonVariants({ variant: 'secondary' }),
|
||||
'w-full'
|
||||
)}
|
||||
>
|
||||
Get Started (Python)
|
||||
<ChevronRight className="ml-2 h-4 w-4" />
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
<div className="grid grid-cols-1 gap-6 mt-6">
|
||||
<div className="border rounded-lg p-6">
|
||||
<h3 className="text-lg font-semibold mb-2">📚 API Reference</h3>
|
||||
<p className="text-muted-foreground mb-4">Explore the agent SDK and APIs</p>
|
||||
<a
|
||||
href="/home/libraries/agent"
|
||||
className={cn(
|
||||
buttonVariants({ variant: 'outline' }),
|
||||
'w-full'
|
||||
)}
|
||||
>
|
||||
View API Reference
|
||||
<ChevronRight className="ml-2 h-4 w-4" />
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -1,158 +0,0 @@
|
||||
---
|
||||
title: Gradio UI with the Python Agent
|
||||
description: The agent module includes a Gradio-based user interface for easier interaction with Computer-Use Agent workflows.
|
||||
---
|
||||
|
||||
The agent includes a Gradio-based user interface for easier interaction.
|
||||
|
||||
<div align="center">
|
||||
<img src="/img/agent_gradio_ui.png" />
|
||||
</div>
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
# Install with Gradio support
|
||||
pip install "cua-agent[ui]"
|
||||
```
|
||||
|
||||
## Create a simple launcher script
|
||||
|
||||
```python
|
||||
# launch_ui.py
|
||||
from agent.ui.gradio.app import create_gradio_ui
|
||||
|
||||
app = create_gradio_ui()
|
||||
app.launch(share=False)
|
||||
```
|
||||
|
||||
### Run the launcher
|
||||
|
||||
```bash
|
||||
python launch_ui.py
|
||||
```
|
||||
|
||||
This will start the Gradio interface on `http://localhost:7860`.
|
||||
|
||||
## Features
|
||||
|
||||
The Gradio UI provides:
|
||||
|
||||
- **Model Selection**: Choose between different AI models and providers
|
||||
- **Task Input**: Enter tasks for the agent to execute
|
||||
- **Real-time Output**: View the agent's actions and results as they happen
|
||||
- **Screenshot Display**: See visual feedback from the computer screen
|
||||
- **Settings Management**: Configure and save your preferred settings
|
||||
|
||||
## Supported Providers
|
||||
|
||||
1. **OpenAI**: GPT-4 and GPT-4 Vision models
|
||||
2. **Anthropic**: Claude models
|
||||
3. **Ollama**: Local models like Gemma3
|
||||
4. **UI-TARS**: Specialized UI understanding models
|
||||
|
||||
### Using UI-TARS
|
||||
|
||||
UI-TARS is a specialized model for UI understanding tasks. You have two options:
|
||||
|
||||
1. **Local MLX UI-TARS**: For running the model locally on Apple Silicon
|
||||
|
||||
```bash
|
||||
# Install MLX support
|
||||
pip install "cua-agent[uitars-mlx]"
|
||||
pip install git+https://github.com/ddupont808/mlx-vlm.git@stable/fix/qwen2-position-id
|
||||
```
|
||||
|
||||
Then select "UI-TARS (MLX)" in the Gradio interface.
|
||||
|
||||
2. **OpenAI-compatible UI-TARS**: For using the original ByteDance model
|
||||
|
||||
- If you want to use the original ByteDance UI-TARS model via an OpenAI-compatible API, follow the [deployment guide](https://github.com/bytedance/UI-TARS/blob/main/README_deploy.md)
|
||||
- This will give you a provider URL like `https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1` which you can use in the code or Gradio UI:
|
||||
|
||||
```python
|
||||
agent = ComputerAgent(
|
||||
computer=macos_computer,
|
||||
loop=AgentLoop.UITARS,
|
||||
model=LLM(
|
||||
provider=LLMProvider.OAICOMPAT,
|
||||
name="ByteDance-Seed/UI-TARS-1.5-7B",
|
||||
provider_base_url="https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1"
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
Or in the Gradio UI, select "OpenAI Compatible" and enter:
|
||||
- Model Name: `ByteDance-Seed/UI-TARS-1.5-7B`
|
||||
- Base URL: Your deployment URL
|
||||
- API Key: Your API key (if required)
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Custom Provider Settings
|
||||
|
||||
You can configure custom providers in the UI:
|
||||
|
||||
1. Select "OpenAI Compatible" from the provider dropdown
|
||||
2. Enter your custom model name, base URL, and API key
|
||||
3. The settings will be saved for future sessions
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Set API keys as environment variables for security:
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY="your-openai-key"
|
||||
export ANTHROPIC_API_KEY="your-anthropic-key"
|
||||
export GROQ_API_KEY="your-groq-key"
|
||||
export DEEPSEEK_API_KEY="your-deepseek-key"
|
||||
export QWEN_API_KEY="your-qwen-key"
|
||||
```
|
||||
|
||||
Or use a `.env` file:
|
||||
|
||||
```bash
|
||||
# .env
|
||||
OPENAI_API_KEY=your-openai-key
|
||||
ANTHROPIC_API_KEY=your-anthropic-key
|
||||
# ... other keys
|
||||
```
|
||||
|
||||
## Settings Persistence
|
||||
|
||||
The Gradio UI automatically saves your settings to `.gradio_settings.json` in your working directory. This includes:
|
||||
|
||||
- Selected provider and model
|
||||
- Custom provider configurations (URLs and model names)
|
||||
- Other UI preferences
|
||||
|
||||
**Note**: API keys entered into the custom provider field are **not** saved in this file for security reasons. Manage API keys using environment variables (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`) or a `.env` file.
|
||||
|
||||
It's recommended to add `.gradio_settings.json` to your `.gitignore` file.
|
||||
|
||||
## Example Usage
|
||||
|
||||
Here's a complete example of using the Gradio UI with different providers:
|
||||
|
||||
```python
|
||||
# launch_ui_with_env.py
|
||||
from agent.ui.gradio.app import create_gradio_ui
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Create and launch the UI
|
||||
app = create_gradio_ui()
|
||||
app.launch(share=False, server_port=7860)
|
||||
```
|
||||
|
||||
Once launched, you can:
|
||||
|
||||
1. Select your preferred AI provider and model
|
||||
2. Enter a task like "Open a web browser and search for Python tutorials"
|
||||
3. Click "Run" to execute the task
|
||||
4. Watch the agent perform the actions in real-time
|
||||
5. View screenshots and logs of the execution
|
||||
|
||||
The UI makes it easy to experiment with different models and tasks without writing code for each interaction.
|
||||
@@ -1,266 +1,123 @@
|
||||
---
|
||||
title: Agent
|
||||
description: The Computer-Use framework for running multi-app agentic workflows targeting macOS, Linux, and Windows sandboxes.
|
||||
pypi: cua-computer
|
||||
macos: true
|
||||
windows: true
|
||||
linux: true
|
||||
description: Reference for the current version of the Agent library.
|
||||
github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/agent
|
||||
---
|
||||
|
||||
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
|
||||
import { cn } from 'fumadocs-ui/utils/cn';
|
||||
import { ChevronRight } from 'lucide-react';
|
||||
The Agent library provides the ComputerAgent class and tools for building AI agents that automate workflows on C/ua Computers.
|
||||
|
||||
**Agent** is a powerful Computer-Use framework that enables AI agents to interact with desktop applications and perform complex multi-step workflows across macOS, Linux, and Windows environments. Built on the Cua platform, it supports both local models (via Ollama) and cloud providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen).
|
||||
|
||||
## Installation
|
||||
|
||||
Install CUA Agent with pip. Choose the installation that matches your needs:
|
||||
|
||||
### All Providers (Recommended)
|
||||
|
||||
```bash
|
||||
# Install everything you need
|
||||
pip install "cua-agent[all]"
|
||||
```
|
||||
|
||||
### Selective Installation
|
||||
|
||||
```bash
|
||||
# OpenAI models (GPT-4, Computer Use Preview)
|
||||
pip install "cua-agent[openai]"
|
||||
|
||||
# Anthropic models (Claude 3.5 Sonnet)
|
||||
pip install "cua-agent[anthropic]"
|
||||
|
||||
# Local UI-TARS models
|
||||
pip install "cua-agent[uitars]"
|
||||
|
||||
# OmniParser + Ollama for local models
|
||||
pip install "cua-agent[omni]"
|
||||
|
||||
# Gradio web interface
|
||||
pip install "cua-agent[ui]"
|
||||
```
|
||||
|
||||
### Advanced: Local UI-TARS with MLX
|
||||
|
||||
```bash
|
||||
pip install "cua-agent[uitars-mlx]"
|
||||
pip install git+https://github.com/ddupont808/mlx-vlm.git@stable/fix/qwen2-position-id
|
||||
```
|
||||
|
||||
### Requirements
|
||||
|
||||
- Python 3.8+
|
||||
- macOS, Linux, or Windows
|
||||
- For cloud providers: API keys (OpenAI, Anthropic, etc.)
|
||||
- For local models: Sufficient RAM and compute resources
|
||||
|
||||
## Getting Started
|
||||
## Reference
|
||||
|
||||
### Basic Usage
|
||||
|
||||
Here's a simple example to get you started with CUA Agent. It instructs the agent to open a text editor and write "Hello World."
|
||||
|
||||
```python
|
||||
from cua_agent import ComputerAgent, AgentLoop, LLM, LLMProvider
|
||||
from cua_computer import Computer
|
||||
from agent2 import ComputerAgent
|
||||
from computer import Computer
|
||||
|
||||
# Set your API key
|
||||
import os
|
||||
os.environ["OPENAI_API_KEY"] = "your-api-key-here"
|
||||
|
||||
async with Computer() as computer:
|
||||
# Create agent with OpenAI
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop=AgentLoop.OPENAI,
|
||||
model=LLM(provider=LLMProvider.OPENAI)
|
||||
)
|
||||
|
||||
# Run a simple task
|
||||
async for result in agent.run("Open a text editor and write 'Hello, World!'"):
|
||||
print(result.get("text"))
|
||||
```
|
||||
|
||||
### Multi-Step Workflow
|
||||
|
||||
This example defines multiple tasks for the agent to complete:
|
||||
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
# Create agent with your preferred provider
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop=AgentLoop.OPENAI, # or ANTHROPIC, OMNI, UITARS
|
||||
model=LLM(provider=LLMProvider.OPENAI)
|
||||
)
|
||||
|
||||
# Define complex workflow
|
||||
tasks = [
|
||||
"Look for a repository named trycua/cua on GitHub.",
|
||||
"Check the open issues, open the most recent one and read it.",
|
||||
"Clone the repository in users/lume/projects if it doesn't exist yet.",
|
||||
"Open the repository with an app named Cursor.",
|
||||
"From Cursor, open Composer and write a task to help resolve the GitHub issue.",
|
||||
]
|
||||
|
||||
# Execute tasks sequentially
|
||||
for i, task in enumerate(tasks):
|
||||
print(f"\nExecuting task {i+1}/{len(tasks)}: {task}")
|
||||
async for result in agent.run(task):
|
||||
print(result.get("text"))
|
||||
print(f"✅ Task {i+1} completed")
|
||||
```
|
||||
|
||||
### Alternative Model Providers
|
||||
|
||||
You may use different models with the agent library -- below are a couple of alternatives that we already support.
|
||||
|
||||
```python
|
||||
# Anthropic Claude
|
||||
computer = Computer() # Connect to a c/ua container
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop=AgentLoop.ANTHROPIC,
|
||||
model=LLM(provider=LLMProvider.ANTHROPIC)
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer]
|
||||
)
|
||||
|
||||
# Local Ollama model
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop=AgentLoop.OMNI,
|
||||
model=LLM(provider=LLMProvider.OLLAMA, name="gemma3")
|
||||
)
|
||||
prompt = "open github, navigate to trycua/cua"
|
||||
|
||||
# UI-TARS model
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop=AgentLoop.UITARS,
|
||||
model=LLM(
|
||||
provider=LLMProvider.OAICOMPAT,
|
||||
name="ByteDance-Seed/UI-TARS-1.5-7B",
|
||||
provider_base_url="https://your-endpoint.com/v1"
|
||||
)
|
||||
)
|
||||
async for result in agent.run(prompt):
|
||||
print("Agent:", result["output"][-1]["content"][0]["text"])
|
||||
```
|
||||
|
||||
## Agent Loops
|
||||
|
||||
The `cua-agent` package provides three agent loops variations, based on different CUA models providers and techniques:
|
||||
|
||||
| Agent Loop | Supported Models | Description | Set-Of-Marks |
|
||||
| :-------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------- | :----------- |
|
||||
| `AgentLoop.OPENAI` | • `computer_use_preview` | Use OpenAI Operator CUA model | Not Required |
|
||||
| `AgentLoop.ANTHROPIC` | • `claude-3-5-sonnet-20240620`<br/>• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use | Not Required |
|
||||
| `AgentLoop.UITARS` | • `mlx-community/UI-TARS-1.5-7B-4bit` (default)<br/>• `mlx-community/UI-TARS-1.5-7B-6bit`<br/>• `ByteDance-Seed/UI-TARS-1.5-7B` (via openAI-compatible endpoint) | Uses UI-TARS models with MLXVLM (default) or OAICOMPAT providers | Not Required |
|
||||
| `AgentLoop.OMNI` | • `claude-3-5-sonnet-20240620`<br/>• `claude-3-7-sonnet-20250219`<br/>• `gpt-4.5-preview`<br/>• `gpt-4o`<br/>• `gpt-4`<br/>• `phi4`<br/>• `phi4-mini`<br/>• `gemma3`<br/>• `...`<br/>• `Any Ollama or OpenAI-compatible model` | Use OmniParser for element pixel-detection (SoM) and any VLMs for UI Grounding and Reasoning | OmniParser |
|
||||
|
||||
## Agent Response
|
||||
|
||||
The `AgentResponse` class represents the structured output returned after each agent turn. It contains the agent's response, reasoning, tool usage, and other metadata. The response format aligns with the new [OpenAI Agent SDK specification](https://platform.openai.com/docs/api-reference/responses) for better consistency across different agent loops.
|
||||
|
||||
```typescript
|
||||
interface AgentResponse {
|
||||
id: string;
|
||||
text: string;
|
||||
usage?: {
|
||||
input_tokens: number;
|
||||
input_tokens_details?: {
|
||||
text_tokens: number;
|
||||
image_tokens: number;
|
||||
};
|
||||
output_tokens: number;
|
||||
output_tokens_details?: {
|
||||
text_tokens: number;
|
||||
reasoning_tokens: number;
|
||||
};
|
||||
total_tokens: number;
|
||||
};
|
||||
tools?: Array<{
|
||||
name: string;
|
||||
description: string;
|
||||
}>;
|
||||
output?: Array<{
|
||||
type: 'reasoning' | 'computer_call';
|
||||
content?: string; // for reasoning type
|
||||
tool_name?: string; // for computer_call type
|
||||
parameters?: Record<string, any>; // for computer_call type
|
||||
result?: string; // for computer_call type
|
||||
}>;
|
||||
}
|
||||
```
|
||||
|
||||
### Example Usage
|
||||
|
||||
```python
|
||||
async for result in agent.run(task):
|
||||
print("Response ID: ", result.get("id"))
|
||||
|
||||
# Print detailed usage information
|
||||
usage = result.get("usage")
|
||||
if usage:
|
||||
print("\nUsage Details:")
|
||||
print(f" Input Tokens: {usage.get('input_tokens')}")
|
||||
if "input_tokens_details" in usage:
|
||||
print(f" Input Tokens Details: {usage.get('input_tokens_details')}")
|
||||
print(f" Output Tokens: {usage.get('output_tokens')}")
|
||||
if "output_tokens_details" in usage:
|
||||
print(f" Output Tokens Details: {usage.get('output_tokens_details')}")
|
||||
print(f" Total Tokens: {usage.get('total_tokens')}")
|
||||
|
||||
print("Response Text: ", result.get("text"))
|
||||
|
||||
# Print tools information
|
||||
tools = result.get("tools")
|
||||
if tools:
|
||||
print("\nTools:")
|
||||
print(tools)
|
||||
|
||||
# Print reasoning and tool call outputs
|
||||
outputs = result.get("output", [])
|
||||
for output in outputs:
|
||||
output_type = output.get("type")
|
||||
if output_type == "reasoning":
|
||||
print("\nReasoning Output:")
|
||||
print(output)
|
||||
elif output_type == "computer_call":
|
||||
print("\nTool Call Output:")
|
||||
print(output)
|
||||
```
|
||||
|
||||
## Examples & Guides
|
||||
|
||||
<Cards>
|
||||
<Card
|
||||
href="https://github.com/trycua/cua/tree/main/notebooks/agent_nb.ipynb"
|
||||
title="Agent Notebook">
|
||||
Step-by-step instructions on using the Computer-Use Agent (CUA)
|
||||
</Card>
|
||||
<Card href="../libraries/agent/agent-gradio-ui" title="Agent Gradio Guide">
|
||||
Use the Agent library with a Python Gradio UI
|
||||
</Card>
|
||||
</Cards>
|
||||
|
||||
---
|
||||
|
||||
<Callout type="info">
|
||||
**Need detailed API documentation?**{' '}
|
||||
<span className="w-full">
|
||||
Explore the complete API reference with detailed class documentation, and
|
||||
method signatures.
|
||||
</span>
|
||||
<a
|
||||
href="/api/agent"
|
||||
className={cn(
|
||||
buttonVariants({
|
||||
color: 'secondary',
|
||||
}),
|
||||
'no-underline h-10'
|
||||
)}>
|
||||
View API Reference
|
||||
<ChevronRight size={18} />
|
||||
</a>
|
||||
</Callout>
|
||||
### ComputerAgent Constructor Options
|
||||
|
||||
The `ComputerAgent` constructor provides a wide range of options for customizing agent behavior, tool integration, callbacks, resource management, and more.
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `model` | `str` | **required** | Model name (e.g., "claude-3-5-sonnet-20241022", "computer-use-preview", "omni+vertex_ai/gemini-pro") |
|
||||
| `tools` | `List[Any]` | `None` | List of tools (e.g., computer objects, decorated functions) |
|
||||
| `custom_loop` | `Callable` | `None` | Custom agent loop function (overrides auto-selection) |
|
||||
| `only_n_most_recent_images` | `int` | `None` | If set, only keep the N most recent images in message history (adds ImageRetentionCallback) |
|
||||
| `callbacks` | `List[Any]` | `None` | List of AsyncCallbackHandler instances for preprocessing/postprocessing |
|
||||
| `verbosity` | `int` | `None` | Logging level (`logging.DEBUG`, `logging.INFO`, etc.; adds LoggingCallback) |
|
||||
| `trajectory_dir` | `str` | `None` | Directory to save trajectory data (adds TrajectorySaverCallback) |
|
||||
| `max_retries` | `int` | `3` | Maximum number of retries for failed API calls |
|
||||
| `screenshot_delay` | `float` \| `int` | `0.5` | Delay before screenshots (seconds) |
|
||||
| `use_prompt_caching` | `bool` | `False` | Use prompt caching to avoid reprocessing the same prompt (mainly for Anthropic) |
|
||||
| `max_trajectory_budget` | `float` \| `dict` | `None` | If set, adds BudgetManagerCallback to track usage costs and stop when budget is exceeded |
|
||||
| `**kwargs` | _any_ | | Additional arguments passed to the agent loop |
|
||||
|
||||
#### Parameter Details
|
||||
|
||||
- **model**: The LLM or agent model to use. Determines which agent loop is selected unless `custom_loop` is provided.
|
||||
- **tools**: List of tools the agent can use (e.g., `Computer`, sandboxed Python functions, etc.).
|
||||
- **custom_loop**: Optional custom agent loop function. If provided, overrides automatic loop selection.
|
||||
- **only_n_most_recent_images**: If set, only the N most recent images are kept in the message history. Useful for limiting memory usage. Automatically adds `ImageRetentionCallback`.
|
||||
- **callbacks**: List of callback instances for advanced preprocessing, postprocessing, logging, or custom hooks. See [Callbacks & Extensibility](#callbacks--extensibility).
|
||||
- **verbosity**: Logging level (e.g., `logging.INFO`). If set, adds a logging callback.
|
||||
- **trajectory_dir**: Directory path to save full trajectory data, including screenshots and responses. Adds `TrajectorySaverCallback`.
|
||||
- **max_retries**: Maximum number of retries for failed API calls (default: 3).
|
||||
- **screenshot_delay**: Delay (in seconds) before taking screenshots (default: 0.5).
|
||||
- **use_prompt_caching**: Enables prompt caching for repeated prompts (mainly for Anthropic models).
|
||||
- **max_trajectory_budget**: If set (float or dict), adds a budget manager callback that tracks usage costs and stops execution if the budget is exceeded. Dict allows advanced options (e.g., `{ "max_budget": 5.0, "raise_error": True }`).
|
||||
- **\*\*kwargs**: Any additional keyword arguments are passed through to the agent loop or model provider.
|
||||
|
||||
**Example with advanced options:**
|
||||
|
||||
```python
|
||||
from agent2 import ComputerAgent
|
||||
from computer import Computer
|
||||
from agent2.callbacks import ImageRetentionCallback
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[Computer(...)],
|
||||
only_n_most_recent_images=3,
|
||||
callbacks=[ImageRetentionCallback(only_n_most_recent_images=3)],
|
||||
verbosity=logging.INFO,
|
||||
trajectory_dir="trajectories",
|
||||
max_retries=5,
|
||||
screenshot_delay=1.0,
|
||||
use_prompt_caching=True,
|
||||
max_trajectory_budget={"max_budget": 5.0, "raise_error": True}
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Message Array (Multi-turn)
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{"role": "user", "content": "go to trycua on gh"},
|
||||
# ... (reasoning, computer_call, computer_call_output, etc)
|
||||
]
|
||||
async for result in agent.run(messages):
|
||||
# Handle output, tool invocations, screenshots, etc.
|
||||
print("Agent:", result["output"][-1]["content"][0]["text"])
|
||||
messages += result["output"] # Add agent output to message array
|
||||
...
|
||||
```
|
||||
|
||||
### Supported Agent Loops
|
||||
- **Anthropic**: Claude 4, 3.7, 3.5 models
|
||||
- **OpenAI**: computer-use-preview
|
||||
- **UITARS**: UI-TARS 1.5 models (Hugging Face, TGI)
|
||||
- **Omni**: Omniparser + any LLM
|
||||
|
||||
See [Agent Loops](../../agent-sdk/agent-loops) for supported models and details.
|
||||
|
||||
### Callbacks & Extensibility
|
||||
|
||||
You can add preprocessing and postprocessing hooks using callbacks, or write your own by subclassing `AsyncCallbackHandler`:
|
||||
|
||||
```python
|
||||
from agent2.callbacks import ImageRetentionCallback, PIIAnonymizationCallback
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
callbacks=[ImageRetentionCallback(only_n_most_recent_images=3)]
|
||||
)
|
||||
```
|
||||
@@ -1,60 +0,0 @@
|
||||
---
|
||||
title: Computer Server
|
||||
description: The server component for the Computer-Use Interface framework.
|
||||
pypi: cua-computer-server
|
||||
macos: true
|
||||
linux: true
|
||||
windows: true
|
||||
github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/computer-server
|
||||
---
|
||||
|
||||
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
|
||||
import { cn } from 'fumadocs-ui/utils/cn';
|
||||
import { ChevronRight } from 'lucide-react';
|
||||
|
||||
**Computer Server** provides the websocket interface for the [Computer-Use Interface (CUI)](./computer/) to interact with.
|
||||
|
||||
## Features
|
||||
|
||||
- WebSocket API for computer-use
|
||||
- Cross-platform support (macOS, Linux, Windows)
|
||||
- Integration with the CUI library for screen control, keyboard/mouse automation, and accessibility
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
pip install cua-computer-server
|
||||
```
|
||||
|
||||
## Examples & Guides
|
||||
|
||||
<Cards>
|
||||
<Card
|
||||
href="https://github.com/trycua/cua/tree/main/notebooks/computer_server_nb.ipynb"
|
||||
title="Computer-Use Server Notebook">
|
||||
Step-by-step guide using the Computer-Use Server on a host system or virtual
|
||||
machine.
|
||||
</Card>
|
||||
</Cards>
|
||||
|
||||
---
|
||||
|
||||
<Callout type="info">
|
||||
**Need detailed API documentation?**
|
||||
<span className="w-full">
|
||||
Explore the complete API reference with detailed class documentation, and
|
||||
method signatures.
|
||||
</span>
|
||||
<a
|
||||
href="/api/computer-server"
|
||||
className={cn(
|
||||
buttonVariants({
|
||||
color: 'secondary',
|
||||
}),
|
||||
'no-underline h-10'
|
||||
)}>
|
||||
View API Reference
|
||||
<ChevronRight size={18} />
|
||||
</a>
|
||||
</Callout>
|
||||
@@ -0,0 +1,48 @@
|
||||
---
|
||||
title: Supported Commands
|
||||
description: List of all commands supported by the Computer Server API (WebSocket and REST).
|
||||
---
|
||||
|
||||
# Commands Reference
|
||||
|
||||
This page lists all supported commands for the Computer Server, available via both WebSocket and REST API endpoints.
|
||||
|
||||
| Command | Description |
|
||||
|---------------------|--------------------------------------------|
|
||||
| version | Get protocol and package version info |
|
||||
| run_command | Run a shell command |
|
||||
| screenshot | Capture a screenshot |
|
||||
| get_screen_size | Get the screen size |
|
||||
| get_cursor_position | Get the current mouse cursor position |
|
||||
| mouse_down | Mouse button down |
|
||||
| mouse_up | Mouse button up |
|
||||
| left_click | Left mouse click |
|
||||
| right_click | Right mouse click |
|
||||
| double_click | Double mouse click |
|
||||
| move_cursor | Move mouse cursor to coordinates |
|
||||
| drag_to | Drag mouse to coordinates |
|
||||
| drag | Drag mouse by offset |
|
||||
| key_down | Keyboard key down |
|
||||
| key_up | Keyboard key up |
|
||||
| type_text | Type text |
|
||||
| press_key | Press a single key |
|
||||
| hotkey | Press a hotkey combination |
|
||||
| scroll | Scroll the screen |
|
||||
| scroll_down | Scroll down |
|
||||
| scroll_up | Scroll up |
|
||||
| copy_to_clipboard | Copy text to clipboard |
|
||||
| set_clipboard | Set clipboard content |
|
||||
| file_exists | Check if a file exists |
|
||||
| directory_exists | Check if a directory exists |
|
||||
| list_dir | List files/directories in a directory |
|
||||
| read_text | Read text from a file |
|
||||
| write_text | Write text to a file |
|
||||
| read_bytes | Read bytes from a file |
|
||||
| write_bytes | Write bytes to a file |
|
||||
| get_file_size | Get file size |
|
||||
| delete_file | Delete a file |
|
||||
| create_dir | Create a directory |
|
||||
| delete_dir | Delete a directory |
|
||||
| get_accessibility_tree | Get accessibility tree (if supported) |
|
||||
| find_element | Find element in accessibility tree |
|
||||
| diorama_cmd | Run a diorama command (if supported) |
|
||||
@@ -0,0 +1,63 @@
|
||||
---
|
||||
title: REST API Reference
|
||||
description: Reference for the /cmd REST endpoint of the Computer Server.
|
||||
---
|
||||
|
||||
# REST API Reference
|
||||
|
||||
The Computer Server exposes a single REST endpoint for command execution:
|
||||
|
||||
- `http://localhost:8000/cmd`
|
||||
- `https://your-container.containers.cloud.trycua.com:8443/cmd` (cloud)
|
||||
|
||||
## POST /cmd
|
||||
|
||||
- Accepts commands as JSON in the request body
|
||||
- Returns results as a streaming response (text/event-stream)
|
||||
|
||||
### Request Format
|
||||
```json
|
||||
{
|
||||
"command": "<command_name>",
|
||||
"params": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
### Required Headers (for cloud containers)
|
||||
- `X-Container-Name`: Name of the container (cloud only)
|
||||
- `X-API-Key`: API key for authentication (cloud only)
|
||||
|
||||
### Example Request (Python)
|
||||
```python
|
||||
import requests
|
||||
|
||||
url = "http://localhost:8000/cmd"
|
||||
body = {"command": "screenshot", "params": {}}
|
||||
resp = requests.post(url, json=body)
|
||||
print(resp.text)
|
||||
```
|
||||
|
||||
### Example Request (Cloud)
|
||||
```python
|
||||
import requests
|
||||
|
||||
url = "https://your-container.containers.cloud.trycua.com:8443/cmd"
|
||||
headers = {
|
||||
"X-Container-Name": "your-container",
|
||||
"X-API-Key": "your-api-key"
|
||||
}
|
||||
body = {"command": "screenshot", "params": {}}
|
||||
resp = requests.post(url, json=body, headers=headers)
|
||||
print(resp.text)
|
||||
```
|
||||
|
||||
### Response Format
|
||||
Streaming text/event-stream with JSON objects, e.g.:
|
||||
```
|
||||
data: {"success": true, "content": "..."}
|
||||
|
||||
data: {"success": false, "error": "..."}
|
||||
```
|
||||
|
||||
### Supported Commands
|
||||
See [Commands Reference](./Commands) for the full list of commands and parameters.
|
||||
@@ -0,0 +1,86 @@
|
||||
---
|
||||
title: WebSocket API Reference
|
||||
description: Reference for the /ws WebSocket endpoint of the Computer Server.
|
||||
---
|
||||
|
||||
# WebSocket API Reference
|
||||
|
||||
The Computer Server exposes a WebSocket endpoint for real-time command execution and streaming results.
|
||||
|
||||
- `ws://localhost:8000/ws`
|
||||
- `wss://your-container.containers.cloud.trycua.com:8443/ws` (cloud)
|
||||
|
||||
### Authentication (Cloud Only)
|
||||
For cloud containers, you must authenticate immediately after connecting:
|
||||
```json
|
||||
{
|
||||
"command": "authenticate",
|
||||
"params": {
|
||||
"container_name": "your-container",
|
||||
"api_key": "your-api-key"
|
||||
}
|
||||
}
|
||||
```
|
||||
If authentication fails, the connection is closed.
|
||||
|
||||
### Command Format
|
||||
Send JSON messages:
|
||||
```json
|
||||
{
|
||||
"command": "<command_name>",
|
||||
"params": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
### Example (Python)
|
||||
```python
|
||||
import websockets
|
||||
import asyncio
|
||||
import json
|
||||
|
||||
async def main():
|
||||
uri = "ws://localhost:8000/ws"
|
||||
async with websockets.connect(uri) as ws:
|
||||
await ws.send(json.dumps({"command": "version", "params": {}}))
|
||||
response = await ws.recv()
|
||||
print(response)
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
### Example (Cloud)
|
||||
```python
|
||||
import websockets
|
||||
import asyncio
|
||||
import json
|
||||
|
||||
async def main():
|
||||
uri = "wss://your-container.containers.cloud.trycua.com:8443/ws"
|
||||
async with websockets.connect(uri) as ws:
|
||||
await ws.send(json.dumps({
|
||||
"command": "authenticate",
|
||||
"params": {
|
||||
"container_name": "your-container",
|
||||
"api_key": "your-api-key"
|
||||
}
|
||||
}))
|
||||
auth_response = await ws.recv()
|
||||
print(auth_response)
|
||||
await ws.send(json.dumps({"command": "version", "params": {}}))
|
||||
response = await ws.recv()
|
||||
print(response)
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
### Response Format
|
||||
Each response is a JSON object:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### Supported Commands
|
||||
See [Commands Reference](./Commands) for the full list of commands and parameters.
|
||||
@@ -5,14 +5,8 @@ github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/computer-server
|
||||
---
|
||||
|
||||
## ⚠️ 🚧 Under Construction 🚧 ⚠️
|
||||
|
||||
The Computer Server API reference documentation is currently under development.
|
||||
|
||||
## Overview
|
||||
|
||||
The Computer Server provides HTTP API endpoints for remote computer control and automation.
|
||||
|
||||
## API Documentation
|
||||
|
||||
Coming soon.
|
||||
The Computer Server provides WebSocket and REST API endpoints for remote computer control and automation.
|
||||
@@ -1,90 +0,0 @@
|
||||
---
|
||||
title: Gradio UI with the Python Computer Interface
|
||||
description: The computer module includes a Gradio UI for creating and sharing demonstration data. This guide makes it easy for people to build community datasets for better computer use models with an upload to Huggingface feature.
|
||||
---
|
||||
|
||||
<Callout title="Note">
|
||||
For precise control of the computer, we recommend using VNC or Screen Sharing
|
||||
instead of Gradio UI.
|
||||
</Callout>
|
||||
|
||||
```bash
|
||||
# Install with UI support
|
||||
pip install "cua/computer[ui]"
|
||||
```
|
||||
|
||||
## Building and Sharing Demonstrations with Huggingface
|
||||
|
||||
Follow these steps to contribute your own demonstrations:
|
||||
|
||||
### 1. Set up Huggingface Access
|
||||
|
||||
Set your HF_TOKEN in a .env file or in your environment variables:
|
||||
|
||||
```bash
|
||||
# In .env file
|
||||
HF_TOKEN=your_huggingface_token
|
||||
```
|
||||
|
||||
### 2. Launch the Computer UI
|
||||
|
||||
```python
|
||||
# launch_ui.py
|
||||
from computer.ui.gradio.app import create_gradio_ui
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv('.env')
|
||||
|
||||
app = create_gradio_ui()
|
||||
app.launch(share=False)
|
||||
```
|
||||
|
||||
For examples, see [Computer UI Examples](https://github.com/trycua/cua/tree/main/examples/computer_ui_examples.py)
|
||||
|
||||
### 3. Record Your Tasks
|
||||
|
||||
<details open>
|
||||
<summary>View demonstration video</summary>
|
||||
<video
|
||||
src="https://github.com/user-attachments/assets/de3c3477-62fe-413c-998d-4063e48de176"
|
||||
controls
|
||||
width="600"></video>
|
||||
</details>
|
||||
|
||||
Record yourself performing various computer tasks using the UI.
|
||||
|
||||
### 4. Save Your Demonstrations
|
||||
|
||||
<details open>
|
||||
<summary>View demonstration video</summary>
|
||||
<video
|
||||
src="https://github.com/user-attachments/assets/5ad1df37-026a-457f-8b49-922ae805faef"
|
||||
controls
|
||||
width="600"></video>
|
||||
</details>
|
||||
|
||||
Save each task by picking a descriptive name and adding relevant tags (e.g., "office", "web-browsing", "coding").
|
||||
|
||||
### 5. Record Additional Demonstrations
|
||||
|
||||
Repeat steps 3 and 4 until you have a good amount of demonstrations covering different tasks and scenarios.
|
||||
|
||||
### 6. Upload to Huggingface
|
||||
|
||||
<details open>
|
||||
<summary>View demonstration video</summary>
|
||||
<video
|
||||
src="https://github.com/user-attachments/assets/c586d460-3877-4b5f-a736-3248886d2134"
|
||||
controls
|
||||
width="600"></video>
|
||||
</details>
|
||||
|
||||
Upload your dataset to Huggingface by:
|
||||
|
||||
- Naming it as `{your_username}/{dataset_name}`
|
||||
- Choosing public or private visibility
|
||||
- Optionally selecting specific tags to upload only tasks with certain tags
|
||||
|
||||
### Examples and Resources
|
||||
|
||||
- Example Dataset: [ddupont/test-dataset](https://huggingface.co/datasets/ddupont/test-dataset)
|
||||
- Find Community Datasets: 🔍 [Browse CUA Datasets on Huggingface](https://huggingface.co/datasets?other=cua)
|
||||
@@ -1,185 +1,123 @@
|
||||
---
|
||||
title: Computer
|
||||
description: The Computer-Use Interface (CUI) framework for interacting with local macOS, Linux, and Windows sandboxes.
|
||||
macos: true
|
||||
windows: true
|
||||
linux: true
|
||||
pypi: cua-computer
|
||||
npm: '@trycua/computer'
|
||||
description: Reference for the current version of the Computer library.
|
||||
github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/computer
|
||||
- https://github.com/trycua/cua/tree/main/libs/typescript/computer
|
||||
---
|
||||
|
||||
import { Tabs, Tab } from 'fumadocs-ui/components/tabs';
|
||||
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
|
||||
import { cn } from 'fumadocs-ui/utils/cn';
|
||||
import { ChevronRight } from 'lucide-react';
|
||||
The Computer library provides a Computer class that can be used to control and automate a container running the Computer Server.
|
||||
|
||||
Computer, when paired with [Computer Server](../computer-server.mdx) enables programatic interaction with cross-platform sandboxes. It powers Cua systems and is PyAutoGUI-compatible and pluggable with any AI agent system (Cua, Langchain, CrewAI, AutoGen).
|
||||
## Reference
|
||||
|
||||
The Python version relies on [Lume](./lume.mdx) for creating and managing sandbox environments.
|
||||
### Basic Usage
|
||||
|
||||
## Installation
|
||||
Connect to a c/ua cloud container:
|
||||
```python
|
||||
from computer import Computer
|
||||
|
||||
<Tabs groupId='language' persist items={['Python', 'TypeScript']}>
|
||||
<Tab value="Python">
|
||||
```bash
|
||||
pip install "cua-computer[all]"
|
||||
```
|
||||
The `cua-computer` PyPi package automatically pulls the latest executable version of Lume through [pylume](https://github.com/trycua/pylume).
|
||||
computer = Computer(
|
||||
os_type="linux",
|
||||
provider_type="cloud",
|
||||
name="your-container-name",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
```bash
|
||||
npm install @trycua/computer
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
computer = await computer.run() # Connect to a c/ua cloud container
|
||||
```
|
||||
|
||||
## Features
|
||||
Connect to a c/ua local container:
|
||||
```python
|
||||
from computer import Computer
|
||||
|
||||
- Create and manage virtual machine sandboxes
|
||||
- Take screenshots of the virtual machine
|
||||
- Control mouse movements and clicks
|
||||
- Simulate keyboard input
|
||||
- Manage clipboard content
|
||||
- Interact with the operating system interface
|
||||
- Support for macOS and Linux environments
|
||||
computer = Computer(
|
||||
os_type="macos"
|
||||
)
|
||||
|
||||
## Simple Example
|
||||
computer = await computer.run() # Connect to the container
|
||||
```
|
||||
|
||||
<Tabs groupId='language' persist items={['Python', 'TypeScript']}>
|
||||
<Tab value="Python">
|
||||
```python
|
||||
from computer import Computer
|
||||
### Interface Actions
|
||||
|
||||
computer = Computer(os_type="macos", display="1024x768", memory="8GB", cpu="4")
|
||||
try:
|
||||
# Start a new local vm instance using Lume
|
||||
await computer.run()
|
||||
```python
|
||||
# Shell Actions
|
||||
result = await computer.interface.run_command(cmd) # Run shell command
|
||||
# result.stdout, result.stderr, result.returncode
|
||||
|
||||
# Interface with the instance
|
||||
screenshot = await computer.interface.screenshot()
|
||||
with open("screenshot.png", "wb") as f:
|
||||
f.write(screenshot)
|
||||
# Mouse Actions
|
||||
await computer.interface.left_click(x, y) # Left click at coordinates
|
||||
await computer.interface.right_click(x, y) # Right click at coordinates
|
||||
await computer.interface.double_click(x, y) # Double click at coordinates
|
||||
await computer.interface.move_cursor(x, y) # Move cursor to coordinates
|
||||
await computer.interface.drag_to(x, y, duration) # Drag to coordinates
|
||||
await computer.interface.get_cursor_position() # Get current cursor position
|
||||
await computer.interface.mouse_down(x, y, button="left") # Press and hold a mouse button
|
||||
await computer.interface.mouse_up(x, y, button="left") # Release a mouse button
|
||||
|
||||
await computer.interface.move_cursor(100, 100)
|
||||
await computer.interface.left_click()
|
||||
await computer.interface.right_click(300, 300)
|
||||
await computer.interface.double_click(400, 400)
|
||||
# Keyboard Actions
|
||||
await computer.interface.type_text("Hello") # Type text
|
||||
await computer.interface.press_key("enter") # Press a single key
|
||||
await computer.interface.hotkey("command", "c") # Press key combination
|
||||
await computer.interface.key_down("command") # Press and hold a key
|
||||
await computer.interface.key_up("command") # Release a key
|
||||
|
||||
await computer.interface.type("Hello, World!")
|
||||
await computer.interface.press_key("enter")
|
||||
# Scrolling Actions
|
||||
await computer.interface.scroll(x, y) # Scroll the mouse wheel
|
||||
await computer.interface.scroll_down(clicks) # Scroll down
|
||||
await computer.interface.scroll_up(clicks) # Scroll up
|
||||
|
||||
await computer.interface.set_clipboard("Test clipboard")
|
||||
content = await computer.interface.copy_to_clipboard()
|
||||
print(f"Clipboard content: {content}")
|
||||
finally:
|
||||
# Stop the vm instance
|
||||
await computer.stop()
|
||||
```
|
||||
# Screen Actions
|
||||
await computer.interface.screenshot() # Take a screenshot
|
||||
await computer.interface.get_screen_size() # Get screen dimensions
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
```typescript
|
||||
import { Computer, OSType } from '@trycua/computer';
|
||||
# Clipboard Actions
|
||||
await computer.interface.set_clipboard(text) # Set clipboard content
|
||||
await computer.interface.copy_to_clipboard() # Get clipboard content
|
||||
|
||||
// This creates and interfaces with a cloud-based cua container.
|
||||
const main = async () => {
|
||||
// Create a cloud-based computer
|
||||
const computer = new Computer({
|
||||
name: 'cloud-vm',
|
||||
osType: OSType.Linux,
|
||||
apiKey: 'your-api-key',
|
||||
});
|
||||
# File System Operations
|
||||
await computer.interface.file_exists(path) # Check if file exists
|
||||
await computer.interface.directory_exists(path) # Check if directory exists
|
||||
await computer.interface.read_text(path, encoding="utf-8") # Read file content
|
||||
await computer.interface.write_text(path, content, encoding="utf-8") # Write file content
|
||||
await computer.interface.read_bytes(path) # Read file content as bytes
|
||||
await computer.interface.write_bytes(path, content) # Write file content as bytes
|
||||
await computer.interface.delete_file(path) # Delete file
|
||||
await computer.interface.create_dir(path) # Create directory
|
||||
await computer.interface.delete_dir(path) # Delete directory
|
||||
await computer.interface.list_dir(path) # List directory contents
|
||||
|
||||
// Access the interface
|
||||
const interface = computer.interface;
|
||||
# Accessibility
|
||||
await computer.interface.get_accessibility_tree() # Get accessibility tree
|
||||
|
||||
// Screenshot operations
|
||||
const screenshot = await interface.screenshot();
|
||||
# Delay Configuration
|
||||
# Set default delay between all actions (in seconds)
|
||||
computer.interface.delay = 0.5 # 500ms delay between actions
|
||||
|
||||
// Mouse operations
|
||||
await interface.moveCursor(100, 100);
|
||||
await interface.leftClick();
|
||||
await interface.rightClick(300, 300);
|
||||
await interface.doubleClick(400, 400);
|
||||
await interface.dragTo(500, 500, 'left', 1000); // Drag with left button for 1 second
|
||||
# Or specify delay for individual actions
|
||||
await computer.interface.left_click(x, y, delay=1.0) # 1 second delay after click
|
||||
await computer.interface.type_text("Hello", delay=0.2) # 200ms delay after typing
|
||||
await computer.interface.press_key("enter", delay=0.5) # 500ms delay after key press
|
||||
|
||||
// Keyboard operations
|
||||
await interface.typeText('Hello from TypeScript!');
|
||||
await interface.pressKey('enter');
|
||||
await interface.hotkey('command', 'a'); // Select all
|
||||
# Python Virtual Environment Operations
|
||||
await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment
|
||||
await computer.venv_cmd("demo_venv", "python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'") # Run a shell command in a virtual environment
|
||||
await computer.venv_exec("demo_venv", python_function_or_code, *args, **kwargs) # Run a Python function in a virtual environment and return the result / raise an exception
|
||||
|
||||
// Clipboard operations
|
||||
await interface.setClipboard('Clipboard content');
|
||||
const content = await interface.copyToClipboard();
|
||||
# Example: Use sandboxed functions to execute code in a Cua Container
|
||||
from computer.helpers import sandboxed
|
||||
|
||||
// File operations
|
||||
await interface.writeText('/tmp/test.txt', 'Hello world');
|
||||
const fileContent = await interface.readText('/tmp/test.txt');
|
||||
@sandboxed("demo_venv")
|
||||
def greet_and_print(name):
|
||||
"""Get the HTML of the current Safari tab"""
|
||||
import PyXA
|
||||
safari = PyXA.Application("Safari")
|
||||
html = safari.current_document.source()
|
||||
print(f"Hello from inside the container, {name}!")
|
||||
return {"greeted": name, "safari_html": html}
|
||||
|
||||
// Run a command in the VM
|
||||
const [stdout, stderr] = await interface.runCommand('ls -la');
|
||||
|
||||
// Disconnect from the cloud VM
|
||||
await computer.disconnect();
|
||||
};
|
||||
|
||||
main().catch(console.error);
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Examples & Guides
|
||||
|
||||
<Tabs groupId="language" persist items={['Python', 'TypeScript']}>
|
||||
<Tab value="Python">
|
||||
<Cards>
|
||||
<Card
|
||||
href="https://github.com/trycua/cua/tree/main/notebooks/samples/computer_nb.ipynb"
|
||||
title="Computer-Use Interface (CUI)">
|
||||
Step-by-step guide on using the Computer-Use Interface (CUI)
|
||||
</Card>
|
||||
|
||||
<Card
|
||||
href="../libraries/computer/computer-use-gradio-ui"
|
||||
title="Computer-Use Gradio UI">
|
||||
Use the Computer library with a Python Gradio UI
|
||||
</Card>
|
||||
</Cards>
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
<Cards>
|
||||
<Card
|
||||
href="https://github.com/trycua/cua/tree/main/examples/computer-example-ts"
|
||||
title="Computer Cloud OpenAI">
|
||||
Use Cua Cloud Containers with OpenAI's API to execute tasks in a sandbox
|
||||
</Card>
|
||||
</Cards>
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
<Callout type="info">
|
||||
**Need detailed API documentation?**{' '}
|
||||
<span className="w-full">
|
||||
Explore the complete API reference with detailed class documentation, and
|
||||
method signatures.
|
||||
</span>
|
||||
<a
|
||||
href="/api/computer"
|
||||
className={cn(
|
||||
buttonVariants({
|
||||
color: 'secondary',
|
||||
}),
|
||||
'no-underline h-10'
|
||||
)}>
|
||||
View API Reference
|
||||
<ChevronRight size={18} />
|
||||
</a>
|
||||
</Callout>
|
||||
# When a @sandboxed function is called, it will execute in the container
|
||||
result = await greet_and_print("Cua")
|
||||
# Result: {"greeted": "Cua", "safari_html": "<html>...</html>"}
|
||||
# stdout and stderr are also captured and printed / raised
|
||||
print("Result from sandboxed function:", result)
|
||||
```
|
||||
|
||||
@@ -1,49 +0,0 @@
|
||||
---
|
||||
title: Core
|
||||
description: Core infrastructure and shared utilities powering the Cua computer-use platform
|
||||
pypi: cua-core
|
||||
npm: '@trycua/core'
|
||||
macos: true
|
||||
windows: true
|
||||
linux: true
|
||||
github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/core
|
||||
- https://github.com/trycua/cua/tree/main/libs/typescript/core
|
||||
---
|
||||
|
||||
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
|
||||
import { cn } from 'fumadocs-ui/utils/cn';
|
||||
import { ChevronRight } from 'lucide-react';
|
||||
|
||||
# Features
|
||||
|
||||
- Privacy-focused telemetry system for transparent usage analytics
|
||||
- Common helper functions and utilities used by other Cua packages
|
||||
- Core infrastructure components shared between modules
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install cua-core
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
<Callout type="info">
|
||||
**Need detailed API documentation?**{' '}
|
||||
<span className="w-full">
|
||||
Explore the complete API reference with detailed class documentation, and
|
||||
method signatures.
|
||||
</span>
|
||||
<a
|
||||
href="/api/core"
|
||||
className={cn(
|
||||
buttonVariants({
|
||||
color: 'secondary',
|
||||
}),
|
||||
'no-underline h-10'
|
||||
)}>
|
||||
View API Reference
|
||||
<ChevronRight size={18} />
|
||||
</a>
|
||||
</Callout>
|
||||
@@ -1,19 +0,0 @@
|
||||
---
|
||||
title: Getting Started
|
||||
description: Getting started with the Cua libraries
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The Cua project provides several libraries for building Computer-Use AI agents.
|
||||
|
||||
| Library | Description | Installation |
|
||||
| -------------------------------------------- | -------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
|
||||
| [**Lume**](./lume.mdx) | VM management for macOS/Linux using Apple's Virtualization.Framework | `curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh \| bash` |
|
||||
| [**Lumier**](./lumier.mdx) | Docker interface for macOS and Linux VMs | `docker pull trycua/lumier:latest` |
|
||||
| [**Computer**](./computer.mdx) | Python Interface for controlling virtual machines | `pip install "cua-computer[all]"`<br/><br/>`npm install @trycua/computer` |
|
||||
| [**Agent**](./agent.mdx) | AI agent framework for automating tasks | `pip install "cua-agent[all]"` |
|
||||
| [**MCP Server**](./mcp-server.mdx) | MCP server for using CUA with Claude Desktop | `pip install cua-mcp-server` |
|
||||
| [**SOM**](./som.mdx) | Self-of-Mark library for Agent | `pip install cua-som` |
|
||||
| [**Computer Server**](./computer-server.mdx) | Server component for Computer | `pip install cua-computer-server` |
|
||||
| [**Core**](./core.mdx) | Python Core utilities | `pip install cua-core`<br/><br/>`npm install @trycua/core` |
|
||||
71
docs/content/docs/home/libraries/lume/cli-reference.mdx
Normal file
71
docs/content/docs/home/libraries/lume/cli-reference.mdx
Normal file
@@ -0,0 +1,71 @@
|
||||
---
|
||||
title: Lume CLI Reference
|
||||
description: Command Line Interface reference for Lume
|
||||
---
|
||||
|
||||
Lume is a lightweight Command Line Interface and local API server for creating, running and managing **macOS and Linux virtual machines** with near-native performance on Apple Silicon, using Apple's [Virtualization.Framework](https://developer.apple.com/documentation/virtualization).
|
||||
|
||||
## Quick Start
|
||||
|
||||
Install and run a prebuilt macOS VM in two commands:
|
||||
|
||||
```bash
|
||||
# Install Lume
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
# Pull & start a macOS image
|
||||
lume run macos-sequoia-vanilla:latest
|
||||
```
|
||||
|
||||
> **Security Note**: All prebuilt images use the default password `lume`. Change this immediately after your first login using the `passwd` command.
|
||||
|
||||
**System Requirements**:
|
||||
- Apple Silicon Mac (M1, M2, M3, etc.)
|
||||
- macOS 13.0 or later
|
||||
- At least 8GB of RAM (16GB recommended)
|
||||
- At least 50GB of free disk space
|
||||
|
||||
## Install
|
||||
|
||||
Install with a single command:
|
||||
|
||||
```bash
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
```
|
||||
|
||||
By default, Lume is installed as a background service that starts automatically on login. If you prefer to start the Lume API service manually when needed, you can use the `--no-background-service` option:
|
||||
|
||||
```bash
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh) --no-background-service"
|
||||
```
|
||||
|
||||
> **Note:** With this option, you'll need to manually start the Lume API service by running `lume serve` in your terminal whenever you need to use tools or libraries that rely on the Lume API (such as the Computer-Use Agent).
|
||||
|
||||
You can also download the `lume.pkg.tar.gz` archive from the [latest release](https://github.com/trycua/cua/releases?q=lume&expanded=true), extract it, and install the package manually.
|
||||
|
||||
## Using Lume
|
||||
|
||||
Once installed, you can start using Lume with these common workflows:
|
||||
|
||||
### Run a Prebuilt VM
|
||||
|
||||
```bash
|
||||
# Run a macOS Sequoia VM
|
||||
lume run macos-sequoia-vanilla:latest
|
||||
|
||||
# Run an Ubuntu VM
|
||||
lume run ubuntu-noble-vanilla:latest
|
||||
```
|
||||
|
||||
> We provide [prebuilt VM images](#prebuilt-images) in our [ghcr registry](https://github.com/orgs/trycua/packages).
|
||||
|
||||
### Create a Custom VM
|
||||
|
||||
```bash
|
||||
# Create a new macOS VM
|
||||
lume create my-macos-vm --cpu 4 --memory 8GB --disk-size 50GB
|
||||
|
||||
# Create a Linux VM
|
||||
lume create my-linux-vm --os linux --cpu 2 --memory 4GB
|
||||
```
|
||||
|
||||
> **Disk Space**: The actual disk space used by sparse images will be much lower than the logical size listed. You can resize VM disks after creation using `lume set <name> --disk-size <size>`.
|
||||
@@ -1,353 +1,18 @@
|
||||
---
|
||||
title: Lume
|
||||
description: A lightweight Command Line Interface and local API server for creating, running and managing macOS and Linux virtual machines.
|
||||
macos: true
|
||||
linux: true
|
||||
description: Reference for the current version of the Lume CLI.
|
||||
github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/lume
|
||||
---
|
||||
|
||||
import Link from 'next/link';
|
||||
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
|
||||
import { Step, Steps } from 'fumadocs-ui/components/steps';
|
||||
import { cn } from 'fumadocs-ui/utils/cn';
|
||||
import { ChevronRight } from 'lucide-react';
|
||||
## ⚠️ 🚧 Under Construction 🚧 ⚠️
|
||||
|
||||
# Lume
|
||||
The Lume API reference documentation is currently under development.
|
||||
|
||||
Lume is a lightweight Command Line Interface and local API server for creating, running and managing **macOS and Linux virtual machines** with near-native performance on Apple Silicon, using Apple's [`Virtualization.Framework`](https://developer.apple.com/documentation/virtualization).
|
||||
## Overview
|
||||
|
||||
## Quick Start
|
||||
The Lume CLI provides command line tools for managing virtual machines with Lume.
|
||||
|
||||
Install and run a prebuilt macOS VM in two commands:
|
||||
## API Documentation
|
||||
|
||||
```bash
|
||||
# Install Lume
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
# Pull & start a macOS image
|
||||
lume run macos-sequoia-vanilla:latest
|
||||
```
|
||||
|
||||
<Callout type="warning">
|
||||
**Security Note**: All prebuilt images use the default password `lume`. Change
|
||||
this immediately after your first login using the `passwd` command.
|
||||
</Callout>
|
||||
|
||||
**System Requirements**:
|
||||
|
||||
- Apple Silicon Mac (M1, M2, M3, etc.)
|
||||
- macOS 13.0 or later
|
||||
- At least 8GB of RAM (16GB recommended)
|
||||
- At least 50GB of free disk space
|
||||
|
||||
## Install
|
||||
|
||||
Install with a single command:
|
||||
|
||||
```bash
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
```
|
||||
|
||||
By default, Lume is installed as a background service that starts automatically on login. If you prefer to start the Lume API service manually when needed, you can use the `--no-background-service` option:
|
||||
|
||||
```bash
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh) --no-background-service"
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
**Note:** With this option, you'll need to manually start the Lume API service
|
||||
by running `lume serve` in your terminal whenever you need to use tools or
|
||||
libraries that rely on the Lume API (such as the Computer-Use Agent).
|
||||
</Callout>
|
||||
|
||||
You can also download the `lume.pkg.tar.gz` archive from the [latest release](https://github.com/trycua/cua/releases?q=lume&expanded=true), extract it, and install the package manually.
|
||||
|
||||
## Using Lume
|
||||
|
||||
Once installed, you can start using Lume with these common workflows:
|
||||
|
||||
<Steps>
|
||||
<Step>
|
||||
### Run a Prebuilt VM
|
||||
|
||||
```bash
|
||||
# Run a macOS Sequoia VM
|
||||
lume run macos-sequoia-vanilla:latest
|
||||
|
||||
# Run an Ubuntu VM
|
||||
lume run ubuntu-noble-vanilla:latest
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
We provide [prebuilt VM images](#prebuilt-images) in our [ghcr
|
||||
registry](https://github.com/orgs/trycua/packages).
|
||||
</Callout>
|
||||
|
||||
</Step>
|
||||
|
||||
<Step>
|
||||
### Create a Custom VM
|
||||
|
||||
```bash
|
||||
# Create a new macOS VM
|
||||
lume create my-macos-vm --cpu 4 --memory 8GB --disk-size 50GB
|
||||
|
||||
# Create a Linux VM
|
||||
lume create my-linux-vm --os linux --cpu 2 --memory 4GB
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
**Disk Space**: The actual disk space used by sparse images will be much lower than the logical size listed. You can resize VM disks after creation using `lume set <name> --disk-size <size>`.
|
||||
</Callout>
|
||||
</Step>
|
||||
|
||||
<Step>
|
||||
### Manage Your VMs
|
||||
|
||||
```bash
|
||||
# List all VMs
|
||||
lume ls
|
||||
|
||||
# Get VM details
|
||||
lume get my-vm
|
||||
|
||||
# Stop a running VM
|
||||
lume stop my-vm
|
||||
```
|
||||
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
## Prebuilt Images
|
||||
|
||||
Pre-built images are available in the registry [ghcr.io/trycua](https://github.com/orgs/trycua/packages).
|
||||
|
||||
| Image | Tag | Description | Logical Size |
|
||||
| ----------------------- | ------------------- | ----------------------------------------------------------------------------------------------- | ------------ |
|
||||
| `macos-sequoia-vanilla` | `latest`, `15.2` | macOS Sequoia 15.2 image | 20GB |
|
||||
| `macos-sequoia-xcode` | `latest`, `15.2` | macOS Sequoia 15.2 image with Xcode command line tools | 22GB |
|
||||
| `macos-sequoia-cua` | `latest`, `15.3` | macOS Sequoia 15.3 image compatible with the Computer interface | 24GB |
|
||||
| `ubuntu-noble-vanilla` | `latest`, `24.04.1` | [Ubuntu Server for ARM 24.04.1 LTS](https://ubuntu.com/download/server/arm) with Ubuntu Desktop | 20GB |
|
||||
|
||||
## Lume CLI
|
||||
|
||||
```bash
|
||||
lume <command>
|
||||
|
||||
Commands:
|
||||
lume create <name> Create a new macOS or Linux VM
|
||||
lume run <name> Run a VM
|
||||
lume ls List all VMs
|
||||
lume get <name> Get detailed information about a VM
|
||||
lume set <name> Modify VM configuration
|
||||
lume stop <name> Stop a running VM
|
||||
lume delete <name> Delete a VM
|
||||
lume pull <image> Pull a macOS image from container registry
|
||||
lume push <name> <image:tag> Push a VM image to a container registry
|
||||
lume clone <name> <new-name> Clone an existing VM
|
||||
lume config Get or set lume configuration
|
||||
lume images List available macOS images in local cache
|
||||
lume ipsw Get the latest macOS restore image URL
|
||||
lume prune Remove cached images
|
||||
lume serve Start the API server
|
||||
|
||||
Options:
|
||||
--help Show help [boolean]
|
||||
--version Show version number [boolean]
|
||||
|
||||
Command Options:
|
||||
create:
|
||||
--os <os> Operating system to install (macOS or linux, default: macOS)
|
||||
--cpu <cores> Number of CPU cores (default: 4)
|
||||
--memory <size> Memory size, e.g., 8GB (default: 4GB)
|
||||
--disk-size <size> Disk size, e.g., 50GB (default: 40GB)
|
||||
--display <res> Display resolution (default: 1024x768)
|
||||
--ipsw <path> Path to IPSW file or 'latest' for macOS VMs
|
||||
--storage <name> VM storage location to use
|
||||
|
||||
run:
|
||||
--no-display Do not start the VNC client app
|
||||
--shared-dir <dir> Share directory with VM (format: path[:ro|rw])
|
||||
--mount <path> For Linux VMs only, attach a read-only disk image
|
||||
--registry <url> Container registry URL (default: ghcr.io)
|
||||
--organization <org> Organization to pull from (default: trycua)
|
||||
--vnc-port <port> Port to use for the VNC server (default: 0 for auto-assign)
|
||||
--recovery-mode <boolean> For MacOS VMs only, start VM in recovery mode (default: false)
|
||||
--storage <name> VM storage location to use
|
||||
|
||||
set:
|
||||
--cpu <cores> New number of CPU cores (e.g., 4)
|
||||
--memory <size> New memory size (e.g., 8192MB or 8GB)
|
||||
--disk-size <size> New disk size (e.g., 40960MB or 40GB)
|
||||
--display <res> New display resolution in format WIDTHxHEIGHT (e.g., 1024x768)
|
||||
--storage <name> VM storage location to use
|
||||
|
||||
delete:
|
||||
--force Force deletion without confirmation
|
||||
--storage <name> VM storage location to use
|
||||
|
||||
pull:
|
||||
--registry <url> Container registry URL (default: ghcr.io)
|
||||
--organization <org> Organization to pull from (default: trycua)
|
||||
--storage <name> VM storage location to use
|
||||
|
||||
push:
|
||||
--additional-tags <tags...> Additional tags to push the same image to
|
||||
--registry <url> Container registry URL (default: ghcr.io)
|
||||
--organization <org> Organization/user to push to (default: trycua)
|
||||
--storage <name> VM storage location to use
|
||||
--chunk-size-mb <size> Chunk size for disk image upload in MB (default: 512)
|
||||
--verbose Enable verbose logging
|
||||
--dry-run Prepare files and show plan without uploading
|
||||
--reassemble Verify integrity by reassembling chunks (requires --dry-run)
|
||||
|
||||
get:
|
||||
-f, --format <format> Output format (json|text)
|
||||
--storage <name> VM storage location to use
|
||||
|
||||
stop:
|
||||
--storage <name> VM storage location to use
|
||||
|
||||
clone:
|
||||
--source-storage <name> Source VM storage location
|
||||
--dest-storage <name> Destination VM storage location
|
||||
|
||||
config:
|
||||
get Get current configuration
|
||||
storage Manage VM storage locations
|
||||
add <name> <path> Add a new VM storage location
|
||||
remove <name> Remove a VM storage location
|
||||
list List all VM storage locations
|
||||
default <name> Set the default VM storage location
|
||||
cache Manage cache settings
|
||||
get Get current cache directory
|
||||
set <path> Set cache directory
|
||||
caching Manage image caching settings
|
||||
get Show current caching status
|
||||
set <boolean> Enable or disable image caching
|
||||
|
||||
serve:
|
||||
--port <port> Port to listen on (default: 7777)
|
||||
```
|
||||
|
||||
## Common Workflows
|
||||
|
||||
### Development Environment Setup
|
||||
|
||||
```bash
|
||||
# Create a development VM with more resources
|
||||
lume create dev-vm --cpu 6 --memory 12GB --disk-size 100GB
|
||||
|
||||
# Run with shared directory for code
|
||||
lume run dev-vm --shared-dir ~/Projects:rw
|
||||
```
|
||||
|
||||
### Testing Different macOS Versions
|
||||
|
||||
```bash
|
||||
# Pull and run different macOS versions
|
||||
lume pull macos-sequoia-vanilla:latest
|
||||
lume run macos-sequoia-vanilla:latest
|
||||
|
||||
# Clone a VM for testing
|
||||
lume clone my-vm my-vm-test
|
||||
```
|
||||
|
||||
### File Sharing Examples
|
||||
|
||||
```bash
|
||||
# Share a read-only directory
|
||||
lume run my-vm --shared-dir ~/Documents:ro
|
||||
|
||||
# Share multiple directories
|
||||
lume run my-vm --shared-dir ~/Projects:rw --shared-dir ~/Downloads:ro
|
||||
|
||||
# For Linux VMs, mount additional disk images
|
||||
lume run ubuntu-vm --mount ~/disk-image.img
|
||||
```
|
||||
|
||||
## Local API Server
|
||||
|
||||
Lume exposes a local HTTP API server for programmatic VM management, perfect for automation and integration with other tools.
|
||||
|
||||
```bash
|
||||
# Start the API server
|
||||
lume serve
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
<span className="w-full">
|
||||
Read the doucmentation on the local API server.
|
||||
</span>
|
||||
<Link
|
||||
href="/home/libraries/lume/http-api"
|
||||
className={cn(
|
||||
buttonVariants({
|
||||
color: 'secondary',
|
||||
}),
|
||||
'no-underline h-10'
|
||||
)}>
|
||||
Lume API Server Documentation
|
||||
<ChevronRight size={18} />
|
||||
</Link>
|
||||
</Callout>
|
||||
|
||||
## Development
|
||||
|
||||
If you're working on Lume in the context of the Cua monorepo, we recommend using the dedicated VS Code workspace configuration:
|
||||
|
||||
```bash
|
||||
# Open VS Code workspace from the root of the monorepo
|
||||
code .vscode/lume.code-workspace
|
||||
```
|
||||
|
||||
This workspace is preconfigured with Swift language support, build tasks, and debug configurations.
|
||||
|
||||
## FAQ
|
||||
|
||||
### Can I run multiple VMs simultaneously?
|
||||
|
||||
Yes, you can run multiple VMs at the same time as long as your system has sufficient resources (CPU, memory, and disk space).
|
||||
|
||||
### How do I share files between the host and VM?
|
||||
|
||||
Use the `--shared-dir` option when running a VM:
|
||||
|
||||
```bash
|
||||
lume run my-vm --shared-dir ~/Projects:rw
|
||||
```
|
||||
|
||||
The shared directory will be automatically mounted in the VM.
|
||||
|
||||
### Where are VM files stored?
|
||||
|
||||
By default, VMs are stored in `~/.lume/vms/`. You can configure additional storage locations using the `lume config storage` commands.
|
||||
|
||||
### How do I update Lume?
|
||||
|
||||
Run the install script again to update to the latest version:
|
||||
|
||||
```bash
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
<Callout type="info">
|
||||
**Need detailed API documentation?**{' '}
|
||||
<span className="w-full">
|
||||
Explore the complete API reference with detailed class documentation, and
|
||||
method signatures.
|
||||
</span>
|
||||
<a
|
||||
href="/api/lume"
|
||||
className={cn(
|
||||
buttonVariants({
|
||||
color: 'secondary',
|
||||
}),
|
||||
'no-underline h-10'
|
||||
)}>
|
||||
View API Reference
|
||||
<ChevronRight size={18} />
|
||||
</a>
|
||||
</Callout>
|
||||
Coming soon.
|
||||
|
||||
@@ -1,350 +0,0 @@
|
||||
---
|
||||
title: Lumier
|
||||
description: Run macOS and Linux virtual machines effortlessly in Docker containers with browser-based VNC access.
|
||||
macos: true
|
||||
linux: true
|
||||
github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/lumier
|
||||
---
|
||||
|
||||
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
|
||||
import { cn } from 'fumadocs-ui/utils/cn';
|
||||
import { ChevronRight } from 'lucide-react';
|
||||
import Link from 'next/link';
|
||||
import { Step, Steps } from 'fumadocs-ui/components/steps';
|
||||
|
||||
## What is Lumier?
|
||||
|
||||
Lumier is a streamlined interface for running macOS and Linux virtual machines with minimal setup. It packages a pre-configured environment in Docker that connects to the `lume` virtualization service on your host machine.
|
||||
|
||||
<div align="center">
|
||||
<video
|
||||
src="https://github.com/user-attachments/assets/2ecca01c-cb6f-4c35-a5a7-69bc58bd94e2"
|
||||
width="800"
|
||||
controls></video>
|
||||
</div>
|
||||
|
||||
### Features
|
||||
|
||||
- **Quick Setup** - Get a VM running in minutes
|
||||
- **Browser Access** - VNC interface accessible from any browser
|
||||
- **Easy File Sharing** - Seamless file transfer between host and VM
|
||||
- **Simple Configuration** - Environment variables for easy customization
|
||||
- **Hardware Acceleration** - Native virtualization using Apple's framework
|
||||
|
||||
<Callout type="info">
|
||||
Lumier uses Docker as a packaging system, not for isolation. It creates true
|
||||
virtual machines using Apple's Virtualization Framework through the Lume CLI.
|
||||
</Callout>
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
|
||||
<Steps>
|
||||
<Step>
|
||||
### Install Docker for Apple Silicon
|
||||
|
||||
Download and install [Docker Desktop](https://desktop.docker.com/mac/main/arm64/Docker.dmg) for Mac.
|
||||
|
||||
Make sure Docker is running before proceeding to the next step.
|
||||
|
||||
</Step>
|
||||
|
||||
<Step>
|
||||
### Install Lume Virtualization Service
|
||||
|
||||
Install [Lume](./lume/) with a single command:
|
||||
|
||||
```bash
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
Lume runs as a background service on port 7777. If this port is already in
|
||||
use, specify a different port with the `--port` option during installation.
|
||||
</Callout>
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Quick Start
|
||||
|
||||
Run your first macOS VM with a single Docker command:
|
||||
|
||||
```bash
|
||||
# Run a macOS VM with default settings
|
||||
docker run -it \
|
||||
-e LUME_SERVER_URL="host.docker.internal:7777" \
|
||||
-p 5900:5900 \
|
||||
ghcr.io/trycua/lumier:latest
|
||||
```
|
||||
|
||||
### Basic Configuration
|
||||
|
||||
Customize your VM with environment variables:
|
||||
|
||||
```bash
|
||||
docker run -it \
|
||||
-e LUME_SERVER_URL="host.docker.internal:7777" \
|
||||
-e VM_NAME="my-dev-vm" \
|
||||
-e VM_CPUS="8" \
|
||||
-e VM_MEMORY="16384" \
|
||||
-e VM_STORAGE="100" \
|
||||
-e VNC_PASSWORD="mysecretpassword" \
|
||||
-p 5900:5900 \
|
||||
ghcr.io/trycua/lumier:latest
|
||||
```
|
||||
|
||||
### Access Your VM
|
||||
|
||||
Once running, access your VM through:
|
||||
|
||||
1. **VNC Client**: Connect to `vnc://localhost:5900`
|
||||
2. **Web Browser**: Navigate to `http://localhost:5900` (if using noVNC)
|
||||
|
||||
## Examples
|
||||
|
||||
### Ephemeral VM (Temporary)
|
||||
|
||||
Run a VM that resets on restart - perfect for testing:
|
||||
|
||||
```bash
|
||||
docker run -it --rm \
|
||||
--name macos-vm \
|
||||
-p 8006:8006 \
|
||||
-e VM_NAME=macos-vm \
|
||||
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
|
||||
-e CPU_CORES=4 \
|
||||
-e RAM_SIZE=8192 \
|
||||
trycua/lumier:latest
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
Access your VM at `http://localhost:8006` after startup. Changes will be lost
|
||||
when the container stops.
|
||||
</Callout>
|
||||
|
||||
### Persistent VM
|
||||
|
||||
Save your VM state between sessions with persistent storage:
|
||||
|
||||
```bash
|
||||
# First, create a storage directory if it doesn't exist
|
||||
mkdir -p storage
|
||||
|
||||
# Then run the container with persistent storage
|
||||
docker run -it --rm \
|
||||
--name lumier-vm \
|
||||
-p 8006:8006 \
|
||||
-v $(pwd)/storage:/storage \
|
||||
-e VM_NAME=lumier-vm \
|
||||
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
|
||||
-e CPU_CORES=4 \
|
||||
-e RAM_SIZE=8192 \
|
||||
-e HOST_STORAGE_PATH=$(pwd)/storage \
|
||||
trycua/lumier:latest
|
||||
```
|
||||
|
||||
### File Sharing
|
||||
|
||||
Share files between your host and VM:
|
||||
|
||||
```bash
|
||||
# Create both storage and shared folders
|
||||
mkdir -p storage shared
|
||||
|
||||
# Run with both persistent storage and a shared folder
|
||||
docker run -it --rm \
|
||||
--name lumier-vm \
|
||||
-p 8006:8006 \
|
||||
-v $(pwd)/storage:/storage \
|
||||
-v $(pwd)/shared:/shared \
|
||||
-e VM_NAME=lumier-vm \
|
||||
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
|
||||
-e CPU_CORES=4 \
|
||||
-e RAM_SIZE=8192 \
|
||||
-e HOST_STORAGE_PATH=$(pwd)/storage \
|
||||
-e HOST_SHARED_PATH=$(pwd)/shared \
|
||||
trycua/lumier:latest
|
||||
```
|
||||
|
||||
Files in the `shared` folder are accessible from both your Mac and the VM.
|
||||
|
||||
### Automation with Startup Scripts
|
||||
|
||||
Automate VM setup with startup scripts:
|
||||
|
||||
```bash
|
||||
# Create the lifecycle directory in your shared folder
|
||||
mkdir -p shared/lifecycle
|
||||
|
||||
# Create a sample on-logon.sh script
|
||||
cat > shared/lifecycle/on-logon.sh << 'EOF'
|
||||
#!/usr/bin/env bash
|
||||
|
||||
# Create a file on the desktop
|
||||
echo "Hello from Lumier!" > /Users/lume/Desktop/hello_lume.txt
|
||||
|
||||
# You can add more commands to execute at VM startup
|
||||
# For example:
|
||||
# - Configure environment variables
|
||||
# - Start applications
|
||||
# - Mount network drives
|
||||
# - Set up development environments
|
||||
EOF
|
||||
|
||||
# Make the script executable
|
||||
chmod +x shared/lifecycle/on-logon.sh
|
||||
```
|
||||
|
||||
The script runs automatically on VM startup with access to:
|
||||
|
||||
- Home directory: `/Users/lume`
|
||||
- Shared folder: `/Volumes/My Shared Files`
|
||||
- All VM resources
|
||||
|
||||
### Docker Compose
|
||||
|
||||
For easier management, use Docker Compose:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
lumier:
|
||||
image: trycua/lumier:latest
|
||||
container_name: lumier-vm
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- '8006:8006' # Port for VNC access
|
||||
volumes:
|
||||
- ./storage:/storage # VM persistent storage
|
||||
- ./shared:/shared # Shared folder accessible in the VM
|
||||
environment:
|
||||
- VM_NAME=lumier-vm
|
||||
- VERSION=ghcr.io/trycua/macos-sequoia-cua:latest
|
||||
- CPU_CORES=4
|
||||
- RAM_SIZE=8192
|
||||
- HOST_STORAGE_PATH=${PWD}/storage
|
||||
- HOST_SHARED_PATH=${PWD}/shared
|
||||
stop_signal: SIGINT
|
||||
stop_grace_period: 2m
|
||||
```
|
||||
|
||||
Run with Docker Compose:
|
||||
|
||||
```bash
|
||||
# First create the required directories
|
||||
mkdir -p storage shared
|
||||
|
||||
# Start the container
|
||||
docker-compose up -d
|
||||
|
||||
# View the logs
|
||||
docker-compose logs -f
|
||||
|
||||
# Stop the container when done
|
||||
docker-compose down
|
||||
```
|
||||
|
||||
## Advanced Topics
|
||||
|
||||
### Building from Source
|
||||
|
||||
Customize Lumier by building from source:
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/trycua/cua.git
|
||||
cd cua/libs/lumier
|
||||
|
||||
# Build the Docker image
|
||||
docker build -t lumier-custom:latest .
|
||||
|
||||
# 3. Run your custom build
|
||||
docker run -it --rm \
|
||||
--name lumier-vm \
|
||||
-p 8006:8006 \
|
||||
-e VM_NAME=lumier-vm \
|
||||
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
|
||||
-e CPU_CORES=4 \
|
||||
-e RAM_SIZE=8192 \
|
||||
lumier-custom:latest
|
||||
```
|
||||
|
||||
### Customization Options
|
||||
|
||||
The Dockerfile provides several customization points:
|
||||
|
||||
1. **Base image**: The container uses Debian Bullseye Slim as the base. You can modify this if needed.
|
||||
2. **Installed packages**: You can add or remove packages in the apt-get install list.
|
||||
3. **Hooks**: Check the `/run/hooks/` directory for scripts that run at specific points during VM lifecycle.
|
||||
4. **Configuration**: Review `/run/config/constants.sh` for default settings.
|
||||
|
||||
After making your modifications, you can build and push your custom image to your own Docker Hub repository:
|
||||
|
||||
```bash
|
||||
# Build with a custom tag
|
||||
docker build -t yourusername/lumier:custom .
|
||||
|
||||
# Push to Docker Hub (after docker login)
|
||||
docker push yourusername/lumier:custom
|
||||
```
|
||||
|
||||
### Configuration Reference
|
||||
|
||||
#### Environment Variables
|
||||
|
||||
| Variable | Description | Default | Example |
|
||||
| ------------------- | --------------------------- | --------------------------- | ----------------------------------------- |
|
||||
| `LUME_SERVER_URL` | Lume service URL | `host.docker.internal:7777` | `host.docker.internal:8080` |
|
||||
| `VM_NAME` | Virtual machine name | `lumier-vm` | `my-dev-vm` |
|
||||
| `VERSION` | VM image to use | - | `ghcr.io/trycua/macos-sequoia-cua:latest` |
|
||||
| `VM_CPUS` | Number of CPU cores | `4` | `8` |
|
||||
| `VM_MEMORY` | Memory in MB | `8192` | `16384` |
|
||||
| `VM_STORAGE` | Storage size in GB | `50` | `100` |
|
||||
| `VNC_PASSWORD` | VNC access password | - | `mysecretpassword` |
|
||||
| `HOST_STORAGE_PATH` | Host path for VM storage | - | `$(pwd)/storage` |
|
||||
| `HOST_SHARED_PATH` | Host path for shared folder | - | `$(pwd)/shared` |
|
||||
|
||||
#### Port Configuration
|
||||
|
||||
- **VNC Port**: `-p 5900:5900` for standard VNC access
|
||||
- **Web Port**: `-p 8006:8006` for browser-based access
|
||||
- Use different host ports if defaults are occupied: `-p 8007:8006`
|
||||
|
||||
## Resources
|
||||
|
||||
<Cards>
|
||||
<Card
|
||||
title="Lume Documentation"
|
||||
description="Learn more about the virtualization service powering Lumier"
|
||||
href="/home/libraries/lume"
|
||||
/>
|
||||
<Card
|
||||
title="Computer Library"
|
||||
description="Automate your VMs with the Computer library"
|
||||
href="/home/libraries/computer"
|
||||
/>
|
||||
</Cards>
|
||||
|
||||
---
|
||||
|
||||
<Callout type="info">
|
||||
**Need detailed API documentation?**{' '}
|
||||
<span className="w-full">
|
||||
Explore the complete API reference with detailed class documentation, and
|
||||
method signatures.
|
||||
</span>
|
||||
<a
|
||||
href="/api/lumier"
|
||||
className={cn(
|
||||
buttonVariants({
|
||||
color: 'secondary',
|
||||
}),
|
||||
'no-underline h-10'
|
||||
)}>
|
||||
View API Reference
|
||||
<ChevronRight size={18} />
|
||||
</a>
|
||||
</Callout>
|
||||
@@ -1,202 +0,0 @@
|
||||
---
|
||||
title: MCP Server
|
||||
description: Model Context Protocol server for Computer-Use Agent integration
|
||||
pypi: cua-mcp-server
|
||||
macos: true
|
||||
linux: true
|
||||
windows: true
|
||||
github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/computer-server
|
||||
---
|
||||
|
||||
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
|
||||
import { cn } from 'fumadocs-ui/utils/cn';
|
||||
import { ChevronRight } from 'lucide-react';
|
||||
import { Step, Steps } from 'fumadocs-ui/components/steps';
|
||||
|
||||
**MCP Server** enables Computer-Use Agent (CUA) integration with Claude Desktop and other Model Context Protocol (MCP) clients, providing seamless access to computer automation capabilities through a standardized interface.
|
||||
|
||||
## Features
|
||||
|
||||
- **MCP Integration** - Connect CUA to Claude Desktop and other MCP-compatible clients
|
||||
- **Computer Control** - Full screen, keyboard, and mouse automation capabilities
|
||||
- **Tool System** - Execute commands, take screenshots, and interact with applications
|
||||
- **Easy Setup** - Simple configuration with Claude Desktop or any MCP client
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Before installing the MCP server, ensure you have:
|
||||
|
||||
1. **Lume CLI** installed and configured
|
||||
2. **macOS CUA image** pulled and ready
|
||||
3. **Python 3.10+** installed on your system
|
||||
|
||||
<Callout type="info">
|
||||
Follow our [Cua Usage Guide](../guides/cua-usage-guide.mdx) for help setting
|
||||
everything up.
|
||||
</Callout>
|
||||
|
||||
### Install via pip
|
||||
|
||||
```bash
|
||||
pip install cua-mcp-server
|
||||
```
|
||||
|
||||
This will install:
|
||||
|
||||
- The MCP server
|
||||
- CUA agent and computer dependencies
|
||||
- An executable `cua-mcp-server` script in your PATH
|
||||
|
||||
### Install Script
|
||||
|
||||
For automated installation, use our setup script:
|
||||
|
||||
```bash
|
||||
curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/mcp-server/scripts/install_mcp_server.sh | bash
|
||||
```
|
||||
|
||||
This script:
|
||||
|
||||
- Creates the `~/.cua` directory
|
||||
- Generates a startup script at `~/.cua/start_mcp_server.sh`
|
||||
- Manages Python virtual environments automatically
|
||||
- Installs and updates the cua-mcp-server package
|
||||
|
||||
## Getting Started
|
||||
|
||||
You can then use the script in your MCP configuration like this:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"cua-agent": {
|
||||
"command": "/bin/bash",
|
||||
"args": ["~/.cua/start_mcp_server.sh"],
|
||||
"env": {
|
||||
"CUA_AGENT_LOOP": "OMNI",
|
||||
"CUA_MODEL_PROVIDER": "ANTHROPIC",
|
||||
"CUA_MODEL_NAME": "claude-3-7-sonnet-20250219",
|
||||
"CUA_PROVIDER_API_KEY": "your-api-key"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Development Config
|
||||
|
||||
If you want to develop with the cua-mcp-server directly without installation, you can use this configuration:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"cua-agent": {
|
||||
"command": "/bin/bash",
|
||||
"args": ["~/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"],
|
||||
"env": {
|
||||
"CUA_AGENT_LOOP": "UITARS",
|
||||
"CUA_MODEL_PROVIDER": "OAICOMPAT",
|
||||
"CUA_MODEL_NAME": "ByteDance-Seed/UI-TARS-1.5-7B",
|
||||
"CUA_PROVIDER_BASE_URL": "https://****************.us-east-1.aws.endpoints.huggingface.cloud/v1",
|
||||
"CUA_PROVIDER_API_KEY": "your-api-key"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This configuration:
|
||||
|
||||
- Uses the start_mcp_server.sh script which automatically sets up the Python path and runs the server module
|
||||
- Works with Claude Desktop, Cursor, or any other MCP client
|
||||
- Automatically uses your development code without requiring installation
|
||||
|
||||
Just add this to your MCP client's configuration and it will use your local development version of the server.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
The MCP server is configured using the following environment variables.
|
||||
|
||||
| Variable | Description | Default |
|
||||
| ----------------------- | ----------------------------------------------------- | ----------------------- |
|
||||
| `CUA_AGENT_LOOP` | Agent loop to use (OPENAI, ANTHROPIC, UITARS, OMNI) | OMNI |
|
||||
| `CUA_MODEL_PROVIDER` | Model provider (ANTHROPIC, OPENAI, OLLAMA, OAICOMPAT) | ANTHROPIC |
|
||||
| `CUA_MODEL_NAME` | Model name to use | None (provider default) |
|
||||
| `CUA_PROVIDER_BASE_URL` | Base URL for provider API | None |
|
||||
| `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 |
|
||||
|
||||
### Usage
|
||||
|
||||
Once configured, you can simply ask the model to perform computer tasks:
|
||||
|
||||
- "Open Chrome and go to github.com"
|
||||
- "Create a folder called 'Projects' on my desktop"
|
||||
- "Find all PDFs in my Downloads folder"
|
||||
- "Take a screenshot and highlight the error message"
|
||||
|
||||
The model will automatically use your CUA agent to perform these tasks.
|
||||
|
||||
## Available Tools
|
||||
|
||||
The MCP server exposes the following tools to Claude:
|
||||
|
||||
1. `run_cua_task` - Run a single Computer-Use Agent task with the given instruction
|
||||
2. `run_multi_cua_tasks` - Run multiple tasks in sequence
|
||||
|
||||
## Integrations
|
||||
|
||||
### Claude Desktop
|
||||
|
||||
To use with Claude Desktop, add an entry to your Claude Desktop configuration (`claude_desktop_config.json`, typically found in `~/.config/claude-desktop/`):
|
||||
|
||||
For more information on MCP with Claude Desktop, see the [official MCP User Guide](https://modelcontextprotocol.io/quickstart/user).
|
||||
|
||||
### Cursor
|
||||
|
||||
To use with Cursor, add an MCP configuration file in one of these locations:
|
||||
|
||||
- **Project-specific**: Create `.cursor/mcp.json` in your project directory
|
||||
- **Global**: Create `~/.cursor/mcp.json` in your home directory
|
||||
|
||||
After configuration, you can simply tell Cursor's Agent to perform computer tasks by explicitly mentioning the CUA agent, such as "Use the computer control tools to open Safari."
|
||||
|
||||
For more information on MCP with Cursor, see the [official Cursor MCP documentation](https://docs.cursor.com/context/model-context-protocol).
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
Ensure you have valid API keys:
|
||||
|
||||
- Add your Anthropic API key, or other model provider API key in the Claude Desktop config (as shown above)
|
||||
- Or set it as an environment variable in your shell profile
|
||||
|
||||
If you get a `/bin/bash: ~/cua/libs/python/mcp-server/scripts/start_mcp_server.sh: No such file or directory` error, try changing the path to the script to be absolute instead of relative.
|
||||
|
||||
View MCP server logs:
|
||||
|
||||
```bash
|
||||
tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
<Callout type="info">
|
||||
**Need detailed API documentation?**
|
||||
<span className="w-full">
|
||||
Explore the complete API reference with detailed class documentation, and
|
||||
method signatures.
|
||||
</span>
|
||||
<a
|
||||
href="/api/mcp-server"
|
||||
className={cn(
|
||||
buttonVariants({
|
||||
color: 'secondary',
|
||||
}),
|
||||
'no-underline h-10'
|
||||
)}>
|
||||
View API Reference
|
||||
<ChevronRight size={18} />
|
||||
</a>
|
||||
</Callout>
|
||||
@@ -1,16 +0,0 @@
|
||||
{
|
||||
"title": "Libraries",
|
||||
"description": "Libraries",
|
||||
"icon": "Library",
|
||||
"pages": [
|
||||
"agent",
|
||||
"computer",
|
||||
"computer-server",
|
||||
"cloud",
|
||||
"core",
|
||||
"lume",
|
||||
"lumier",
|
||||
"mcp-server",
|
||||
"som"
|
||||
]
|
||||
}
|
||||
@@ -1,209 +0,0 @@
|
||||
---
|
||||
title: Set-of-Mark
|
||||
description: A high-performance visual grounding library for detecting and analyzing UI elements in screenshots.
|
||||
macos: true
|
||||
windows: true
|
||||
linux: true
|
||||
pypi: cua-computer
|
||||
github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/som
|
||||
---
|
||||
|
||||
import { buttonVariants } from 'fumadocs-ui/components/ui/button';
|
||||
import { cn } from 'fumadocs-ui/utils/cn';
|
||||
import { ChevronRight } from 'lucide-react';
|
||||
|
||||
## Overview
|
||||
|
||||
**Set-of-Mark (Som)** is a high-performance visual grounding library for detecting and analyzing UI elements in screenshots. Built for the Computer-Use Agent (CUA) framework, it combines state-of-the-art computer vision models to identify icons, buttons, and text in user interfaces.
|
||||
|
||||
<Callout type="info">
|
||||
Som is optimized for **Apple Silicon** with Metal Performance Shaders (MPS)
|
||||
acceleration, achieving sub-second detection times while maintaining high
|
||||
accuracy.
|
||||
</Callout>
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Hardware Acceleration** - Automatic detection of MPS, CUDA, or CPU
|
||||
- **Multi-Model Architecture** - YOLO for icons + EasyOCR for text
|
||||
- **Optimized Performance** - Sub-second detection on Apple Silicon
|
||||
- **Flexible Configuration** - Tunable thresholds for different use cases
|
||||
- **Rich Output Format** - Structured data with confidence scores
|
||||
- **Visual Debugging** - Annotated screenshots with numbered elements
|
||||
|
||||
## Installation
|
||||
|
||||
### Install from PyPI
|
||||
|
||||
```bash
|
||||
pip install cua-som
|
||||
```
|
||||
|
||||
<Callout type="warning">
|
||||
Som requires Python 3.11 or higher. For best performance, use macOS with Apple
|
||||
Silicon.
|
||||
</Callout>
|
||||
|
||||
### Install from Source
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/cua/som.git
|
||||
cd som
|
||||
|
||||
# Using PDM (recommended)
|
||||
pdm install
|
||||
|
||||
# Or using pip
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
### System Requirements
|
||||
|
||||
| Platform | Hardware | Detection Time |
|
||||
| -------- | ------------------------ | -------------- |
|
||||
| macOS | Apple Silicon (M1/M2/M3) | ~0.4s |
|
||||
| Any | CPU only | ~1.3s |
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Basic Usage
|
||||
|
||||
Here's a simple example to detect UI elements in a screenshot:
|
||||
|
||||
```python
|
||||
from som import OmniParser
|
||||
from PIL import Image
|
||||
|
||||
# Initialize the parser
|
||||
parser = OmniParser()
|
||||
|
||||
# Load and process an image
|
||||
image = Image.open("screenshot.png")
|
||||
result = parser.parse(
|
||||
image,
|
||||
box_threshold=0.3, # Confidence threshold
|
||||
iou_threshold=0.1, # Overlap threshold
|
||||
use_ocr=True # Enable text detection
|
||||
)
|
||||
|
||||
# Print detected elements
|
||||
for elem in result.elements:
|
||||
if elem.type == "icon":
|
||||
print(f"Icon: confidence={elem.confidence:.3f}, bbox={elem.bbox.coordinates}")
|
||||
else: # text
|
||||
print(f"Text: '{elem.content}', confidence={elem.confidence:.3f}")
|
||||
```
|
||||
|
||||
### Advanced Configuration
|
||||
|
||||
Customize detection parameters for your specific use case:
|
||||
|
||||
```python
|
||||
result = parser.parse(
|
||||
image,
|
||||
box_threshold=0.3, # Confidence threshold (0.0-1.0)
|
||||
iou_threshold=0.1, # Overlap threshold (0.0-1.0)
|
||||
use_ocr=True, # Enable text detection
|
||||
)
|
||||
```
|
||||
|
||||
## Configuration Guide
|
||||
|
||||
### Box Thresholds
|
||||
|
||||
Controls detection confidence (default: 0.3)
|
||||
|
||||
- **Higher values (0.4-0.5)**: More precise, fewer false positives
|
||||
- **Lower values (0.1 - 0.2)**: More detections, may include noise
|
||||
- **Recommended**: 0.3 for balanced performance
|
||||
|
||||
### Intersection Over Union (IOU) Thresholds
|
||||
|
||||
Set the `iou_threshold` parameter to control when overlapping element boxes should be merged into a single detection. A value of 0.1-0.2 is recommended for most use cases. Higher values will require more overlap before merging occurs.
|
||||
|
||||
<div class="flex gap-x-6">
|
||||
|
||||
<IOU
|
||||
title="Low Overlap (Keep Both)"
|
||||
description="When boxes have minimal overlap (IOU ~ 0.05), both detections are kept as separate elements."
|
||||
rect1={{
|
||||
left: 30,
|
||||
top: 30,
|
||||
width: 60,
|
||||
height: 50,
|
||||
fill: 'rgba(0, 0, 255, 0.6)',
|
||||
name: 'box1',
|
||||
}}
|
||||
rect2={{
|
||||
left: 80,
|
||||
top: 70,
|
||||
width: 60,
|
||||
height: 50,
|
||||
fill: 'rgba(255, 165, 0, 0.6)',
|
||||
name: 'box2',
|
||||
}}
|
||||
/>
|
||||
|
||||
<IOU
|
||||
title="High Overlap (Merge)"
|
||||
description="When boxes significant overlap (IOU ~ 0.4), they are merged into a single detection to avoid duplicates."
|
||||
rect1={{
|
||||
left: 30,
|
||||
top: 30,
|
||||
width: 80,
|
||||
height: 60,
|
||||
fill: 'rgba(0, 0, 255, 0.6)',
|
||||
name: 'box1',
|
||||
}}
|
||||
rect2={{
|
||||
left: 50,
|
||||
top: 40,
|
||||
width: 80,
|
||||
height: 60,
|
||||
fill: 'rgba(255, 165, 0, 0.6)',
|
||||
name: 'box2',
|
||||
}}
|
||||
/>
|
||||
</div>
|
||||
|
||||
## Performance
|
||||
|
||||
<Cards>
|
||||
<Card title="Metal Performance Shaders (Apple Silicon)" description="Best performance on macOS">
|
||||
- Multi-scale detection (640px, 1280px, 1920px)
|
||||
- Test-time augmentation enabled
|
||||
- Half-precision (FP16)
|
||||
- ~0.4s average detection time
|
||||
- Best for production use
|
||||
</Card>
|
||||
|
||||
<Card title="CPU Fallback" description="Universal compatibility">
|
||||
- Single-scale detection (1280px)
|
||||
- Full precision (FP32)
|
||||
- ~1.3s average time
|
||||
- Reliable fallback option
|
||||
</Card>
|
||||
</Cards>
|
||||
|
||||
---
|
||||
|
||||
<Callout type="info">
|
||||
**Need the full API documentation?**
|
||||
<span className="w-full">
|
||||
Explore the complete API reference with detailed class documentation, and
|
||||
method signatures.
|
||||
</span>
|
||||
<a
|
||||
href="/api/som"
|
||||
className={cn(
|
||||
buttonVariants({
|
||||
color: 'secondary',
|
||||
}),
|
||||
'no-underline h-10'
|
||||
)}>
|
||||
View API Reference
|
||||
<ChevronRight size={18} />
|
||||
</a>
|
||||
</Callout>
|
||||
@@ -5,17 +5,14 @@
|
||||
"defaultOpen": true,
|
||||
"pages": [
|
||||
"index",
|
||||
"compatibility",
|
||||
"faq",
|
||||
"quickstart-ui",
|
||||
"quickstart-devs",
|
||||
"telemetry",
|
||||
"---[BookCopy]Guides---",
|
||||
"guides/cua-usage-guide",
|
||||
"guides/developer-guide",
|
||||
"guides/dev-container-setup",
|
||||
"guides/computer-use-agent-quickstart",
|
||||
"guides/agent-gradio-ui",
|
||||
"guides/computer-use-gradio-ui",
|
||||
"---[Library]Libraries---",
|
||||
"---[BookCopy]Computer Playbook---",
|
||||
"...computer-sdk",
|
||||
"---[BookCopy]Agent Playbook---",
|
||||
"...agent-sdk",
|
||||
"---[CodeXml]API Reference---",
|
||||
"...libraries"
|
||||
]
|
||||
}
|
||||
68
docs/content/docs/home/quickstart-devs.mdx
Normal file
68
docs/content/docs/home/quickstart-devs.mdx
Normal file
@@ -0,0 +1,68 @@
|
||||
---
|
||||
title: Quickstart (for Developers)
|
||||
description: Get started with c/ua in 5 steps
|
||||
icon: Rocket
|
||||
---
|
||||
|
||||
Get up and running with c/ua in 5 simple steps.
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
c/ua combines Computer (interface) + Agent (AI) for automating desktop apps. Computer handles clicks/typing, Agent provides the intelligence.
|
||||
|
||||
## 2. Create Your First c/ua Container
|
||||
|
||||
1. Go to [trycua.com/signin](https://www.trycua.com/signin)
|
||||
2. Navigate to **Dashboard > Containers > Create Instance**
|
||||
3. Create a **Medium, Ubuntu 22** container
|
||||
4. Note your container name and API key
|
||||
|
||||
## 3. Install c/ua
|
||||
|
||||
```bash
|
||||
pip install "cua-agent2[all]" cua-computer
|
||||
```
|
||||
|
||||
## 4. Using Computer
|
||||
|
||||
```python
|
||||
from computer import Computer
|
||||
|
||||
async with Computer(
|
||||
os_type="linux",
|
||||
provider_type="cloud",
|
||||
name="your-container-name",
|
||||
api_key="your-api-key"
|
||||
) as computer:
|
||||
# Take screenshot
|
||||
screenshot = await computer.interface.screenshot()
|
||||
|
||||
# Click and type
|
||||
await computer.interface.left_click(100, 100)
|
||||
await computer.interface.type("Hello!")
|
||||
```
|
||||
|
||||
## 5. Using Agent
|
||||
|
||||
```python
|
||||
from agent2 import ComputerAgent
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
max_trajectory_budget=5.0
|
||||
)
|
||||
|
||||
messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
|
||||
|
||||
async for result in agent.run(messages):
|
||||
for item in result["output"]:
|
||||
if item["type"] == "message":
|
||||
print(item["content"][0]["text"])
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Explore the [SDK documentation](/docs/sdk) for advanced features
|
||||
- Learn about [trajectory tracking and callbacks](/docs/concepts)
|
||||
- Join our [Discord community](https://discord.com/invite/mVnXXpdE85) for support
|
||||
43
docs/content/docs/home/quickstart-ui.mdx
Normal file
43
docs/content/docs/home/quickstart-ui.mdx
Normal file
@@ -0,0 +1,43 @@
|
||||
---
|
||||
title: Quickstart (GUI)
|
||||
description: Get started with the c/ua Agent UI in 5 steps
|
||||
icon: Rocket
|
||||
---
|
||||
|
||||
Get up and running with the c/ua Agent UI in 5 simple steps.
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
c/ua combines Computer (interface) + Agent (AI) for automating desktop apps. The Agent UI provides a simple chat interface to control your remote computer using natural language.
|
||||
|
||||
## 2. Create Your First c/ua Container
|
||||
|
||||
1. Go to [trycua.com/signin](https://www.trycua.com/signin)
|
||||
2. Navigate to **Dashboard > Containers > Create Instance**
|
||||
3. Create a **Medium, Ubuntu 22** container
|
||||
4. Note your container name and API key
|
||||
|
||||
## 3. Install c/ua
|
||||
|
||||
```bash
|
||||
pip install "cua-agent2[all]" cua-computer
|
||||
```
|
||||
|
||||
## 4. Run the Agent UI
|
||||
|
||||
```bash
|
||||
python -m agent.ui
|
||||
```
|
||||
|
||||
## 5. Start Chatting
|
||||
|
||||
Open your browser to the displayed URL and start chatting with your computer-using agent.
|
||||
|
||||
You can ask your agent to perform actions like:
|
||||
- "Open Firefox and go to github.com"
|
||||
- "Take a screenshot and tell me what's on the screen"
|
||||
- "Type 'Hello world' into the terminal"
|
||||
|
||||
---
|
||||
|
||||
For advanced Python usage, see the [Quickstart for Developers](/docs/quickstart-devs).
|
||||
Reference in New Issue
Block a user