mirror of
https://github.com/trycua/computer.git
synced 2026-01-04 04:19:57 -06:00
Merge branch 'main' into feat/add-desktop-commands
This commit is contained in:
@@ -3,7 +3,13 @@ title: Agent Loops
|
||||
description: Supported computer-using agent loops and models
|
||||
---
|
||||
|
||||
<Callout>A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/agent_nb.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.</Callout>
|
||||
<Callout>
|
||||
A corresponding{' '}
|
||||
<a href="https://github.com/trycua/cua/blob/main/notebooks/agent_nb.ipynb" target="_blank">
|
||||
Jupyter Notebook
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
An agent can be thought of as a loop - it generates actions, executes them, and repeats until done:
|
||||
|
||||
@@ -102,7 +108,7 @@ messages = [
|
||||
"content": "Take a screenshot and describe what you see"
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"role": "assistant",
|
||||
"content": "I'll take a screenshot for you."
|
||||
}
|
||||
]
|
||||
|
||||
@@ -4,13 +4,14 @@ description: Computer Agent SDK benchmarks for agentic GUI tasks
|
||||
---
|
||||
|
||||
The benchmark system evaluates models on GUI grounding tasks, specifically agent loop success rate and click prediction accuracy. It supports both:
|
||||
|
||||
- **Computer Agent SDK providers** (using model strings like `"huggingface-local/HelloKKMe/GTA1-7B"`)
|
||||
- **Reference agent implementations** (custom model classes implementing the `ModelProtocol`)
|
||||
|
||||
## Available Benchmarks
|
||||
|
||||
- **[ScreenSpot-v2](./benchmarks/screenspot-v2)** - Standard resolution GUI grounding
|
||||
- **[ScreenSpot-Pro](./benchmarks/screenspot-pro)** - High-resolution GUI grounding
|
||||
- **[ScreenSpot-Pro](./benchmarks/screenspot-pro)** - High-resolution GUI grounding
|
||||
- **[Interactive Testing](./benchmarks/interactive)** - Real-time testing and visualization
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -8,6 +8,7 @@ The Cua agent framework uses benchmarks to test the performance of supported mod
|
||||
## Benchmark Types
|
||||
|
||||
Computer-Agent benchmarks evaluate two key capabilities:
|
||||
|
||||
- **Plan Generation**: Breaking down complex tasks into a sequence of actions
|
||||
- **Coordinate Generation**: Predicting precise click locations on GUI elements
|
||||
|
||||
@@ -31,7 +32,7 @@ agent.run("Open Firefox and go to github.com")
|
||||
|
||||
### Coordinate Generation Only
|
||||
|
||||
**[GUI Agent Grounding Leaderboard](https://gui-agent.github.io/grounding-leaderboard/)** - Benchmark for click prediction accuracy
|
||||
**[GUI Agent Grounding Leaderboard](https://gui-agent.github.io/grounding-leaderboard/)** - Benchmark for click prediction accuracy
|
||||
|
||||
This leaderboard tests models that specialize in finding exactly where to click on screen elements, but needs to be told what specific action to take.
|
||||
|
||||
@@ -41,7 +42,7 @@ This leaderboard tests models that specialize in finding exactly where to click
|
||||
agent = ComputerAgent("huggingface-local/HelloKKMe/GTA1-7B", tools=[computer])
|
||||
agent.predict_click("find the button to open the settings") # (27, 450)
|
||||
# This will raise an error:
|
||||
# agent.run("Open Firefox and go to github.com")
|
||||
# agent.run("Open Firefox and go to github.com")
|
||||
```
|
||||
|
||||
### Composed Agent
|
||||
|
||||
@@ -5,4 +5,4 @@ description: Benchmark ComputerAgent on OSWorld tasks using HUD
|
||||
|
||||
OSWorld-Verified is a curated subset of OSWorld tasks that can be run using the HUD framework.
|
||||
|
||||
Use [ComputerAgent with HUD](../integrations/hud) to benchmark on these tasks.
|
||||
Use [ComputerAgent with HUD](../integrations/hud) to benchmark on these tasks.
|
||||
|
||||
@@ -18,8 +18,8 @@ python ss-pro.py --samples 50
|
||||
|
||||
## Results
|
||||
|
||||
| Model | Accuracy | Failure Rate | Samples |
|
||||
|-------|----------|--------------|---------|
|
||||
| Coming Soon | - | - | - |
|
||||
| Model | Accuracy | Failure Rate | Samples |
|
||||
| ----------- | -------- | ------------ | ------- |
|
||||
| Coming Soon | - | - | - |
|
||||
|
||||
Results will be populated after running benchmarks with various models.
|
||||
|
||||
@@ -18,8 +18,8 @@ python ss-v2.py --samples 100
|
||||
|
||||
## Results
|
||||
|
||||
| Model | Accuracy | Failure Rate | Samples |
|
||||
|-------|----------|--------------|---------|
|
||||
| Coming Soon | - | - | - |
|
||||
| Model | Accuracy | Failure Rate | Samples |
|
||||
| ----------- | -------- | ------------ | ------- |
|
||||
| Coming Soon | - | - | - |
|
||||
|
||||
Results will be populated after running benchmarks with various models.
|
||||
|
||||
@@ -10,30 +10,39 @@ Callbacks provide hooks into the agent lifecycle for extensibility. They're call
|
||||
## Callback Lifecycle
|
||||
|
||||
### 1. `on_run_start(kwargs, old_items)`
|
||||
|
||||
Called once when agent run begins. Initialize tracking, logging, or state.
|
||||
|
||||
### 2. `on_run_continue(kwargs, old_items, new_items)` → bool
|
||||
|
||||
Called before each iteration. Return `False` to stop execution (e.g., budget limits).
|
||||
|
||||
### 3. `on_llm_start(messages)` → messages
|
||||
|
||||
Preprocess messages before LLM call. Use for PII anonymization, image retention.
|
||||
|
||||
### 4. `on_api_start(kwargs)`
|
||||
|
||||
Called before each LLM API call.
|
||||
|
||||
### 5. `on_api_end(kwargs, result)`
|
||||
|
||||
Called after each LLM API call completes.
|
||||
|
||||
### 6. `on_usage(usage)`
|
||||
|
||||
Called when usage information is received from LLM.
|
||||
|
||||
### 7. `on_llm_end(messages)` → messages
|
||||
|
||||
Postprocess messages after LLM call. Use for PII deanonymization.
|
||||
|
||||
### 8. `on_responses(kwargs, responses)`
|
||||
|
||||
Called when responses are received from agent loop.
|
||||
|
||||
### 9. Response-specific hooks:
|
||||
|
||||
- `on_text(item)` - Text messages
|
||||
- `on_computer_call_start(item)` - Before computer actions
|
||||
- `on_computer_call_end(item, result)` - After computer actions
|
||||
@@ -42,4 +51,5 @@ Called when responses are received from agent loop.
|
||||
- `on_screenshot(screenshot, name)` - When screenshots are taken
|
||||
|
||||
### 10. `on_run_end(kwargs, old_items, new_items)`
|
||||
Called when agent run completes. Finalize tracking, save trajectories.
|
||||
|
||||
Called when agent run completes. Finalize tracking, save trajectories.
|
||||
|
||||
@@ -36,6 +36,7 @@ agent = ComputerAgent(
|
||||
```
|
||||
|
||||
**Or with options:**
|
||||
|
||||
```python
|
||||
# Advanced budget configuration
|
||||
agent = ComputerAgent(
|
||||
|
||||
@@ -15,7 +15,7 @@ Built-in callbacks can be used as follows:
|
||||
```python
|
||||
from agent.callbacks import (
|
||||
ImageRetentionCallback,
|
||||
TrajectorySaverCallback,
|
||||
TrajectorySaverCallback,
|
||||
BudgetManagerCallback,
|
||||
LoggingCallback
|
||||
)
|
||||
@@ -52,12 +52,12 @@ class CustomCallback(AsyncCallbackHandler):
|
||||
"""Preprocess messages before LLM call"""
|
||||
# Add custom preprocessing logic
|
||||
return messages
|
||||
|
||||
|
||||
async def on_llm_end(self, messages):
|
||||
"""Postprocess messages after LLM call"""
|
||||
# Add custom postprocessing logic
|
||||
return messages
|
||||
|
||||
|
||||
async def on_usage(self, usage):
|
||||
"""Track usage information"""
|
||||
print(f"Tokens used: {usage.total_tokens}")
|
||||
|
||||
@@ -18,7 +18,7 @@ agent = ComputerAgent(
|
||||
tools=[computer],
|
||||
callbacks=[
|
||||
LoggingCallback(
|
||||
logger=logging.getLogger("cua"),
|
||||
logger=logging.getLogger("cua"),
|
||||
level=logging.INFO
|
||||
)
|
||||
]
|
||||
@@ -47,7 +47,7 @@ class CustomLogger(AsyncCallbackHandler):
|
||||
def __init__(self, logger_name="agent"):
|
||||
self.logger = logging.getLogger(logger_name)
|
||||
self.logger.setLevel(logging.INFO)
|
||||
|
||||
|
||||
# Add console handler
|
||||
handler = logging.StreamHandler()
|
||||
formatter = logging.Formatter(
|
||||
@@ -55,18 +55,18 @@ class CustomLogger(AsyncCallbackHandler):
|
||||
)
|
||||
handler.setFormatter(formatter)
|
||||
self.logger.addHandler(handler)
|
||||
|
||||
|
||||
async def on_run_start(self, kwargs, old_items):
|
||||
self.logger.info(f"Agent run started with model: {kwargs.get('model')}")
|
||||
|
||||
|
||||
async def on_computer_call_start(self, item):
|
||||
action = item.get('action', {})
|
||||
self.logger.info(f"Computer action: {action.get('type')}")
|
||||
|
||||
|
||||
async def on_usage(self, usage):
|
||||
cost = usage.get('response_cost', 0)
|
||||
self.logger.info(f"API call cost: ${cost:.4f}")
|
||||
|
||||
|
||||
async def on_run_end(self, kwargs, old_items, new_items):
|
||||
self.logger.info("Agent run completed")
|
||||
|
||||
@@ -81,6 +81,7 @@ agent = ComputerAgent(
|
||||
## Available Hooks
|
||||
|
||||
Log any agent event using these callback methods:
|
||||
|
||||
- `on_run_start/end` - Run lifecycle
|
||||
- `on_computer_call_start/end` - Computer actions
|
||||
- `on_api_start/end` - LLM API calls
|
||||
|
||||
@@ -40,6 +40,7 @@ View trajectories in the browser at:
|
||||
**[trycua.com/trajectory-viewer](http://trycua.com/trajectory-viewer)**
|
||||
|
||||
The viewer provides:
|
||||
|
||||
- Interactive conversation replay
|
||||
- Screenshot galleries
|
||||
- No data collection
|
||||
@@ -47,11 +48,13 @@ The viewer provides:
|
||||
## Trajectory Structure
|
||||
|
||||
Trajectories are saved with:
|
||||
|
||||
- Complete conversation history
|
||||
- Usage statistics and costs
|
||||
- Timestamps and metadata
|
||||
- Screenshots and computer actions
|
||||
|
||||
Each trajectory contains:
|
||||
|
||||
- **metadata.json**: Run info, timestamps, usage stats (`total_tokens`, `response_cost`)
|
||||
- **turn_000/**: Turn-by-turn conversation history (api calls, responses, computer calls, screenshots)
|
||||
|
||||
@@ -53,67 +53,67 @@ from typing import Literal, List, Dict, Union, Optional
|
||||
|
||||
class MyCustomComputer(AsyncComputerHandler):
|
||||
"""Custom computer handler implementation."""
|
||||
|
||||
|
||||
def __init__(self):
|
||||
# Initialize your custom computer interface here
|
||||
pass
|
||||
|
||||
# ==== Computer-Use-Preview Action Space ====
|
||||
|
||||
# ==== Computer-Use-Preview Action Space ====
|
||||
|
||||
async def get_environment(self) -> Literal["windows", "mac", "linux", "browser"]:
|
||||
"""Get the current environment type."""
|
||||
...
|
||||
|
||||
|
||||
async def get_dimensions(self) -> tuple[int, int]:
|
||||
"""Get screen dimensions as (width, height)."""
|
||||
...
|
||||
|
||||
|
||||
async def screenshot(self) -> str:
|
||||
"""Take a screenshot and return as base64 string."""
|
||||
...
|
||||
|
||||
|
||||
async def click(self, x: int, y: int, button: str = "left") -> None:
|
||||
"""Click at coordinates with specified button."""
|
||||
...
|
||||
|
||||
|
||||
async def double_click(self, x: int, y: int) -> None:
|
||||
"""Double click at coordinates."""
|
||||
...
|
||||
|
||||
|
||||
async def scroll(self, x: int, y: int, scroll_x: int, scroll_y: int) -> None:
|
||||
"""Scroll at coordinates with specified scroll amounts."""
|
||||
...
|
||||
|
||||
|
||||
async def type(self, text: str) -> None:
|
||||
"""Type text."""
|
||||
...
|
||||
|
||||
|
||||
async def wait(self, ms: int = 1000) -> None:
|
||||
"""Wait for specified milliseconds."""
|
||||
...
|
||||
|
||||
|
||||
async def move(self, x: int, y: int) -> None:
|
||||
"""Move cursor to coordinates."""
|
||||
...
|
||||
|
||||
|
||||
async def keypress(self, keys: Union[List[str], str]) -> None:
|
||||
"""Press key combination."""
|
||||
...
|
||||
|
||||
|
||||
async def drag(self, path: List[Dict[str, int]]) -> None:
|
||||
"""Drag along specified path."""
|
||||
...
|
||||
|
||||
|
||||
async def get_current_url(self) -> str:
|
||||
"""Get current URL (for browser environments)."""
|
||||
...
|
||||
|
||||
# ==== Anthropic Action Space ====
|
||||
|
||||
# ==== Anthropic Action Space ====
|
||||
|
||||
async def left_mouse_down(self, x: Optional[int] = None, y: Optional[int] = None) -> None:
|
||||
"""Left mouse down at coordinates."""
|
||||
...
|
||||
|
||||
|
||||
async def left_mouse_up(self, x: Optional[int] = None, y: Optional[int] = None) -> None:
|
||||
"""Left mouse up at coordinates."""
|
||||
...
|
||||
@@ -127,4 +127,4 @@ agent = ComputerAgent(
|
||||
)
|
||||
|
||||
await agent.run("Take a screenshot and click at coordinates 100, 200")
|
||||
```
|
||||
```
|
||||
|
||||
@@ -2,7 +2,16 @@
|
||||
title: Customizing Your ComputerAgent
|
||||
---
|
||||
|
||||
<Callout>A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/customizing_computeragent.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.</Callout>
|
||||
<Callout>
|
||||
A corresponding{' '}
|
||||
<a
|
||||
href="https://github.com/trycua/cua/blob/main/notebooks/customizing_computeragent.ipynb"
|
||||
target="_blank"
|
||||
>
|
||||
Jupyter Notebook
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
The `ComputerAgent` interface provides an easy proxy to any computer-using model configuration, and it is a powerful framework for extending and building your own agentic systems.
|
||||
|
||||
@@ -118,4 +127,4 @@ await run_single_task(
|
||||
# tools=[your_custom_function],
|
||||
# callbacks=[YourCustomCallback()],
|
||||
)
|
||||
```
|
||||
```
|
||||
|
||||
@@ -3,7 +3,13 @@ title: HUD Evals
|
||||
description: Use ComputerAgent with HUD for benchmarking and evaluation
|
||||
---
|
||||
|
||||
<Callout>A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/eval_osworld.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.</Callout>
|
||||
<Callout>
|
||||
A corresponding{' '}
|
||||
<a href="https://github.com/trycua/cua/blob/main/notebooks/eval_osworld.ipynb" target="_blank">
|
||||
Jupyter Notebook
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
The HUD integration allows an agent to be benchmarked using the [HUD framework](https://www.hud.so/). Through the HUD integration, the agent controls a computer inside HUD, where tests are run to evaluate the success of each task.
|
||||
|
||||
@@ -120,8 +126,8 @@ Both single-task and full-dataset runs share a common set of configuration optio
|
||||
HUD provides multiple benchmark datasets for realistic evaluation.
|
||||
|
||||
1. **[OSWorld-Verified](/agent-sdk/benchmarks/osworld-verified)** – Benchmark on 369+ real-world desktop tasks across Chrome, LibreOffice, GIMP, VS Code, etc.
|
||||
*Best for*: evaluating full computer-use agents in realistic environments.
|
||||
*Verified variant*: fixes 300+ issues from earlier versions for reliability.
|
||||
_Best for_: evaluating full computer-use agents in realistic environments.
|
||||
_Verified variant_: fixes 300+ issues from earlier versions for reliability.
|
||||
|
||||
**Coming soon:** SheetBench (spreadsheet automation) and other specialized HUD datasets.
|
||||
|
||||
@@ -129,7 +135,7 @@ See the [HUD docs](https://docs.hud.so/environment-creation) for more eval envir
|
||||
|
||||
## Tips
|
||||
|
||||
* **Debugging:** set `verbosity=2` to see every model call and tool action.
|
||||
* **Performance:** lower `screenshot_delay` for faster runs; raise it if you see race conditions.
|
||||
* **Safety:** always set `max_steps` (defaults to 50) to prevent runaway loops.
|
||||
* **Custom tools:** pass extra `tools=[...]` into the agent config if you need beyond `openai_computer`.
|
||||
- **Debugging:** set `verbosity=2` to see every model call and tool action.
|
||||
- **Performance:** lower `screenshot_delay` for faster runs; raise it if you see race conditions.
|
||||
- **Safety:** always set `max_steps` (defaults to 50) to prevent runaway loops.
|
||||
- **Custom tools:** pass extra `tools=[...]` into the agent config if you need beyond `openai_computer`.
|
||||
|
||||
@@ -20,7 +20,9 @@ This guide lists **breaking changes** when migrating from the original `Computer
|
||||
## Usage Examples: Old vs New
|
||||
|
||||
### 1. Anthropic Loop
|
||||
|
||||
**Old:**
|
||||
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
@@ -31,7 +33,9 @@ async with Computer() as computer:
|
||||
async for result in agent.run("Take a screenshot"):
|
||||
print(result)
|
||||
```
|
||||
|
||||
**New:**
|
||||
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
@@ -46,7 +50,9 @@ async with Computer() as computer:
|
||||
```
|
||||
|
||||
### 2. OpenAI Loop
|
||||
|
||||
**Old:**
|
||||
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
@@ -57,7 +63,9 @@ async with Computer() as computer:
|
||||
async for result in agent.run("Take a screenshot"):
|
||||
print(result)
|
||||
```
|
||||
|
||||
**New:**
|
||||
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
@@ -72,7 +80,9 @@ async with Computer() as computer:
|
||||
```
|
||||
|
||||
### 3. UI-TARS Loop
|
||||
|
||||
**Old:**
|
||||
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
@@ -83,7 +93,9 @@ async with Computer() as computer:
|
||||
async for result in agent.run("Take a screenshot"):
|
||||
print(result)
|
||||
```
|
||||
|
||||
**New:**
|
||||
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
@@ -98,7 +110,9 @@ async with Computer() as computer:
|
||||
```
|
||||
|
||||
### 4. Omni Loop
|
||||
|
||||
**Old:**
|
||||
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
@@ -109,7 +123,9 @@ async with Computer() as computer:
|
||||
async for result in agent.run("Take a screenshot"):
|
||||
print(result)
|
||||
```
|
||||
|
||||
**New:**
|
||||
|
||||
```python
|
||||
async with Computer() as computer:
|
||||
agent = ComputerAgent(
|
||||
|
||||
@@ -26,7 +26,7 @@ agent = ComputerAgent(
|
||||
When using Anthropic-based CUAs (Claude models), setting `use_prompt_caching=True` will automatically add `{ "cache_control": "ephemeral" }` to your messages. This enables prompt caching for the session and can speed up repeated runs with the same prompt.
|
||||
|
||||
<Callout title="Note">
|
||||
This argument is only required for Anthropic CUAs. For other providers, it is ignored.
|
||||
This argument is only required for Anthropic CUAs. For other providers, it is ignored.
|
||||
</Callout>
|
||||
|
||||
## OpenAI Provider
|
||||
@@ -44,13 +44,16 @@ agent = ComputerAgent(
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
- For Anthropic: Adds `{ "cache_control": "ephemeral" }` to messages when enabled.
|
||||
- For OpenAI: Caching is automatic for long prompts; the argument is ignored.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Enable for Anthropic CUAs if you want to avoid reprocessing the same prompt in repeated or iterative tasks.
|
||||
- Not needed for OpenAI models unless you want explicit ephemeral cache control (not required for most users).
|
||||
|
||||
## See Also
|
||||
|
||||
- [Agent Loops](./agent-loops)
|
||||
- [Migration Guide](./migration-guide)
|
||||
|
||||
@@ -59,7 +59,7 @@ Combine state-of-the-art grounding with powerful reasoning:
|
||||
|
||||
```python
|
||||
agent = ComputerAgent(
|
||||
"huggingface-local/HelloKKMe/GTA1-7B+anthropic/claude-3-5-sonnet-20241022",
|
||||
"huggingface-local/HelloKKMe/GTA1-7B+anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer]
|
||||
)
|
||||
|
||||
|
||||
@@ -65,6 +65,7 @@ async for _ in agent.run("Click on the search bar and type 'hello world'"):
|
||||
## InternVL 3.5
|
||||
|
||||
InternVL 3.5 family:
|
||||
|
||||
- `huggingface-local/OpenGVLab/InternVL3_5-{1B,2B,4B,8B,...}`
|
||||
|
||||
```python
|
||||
@@ -76,6 +77,7 @@ async for _ in agent.run("Open Firefox and navigate to github.com"):
|
||||
## Qwen3 VL
|
||||
|
||||
Qwen3 VL family:
|
||||
|
||||
- `openrouter/qwen/qwen3-vl-235b-a22b-instruct`
|
||||
|
||||
```python
|
||||
|
||||
@@ -17,9 +17,11 @@ All models that support `ComputerAgent.run()` also support `ComputerAgent.predic
|
||||
- Claude 3.5: `claude-3-5-sonnet-20241022`
|
||||
|
||||
### OpenAI CUA Preview
|
||||
|
||||
- Computer-use-preview: `computer-use-preview`
|
||||
|
||||
### UI-TARS 1.5 (Unified VLM with grounding support)
|
||||
|
||||
- `huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B`
|
||||
- `huggingface/ByteDance-Seed/UI-TARS-1.5-7B` (requires TGI endpoint)
|
||||
|
||||
@@ -28,15 +30,19 @@ All models that support `ComputerAgent.run()` also support `ComputerAgent.predic
|
||||
These models are optimized specifically for click prediction and UI element grounding:
|
||||
|
||||
### OpenCUA
|
||||
|
||||
- `huggingface-local/xlangai/OpenCUA-{7B,32B}`
|
||||
|
||||
### GTA1 Family
|
||||
|
||||
- `huggingface-local/HelloKKMe/GTA1-{7B,32B,72B}`
|
||||
|
||||
### Holo 1.5 Family
|
||||
|
||||
- `huggingface-local/Hcompany/Holo1.5-{3B,7B,72B}`
|
||||
|
||||
### InternVL 3.5 Family
|
||||
|
||||
- `huggingface-local/OpenGVLab/InternVL3_5-{1B,2B,4B,8B,...}`
|
||||
|
||||
### OmniParser (OCR)
|
||||
|
||||
@@ -5,6 +5,7 @@ title: Supported Model Providers
|
||||
## Supported Models
|
||||
|
||||
### Anthropic Claude (Computer Use API)
|
||||
|
||||
```python
|
||||
model="anthropic/claude-3-5-sonnet-20241022"
|
||||
model="anthropic/claude-3-7-sonnet-20250219"
|
||||
@@ -13,20 +14,23 @@ model="anthropic/claude-sonnet-4-20250514"
|
||||
```
|
||||
|
||||
### OpenAI Computer Use Preview
|
||||
|
||||
```python
|
||||
model="openai/computer-use-preview"
|
||||
```
|
||||
|
||||
### UI-TARS (Local or Huggingface Inference)
|
||||
|
||||
```python
|
||||
model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"
|
||||
model="ollama_chat/0000/ui-tars-1.5-7b"
|
||||
```
|
||||
|
||||
### Omniparser + Any LLM
|
||||
|
||||
```python
|
||||
model="omniparser+ollama_chat/mistral-small3.2"
|
||||
model="omniparser+vertex_ai/gemini-pro"
|
||||
model="omniparser+anthropic/claude-3-5-sonnet-20241022"
|
||||
model="omniparser+openai/gpt-4o"
|
||||
```
|
||||
```
|
||||
|
||||
@@ -51,7 +51,7 @@ class UsageTrackerCallback(AsyncCallbackHandler):
|
||||
print("Usage update:", usage)
|
||||
|
||||
agent = ComputerAgent(
|
||||
...,
|
||||
...,
|
||||
callbacks=[UsageTrackerCallback()]
|
||||
)
|
||||
```
|
||||
@@ -59,5 +59,6 @@ agent = ComputerAgent(
|
||||
See also: [Budget Manager Callbacks](./callbacks/cost-saving)
|
||||
|
||||
## See Also
|
||||
|
||||
- [Prompt Caching](./prompt-caching)
|
||||
- [Callbacks](./callbacks)
|
||||
|
||||
@@ -5,7 +5,6 @@ description: Manage your Cua Cloud sandboxes (VMs) via Python SDK or HTTP API
|
||||
|
||||
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
|
||||
|
||||
|
||||
Using the Cua Cloud API, you can manage your Cua Cloud sandboxes (VMs) with Python or HTTP (curl).
|
||||
|
||||
All examples require a CUA API key. You can obtain one from the [Dashboard](https://www.cua.ai/dashboard/keys).
|
||||
@@ -17,107 +16,111 @@ All examples require a CUA API key. You can obtain one from the [Dashboard](http
|
||||
<Tabs items={["Python", "curl"]}>
|
||||
<Tab value="Python">
|
||||
|
||||
```python
|
||||
import os
|
||||
import asyncio
|
||||
from computer.providers.cloud.provider import CloudProvider
|
||||
```python
|
||||
import os
|
||||
import asyncio
|
||||
from computer.providers.cloud.provider import CloudProvider
|
||||
|
||||
async def main():
|
||||
api_key = os.getenv("CUA_API_KEY") or "your-api-key"
|
||||
# Optional: point to a different API base
|
||||
# os.environ["CUA_API_BASE"] = "https://api.cua.ai"
|
||||
async def main():
|
||||
api_key = os.getenv("CUA_API_KEY") or "your-api-key"
|
||||
# Optional: point to a different API base
|
||||
# os.environ["CUA_API_BASE"] = "https://api.cua.ai"
|
||||
|
||||
provider = CloudProvider(api_key=api_key, verbose=False)
|
||||
async with provider:
|
||||
vms = await provider.list_vms()
|
||||
for vm in vms:
|
||||
print({
|
||||
"name": vm["name"],
|
||||
"status": vm["status"],
|
||||
"api_url": vm.get("api_url"),
|
||||
"vnc_url": vm.get("vnc_url"),
|
||||
})
|
||||
provider = CloudProvider(api_key=api_key, verbose=False)
|
||||
async with provider:
|
||||
vms = await provider.list_vms()
|
||||
for vm in vms:
|
||||
print({
|
||||
"name": vm["name"],
|
||||
"status": vm["status"],
|
||||
"api_url": vm.get("api_url"),
|
||||
"vnc_url": vm.get("vnc_url"),
|
||||
})
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="curl">
|
||||
|
||||
```bash
|
||||
curl -H "Authorization: Bearer $CUA_API_KEY" \
|
||||
"https://api.cua.ai/v1/vms"
|
||||
```
|
||||
```bash
|
||||
curl -H "Authorization: Bearer $CUA_API_KEY" \
|
||||
"https://api.cua.ai/v1/vms"
|
||||
```
|
||||
|
||||
Responses:
|
||||
- 200: Array of minimal VM objects with fields `{ name, password, status }`
|
||||
- 401: Unauthorized (missing/invalid API key)
|
||||
Responses:
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"name": "s-windows-x4snp46ebf",
|
||||
"password": "49b8daa3",
|
||||
"status": "running"
|
||||
}
|
||||
]
|
||||
```
|
||||
- 200: Array of minimal VM objects with fields `{ name, password, status }`
|
||||
- 401: Unauthorized (missing/invalid API key)
|
||||
|
||||
Status values:
|
||||
```json
|
||||
[
|
||||
{
|
||||
"name": "s-windows-x4snp46ebf",
|
||||
"password": "49b8daa3",
|
||||
"status": "running"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
- `pending`: VM deployment in progress
|
||||
- `running`: VM is active and accessible
|
||||
- `stopped`: VM is stopped but not terminated
|
||||
- `terminated`: VM has been permanently destroyed
|
||||
- `failed`: VM deployment or operation failed
|
||||
Status values:
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
- `pending`: VM deployment in progress
|
||||
- `running`: VM is active and accessible
|
||||
- `stopped`: VM is stopped but not terminated
|
||||
- `terminated`: VM has been permanently destroyed
|
||||
- `failed`: VM deployment or operation failed
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Start a VM
|
||||
|
||||
Provide the VM name you want to start.
|
||||
|
||||
<Tabs items={["Python", "curl"]}>
|
||||
<Tab value="Python">
|
||||
|
||||
```python
|
||||
import os
|
||||
import asyncio
|
||||
from computer.providers.cloud.provider import CloudProvider
|
||||
```python
|
||||
import os
|
||||
import asyncio
|
||||
from computer.providers.cloud.provider import CloudProvider
|
||||
|
||||
async def main():
|
||||
api_key = os.getenv("CUA_API_KEY") or "your-api-key"
|
||||
name = "my-vm-name" # e.g., "m-linux-96lcxd2c2k"
|
||||
async def main():
|
||||
api_key = os.getenv("CUA_API_KEY") or "your-api-key"
|
||||
name = "my-vm-name" # e.g., "m-linux-96lcxd2c2k"
|
||||
|
||||
provider = CloudProvider(api_key=api_key)
|
||||
async with provider:
|
||||
resp = await provider.run_vm(name)
|
||||
print(resp) # { "name": name, "status": "starting" }
|
||||
provider = CloudProvider(api_key=api_key)
|
||||
async with provider:
|
||||
resp = await provider.run_vm(name)
|
||||
print(resp) # { "name": name, "status": "starting" }
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="curl">
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
-H "Authorization: Bearer $CUA_API_KEY" \
|
||||
"https://api.cua.ai/v1/vms/my-vm-name/start" -i
|
||||
```
|
||||
```bash
|
||||
curl -X POST \
|
||||
-H "Authorization: Bearer $CUA_API_KEY" \
|
||||
"https://api.cua.ai/v1/vms/my-vm-name/start" -i
|
||||
```
|
||||
|
||||
Responses:
|
||||
- 204: No Content (start accepted)
|
||||
- 401: Unauthorized (missing/invalid API key)
|
||||
- 404: VM not found or not owned by the user
|
||||
Responses:
|
||||
|
||||
```text
|
||||
HTTP/1.1 204 No Content
|
||||
```
|
||||
- 204: No Content (start accepted)
|
||||
- 401: Unauthorized (missing/invalid API key)
|
||||
- 404: VM not found or not owned by the user
|
||||
|
||||
```text
|
||||
HTTP/1.1 204 No Content
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
@@ -125,46 +128,48 @@ Provide the VM name you want to start.
|
||||
---
|
||||
|
||||
## Stop a VM
|
||||
|
||||
Stops the VM asynchronously.
|
||||
|
||||
<Tabs items={["Python", "curl"]}>
|
||||
<Tab value="Python">
|
||||
|
||||
```python
|
||||
import os
|
||||
import asyncio
|
||||
from computer.providers.cloud.provider import CloudProvider
|
||||
```python
|
||||
import os
|
||||
import asyncio
|
||||
from computer.providers.cloud.provider import CloudProvider
|
||||
|
||||
async def main():
|
||||
api_key = os.getenv("CUA_API_KEY") or "your-api-key"
|
||||
name = "my-vm-name"
|
||||
async def main():
|
||||
api_key = os.getenv("CUA_API_KEY") or "your-api-key"
|
||||
name = "my-vm-name"
|
||||
|
||||
provider = CloudProvider(api_key=api_key)
|
||||
async with provider:
|
||||
resp = await provider.stop_vm(name)
|
||||
print(resp) # { "name": name, "status": "stopping" }
|
||||
provider = CloudProvider(api_key=api_key)
|
||||
async with provider:
|
||||
resp = await provider.stop_vm(name)
|
||||
print(resp) # { "name": name, "status": "stopping" }
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="curl">
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
-H "Authorization: Bearer $CUA_API_KEY" \
|
||||
"https://api.cua.ai/v1/vms/my-vm-name/stop"
|
||||
```
|
||||
```bash
|
||||
curl -X POST \
|
||||
-H "Authorization: Bearer $CUA_API_KEY" \
|
||||
"https://api.cua.ai/v1/vms/my-vm-name/stop"
|
||||
```
|
||||
|
||||
Responses:
|
||||
- 202: Accepted with `{ "status": "stopping" }`
|
||||
- 401: Unauthorized (missing/invalid API key)
|
||||
- 404: VM not found or not owned by the user
|
||||
Responses:
|
||||
|
||||
```json
|
||||
{ "status": "stopping" }
|
||||
```
|
||||
- 202: Accepted with `{ "status": "stopping" }`
|
||||
- 401: Unauthorized (missing/invalid API key)
|
||||
- 404: VM not found or not owned by the user
|
||||
|
||||
```json
|
||||
{ "status": "stopping" }
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
@@ -172,46 +177,48 @@ Stops the VM asynchronously.
|
||||
---
|
||||
|
||||
## Restart a VM
|
||||
|
||||
Restarts the VM asynchronously.
|
||||
|
||||
<Tabs items={["Python", "curl"]}>
|
||||
<Tab value="Python">
|
||||
|
||||
```python
|
||||
import os
|
||||
import asyncio
|
||||
from computer.providers.cloud.provider import CloudProvider
|
||||
```python
|
||||
import os
|
||||
import asyncio
|
||||
from computer.providers.cloud.provider import CloudProvider
|
||||
|
||||
async def main():
|
||||
api_key = os.getenv("CUA_API_KEY") or "your-api-key"
|
||||
name = "my-vm-name"
|
||||
async def main():
|
||||
api_key = os.getenv("CUA_API_KEY") or "your-api-key"
|
||||
name = "my-vm-name"
|
||||
|
||||
provider = CloudProvider(api_key=api_key)
|
||||
async with provider:
|
||||
resp = await provider.restart_vm(name)
|
||||
print(resp) # { "name": name, "status": "restarting" }
|
||||
provider = CloudProvider(api_key=api_key)
|
||||
async with provider:
|
||||
resp = await provider.restart_vm(name)
|
||||
print(resp) # { "name": name, "status": "restarting" }
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="curl">
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
-H "Authorization: Bearer $CUA_API_KEY" \
|
||||
"https://api.cua.ai/v1/vms/my-vm-name/restart"
|
||||
```
|
||||
```bash
|
||||
curl -X POST \
|
||||
-H "Authorization: Bearer $CUA_API_KEY" \
|
||||
"https://api.cua.ai/v1/vms/my-vm-name/restart"
|
||||
```
|
||||
|
||||
Responses:
|
||||
- 202: Accepted with `{ "status": "restarting" }`
|
||||
- 401: Unauthorized (missing/invalid API key)
|
||||
- 404: VM not found or not owned by the user
|
||||
Responses:
|
||||
|
||||
```json
|
||||
{ "status": "restarting" }
|
||||
```
|
||||
- 202: Accepted with `{ "status": "restarting" }`
|
||||
- 401: Unauthorized (missing/invalid API key)
|
||||
- 404: VM not found or not owned by the user
|
||||
|
||||
```json
|
||||
{ "status": "restarting" }
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
@@ -219,42 +226,44 @@ Restarts the VM asynchronously.
|
||||
---
|
||||
|
||||
## Query a VM by name
|
||||
|
||||
Query the computer-server running on the VM. Useful for checking details like status or OS type.
|
||||
|
||||
<Tabs items={["Python", "curl"]}>
|
||||
<Tab value="Python">
|
||||
|
||||
```python
|
||||
import os
|
||||
import asyncio
|
||||
from computer.providers.cloud.provider import CloudProvider
|
||||
```python
|
||||
import os
|
||||
import asyncio
|
||||
from computer.providers.cloud.provider import CloudProvider
|
||||
|
||||
async def main():
|
||||
api_key = os.getenv("CUA_API_KEY") or "your-api-key"
|
||||
name = "my-vm-name"
|
||||
async def main():
|
||||
api_key = os.getenv("CUA_API_KEY") or "your-api-key"
|
||||
name = "my-vm-name"
|
||||
|
||||
provider = CloudProvider(api_key=api_key)
|
||||
async with provider:
|
||||
info = await provider.get_vm(name)
|
||||
print(info)
|
||||
provider = CloudProvider(api_key=api_key)
|
||||
async with provider:
|
||||
info = await provider.get_vm(name)
|
||||
print(info)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="curl">
|
||||
|
||||
```bash
|
||||
curl "https://my-vm-name.containers.cloud.cua.ai:8443/status"
|
||||
```
|
||||
```bash
|
||||
curl "https://my-vm-name.containers.cloud.cua.ai:8443/status"
|
||||
```
|
||||
|
||||
Responses:
|
||||
- 200: Server available
|
||||
Responses:
|
||||
|
||||
```json
|
||||
{ "status": "ok", "os_type": "linux", "features": ["agent"] }
|
||||
```
|
||||
- 200: Server available
|
||||
|
||||
```json
|
||||
{ "status": "ok", "os_type": "linux", "features": ["agent"] }
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
@@ -13,12 +13,20 @@ Execute shell commands and get detailed results:
|
||||
|
||||
<Tabs items={['Python', 'TypeScript']}>
|
||||
<Tab value="Python">
|
||||
```python # Run shell command result = await computer.interface.run_command(cmd) #
|
||||
result.stdout, result.stderr, result.returncode ```
|
||||
|
||||
```python
|
||||
# Run shell command
|
||||
result = await computer.interface.run_command(cmd) # result.stdout, result.stderr, result.returncode
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
```typescript // Run shell command const result = await computer.interface.runCommand(cmd); //
|
||||
result.stdout, result.stderr, result.returncode ```
|
||||
|
||||
```typescript
|
||||
// Run shell command
|
||||
const result = await computer.interface.runCommand(cmd); // result.stdout, result.stderr, result.returncode
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
@@ -28,6 +36,7 @@ Control application launching and windows:
|
||||
|
||||
<Tabs items={['Python', 'TypeScript']}>
|
||||
<Tab value="Python">
|
||||
|
||||
```python
|
||||
# Launch applications
|
||||
await computer.interface.launch("xfce4-terminal")
|
||||
@@ -52,6 +61,7 @@ Control application launching and windows:
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
|
||||
```typescript
|
||||
// Launch applications
|
||||
await computer.interface.launch("xfce4-terminal");
|
||||
@@ -83,6 +93,7 @@ Precise mouse control and interaction:
|
||||
|
||||
<Tabs items={['Python', 'TypeScript']}>
|
||||
<Tab value="Python">
|
||||
|
||||
```python
|
||||
# Basic clicks
|
||||
await computer.interface.left_click(x, y) # Left click at coordinates
|
||||
@@ -101,6 +112,7 @@ Precise mouse control and interaction:
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
|
||||
```typescript
|
||||
// Basic clicks
|
||||
await computer.interface.leftClick(x, y); // Left click at coordinates
|
||||
@@ -126,6 +138,7 @@ Text input and key combinations:
|
||||
|
||||
<Tabs items={['Python', 'TypeScript']}>
|
||||
<Tab value="Python">
|
||||
|
||||
```python
|
||||
# Text input
|
||||
await computer.interface.type_text("Hello") # Type text
|
||||
@@ -139,6 +152,7 @@ Text input and key combinations:
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
|
||||
```typescript
|
||||
// Text input
|
||||
await computer.interface.typeText("Hello"); // Type text
|
||||
@@ -159,14 +173,24 @@ Mouse wheel and scrolling control:
|
||||
|
||||
<Tabs items={['Python', 'TypeScript']}>
|
||||
<Tab value="Python">
|
||||
```python # Scrolling await computer.interface.scroll(x, y) # Scroll the mouse wheel await
|
||||
computer.interface.scroll_down(clicks) # Scroll down await computer.interface.scroll_up(clicks)
|
||||
# Scroll up ```
|
||||
|
||||
```python
|
||||
# Scrolling
|
||||
await computer.interface.scroll(x, y) # Scroll the mouse wheel
|
||||
await computer.interface.scroll_down(clicks) # Scroll down
|
||||
await computer.interface.scroll_up(clicks) # Scroll up
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
```typescript // Scrolling await computer.interface.scroll(x, y); // Scroll the mouse wheel
|
||||
await computer.interface.scrollDown(clicks); // Scroll down await
|
||||
computer.interface.scrollUp(clicks); // Scroll up ```
|
||||
|
||||
```typescript
|
||||
// Scrolling
|
||||
await computer.interface.scroll(x, y); // Scroll the mouse wheel
|
||||
await computer.interface.scrollDown(clicks); // Scroll down
|
||||
await computer.interface.scrollUp(clicks); // Scroll up
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
@@ -176,21 +200,22 @@ Screen capture and display information:
|
||||
|
||||
<Tabs items={['Python', 'TypeScript']}>
|
||||
<Tab value="Python">
|
||||
```python
|
||||
# Screen operations
|
||||
await computer.interface.screenshot() # Take a screenshot
|
||||
await computer.interface.get_screen_size() # Get screen dimensions
|
||||
|
||||
```python
|
||||
# Screen operations
|
||||
await computer.interface.screenshot() # Take a screenshot
|
||||
await computer.interface.get_screen_size() # Get screen dimensions
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
```typescript
|
||||
// Screen operations
|
||||
await computer.interface.screenshot(); // Take a screenshot
|
||||
await computer.interface.getScreenSize(); // Get screen dimensions
|
||||
|
||||
|
||||
```typescript
|
||||
// Screen operations
|
||||
await computer.interface.screenshot(); // Take a screenshot
|
||||
await computer.interface.getScreenSize(); // Get screen dimensions
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
@@ -229,20 +254,20 @@ System clipboard management:
|
||||
|
||||
<Tabs items={['Python', 'TypeScript']}>
|
||||
<Tab value="Python">
|
||||
```python
|
||||
# Clipboard operations await
|
||||
computer.interface.set_clipboard(text) # Set clipboard content await
|
||||
computer.interface.copy_to_clipboard() # Get clipboard content
|
||||
|
||||
```python
|
||||
# Clipboard operations
|
||||
await computer.interface.set_clipboard(text) # Set clipboard content
|
||||
await computer.interface.copy_to_clipboard() # Get clipboard content
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
```typescript
|
||||
// Clipboard operations
|
||||
|
||||
```typescript
|
||||
// Clipboard operations
|
||||
await computer.interface.setClipboard(text); // Set clipboard content
|
||||
await computer.interface.copyToClipboard(); // Get clipboard content
|
||||
|
||||
```
|
||||
|
||||
</Tab>
|
||||
@@ -275,18 +300,19 @@ Direct file and directory manipulation:
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
|
||||
```typescript
|
||||
# File existence checks
|
||||
// File existence checks
|
||||
await computer.interface.fileExists(path); // Check if file exists
|
||||
await computer.interface.directoryExists(path); // Check if directory exists
|
||||
|
||||
# File content operations
|
||||
// File content operations
|
||||
await computer.interface.readText(path, "utf-8"); // Read file content
|
||||
await computer.interface.writeText(path, content, "utf-8"); // Write file content
|
||||
await computer.interface.readBytes(path); // Read file content as bytes
|
||||
await computer.interface.writeBytes(path, content); // Write file content as bytes
|
||||
|
||||
# File and directory management
|
||||
// File and directory management
|
||||
await computer.interface.deleteFile(path); // Delete file
|
||||
await computer.interface.createDir(path); // Create directory
|
||||
await computer.interface.deleteDir(path); // Delete directory
|
||||
@@ -302,20 +328,21 @@ Access system accessibility information:
|
||||
|
||||
<Tabs items={['Python', 'TypeScript']}>
|
||||
<Tab value="Python">
|
||||
```python
|
||||
# Get accessibility tree
|
||||
await computer.interface.get_accessibility_tree()
|
||||
|
||||
```python
|
||||
# Get accessibility tree
|
||||
await computer.interface.get_accessibility_tree()
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
```typescript
|
||||
// Get accessibility tree
|
||||
await computer.interface.getAccessibilityTree();
|
||||
|
||||
````
|
||||
</Tab>
|
||||
```typescript
|
||||
// Get accessibility tree
|
||||
await computer.interface.getAccessibilityTree();
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Delay Configuration
|
||||
@@ -324,6 +351,7 @@ Control timing between actions:
|
||||
|
||||
<Tabs items={['Python']}>
|
||||
<Tab value="Python">
|
||||
|
||||
```python
|
||||
# Set default delay between all actions (in seconds)
|
||||
computer.interface.delay = 0.5 # 500ms delay between actions
|
||||
@@ -343,6 +371,7 @@ Manage Python environments:
|
||||
|
||||
<Tabs items={['Python']}>
|
||||
<Tab value="Python">
|
||||
|
||||
```python
|
||||
# Virtual environment management
|
||||
await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment
|
||||
@@ -352,4 +381,3 @@ Manage Python environments:
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
````
|
||||
|
||||
@@ -10,7 +10,8 @@ pip install "cua-computer[ui]"
|
||||
```
|
||||
|
||||
<Callout title="Note">
|
||||
For precise control of the computer, we recommend using VNC or Screen Sharing instead of the Computer Gradio UI.
|
||||
For precise control of the computer, we recommend using VNC or Screen Sharing instead of the
|
||||
Computer Gradio UI.
|
||||
</Callout>
|
||||
|
||||
### Building and Sharing Demonstrations with Huggingface
|
||||
@@ -43,8 +44,12 @@ For examples, see [Computer UI Examples](https://github.com/trycua/cua/tree/main
|
||||
#### 3. Record Your Tasks
|
||||
|
||||
<details open>
|
||||
<summary>View demonstration video</summary>
|
||||
<video src="https://github.com/user-attachments/assets/de3c3477-62fe-413c-998d-4063e48de176" controls width="600"></video>
|
||||
<summary>View demonstration video</summary>
|
||||
<video
|
||||
src="https://github.com/user-attachments/assets/de3c3477-62fe-413c-998d-4063e48de176"
|
||||
controls
|
||||
width="600"
|
||||
></video>
|
||||
</details>
|
||||
|
||||
Record yourself performing various computer tasks using the UI.
|
||||
@@ -52,8 +57,12 @@ Record yourself performing various computer tasks using the UI.
|
||||
#### 4. Save Your Demonstrations
|
||||
|
||||
<details open>
|
||||
<summary>View demonstration video</summary>
|
||||
<video src="https://github.com/user-attachments/assets/5ad1df37-026a-457f-8b49-922ae805faef" controls width="600"></video>
|
||||
<summary>View demonstration video</summary>
|
||||
<video
|
||||
src="https://github.com/user-attachments/assets/5ad1df37-026a-457f-8b49-922ae805faef"
|
||||
controls
|
||||
width="600"
|
||||
></video>
|
||||
</details>
|
||||
|
||||
Save each task by picking a descriptive name and adding relevant tags (e.g., "office", "web-browsing", "coding").
|
||||
@@ -65,11 +74,16 @@ Repeat steps 3 and 4 until you have a good amount of demonstrations covering dif
|
||||
#### 6. Upload to Huggingface
|
||||
|
||||
<details open>
|
||||
<summary>View demonstration video</summary>
|
||||
<video src="https://github.com/user-attachments/assets/c586d460-3877-4b5f-a736-3248886d2134" controls width="600"></video>
|
||||
<summary>View demonstration video</summary>
|
||||
<video
|
||||
src="https://github.com/user-attachments/assets/c586d460-3877-4b5f-a736-3248886d2134"
|
||||
controls
|
||||
width="600"
|
||||
></video>
|
||||
</details>
|
||||
|
||||
Upload your dataset to Huggingface by:
|
||||
|
||||
- Naming it as `{your_username}/{dataset_name}`
|
||||
- Choosing public or private visibility
|
||||
- Optionally selecting specific tags to upload only tasks with certain tags
|
||||
@@ -77,4 +91,4 @@ Upload your dataset to Huggingface by:
|
||||
#### Examples and Resources
|
||||
|
||||
- Example Dataset: [ddupont/test-dataset](https://huggingface.co/datasets/ddupont/test-dataset)
|
||||
- Find Community Datasets: 🔍 [Browse CUA Datasets on Huggingface](https://huggingface.co/datasets?other=cua)
|
||||
- Find Community Datasets: 🔍 [Browse CUA Datasets on Huggingface](https://huggingface.co/datasets?other=cua)
|
||||
|
||||
@@ -3,7 +3,17 @@ title: Cua Computers
|
||||
description: Understanding Cua computer types and connection methods
|
||||
---
|
||||
|
||||
<Callout>A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/computer_nb.ipynb" target="_blank">Jupyter Notebook</a> and <a href="https://github.com/trycua/cua/tree/main/examples/computer-example-ts" target="_blank">NodeJS project</a> are available for this documentation.</Callout>
|
||||
<Callout>
|
||||
A corresponding{' '}
|
||||
<a href="https://github.com/trycua/cua/blob/main/notebooks/computer_nb.ipynb" target="_blank">
|
||||
Jupyter Notebook
|
||||
</a>{' '}
|
||||
and{' '}
|
||||
<a href="https://github.com/trycua/cua/tree/main/examples/computer-example-ts" target="_blank">
|
||||
NodeJS project
|
||||
</a>{' '}
|
||||
are available for this documentation.
|
||||
</Callout>
|
||||
|
||||
Before we can automate apps using AI, we need to first connect to a Computer Server to give the AI a safe environment to execute workflows in.
|
||||
|
||||
|
||||
@@ -3,7 +3,16 @@ title: Sandboxed Python
|
||||
slug: sandboxed-python
|
||||
---
|
||||
|
||||
<Callout>A corresponding <a href="https://github.com/trycua/cua/blob/main/examples/sandboxed_functions_examples.py" target="_blank">Python example</a> is available for this documentation.</Callout>
|
||||
<Callout>
|
||||
A corresponding{' '}
|
||||
<a
|
||||
href="https://github.com/trycua/cua/blob/main/examples/sandboxed_functions_examples.py"
|
||||
target="_blank"
|
||||
>
|
||||
Python example
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
You can run Python functions securely inside a sandboxed virtual environment on a remote Cua Computer. This is useful for executing untrusted user code, isolating dependencies, or providing a safe environment for automation tasks.
|
||||
|
||||
|
||||
@@ -15,6 +15,7 @@ This preset usecase uses [Cua Computer](/computer-sdk/computers) to interact wit
|
||||
## Quickstart
|
||||
|
||||
Create a `requirements.txt` file with the following dependencies:
|
||||
|
||||
```text
|
||||
cua-agent
|
||||
cua-computer
|
||||
@@ -34,7 +35,7 @@ ANTHROPIC_API_KEY=your-api-key
|
||||
CUA_API_KEY=sk_cua-api01...
|
||||
```
|
||||
|
||||
Select the environment you want to run the code in (*click on the underlined values in the code to edit them directly!*):
|
||||
Select the environment you want to run the code in (_click on the underlined values in the code to edit them directly!_):
|
||||
|
||||
<Tabs items={['☁️ Cloud', '🐳 Docker', '🍎 Lume', '🪟 Windows Sandbox']}>
|
||||
<Tab value="☁️ Cloud">
|
||||
@@ -58,23 +59,21 @@ from computer import Computer, VMProviderType
|
||||
from dotenv import load_dotenv
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
logger = logging.getLogger(**name**)
|
||||
|
||||
def handle_sigint(sig, frame):
|
||||
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
|
||||
exit(0)
|
||||
|
||||
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
|
||||
exit(0)
|
||||
|
||||
async def fill_application():
|
||||
try:
|
||||
async with Computer(
|
||||
os_type="linux",
|
||||
provider_type=VMProviderType.CLOUD,
|
||||
name="`}<EditableValue placeholder="container-name" />{`",
|
||||
api_key="`}<EditableValue placeholder="api_key" />{`",
|
||||
verbosity=logging.INFO,
|
||||
) as computer:
|
||||
try:
|
||||
async with Computer(
|
||||
os_type="linux",
|
||||
provider_type=VMProviderType.CLOUD,
|
||||
name="`}<EditableValue placeholder="container-name" />{`",
|
||||
api_key="`}<EditableValue placeholder="api_key" />{`",
|
||||
verbosity=logging.INFO,
|
||||
) as computer:
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
@@ -124,10 +123,9 @@ async def fill_application():
|
||||
traceback.print_exc()
|
||||
raise
|
||||
|
||||
|
||||
def main():
|
||||
try:
|
||||
load_dotenv()
|
||||
try:
|
||||
load_dotenv()
|
||||
|
||||
if "ANTHROPIC_API_KEY" not in os.environ:
|
||||
raise RuntimeError(
|
||||
@@ -149,9 +147,9 @@ def main():
|
||||
logger.error(f"Error running automation: {e}")
|
||||
traceback.print_exc()
|
||||
|
||||
if **name** == "**main**":
|
||||
main()`}
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()`}
|
||||
</EditableCodeBlock>
|
||||
|
||||
</Tab>
|
||||
@@ -175,22 +173,20 @@ from computer import Computer, VMProviderType
|
||||
from dotenv import load_dotenv
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
logger = logging.getLogger(**name**)
|
||||
|
||||
def handle_sigint(sig, frame):
|
||||
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
|
||||
exit(0)
|
||||
|
||||
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
|
||||
exit(0)
|
||||
|
||||
async def fill_application():
|
||||
try:
|
||||
async with Computer(
|
||||
os_type="macos",
|
||||
provider_type=VMProviderType.LUME,
|
||||
name="`}<EditableValue placeholder="container-name" />{`",
|
||||
verbosity=logging.INFO,
|
||||
) as computer:
|
||||
try:
|
||||
async with Computer(
|
||||
os_type="macos",
|
||||
provider_type=VMProviderType.LUME,
|
||||
name="`}<EditableValue placeholder="container-name" />{`",
|
||||
verbosity=logging.INFO,
|
||||
) as computer:
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
@@ -240,10 +236,9 @@ async def fill_application():
|
||||
traceback.print_exc()
|
||||
raise
|
||||
|
||||
|
||||
def main():
|
||||
try:
|
||||
load_dotenv()
|
||||
try:
|
||||
load_dotenv()
|
||||
|
||||
if "ANTHROPIC_API_KEY" not in os.environ:
|
||||
raise RuntimeError(
|
||||
@@ -259,9 +254,9 @@ def main():
|
||||
logger.error(f"Error running automation: {e}")
|
||||
traceback.print_exc()
|
||||
|
||||
if **name** == "**main**":
|
||||
main()`}
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()`}
|
||||
</EditableCodeBlock>
|
||||
|
||||
</Tab>
|
||||
@@ -283,21 +278,19 @@ from computer import Computer, VMProviderType
|
||||
from dotenv import load_dotenv
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
logger = logging.getLogger(**name**)
|
||||
|
||||
def handle_sigint(sig, frame):
|
||||
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
|
||||
exit(0)
|
||||
|
||||
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
|
||||
exit(0)
|
||||
|
||||
async def fill_application():
|
||||
try:
|
||||
async with Computer(
|
||||
os_type="windows",
|
||||
provider_type=VMProviderType.WINDOWS_SANDBOX,
|
||||
verbosity=logging.INFO,
|
||||
) as computer:
|
||||
try:
|
||||
async with Computer(
|
||||
os_type="windows",
|
||||
provider_type=VMProviderType.WINDOWS_SANDBOX,
|
||||
verbosity=logging.INFO,
|
||||
) as computer:
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
@@ -347,10 +340,9 @@ async def fill_application():
|
||||
traceback.print_exc()
|
||||
raise
|
||||
|
||||
|
||||
def main():
|
||||
try:
|
||||
load_dotenv()
|
||||
try:
|
||||
load_dotenv()
|
||||
|
||||
if "ANTHROPIC_API_KEY" not in os.environ:
|
||||
raise RuntimeError(
|
||||
@@ -366,9 +358,9 @@ def main():
|
||||
logger.error(f"Error running automation: {e}")
|
||||
traceback.print_exc()
|
||||
|
||||
if **name** == "**main**":
|
||||
main()`}
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()`}
|
||||
</EditableCodeBlock>
|
||||
|
||||
</Tab>
|
||||
@@ -392,22 +384,20 @@ from computer import Computer, VMProviderType
|
||||
from dotenv import load_dotenv
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
logger = logging.getLogger(**name**)
|
||||
|
||||
def handle_sigint(sig, frame):
|
||||
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
|
||||
exit(0)
|
||||
|
||||
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
|
||||
exit(0)
|
||||
|
||||
async def fill_application():
|
||||
try:
|
||||
async with Computer(
|
||||
os_type="linux",
|
||||
provider_type=VMProviderType.DOCKER,
|
||||
name="`}<EditableValue placeholder="container-name" />{`",
|
||||
verbosity=logging.INFO,
|
||||
) as computer:
|
||||
try:
|
||||
async with Computer(
|
||||
os_type="linux",
|
||||
provider_type=VMProviderType.DOCKER,
|
||||
name="`}<EditableValue placeholder="container-name" />{`",
|
||||
verbosity=logging.INFO,
|
||||
) as computer:
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
@@ -457,10 +447,9 @@ async def fill_application():
|
||||
traceback.print_exc()
|
||||
raise
|
||||
|
||||
|
||||
def main():
|
||||
try:
|
||||
load_dotenv()
|
||||
try:
|
||||
load_dotenv()
|
||||
|
||||
if "ANTHROPIC_API_KEY" not in os.environ:
|
||||
raise RuntimeError(
|
||||
@@ -476,9 +465,9 @@ def main():
|
||||
logger.error(f"Error running automation: {e}")
|
||||
traceback.print_exc()
|
||||
|
||||
if **name** == "**main**":
|
||||
main()`}
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()`}
|
||||
</EditableCodeBlock>
|
||||
|
||||
</Tab>
|
||||
@@ -488,4 +477,4 @@ if __name__ == "__main__":
|
||||
|
||||
- Learn more about [Cua computers](/computer-sdk/computers) and [computer commands](/computer-sdk/commands)
|
||||
- Read about [Agent loops](/agent-sdk/agent-loops), [tools](/agent-sdk/custom-tools), and [supported model providers](/agent-sdk/supported-model-providers/)
|
||||
- Experiment with different [Models and Providers](/agent-sdk/supported-model-providers/)
|
||||
- Experiment with different [Models and Providers](/agent-sdk/supported-model-providers/)
|
||||
|
||||
@@ -7,42 +7,42 @@ description: List of all commands supported by the Computer Server API (WebSocke
|
||||
|
||||
This page lists all supported commands for the Computer Server, available via both WebSocket and REST API endpoints.
|
||||
|
||||
| Command | Description |
|
||||
|---------------------|--------------------------------------------|
|
||||
| version | Get protocol and package version info |
|
||||
| run_command | Run a shell command |
|
||||
| screenshot | Capture a screenshot |
|
||||
| get_screen_size | Get the screen size |
|
||||
| get_cursor_position | Get the current mouse cursor position |
|
||||
| mouse_down | Mouse button down |
|
||||
| mouse_up | Mouse button up |
|
||||
| left_click | Left mouse click |
|
||||
| right_click | Right mouse click |
|
||||
| double_click | Double mouse click |
|
||||
| move_cursor | Move mouse cursor to coordinates |
|
||||
| drag_to | Drag mouse to coordinates |
|
||||
| drag | Drag mouse by offset |
|
||||
| key_down | Keyboard key down |
|
||||
| key_up | Keyboard key up |
|
||||
| type_text | Type text |
|
||||
| press_key | Press a single key |
|
||||
| hotkey | Press a hotkey combination |
|
||||
| scroll | Scroll the screen |
|
||||
| scroll_down | Scroll down |
|
||||
| scroll_up | Scroll up |
|
||||
| copy_to_clipboard | Copy text to clipboard |
|
||||
| set_clipboard | Set clipboard content |
|
||||
| file_exists | Check if a file exists |
|
||||
| directory_exists | Check if a directory exists |
|
||||
| list_dir | List files/directories in a directory |
|
||||
| read_text | Read text from a file |
|
||||
| write_text | Write text to a file |
|
||||
| read_bytes | Read bytes from a file |
|
||||
| write_bytes | Write bytes to a file |
|
||||
| get_file_size | Get file size |
|
||||
| delete_file | Delete a file |
|
||||
| create_dir | Create a directory |
|
||||
| delete_dir | Delete a directory |
|
||||
| get_accessibility_tree | Get accessibility tree (if supported) |
|
||||
| find_element | Find element in accessibility tree |
|
||||
| diorama_cmd | Run a diorama command (if supported) |
|
||||
| Command | Description |
|
||||
| ---------------------- | ------------------------------------- |
|
||||
| version | Get protocol and package version info |
|
||||
| run_command | Run a shell command |
|
||||
| screenshot | Capture a screenshot |
|
||||
| get_screen_size | Get the screen size |
|
||||
| get_cursor_position | Get the current mouse cursor position |
|
||||
| mouse_down | Mouse button down |
|
||||
| mouse_up | Mouse button up |
|
||||
| left_click | Left mouse click |
|
||||
| right_click | Right mouse click |
|
||||
| double_click | Double mouse click |
|
||||
| move_cursor | Move mouse cursor to coordinates |
|
||||
| drag_to | Drag mouse to coordinates |
|
||||
| drag | Drag mouse by offset |
|
||||
| key_down | Keyboard key down |
|
||||
| key_up | Keyboard key up |
|
||||
| type_text | Type text |
|
||||
| press_key | Press a single key |
|
||||
| hotkey | Press a hotkey combination |
|
||||
| scroll | Scroll the screen |
|
||||
| scroll_down | Scroll down |
|
||||
| scroll_up | Scroll up |
|
||||
| copy_to_clipboard | Copy text to clipboard |
|
||||
| set_clipboard | Set clipboard content |
|
||||
| file_exists | Check if a file exists |
|
||||
| directory_exists | Check if a directory exists |
|
||||
| list_dir | List files/directories in a directory |
|
||||
| read_text | Read text from a file |
|
||||
| write_text | Write text to a file |
|
||||
| read_bytes | Read bytes from a file |
|
||||
| write_bytes | Write bytes to a file |
|
||||
| get_file_size | Get file size |
|
||||
| delete_file | Delete a file |
|
||||
| create_dir | Create a directory |
|
||||
| delete_dir | Delete a directory |
|
||||
| get_accessibility_tree | Get accessibility tree (if supported) |
|
||||
| find_element | Find element in accessibility tree |
|
||||
| diorama_cmd | Run a diorama command (if supported) |
|
||||
|
||||
@@ -16,6 +16,7 @@ The Computer Server exposes a single REST endpoint for command execution:
|
||||
- Returns results as a streaming response (text/event-stream)
|
||||
|
||||
### Request Format
|
||||
|
||||
```json
|
||||
{
|
||||
"command": "<command_name>",
|
||||
@@ -24,10 +25,12 @@ The Computer Server exposes a single REST endpoint for command execution:
|
||||
```
|
||||
|
||||
### Required Headers (for cloud containers)
|
||||
|
||||
- `X-Container-Name`: Name of the container (cloud only)
|
||||
- `X-API-Key`: API key for authentication (cloud only)
|
||||
|
||||
### Example Request (Python)
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
@@ -38,6 +41,7 @@ print(resp.text)
|
||||
```
|
||||
|
||||
### Example Request (Cloud)
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
@@ -52,7 +56,9 @@ print(resp.text)
|
||||
```
|
||||
|
||||
### Response Format
|
||||
|
||||
Streaming text/event-stream with JSON objects, e.g.:
|
||||
|
||||
```
|
||||
data: {"success": true, "content": "..."}
|
||||
|
||||
@@ -60,4 +66,5 @@ data: {"success": false, "error": "..."}
|
||||
```
|
||||
|
||||
### Supported Commands
|
||||
|
||||
See [Commands Reference](./Commands) for the full list of commands and parameters.
|
||||
|
||||
@@ -11,7 +11,9 @@ The Computer Server exposes a WebSocket endpoint for real-time command execution
|
||||
- `wss://your-container.containers.cloud.trycua.com:8443/ws` (cloud)
|
||||
|
||||
### Authentication (Cloud Only)
|
||||
|
||||
For cloud containers, you must authenticate immediately after connecting:
|
||||
|
||||
```json
|
||||
{
|
||||
"command": "authenticate",
|
||||
@@ -21,10 +23,13 @@ For cloud containers, you must authenticate immediately after connecting:
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If authentication fails, the connection is closed.
|
||||
|
||||
### Command Format
|
||||
|
||||
Send JSON messages:
|
||||
|
||||
```json
|
||||
{
|
||||
"command": "<command_name>",
|
||||
@@ -33,6 +38,7 @@ Send JSON messages:
|
||||
```
|
||||
|
||||
### Example (Python)
|
||||
|
||||
```python
|
||||
import websockets
|
||||
import asyncio
|
||||
@@ -49,6 +55,7 @@ asyncio.run(main())
|
||||
```
|
||||
|
||||
### Example (Cloud)
|
||||
|
||||
```python
|
||||
import websockets
|
||||
import asyncio
|
||||
@@ -74,7 +81,9 @@ asyncio.run(main())
|
||||
```
|
||||
|
||||
### Response Format
|
||||
|
||||
Each response is a JSON object:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
@@ -83,4 +92,5 @@ Each response is a JSON object:
|
||||
```
|
||||
|
||||
### Supported Commands
|
||||
|
||||
See [Commands Reference](./Commands) for the full list of commands and parameters.
|
||||
|
||||
@@ -6,7 +6,16 @@ github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/computer-server
|
||||
---
|
||||
|
||||
<Callout>A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/computer_server_nb.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.</Callout>
|
||||
<Callout>
|
||||
A corresponding{' '}
|
||||
<a
|
||||
href="https://github.com/trycua/cua/blob/main/notebooks/computer_server_nb.ipynb"
|
||||
target="_blank"
|
||||
>
|
||||
Jupyter Notebook
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
The Computer Server API reference documentation is currently under development.
|
||||
|
||||
|
||||
@@ -20,4 +20,4 @@ See the [Commands](../computer-sdk/commands) documentation for all supported com
|
||||
|
||||
## Sandboxed Python Functions
|
||||
|
||||
See the [Sandboxed Python](../computer-sdk/sandboxed-python) documentation for running Python functions securely in isolated environments on a remote Cua Computer.
|
||||
See the [Sandboxed Python](../computer-sdk/sandboxed-python) documentation for running Python functions securely in isolated environments on a remote Cua Computer.
|
||||
|
||||
@@ -18,7 +18,8 @@ lume run ubuntu-noble-vanilla:latest
|
||||
```
|
||||
|
||||
<Callout>
|
||||
We provide [prebuilt VM images](../lume/prebuilt-images) in our [ghcr registry](https://github.com/orgs/trycua/packages).
|
||||
We provide [prebuilt VM images](../lume/prebuilt-images) in our [ghcr
|
||||
registry](https://github.com/orgs/trycua/packages).
|
||||
</Callout>
|
||||
|
||||
### Create a Custom VM
|
||||
@@ -37,10 +38,11 @@ The actual disk space used by sparse images will be much lower than the logical
|
||||
|
||||
## VM Management
|
||||
|
||||
lume create <name>
|
||||
lume create <name>
|
||||
Create a new macOS or Linux virtual machine.
|
||||
|
||||
**Options:**
|
||||
|
||||
- `--os <os>` - Operating system to install (macOS or linux, default: macOS)
|
||||
- `--cpu <cores>` - Number of CPU cores (default: 4)
|
||||
- `--memory <size>` - Memory size, e.g., 8GB (default: 4GB)
|
||||
@@ -50,6 +52,7 @@ Create a new macOS or Linux virtual machine.
|
||||
- `--storage <name>` - VM storage location to use
|
||||
|
||||
**Examples:**
|
||||
|
||||
```bash
|
||||
# Create macOS VM with custom specs
|
||||
lume create my-mac --cpu 6 --memory 16GB --disk-size 100GB
|
||||
@@ -61,10 +64,11 @@ lume create my-ubuntu --os linux --cpu 2 --memory 8GB
|
||||
lume create my-sequoia --ipsw latest
|
||||
```
|
||||
|
||||
lume run <name>
|
||||
lume run <name>
|
||||
Start and run a virtual machine.
|
||||
|
||||
**Options:**
|
||||
|
||||
- `--no-display` - Do not start the VNC client app
|
||||
- `--shared-dir <dir>` - Share directory with VM (format: path[:ro|rw])
|
||||
- `--mount <path>` - For Linux VMs only, attach a read-only disk image
|
||||
@@ -75,6 +79,7 @@ Start and run a virtual machine.
|
||||
- `--storage <name>` - VM storage location to use
|
||||
|
||||
**Examples:**
|
||||
|
||||
```bash
|
||||
# Run VM with shared directory
|
||||
lume run my-vm --shared-dir /path/to/share:rw
|
||||
@@ -86,42 +91,52 @@ lume run my-vm --no-display
|
||||
lume run my-mac --recovery-mode true
|
||||
```
|
||||
|
||||
lume stop <name>
|
||||
lume stop <name>
|
||||
Stop a running virtual machine.
|
||||
|
||||
**Options:**
|
||||
|
||||
- `--storage <name>` - VM storage location to use
|
||||
|
||||
### lume delete <name>
|
||||
|
||||
Delete a virtual machine and its associated files.
|
||||
|
||||
**Options:**
|
||||
|
||||
- `--force` - Force deletion without confirmation
|
||||
- `--storage <name>` - VM storage location to use
|
||||
|
||||
### lume clone <name> <new-name>
|
||||
|
||||
Create a copy of an existing virtual machine.
|
||||
|
||||
**Options:**
|
||||
|
||||
- `--source-storage <name>` - Source VM storage location
|
||||
- `--dest-storage <name>` - Destination VM storage location
|
||||
|
||||
## VM Information and Configuration
|
||||
|
||||
### lume ls
|
||||
|
||||
List all virtual machines and their status.
|
||||
|
||||
### lume get <name>
|
||||
|
||||
Get detailed information about a specific virtual machine.
|
||||
|
||||
**Options:**
|
||||
|
||||
- `-f, --format <format>` - Output format (json|text)
|
||||
- `--storage <name>` - VM storage location to use
|
||||
|
||||
### lume set <name>
|
||||
|
||||
Modify virtual machine configuration.
|
||||
|
||||
**Options:**
|
||||
|
||||
- `--cpu <cores>` - New number of CPU cores (e.g., 4)
|
||||
- `--memory <size>` - New memory size (e.g., 8192MB or 8GB)
|
||||
- `--disk-size <size>` - New disk size (e.g., 40960MB or 40GB)
|
||||
@@ -129,6 +144,7 @@ Modify virtual machine configuration.
|
||||
- `--storage <name>` - VM storage location to use
|
||||
|
||||
**Examples:**
|
||||
|
||||
```bash
|
||||
# Increase VM memory
|
||||
lume set my-vm --memory 16GB
|
||||
@@ -143,20 +159,25 @@ lume set my-vm --cpu 8
|
||||
## Image Management
|
||||
|
||||
### lume images
|
||||
|
||||
List available macOS images in local cache.
|
||||
|
||||
### lume pull <image>
|
||||
|
||||
Download a VM image from a container registry.
|
||||
|
||||
**Options:**
|
||||
|
||||
- `--registry <url>` - Container registry URL (default: ghcr.io)
|
||||
- `--organization <org>` - Organization to pull from (default: trycua)
|
||||
- `--storage <name>` - VM storage location to use
|
||||
|
||||
### lume push <name> <image:tag>
|
||||
|
||||
Upload a VM image to a container registry.
|
||||
|
||||
**Options:**
|
||||
|
||||
- `--additional-tags <tags...>` - Additional tags to push the same image to
|
||||
- `--registry <url>` - Container registry URL (default: ghcr.io)
|
||||
- `--organization <org>` - Organization/user to push to (default: trycua)
|
||||
@@ -167,38 +188,46 @@ Upload a VM image to a container registry.
|
||||
- `--reassemble` - Verify integrity by reassembling chunks (requires --dry-run)
|
||||
|
||||
### lume ipsw
|
||||
|
||||
Get the latest macOS restore image URL.
|
||||
|
||||
### lume prune
|
||||
|
||||
Remove cached images to free up disk space.
|
||||
|
||||
## Configuration
|
||||
|
||||
### lume config
|
||||
|
||||
Manage Lume configuration settings.
|
||||
|
||||
**Subcommands:**
|
||||
|
||||
##### Storage Management
|
||||
|
||||
- `lume config storage add <name> <path>` - Add a new VM storage location
|
||||
- `lume config storage remove <name>` - Remove a VM storage location
|
||||
- `lume config storage list` - List all VM storage locations
|
||||
- `lume config storage default <name>` - Set the default VM storage location
|
||||
|
||||
##### Cache Management
|
||||
|
||||
- `lume config cache get` - Get current cache directory
|
||||
- `lume config cache set <path>` - Set cache directory
|
||||
|
||||
##### Image Caching
|
||||
|
||||
- `lume config caching get` - Show current caching status
|
||||
- `lume config caching set <boolean>` - Enable or disable image caching
|
||||
|
||||
## API Server
|
||||
|
||||
### lume serve
|
||||
|
||||
Start the Lume API server for programmatic access.
|
||||
|
||||
**Options:**
|
||||
|
||||
- `--port <port>` - Port to listen on (default: 7777)
|
||||
|
||||
## Global Options
|
||||
@@ -206,4 +235,4 @@ Start the Lume API server for programmatic access.
|
||||
These options are available for all commands:
|
||||
|
||||
- `--help` - Show help information
|
||||
- `--version` - Show version number
|
||||
- `--version` - Show version number
|
||||
|
||||
@@ -13,9 +13,8 @@ http://localhost:7777
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
The HTTP API service runs on port `7777` by default. If you'd like to use a
|
||||
different port, pass the `--port` option during installation or when running
|
||||
`lume serve`.
|
||||
The HTTP API service runs on port `7777` by default. If you'd like to use a different port, pass
|
||||
the `--port` option during installation or when running `lume serve`.
|
||||
</Callout>
|
||||
|
||||
## Endpoints
|
||||
@@ -726,15 +725,15 @@ Push a VM to a registry as an image (asynchronous operation).
|
||||
|
||||
#### Parameters
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ------------ | ------------ | -------- | ----------------------------------------------- |
|
||||
| name | string | Yes | Local VM name to push |
|
||||
| imageName | string | Yes | Image name in registry |
|
||||
| tags | array | Yes | Image tags (e.g. `["latest", "v1"]`) |
|
||||
| organization | string | Yes | Organization name |
|
||||
| registry | string | No | Registry host (e.g. `ghcr.io`) |
|
||||
| chunkSizeMb | integer | No | Chunk size in MB for upload |
|
||||
| storage | string/null | No | Storage type (`ssd`, etc.) |
|
||||
| Name | Type | Required | Description |
|
||||
| ------------ | ----------- | -------- | ------------------------------------ |
|
||||
| name | string | Yes | Local VM name to push |
|
||||
| imageName | string | Yes | Image name in registry |
|
||||
| tags | array | Yes | Image tags (e.g. `["latest", "v1"]`) |
|
||||
| organization | string | Yes | Organization name |
|
||||
| registry | string | No | Registry host (e.g. `ghcr.io`) |
|
||||
| chunkSizeMb | integer | No | Chunk size in MB for upload |
|
||||
| storage | string/null | No | Storage type (`ssd`, etc.) |
|
||||
|
||||
#### Example Request
|
||||
|
||||
@@ -747,13 +746,13 @@ curl --connect-timeout 6000 \
|
||||
-X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "my-local-vm",
|
||||
"name": "my-local-vm",
|
||||
"imageName": "my-image",
|
||||
"tags": ["latest", "v1"],
|
||||
"organization": "my-org",
|
||||
"organization": "my-org",
|
||||
"registry": "ghcr.io",
|
||||
"chunkSizeMb": 512,
|
||||
"storage": null
|
||||
"storage": null
|
||||
}' \
|
||||
http://localhost:7777/lume/vms/push
|
||||
```
|
||||
@@ -808,10 +807,7 @@ console.log(await res.json());
|
||||
"message": "Push initiated in background",
|
||||
"name": "my-local-vm",
|
||||
"imageName": "my-image",
|
||||
"tags": [
|
||||
"latest",
|
||||
"v1"
|
||||
]
|
||||
"tags": ["latest", "v1"]
|
||||
}
|
||||
```
|
||||
|
||||
@@ -857,10 +853,7 @@ console.log(await res.json());
|
||||
|
||||
```json
|
||||
{
|
||||
"local": [
|
||||
"macos-sequoia-xcode:latest",
|
||||
"macos-sequoia-vanilla:latest"
|
||||
]
|
||||
"local": ["macos-sequoia-xcode:latest", "macos-sequoia-vanilla:latest"]
|
||||
}
|
||||
```
|
||||
|
||||
@@ -1005,11 +998,11 @@ Update Lume configuration settings.
|
||||
|
||||
#### Parameters
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| --------------- | ------- | -------- | -------------------------------- |
|
||||
| homeDirectory | string | No | Lume home directory path |
|
||||
| cacheDirectory | string | No | Cache directory path |
|
||||
| cachingEnabled | boolean | No | Enable or disable caching |
|
||||
| Name | Type | Required | Description |
|
||||
| -------------- | ------- | -------- | ------------------------- |
|
||||
| homeDirectory | string | No | Lume home directory path |
|
||||
| cacheDirectory | string | No | Cache directory path |
|
||||
| cachingEnabled | boolean | No | Enable or disable caching |
|
||||
|
||||
#### Example Request
|
||||
|
||||
|
||||
@@ -5,4 +5,4 @@ github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/lume
|
||||
---
|
||||
|
||||
Lume is a lightweight Command Line Interface and local API server for creating, running and managing **macOS and Linux virtual machines** with near-native performance on Apple Silicon, using Apple's [Virtualization.Framework](https://developer.apple.com/documentation/virtualization).
|
||||
Lume is a lightweight Command Line Interface and local API server for creating, running and managing **macOS and Linux virtual machines** with near-native performance on Apple Silicon, using Apple's [Virtualization.Framework](https://developer.apple.com/documentation/virtualization).
|
||||
|
||||
@@ -15,10 +15,12 @@ lume run macos-sequoia-vanilla:latest
|
||||
```
|
||||
|
||||
<Callout title="Security Note">
|
||||
All prebuilt images use the default password `lume`. Change this immediately after your first login using the `passwd` command.
|
||||
All prebuilt images use the default password `lume`. Change this immediately after your first
|
||||
login using the `passwd` command.
|
||||
</Callout>
|
||||
|
||||
**System Requirements**:
|
||||
|
||||
- Apple Silicon Mac (M1, M2, M3, etc.)
|
||||
- macOS 13.0 or later
|
||||
- At least 8GB of RAM (16GB recommended)
|
||||
@@ -33,6 +35,7 @@ Install with a single command:
|
||||
```
|
||||
|
||||
### Manual Start (No Background Service)
|
||||
|
||||
By default, Lume is installed as a background service that starts automatically on login. If you prefer to start the Lume API service manually when needed, you can use the `--no-background-service` option:
|
||||
|
||||
```bash
|
||||
@@ -40,8 +43,11 @@ By default, Lume is installed as a background service that starts automatically
|
||||
```
|
||||
|
||||
<Callout title="Note">
|
||||
With this option, you'll need to manually start the Lume API service by running `lume serve` in your terminal whenever you need to use tools or libraries that rely on the Lume API (such as the Computer-Use Agent).
|
||||
With this option, you'll need to manually start the Lume API service by running `lume serve` in
|
||||
your terminal whenever you need to use tools or libraries that rely on the Lume API (such as the
|
||||
Computer-Use Agent).
|
||||
</Callout>
|
||||
|
||||
## Manual Download and Installation
|
||||
You can also download the `lume.pkg.tar.gz` archive from the [latest release](https://github.com/trycua/cua/releases?q=lume&expanded=true), extract it, and install the package manually.
|
||||
|
||||
You can also download the `lume.pkg.tar.gz` archive from the [latest release](https://github.com/trycua/cua/releases?q=lume&expanded=true), extract it, and install the package manually.
|
||||
|
||||
@@ -5,24 +5,29 @@ title: Prebuilt Images
|
||||
Pre-built images are available in the registry [ghcr.io/trycua](https://github.com/orgs/trycua/packages). These images come with an SSH server pre-configured and auto-login enabled.
|
||||
|
||||
<Callout>
|
||||
The default password on pre-built images is `lume`. For the security of your VM, change this password after your first login.
|
||||
The default password on pre-built images is `lume`. For the security of your VM, change this
|
||||
password after your first login.
|
||||
</Callout>
|
||||
|
||||
## Available Images
|
||||
|
||||
The following pre-built images are available to download via `lume pull`:
|
||||
|
||||
| Image | Tag | Description | Logical Size |
|
||||
|-------|------------|-------------|------|
|
||||
| `macos-sequoia-vanilla` | `latest`, `15.2` | macOS Sequoia 15.2 image | 20GB |
|
||||
| `macos-sequoia-xcode` | `latest`, `15.2` | macOS Sequoia 15.2 image with Xcode command line tools | 22GB |
|
||||
| `macos-sequoia-cua` | `latest`, `15.3` | macOS Sequoia 15.3 image compatible with the Computer interface | 24GB |
|
||||
| `ubuntu-noble-vanilla` | `latest`, `24.04.1` | [Ubuntu Server for ARM 24.04.1 LTS](https://ubuntu.com/download/server/arm) with Ubuntu Desktop | 20GB |
|
||||
| Image | Tag | Description | Logical Size |
|
||||
| ----------------------- | ------------------- | ----------------------------------------------------------------------------------------------- | ------------ |
|
||||
| `macos-sequoia-vanilla` | `latest`, `15.2` | macOS Sequoia 15.2 image | 20GB |
|
||||
| `macos-sequoia-xcode` | `latest`, `15.2` | macOS Sequoia 15.2 image with Xcode command line tools | 22GB |
|
||||
| `macos-sequoia-cua` | `latest`, `15.3` | macOS Sequoia 15.3 image compatible with the Computer interface | 24GB |
|
||||
| `ubuntu-noble-vanilla` | `latest`, `24.04.1` | [Ubuntu Server for ARM 24.04.1 LTS](https://ubuntu.com/download/server/arm) with Ubuntu Desktop | 20GB |
|
||||
|
||||
## Disk Space
|
||||
|
||||
For additional disk space, resize the VM disk after pulling the image using the `lume set <name> --disk-size <size>` command. Note that the actual disk space used by sparse images will be much lower than the logical size listed.
|
||||
|
||||
<Callout>
|
||||
**Important Note (v0.2.0+):** Images are being re-uploaded with sparse file system optimizations enabled, resulting in significantly lower actual disk usage. Older images (without the `-sparse` suffix) are now **deprecated**. The last version of `lume` fully supporting the non-sparse images was `v0.1.x`. Starting from `v0.2.0`, lume will automatically pull images optimized with sparse file system support.
|
||||
</Callout>
|
||||
**Important Note (v0.2.0+):** Images are being re-uploaded with sparse file system optimizations
|
||||
enabled, resulting in significantly lower actual disk usage. Older images (without the `-sparse`
|
||||
suffix) are now **deprecated**. The last version of `lume` fully supporting the non-sparse images
|
||||
was `v0.1.x`. Starting from `v0.2.0`, lume will automatically pull images optimized with sparse
|
||||
file system support.
|
||||
</Callout>
|
||||
|
||||
@@ -39,4 +39,4 @@ docker build -t yourusername/lumier:custom .
|
||||
|
||||
# Push to Docker Hub (after docker login)
|
||||
docker push yourusername/lumier:custom
|
||||
```
|
||||
```
|
||||
|
||||
@@ -13,10 +13,10 @@ services:
|
||||
container_name: lumier-vm
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "8006:8006" # Port for VNC access
|
||||
- '8006:8006' # Port for VNC access
|
||||
volumes:
|
||||
- ./storage:/storage # VM persistent storage
|
||||
- ./shared:/shared # Shared folder accessible in the VM
|
||||
- ./storage:/storage # VM persistent storage
|
||||
- ./shared:/shared # Shared folder accessible in the VM
|
||||
environment:
|
||||
- VM_NAME=lumier-vm
|
||||
- VERSION=ghcr.io/trycua/macos-sequoia-cua:latest
|
||||
|
||||
@@ -5,6 +5,7 @@ title: Docker
|
||||
You can use Lumier through Docker:
|
||||
|
||||
### Run a macOS VM (ephemeral)
|
||||
|
||||
```bash
|
||||
# Run the container with temporary storage (using pre-built image from Docker Hub)
|
||||
docker run -it --rm \
|
||||
@@ -16,12 +17,15 @@ docker run -it --rm \
|
||||
-e RAM_SIZE=8192 \
|
||||
trycua/lumier:latest
|
||||
```
|
||||
|
||||
Access the VM in your browser at **http://localhost:8006**.
|
||||
|
||||
After running the command above, you can access your macOS VM through a web browser (e.g., http://localhost:8006).
|
||||
|
||||
<Callout title="Note">
|
||||
With the basic setup above, your VM will be reset when you stop the container (ephemeral mode). This means any changes you make inside the macOS VM will be lost. See the section below for how to save your VM state.
|
||||
With the basic setup above, your VM will be reset when you stop the container (ephemeral mode).
|
||||
This means any changes you make inside the macOS VM will be lost. See the section below for how to
|
||||
save your VM state.
|
||||
</Callout>
|
||||
|
||||
## Saving Your VM State
|
||||
@@ -121,4 +125,4 @@ When running Lumier, you'll need to configure a few things:
|
||||
- `HOST_STORAGE_PATH`: Path to save VM state (when using persistent storage)
|
||||
- `HOST_SHARED_PATH`: Path to the shared folder (optional)
|
||||
|
||||
- **Background service**: The `lume serve` service should be running on your host (starts automatically when you install Lume using the `install.sh` script above).
|
||||
- **Background service**: The `lume serve` service should be running on your host (starts automatically when you install Lume using the `install.sh` script above).
|
||||
|
||||
@@ -15,7 +15,9 @@ github:
|
||||
## How It Works
|
||||
|
||||
<Callout title="Note">
|
||||
We're using Docker primarily as a convenient delivery mechanism, not as an isolation layer. Unlike traditional Docker containers, Lumier leverages the Apple Virtualization Framework (Apple Vz) through the `lume` CLI to create true virtual machines.
|
||||
We're using Docker primarily as a convenient delivery mechanism, not as an isolation layer. Unlike
|
||||
traditional Docker containers, Lumier leverages the Apple Virtualization Framework (Apple Vz)
|
||||
through the `lume` CLI to create true virtual machines.
|
||||
</Callout>
|
||||
|
||||
Here's what's happening behind the scenes:
|
||||
@@ -23,4 +25,4 @@ Here's what's happening behind the scenes:
|
||||
1. The Docker container provides a consistent environment to run the Lumier interface
|
||||
2. Lumier connects to the Lume service running on your host Mac
|
||||
3. Lume uses Apple's Virtualization Framework to create a true macOS virtual machine
|
||||
4. The VM runs with hardware acceleration using your Mac's native virtualization capabilities
|
||||
4. The VM runs with hardware acceleration using your Mac's native virtualization capabilities
|
||||
|
||||
@@ -7,8 +7,9 @@ Before using Lumier, make sure you have:
|
||||
1. **Docker for Apple Silicon** - download it [here](https://desktop.docker.com/mac/main/arm64/Docker.dmg) and follow the installation instructions.
|
||||
|
||||
2. **Lume** - This is the virtualization CLI that powers Lumier. Install it with this command:
|
||||
|
||||
```bash
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
```
|
||||
|
||||
After installation, Lume runs as a background service and listens on port 7777. This service allows Lumier to create and manage virtual machines. If port 7777 is already in use on your system, you can specify a different port with the `--port` option when running the `install.sh` script.
|
||||
After installation, Lume runs as a background service and listens on port 7777. This service allows Lumier to create and manage virtual machines. If port 7777 is already in use on your system, you can specify a different port with the `--port` option when running the `install.sh` script.
|
||||
|
||||
@@ -17,4 +17,4 @@ To use with Cursor, add an MCP configuration file in one of these locations:
|
||||
|
||||
After configuration, you can simply tell Cursor's Agent to perform computer tasks by explicitly mentioning the CUA agent, such as "Use the computer control tools to open Safari."
|
||||
|
||||
For more information on MCP with Cursor, see the [official Cursor MCP documentation](https://docs.cursor.com/context/model-context-protocol).
|
||||
For more information on MCP with Cursor, see the [official Cursor MCP documentation](https://docs.cursor.com/context/model-context-protocol).
|
||||
|
||||
@@ -4,7 +4,7 @@ title: Configuration
|
||||
|
||||
The server is configured using environment variables (can be set in the Claude Desktop config):
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| Variable | Description | Default |
|
||||
| ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------ |
|
||||
| `CUA_MODEL_NAME` | Model string (e.g., "anthropic/claude-3-5-sonnet-20241022", "openai/computer-use-preview", "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", "omniparser+litellm/gpt-4o", "omniparser+ollama_chat/gemma3") | anthropic/claude-3-5-sonnet-20241022 |
|
||||
| `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 |
|
||||
| `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 |
|
||||
|
||||
@@ -6,4 +6,4 @@ github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/mcp-server
|
||||
---
|
||||
|
||||
**cua-mcp-server** is a MCP server for the Computer-Use Agent (CUA), allowing you to run CUA through Claude Desktop or other MCP clients.
|
||||
**cua-mcp-server** is a MCP server for the Computer-Use Agent (CUA), allowing you to run CUA through Claude Desktop or other MCP clients.
|
||||
|
||||
@@ -9,8 +9,9 @@ pip install cua-mcp-server
|
||||
```
|
||||
|
||||
This will install:
|
||||
|
||||
- The MCP server
|
||||
- CUA agent and computer dependencies
|
||||
- CUA agent and computer dependencies
|
||||
- An executable `cua-mcp-server` script in your PATH
|
||||
|
||||
## Easy Setup Script
|
||||
@@ -22,6 +23,7 @@ curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/python/mcp-ser
|
||||
```
|
||||
|
||||
This script will:
|
||||
|
||||
- Create the ~/.cua directory if it doesn't exist
|
||||
- Generate a startup script at ~/.cua/start_mcp_server.sh
|
||||
- Make the script executable
|
||||
@@ -30,7 +32,7 @@ This script will:
|
||||
You can then use the script in your MCP configuration like this:
|
||||
|
||||
```json
|
||||
{
|
||||
{
|
||||
"mcpServers": {
|
||||
"cua-agent": {
|
||||
"command": "/bin/bash",
|
||||
@@ -48,6 +50,7 @@ You can then use the script in your MCP configuration like this:
|
||||
If you get a `/bin/bash: ~/cua/libs/python/mcp-server/scripts/start_mcp_server.sh: No such file or directory` error, try changing the path to the script to be absolute instead of relative.
|
||||
|
||||
To see the logs:
|
||||
|
||||
```
|
||||
tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
|
||||
```
|
||||
```
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
---
|
||||
title: LLM Integrations
|
||||
---
|
||||
|
||||
## LiteLLM Integration
|
||||
|
||||
This MCP server features comprehensive liteLLM integration, allowing you to use any supported LLM provider with a simple model string configuration.
|
||||
@@ -10,7 +11,8 @@ This MCP server features comprehensive liteLLM integration, allowing you to use
|
||||
- **Extensive Provider Support**: Works with Anthropic, OpenAI, local models, and any liteLLM-compatible provider
|
||||
|
||||
### Model String Examples:
|
||||
|
||||
- **Anthropic**: `"anthropic/claude-3-5-sonnet-20241022"`
|
||||
- **OpenAI**: `"openai/computer-use-preview"`
|
||||
- **UI-TARS**: `"huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"`
|
||||
- **Omni + Any LiteLLM**: `"omniparser+litellm/gpt-4o"`, `"omniparser+litellm/claude-3-haiku"`, `"omniparser+ollama_chat/gemma3"`
|
||||
- **Omni + Any LiteLLM**: `"omniparser+litellm/gpt-4o"`, `"omniparser+litellm/claude-3-haiku"`, `"omniparser+ollama_chat/gemma3"`
|
||||
|
||||
@@ -7,4 +7,4 @@ title: Tools
|
||||
The MCP server exposes the following tools to Claude:
|
||||
|
||||
1. `run_cua_task` - Run a single Computer-Use Agent task with the given instruction
|
||||
2. `run_multi_cua_tasks` - Run multiple tasks in sequence
|
||||
2. `run_multi_cua_tasks` - Run multiple tasks in sequence
|
||||
|
||||
@@ -16,5 +16,6 @@ Claude will automatically use your CUA agent to perform these tasks.
|
||||
### First-time Usage Notes
|
||||
|
||||
**API Keys**: Ensure you have valid API keys:
|
||||
- Add your Anthropic API key, or other model provider API key in the Claude Desktop config (as shown above)
|
||||
- Or set it as an environment variable in your shell profile
|
||||
|
||||
- Add your Anthropic API key, or other model provider API key in the Claude Desktop config (as shown above)
|
||||
- Or set it as an environment variable in your shell profile
|
||||
|
||||
@@ -5,18 +5,28 @@ title: Configuration
|
||||
### Detection Parameters
|
||||
|
||||
#### Box Threshold (0.3)
|
||||
|
||||
Controls the confidence threshold for accepting detections:
|
||||
<img src="/docs/img/som_box_threshold.png" alt="Illustration of confidence thresholds in object detection, with a high-confidence detection accepted and a low-confidence detection rejected." width="500px" />
|
||||
- Higher values (0.3) yield more precise but fewer detections
|
||||
- Lower values (0.01) catch more potential icons but increase false positives
|
||||
- Default is 0.3 for optimal precision/recall balance
|
||||
|
||||
<img
|
||||
src="/docs/img/som_box_threshold.png"
|
||||
alt="Illustration of confidence thresholds in object detection, with a high-confidence detection accepted and a low-confidence detection rejected."
|
||||
width="500px"
|
||||
/>
|
||||
- Higher values (0.3) yield more precise but fewer detections - Lower values (0.01) catch more
|
||||
potential icons but increase false positives - Default is 0.3 for optimal precision/recall balance
|
||||
|
||||
#### IOU Threshold (0.1)
|
||||
|
||||
Controls how overlapping detections are merged:
|
||||
<img src="/docs/img/som_iou_threshold.png" alt="Diagram showing Intersection over Union (IOU) with low overlap between two boxes kept separate and high overlap leading to merging." width="500px" />
|
||||
- Lower values (0.1) more aggressively remove overlapping boxes
|
||||
- Higher values (0.5) allow more overlapping detections
|
||||
- Default is 0.1 to handle densely packed UI elements
|
||||
|
||||
<img
|
||||
src="/docs/img/som_iou_threshold.png"
|
||||
alt="Diagram showing Intersection over Union (IOU) with low overlap between two boxes kept separate and high overlap leading to merging."
|
||||
width="500px"
|
||||
/>
|
||||
- Lower values (0.1) more aggressively remove overlapping boxes - Higher values (0.5) allow more
|
||||
overlapping detections - Default is 0.1 to handle densely packed UI elements
|
||||
|
||||
### OCR Configuration
|
||||
|
||||
@@ -37,6 +47,7 @@ Controls how overlapping detections are merged:
|
||||
### Hardware Acceleration
|
||||
|
||||
#### MPS (Metal Performance Shaders)
|
||||
|
||||
- Multi-scale detection (640px, 1280px, 1920px)
|
||||
- Test-time augmentation enabled
|
||||
- Half-precision (FP16)
|
||||
@@ -44,6 +55,7 @@ Controls how overlapping detections are merged:
|
||||
- Best for production use when available
|
||||
|
||||
#### CPU
|
||||
|
||||
- Single-scale detection (1280px)
|
||||
- Full-precision (FP32)
|
||||
- Average detection time: ~1.3s
|
||||
@@ -63,4 +75,4 @@ examples/output/
|
||||
│ └── screenshot_analyzed.png
|
||||
├── screen_details.txt
|
||||
└── summary.json
|
||||
```
|
||||
```
|
||||
|
||||
@@ -6,7 +6,13 @@ github:
|
||||
- https://github.com/trycua/cua/tree/main/libs/python/som
|
||||
---
|
||||
|
||||
<Callout>A corresponding <a href="https://github.com/trycua/cua/blob/main/examples/som_examples.py" target="_blank">Python example</a> is available for this documentation.</Callout>
|
||||
<Callout>
|
||||
A corresponding{' '}
|
||||
<a href="https://github.com/trycua/cua/blob/main/examples/som_examples.py" target="_blank">
|
||||
Python example
|
||||
</a>{' '}
|
||||
is available for this documentation.
|
||||
</Callout>
|
||||
|
||||
## Overview
|
||||
|
||||
|
||||
@@ -35,7 +35,7 @@ You can run your Cua computer in the cloud (recommended for easiest setup), loca
|
||||
<Tab value="🍎 Lume">
|
||||
|
||||
Lume containers are macOS virtual machines that run on a macOS host machine.
|
||||
|
||||
|
||||
1. Install the Lume CLI:
|
||||
|
||||
```bash
|
||||
@@ -51,8 +51,8 @@ You can run your Cua computer in the cloud (recommended for easiest setup), loca
|
||||
</Tab>
|
||||
<Tab value="🪟 Windows Sandbox">
|
||||
|
||||
Windows Sandbox provides Windows virtual environments that run on a Windows host machine.
|
||||
|
||||
Windows Sandbox provides Windows virtual environments that run on a Windows host machine.
|
||||
|
||||
1. Enable [Windows Sandbox](https://learn.microsoft.com/en-us/windows/security/application-security/application-isolation/windows-sandbox/windows-sandbox-install) (requires Windows 10 Pro/Enterprise or Windows 11)
|
||||
2. Install the `pywinsandbox` dependency:
|
||||
|
||||
@@ -65,8 +65,8 @@ You can run your Cua computer in the cloud (recommended for easiest setup), loca
|
||||
</Tab>
|
||||
<Tab value="🐳 Docker">
|
||||
|
||||
Docker provides a way to run Ubuntu containers on any host machine.
|
||||
|
||||
Docker provides a way to run Ubuntu containers on any host machine.
|
||||
|
||||
1. Install Docker Desktop or Docker Engine:
|
||||
|
||||
2. Pull the CUA Ubuntu sandbox:
|
||||
@@ -173,6 +173,7 @@ Connect to your Cua computer and perform basic interactions, such as taking scre
|
||||
finally:
|
||||
await computer.close()
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab value="TypeScript">
|
||||
Install the Cua computer TypeScript SDK:
|
||||
@@ -260,6 +261,7 @@ Connect to your Cua computer and perform basic interactions, such as taking scre
|
||||
await computer.close();
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
@@ -274,11 +276,13 @@ Learn more about computers in the [Cua computers documentation](/computer-sdk/co
|
||||
Utilize an Agent to automate complex tasks by providing it with a goal and allowing it to interact with the computer environment.
|
||||
|
||||
Install the Cua agent Python SDK:
|
||||
|
||||
```bash
|
||||
pip install "cua-agent[all]"
|
||||
```
|
||||
|
||||
Then, use the `ComputerAgent` object:
|
||||
|
||||
```python
|
||||
from agent import ComputerAgent
|
||||
|
||||
|
||||
@@ -24,6 +24,7 @@ Basic performance metrics and system information that help us understand usage p
|
||||
### Opt-In Telemetry (Disabled by Default)
|
||||
|
||||
**Conversation Trajectory Logging**: Full conversation history including:
|
||||
|
||||
- User messages and agent responses
|
||||
- Computer actions and their outputs
|
||||
- Reasoning traces from the agent
|
||||
@@ -123,21 +124,21 @@ Note that telemetry settings must be configured during initialization and cannot
|
||||
|
||||
### Computer SDK Events
|
||||
|
||||
| Event Name | Data Collected | Trigger Notes |
|
||||
|------------|----------------|---------------|
|
||||
| **computer_initialized** | • `os`: Operating system (e.g., 'windows', 'darwin', 'linux')<br />• `os_version`: OS version<br />• `python_version`: Python version | Triggered when a Computer instance is created |
|
||||
| **module_init** | • `module`: "computer"<br />• `version`: Package version<br />• `python_version`: Full Python version string | Triggered once when the computer package is imported for the first time |
|
||||
| Event Name | Data Collected | Trigger Notes |
|
||||
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
|
||||
| **computer_initialized** | • `os`: Operating system (e.g., 'windows', 'darwin', 'linux')<br />• `os_version`: OS version<br />• `python_version`: Python version | Triggered when a Computer instance is created |
|
||||
| **module_init** | • `module`: "computer"<br />• `version`: Package version<br />• `python_version`: Full Python version string | Triggered once when the computer package is imported for the first time |
|
||||
|
||||
### Agent SDK Events
|
||||
|
||||
| Event Name | Data Collected | Trigger Notes |
|
||||
|------------|----------------|---------------|
|
||||
| **module_init** | • `module`: "agent"<br />• `version`: Package version<br />• `python_version`: Full Python version string | Triggered once when the agent package is imported for the first time |
|
||||
| **agent_session_start** | • `session_id`: Unique UUID for this agent instance<br />• `agent_type`: Class name (e.g., "ComputerAgent")<br />• `model`: Model name (e.g., "claude-3-5-sonnet")<br />• `os`: Operating system<br />• `os_version`: OS version<br />• `python_version`: Python version | Triggered when TelemetryCallback is initialized (agent instantiation) |
|
||||
| **agent_run_start** | • `session_id`: Agent session UUID<br />• `run_id`: Unique UUID for this run<br />• `start_time`: Unix timestamp<br />• `input_context_size`: Character count of input messages<br />• `num_existing_messages`: Count of existing messages<br />• `uploaded_trajectory`: Full conversation items (opt-in) | Triggered at the start of each agent.run() call |
|
||||
| **agent_run_end** | • `session_id`: Agent session UUID<br />• `run_id`: Run UUID<br />• `end_time`: Unix timestamp<br />• `duration_seconds`: Total run duration<br />• `num_steps`: Total steps taken in this run<br />• `total_usage`: Accumulated token usage and costs<br />• `uploaded_trajectory`: Full conversation items (opt-in) | Triggered at the end of each agent.run() call |
|
||||
| **agent_step** | • `session_id`: Agent session UUID<br />• `run_id`: Run UUID<br />• `step`: Step number (incremental)<br />• `timestamp`: Unix timestamp<br />• `duration_seconds`: Duration of previous step | Triggered on each agent response/step during a run |
|
||||
| **agent_usage** | • `session_id`: Agent session UUID<br />• `run_id`: Run UUID<br />• `step`: Current step number<br />• `prompt_tokens`: Tokens in prompt<br />• `completion_tokens`: Tokens in response<br />• `total_tokens`: Total tokens used<br />• `response_cost`: Cost of this API call | Triggered whenever usage information is received from LLM API |
|
||||
| Event Name | Data Collected | Trigger Notes |
|
||||
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
|
||||
| **module_init** | • `module`: "agent"<br />• `version`: Package version<br />• `python_version`: Full Python version string | Triggered once when the agent package is imported for the first time |
|
||||
| **agent_session_start** | • `session_id`: Unique UUID for this agent instance<br />• `agent_type`: Class name (e.g., "ComputerAgent")<br />• `model`: Model name (e.g., "claude-3-5-sonnet")<br />• `os`: Operating system<br />• `os_version`: OS version<br />• `python_version`: Python version | Triggered when TelemetryCallback is initialized (agent instantiation) |
|
||||
| **agent_run_start** | • `session_id`: Agent session UUID<br />• `run_id`: Unique UUID for this run<br />• `start_time`: Unix timestamp<br />• `input_context_size`: Character count of input messages<br />• `num_existing_messages`: Count of existing messages<br />• `uploaded_trajectory`: Full conversation items (opt-in) | Triggered at the start of each agent.run() call |
|
||||
| **agent_run_end** | • `session_id`: Agent session UUID<br />• `run_id`: Run UUID<br />• `end_time`: Unix timestamp<br />• `duration_seconds`: Total run duration<br />• `num_steps`: Total steps taken in this run<br />• `total_usage`: Accumulated token usage and costs<br />• `uploaded_trajectory`: Full conversation items (opt-in) | Triggered at the end of each agent.run() call |
|
||||
| **agent_step** | • `session_id`: Agent session UUID<br />• `run_id`: Run UUID<br />• `step`: Step number (incremental)<br />• `timestamp`: Unix timestamp<br />• `duration_seconds`: Duration of previous step | Triggered on each agent response/step during a run |
|
||||
| **agent_usage** | • `session_id`: Agent session UUID<br />• `run_id`: Run UUID<br />• `step`: Current step number<br />• `prompt_tokens`: Tokens in prompt<br />• `completion_tokens`: Tokens in response<br />• `total_tokens`: Total tokens used<br />• `response_cost`: Cost of this API call | Triggered whenever usage information is received from LLM API |
|
||||
|
||||
## Transparency
|
||||
|
||||
|
||||
Reference in New Issue
Block a user