mirror of
https://github.com/trycua/computer.git
synced 2026-02-17 20:10:07 -06:00
restore readme
This commit is contained in:
262
README.md
262
README.md
@@ -13,7 +13,7 @@
|
||||
<a href="https://trendshift.io/repositories/13685" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13685" alt="trycua%2Fcua | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
||||
</div>
|
||||
|
||||
**cua** ("koo-ah") is Docker for [Computer-Use Agents](https://www.oneusefulthing.org/p/when-you-give-a-claude-a-mouse) - it enables AI agents to control full operating systems in virtual containers and deploy them locally or to the cloud.
|
||||
**c/ua** ("koo-ah") is Docker for [Computer-Use Agents](https://www.oneusefulthing.org/p/when-you-give-a-claude-a-mouse) - it enables AI agents to control full operating systems in virtual containers and deploy them locally or to the cloud.
|
||||
|
||||
<div align="center">
|
||||
<video src="https://github.com/user-attachments/assets/c619b4ea-bb8e-4382-860e-f3757e36af20" width="800" controls></video>
|
||||
@@ -47,25 +47,146 @@
|
||||
</details>
|
||||
</details><br/>
|
||||
|
||||
# 🚀 Quick Start
|
||||
# 🚀 Quick Start with a Computer-Use Agent UI
|
||||
|
||||
Read our guide on getting started with a Computer-Use Agent:
|
||||
[Computer-Use Agent Quickstart](https://trycua.com/docs/guides/usage-guide)
|
||||
**Need to automate desktop tasks? Launch the Computer-Use Agent UI with a single command.**
|
||||
|
||||
Get started using Cua services on your machine:
|
||||
[Cua Usage Guide](https://docs.trycua.com/home/guides/cua-usage-guide)
|
||||
### Option 1: Fully-managed install with Docker (recommended)
|
||||
|
||||
Set up a development environment with the Dev Container:
|
||||
[Dev Container Setup](https://docs.trycua.com/home/guides/dev-container-setup)
|
||||
*Docker-based guided install for quick use*
|
||||
|
||||
## Lume
|
||||
**macOS/Linux/Windows (via WSL):**
|
||||
|
||||
For managing and creating virtual machines on macOS, check out [Lume](./libs/lume/README.md).
|
||||
```bash
|
||||
# Requires Docker
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground-docker.sh)"
|
||||
```
|
||||
|
||||
This script will guide you through setup using Docker containers and launch the Computer-Use Agent UI.
|
||||
|
||||
---
|
||||
|
||||
### Option 2: [Dev Container](./.devcontainer/README.md)
|
||||
|
||||
*Best for contributors and development*
|
||||
|
||||
This repository includes a [Dev Container](./.devcontainer/README.md) configuration that simplifies setup to a few steps:
|
||||
|
||||
1. **Install the Dev Containers extension ([VS Code](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) or [WindSurf](https://docs.windsurf.com/windsurf/advanced#dev-containers-beta))**
|
||||
2. **Open the repository in the Dev Container:**
|
||||
- Press `Ctrl+Shift+P` (or `⌘+Shift+P` on macOS)
|
||||
- Select `Dev Containers: Clone Repository in Container Volume...` and paste the repository URL: `https://github.com/trycua/cua.git` (if not cloned) or `Dev Containers: Open Folder in Container...` (if git cloned).
|
||||
> **Note**: On WindSurf, the post install hook might not run automatically. If so, run `/bin/bash .devcontainer/post-install.sh` manually.
|
||||
3. **Open the VS Code workspace:** Once the post-install.sh is done running, open the `.vscode/py.code-workspace` workspace and press 
|
||||
.
|
||||
4. **Run the Agent UI example:** Click 
|
||||
to start the Gradio UI. If prompted to install **debugpy (Python Debugger)** to enable remote debugging, select 'Yes' to proceed.
|
||||
5. **Access the Gradio UI:** The Gradio UI will be available at `http://localhost:7860` and will automatically forward to your host machine.
|
||||
|
||||
---
|
||||
|
||||
### Option 3: PyPI
|
||||
|
||||
*Direct Python package installation*
|
||||
|
||||
```bash
|
||||
# conda create -yn cua python==3.12
|
||||
|
||||
pip install -U "cua-computer[all]" "cua-agent[all]"
|
||||
python -m agent.ui # Start the agent UI
|
||||
```
|
||||
|
||||
Or check out the [Usage Guide](#-usage-guide) to learn how to use our Python SDK in your own code.
|
||||
|
||||
---
|
||||
|
||||
## Supported [Agent Loops](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops)
|
||||
|
||||
- [UITARS-1.5](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Run locally on Apple Silicon with MLX, or use cloud providers
|
||||
- [OpenAI CUA](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Use OpenAI's Computer-Use Preview model
|
||||
- [Anthropic CUA](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Use Anthropic's Computer-Use capabilities
|
||||
- [OmniParser-v2.0](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Control UI with [Set-of-Marks prompting](https://som-gpt4v.github.io/) using any vision model
|
||||
|
||||
## 🖥️ Compatibility
|
||||
|
||||
For detailed compatibility information including host OS support, VM emulation capabilities, and model provider compatibility, see the [Compatibility Matrix](./COMPATIBILITY.md).
|
||||
|
||||
<br/>
|
||||
<br/>
|
||||
|
||||
# 🐍 Usage Guide
|
||||
|
||||
Follow these steps to use C/ua in your own Python code. See [Developer Guide](./docs/Developer-Guide.md) for building from source.
|
||||
|
||||
### Step 1: Install Lume CLI
|
||||
|
||||
```bash
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
```
|
||||
|
||||
Lume CLI manages high-performance macOS/Linux VMs with near-native speed on Apple Silicon.
|
||||
|
||||
### Step 2: Pull the macOS CUA Image
|
||||
|
||||
```bash
|
||||
lume pull macos-sequoia-cua:latest
|
||||
```
|
||||
|
||||
The macOS CUA image contains the default Mac apps and the Computer Server for easy automation.
|
||||
|
||||
### Step 3: Install Python SDK
|
||||
|
||||
```bash
|
||||
pip install "cua-computer[all]" "cua-agent[all]"
|
||||
```
|
||||
|
||||
### Step 4: Use in Your Code
|
||||
|
||||
```python
|
||||
from computer import Computer
|
||||
from agent import ComputerAgent, LLM
|
||||
|
||||
async def main():
|
||||
# Start a local macOS VM
|
||||
computer = Computer(os_type="macos")
|
||||
await computer.run()
|
||||
|
||||
# Or with C/ua Cloud Container
|
||||
computer = Computer(
|
||||
os_type="linux",
|
||||
api_key="your_cua_api_key_here",
|
||||
name="your_container_name_here"
|
||||
)
|
||||
|
||||
# Example: Direct control of a macOS VM with Computer
|
||||
computer.interface.delay = 0.1 # Wait 0.1 seconds between kb/m actions
|
||||
await computer.interface.left_click(100, 200)
|
||||
await computer.interface.type_text("Hello, world!")
|
||||
screenshot_bytes = await computer.interface.screenshot()
|
||||
|
||||
# Example: Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit
|
||||
agent = ComputerAgent(
|
||||
model="mlx/mlx-community/UI-TARS-1.5-7B-6bit",
|
||||
tools=[computer],
|
||||
)
|
||||
async for result in agent.run("Find the trycua/cua repository on GitHub and follow the quick start guide"):
|
||||
print(result)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
For ready-to-use examples, check out our [Notebooks](./notebooks/) collection.
|
||||
|
||||
### Lume CLI Reference
|
||||
|
||||
```bash
|
||||
# Install Lume CLI and background service
|
||||
curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash
|
||||
|
||||
# List all VMs
|
||||
lume ls
|
||||
|
||||
# Pull a VM image
|
||||
lume pull macos-sequoia-cua:latest
|
||||
|
||||
@@ -77,9 +198,12 @@ lume run macos-sequoia-cua:latest
|
||||
|
||||
# Stop a VM
|
||||
lume stop macos-sequoia-cua_latest
|
||||
|
||||
# Delete a VM
|
||||
lume delete macos-sequoia-cua_latest
|
||||
```
|
||||
|
||||
## Lumier
|
||||
### Lumier CLI Reference
|
||||
|
||||
For advanced container-like virtualization, check out [Lumier](./libs/lumier/README.md) - a Docker interface for macOS and Linux VMs.
|
||||
|
||||
@@ -102,15 +226,15 @@ docker run -it --rm \
|
||||
trycua/lumier:latest
|
||||
```
|
||||
|
||||
# Resources
|
||||
## Resources
|
||||
|
||||
- [How to use the MCP Server with Claude Desktop or other MCP clients](./libs/python/mcp-server/README.md) - One of the easiest ways to get started with Cua
|
||||
- [How to use the MCP Server with Claude Desktop or other MCP clients](./libs/python/mcp-server/README.md) - One of the easiest ways to get started with C/ua
|
||||
- [How to use OpenAI Computer-Use, Anthropic, OmniParser, or UI-TARS for your Computer-Use Agent](./libs/python/agent/README.md)
|
||||
- [How to use Lume CLI for managing desktops](./libs/lume/README.md)
|
||||
- [Training Computer-Use Models: Collecting Human Trajectories with Cua (Part 1)](https://www.trycua.com/blog/training-computer-use-models-trajectories-1)
|
||||
- [Training Computer-Use Models: Collecting Human Trajectories with C/ua (Part 1)](https://www.trycua.com/blog/training-computer-use-models-trajectories-1)
|
||||
- [Build Your Own Operator on macOS (Part 1)](https://www.trycua.com/blog/build-your-own-operator-on-macos-1)
|
||||
|
||||
# Modules
|
||||
## Modules
|
||||
|
||||
| Module | Description | Installation |
|
||||
|--------|-------------|---------------|
|
||||
@@ -125,6 +249,112 @@ docker run -it --rm \
|
||||
| [**Core (Python)**](./libs/python/core/README.md) | Python Core utilities | `pip install cua-core` |
|
||||
| [**Core (Typescript)**](./libs/typescript/core/README.md) | Typescript Core utilities | `npm install @trycua/core` |
|
||||
|
||||
## Computer Interface Reference
|
||||
|
||||
For complete examples, see [computer_examples.py](./examples/computer_examples.py) or [computer_nb.ipynb](./notebooks/computer_nb.ipynb)
|
||||
|
||||
```python
|
||||
# Shell Actions
|
||||
result = await computer.interface.run_command(cmd) # Run shell command
|
||||
# result.stdout, result.stderr, result.returncode
|
||||
|
||||
# Mouse Actions
|
||||
await computer.interface.left_click(x, y) # Left click at coordinates
|
||||
await computer.interface.right_click(x, y) # Right click at coordinates
|
||||
await computer.interface.double_click(x, y) # Double click at coordinates
|
||||
await computer.interface.move_cursor(x, y) # Move cursor to coordinates
|
||||
await computer.interface.drag_to(x, y, duration) # Drag to coordinates
|
||||
await computer.interface.get_cursor_position() # Get current cursor position
|
||||
await computer.interface.mouse_down(x, y, button="left") # Press and hold a mouse button
|
||||
await computer.interface.mouse_up(x, y, button="left") # Release a mouse button
|
||||
|
||||
# Keyboard Actions
|
||||
await computer.interface.type_text("Hello") # Type text
|
||||
await computer.interface.press_key("enter") # Press a single key
|
||||
await computer.interface.hotkey("command", "c") # Press key combination
|
||||
await computer.interface.key_down("command") # Press and hold a key
|
||||
await computer.interface.key_up("command") # Release a key
|
||||
|
||||
# Scrolling Actions
|
||||
await computer.interface.scroll(x, y) # Scroll the mouse wheel
|
||||
await computer.interface.scroll_down(clicks) # Scroll down
|
||||
await computer.interface.scroll_up(clicks) # Scroll up
|
||||
|
||||
# Screen Actions
|
||||
await computer.interface.screenshot() # Take a screenshot
|
||||
await computer.interface.get_screen_size() # Get screen dimensions
|
||||
|
||||
# Clipboard Actions
|
||||
await computer.interface.set_clipboard(text) # Set clipboard content
|
||||
await computer.interface.copy_to_clipboard() # Get clipboard content
|
||||
|
||||
# File System Operations
|
||||
await computer.interface.file_exists(path) # Check if file exists
|
||||
await computer.interface.directory_exists(path) # Check if directory exists
|
||||
await computer.interface.read_text(path, encoding="utf-8") # Read file content
|
||||
await computer.interface.write_text(path, content, encoding="utf-8") # Write file content
|
||||
await computer.interface.read_bytes(path) # Read file content as bytes
|
||||
await computer.interface.write_bytes(path, content) # Write file content as bytes
|
||||
await computer.interface.delete_file(path) # Delete file
|
||||
await computer.interface.create_dir(path) # Create directory
|
||||
await computer.interface.delete_dir(path) # Delete directory
|
||||
await computer.interface.list_dir(path) # List directory contents
|
||||
|
||||
# Accessibility
|
||||
await computer.interface.get_accessibility_tree() # Get accessibility tree
|
||||
|
||||
# Delay Configuration
|
||||
# Set default delay between all actions (in seconds)
|
||||
computer.interface.delay = 0.5 # 500ms delay between actions
|
||||
|
||||
# Or specify delay for individual actions
|
||||
await computer.interface.left_click(x, y, delay=1.0) # 1 second delay after click
|
||||
await computer.interface.type_text("Hello", delay=0.2) # 200ms delay after typing
|
||||
await computer.interface.press_key("enter", delay=0.5) # 500ms delay after key press
|
||||
|
||||
# Python Virtual Environment Operations
|
||||
await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment
|
||||
await computer.venv_cmd("demo_venv", "python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'") # Run a shell command in a virtual environment
|
||||
await computer.venv_exec("demo_venv", python_function_or_code, *args, **kwargs) # Run a Python function in a virtual environment and return the result / raise an exception
|
||||
|
||||
# Example: Use sandboxed functions to execute code in a C/ua Container
|
||||
from computer.helpers import sandboxed
|
||||
|
||||
@sandboxed("demo_venv")
|
||||
def greet_and_print(name):
|
||||
"""Get the HTML of the current Safari tab"""
|
||||
import PyXA
|
||||
safari = PyXA.Application("Safari")
|
||||
html = safari.current_document.source()
|
||||
print(f"Hello from inside the container, {name}!")
|
||||
return {"greeted": name, "safari_html": html}
|
||||
|
||||
# When a @sandboxed function is called, it will execute in the container
|
||||
result = await greet_and_print("C/ua")
|
||||
# Result: {"greeted": "C/ua", "safari_html": "<html>...</html>"}
|
||||
# stdout and stderr are also captured and printed / raised
|
||||
print("Result from sandboxed function:", result)
|
||||
```
|
||||
|
||||
## ComputerAgent Reference
|
||||
|
||||
For complete examples, see [agent_examples.py](./examples/agent_examples.py) or [agent_nb.ipynb](./notebooks/agent_nb.ipynb)
|
||||
|
||||
```python
|
||||
# Import necessary components
|
||||
from agent import ComputerAgent
|
||||
|
||||
# UI-TARS-1.5 agent for local execution with MLX
|
||||
ComputerAgent(model="mlx/mlx-community/UI-TARS-1.5-7B-6bit")
|
||||
# OpenAI Computer-Use agent using OPENAI_API_KEY
|
||||
ComputerAgent(model="computer-use-preview")
|
||||
# Anthropic Claude agent using ANTHROPIC_API_KEY
|
||||
ComputerAgent(model="anthropic/claude-3-5-sonnet-20240620")
|
||||
|
||||
# OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision LLM
|
||||
ComputerAgent(model="omniparser+ollama_chat/gemma3:12b-it-q4_K_M")
|
||||
```
|
||||
|
||||
## Community
|
||||
|
||||
Join our [Discord community](https://discord.com/invite/mVnXXpdE85) to discuss ideas, get assistance, or share your demos!
|
||||
@@ -179,4 +409,4 @@ Thank you to all our supporters!
|
||||
<!-- markdownlint-restore -->
|
||||
<!-- prettier-ignore-end -->
|
||||
|
||||
<!-- ALL-CONTRIBUTORS-LIST:END -->
|
||||
<!-- ALL-CONTRIBUTORS-LIST:END -->
|
||||
Reference in New Issue
Block a user