From 5c1d7be321a06e3c7aaa8cc16a61760d50d9cd8b Mon Sep 17 00:00:00 2001 From: Morgan Dean Date: Thu, 31 Jul 2025 18:11:03 +0100 Subject: [PATCH] restore readme --- README.md | 262 ++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 246 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 92dd888f..834931ae 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ trycua%2Fcua | Trendshift -**cua** ("koo-ah") is Docker for [Computer-Use Agents](https://www.oneusefulthing.org/p/when-you-give-a-claude-a-mouse) - it enables AI agents to control full operating systems in virtual containers and deploy them locally or to the cloud. +**c/ua** ("koo-ah") is Docker for [Computer-Use Agents](https://www.oneusefulthing.org/p/when-you-give-a-claude-a-mouse) - it enables AI agents to control full operating systems in virtual containers and deploy them locally or to the cloud.
@@ -47,25 +47,146 @@
-# 🚀 Quick Start +# 🚀 Quick Start with a Computer-Use Agent UI -Read our guide on getting started with a Computer-Use Agent: -[Computer-Use Agent Quickstart](https://trycua.com/docs/guides/usage-guide) +**Need to automate desktop tasks? Launch the Computer-Use Agent UI with a single command.** -Get started using Cua services on your machine: -[Cua Usage Guide](https://docs.trycua.com/home/guides/cua-usage-guide) +### Option 1: Fully-managed install with Docker (recommended) -Set up a development environment with the Dev Container: -[Dev Container Setup](https://docs.trycua.com/home/guides/dev-container-setup) +*Docker-based guided install for quick use* -## Lume +**macOS/Linux/Windows (via WSL):** -For managing and creating virtual machines on macOS, check out [Lume](./libs/lume/README.md). +```bash +# Requires Docker +/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground-docker.sh)" +``` + +This script will guide you through setup using Docker containers and launch the Computer-Use Agent UI. + +--- + +### Option 2: [Dev Container](./.devcontainer/README.md) + +*Best for contributors and development* + +This repository includes a [Dev Container](./.devcontainer/README.md) configuration that simplifies setup to a few steps: + +1. **Install the Dev Containers extension ([VS Code](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) or [WindSurf](https://docs.windsurf.com/windsurf/advanced#dev-containers-beta))** +2. **Open the repository in the Dev Container:** + - Press `Ctrl+Shift+P` (or `⌘+Shift+P` on macOS) + - Select `Dev Containers: Clone Repository in Container Volume...` and paste the repository URL: `https://github.com/trycua/cua.git` (if not cloned) or `Dev Containers: Open Folder in Container...` (if git cloned). + > **Note**: On WindSurf, the post install hook might not run automatically. If so, run `/bin/bash .devcontainer/post-install.sh` manually. +3. **Open the VS Code workspace:** Once the post-install.sh is done running, open the `.vscode/py.code-workspace` workspace and press ![Open Workspace](https://github.com/user-attachments/assets/923bdd43-8c8f-4060-8d78-75bfa302b48c) +. +4. **Run the Agent UI example:** Click ![Run Agent UI](https://github.com/user-attachments/assets/7a61ef34-4b22-4dab-9864-f86bf83e290b) + to start the Gradio UI. If prompted to install **debugpy (Python Debugger)** to enable remote debugging, select 'Yes' to proceed. +5. **Access the Gradio UI:** The Gradio UI will be available at `http://localhost:7860` and will automatically forward to your host machine. + +--- + +### Option 3: PyPI + +*Direct Python package installation* + +```bash +# conda create -yn cua python==3.12 + +pip install -U "cua-computer[all]" "cua-agent[all]" +python -m agent.ui # Start the agent UI +``` + +Or check out the [Usage Guide](#-usage-guide) to learn how to use our Python SDK in your own code. + +--- + +## Supported [Agent Loops](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) + +- [UITARS-1.5](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Run locally on Apple Silicon with MLX, or use cloud providers +- [OpenAI CUA](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Use OpenAI's Computer-Use Preview model +- [Anthropic CUA](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Use Anthropic's Computer-Use capabilities +- [OmniParser-v2.0](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Control UI with [Set-of-Marks prompting](https://som-gpt4v.github.io/) using any vision model + +## 🖥️ Compatibility + +For detailed compatibility information including host OS support, VM emulation capabilities, and model provider compatibility, see the [Compatibility Matrix](./COMPATIBILITY.md). + +
+
+ +# 🐍 Usage Guide + +Follow these steps to use C/ua in your own Python code. See [Developer Guide](./docs/Developer-Guide.md) for building from source. + +### Step 1: Install Lume CLI + +```bash +/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)" +``` + +Lume CLI manages high-performance macOS/Linux VMs with near-native speed on Apple Silicon. + +### Step 2: Pull the macOS CUA Image + +```bash +lume pull macos-sequoia-cua:latest +``` + +The macOS CUA image contains the default Mac apps and the Computer Server for easy automation. + +### Step 3: Install Python SDK + +```bash +pip install "cua-computer[all]" "cua-agent[all]" +``` + +### Step 4: Use in Your Code + +```python +from computer import Computer +from agent import ComputerAgent, LLM + +async def main(): + # Start a local macOS VM + computer = Computer(os_type="macos") + await computer.run() + + # Or with C/ua Cloud Container + computer = Computer( + os_type="linux", + api_key="your_cua_api_key_here", + name="your_container_name_here" + ) + + # Example: Direct control of a macOS VM with Computer + computer.interface.delay = 0.1 # Wait 0.1 seconds between kb/m actions + await computer.interface.left_click(100, 200) + await computer.interface.type_text("Hello, world!") + screenshot_bytes = await computer.interface.screenshot() + + # Example: Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit + agent = ComputerAgent( + model="mlx/mlx-community/UI-TARS-1.5-7B-6bit", + tools=[computer], + ) + async for result in agent.run("Find the trycua/cua repository on GitHub and follow the quick start guide"): + print(result) + +if __name__ == "__main__": + asyncio.run(main()) +``` + +For ready-to-use examples, check out our [Notebooks](./notebooks/) collection. + +### Lume CLI Reference ```bash # Install Lume CLI and background service curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash +# List all VMs +lume ls + # Pull a VM image lume pull macos-sequoia-cua:latest @@ -77,9 +198,12 @@ lume run macos-sequoia-cua:latest # Stop a VM lume stop macos-sequoia-cua_latest + +# Delete a VM +lume delete macos-sequoia-cua_latest ``` -## Lumier +### Lumier CLI Reference For advanced container-like virtualization, check out [Lumier](./libs/lumier/README.md) - a Docker interface for macOS and Linux VMs. @@ -102,15 +226,15 @@ docker run -it --rm \ trycua/lumier:latest ``` -# Resources +## Resources -- [How to use the MCP Server with Claude Desktop or other MCP clients](./libs/python/mcp-server/README.md) - One of the easiest ways to get started with Cua +- [How to use the MCP Server with Claude Desktop or other MCP clients](./libs/python/mcp-server/README.md) - One of the easiest ways to get started with C/ua - [How to use OpenAI Computer-Use, Anthropic, OmniParser, or UI-TARS for your Computer-Use Agent](./libs/python/agent/README.md) - [How to use Lume CLI for managing desktops](./libs/lume/README.md) -- [Training Computer-Use Models: Collecting Human Trajectories with Cua (Part 1)](https://www.trycua.com/blog/training-computer-use-models-trajectories-1) +- [Training Computer-Use Models: Collecting Human Trajectories with C/ua (Part 1)](https://www.trycua.com/blog/training-computer-use-models-trajectories-1) - [Build Your Own Operator on macOS (Part 1)](https://www.trycua.com/blog/build-your-own-operator-on-macos-1) -# Modules +## Modules | Module | Description | Installation | |--------|-------------|---------------| @@ -125,6 +249,112 @@ docker run -it --rm \ | [**Core (Python)**](./libs/python/core/README.md) | Python Core utilities | `pip install cua-core` | | [**Core (Typescript)**](./libs/typescript/core/README.md) | Typescript Core utilities | `npm install @trycua/core` | +## Computer Interface Reference + +For complete examples, see [computer_examples.py](./examples/computer_examples.py) or [computer_nb.ipynb](./notebooks/computer_nb.ipynb) + +```python +# Shell Actions +result = await computer.interface.run_command(cmd) # Run shell command +# result.stdout, result.stderr, result.returncode + +# Mouse Actions +await computer.interface.left_click(x, y) # Left click at coordinates +await computer.interface.right_click(x, y) # Right click at coordinates +await computer.interface.double_click(x, y) # Double click at coordinates +await computer.interface.move_cursor(x, y) # Move cursor to coordinates +await computer.interface.drag_to(x, y, duration) # Drag to coordinates +await computer.interface.get_cursor_position() # Get current cursor position +await computer.interface.mouse_down(x, y, button="left") # Press and hold a mouse button +await computer.interface.mouse_up(x, y, button="left") # Release a mouse button + +# Keyboard Actions +await computer.interface.type_text("Hello") # Type text +await computer.interface.press_key("enter") # Press a single key +await computer.interface.hotkey("command", "c") # Press key combination +await computer.interface.key_down("command") # Press and hold a key +await computer.interface.key_up("command") # Release a key + +# Scrolling Actions +await computer.interface.scroll(x, y) # Scroll the mouse wheel +await computer.interface.scroll_down(clicks) # Scroll down +await computer.interface.scroll_up(clicks) # Scroll up + +# Screen Actions +await computer.interface.screenshot() # Take a screenshot +await computer.interface.get_screen_size() # Get screen dimensions + +# Clipboard Actions +await computer.interface.set_clipboard(text) # Set clipboard content +await computer.interface.copy_to_clipboard() # Get clipboard content + +# File System Operations +await computer.interface.file_exists(path) # Check if file exists +await computer.interface.directory_exists(path) # Check if directory exists +await computer.interface.read_text(path, encoding="utf-8") # Read file content +await computer.interface.write_text(path, content, encoding="utf-8") # Write file content +await computer.interface.read_bytes(path) # Read file content as bytes +await computer.interface.write_bytes(path, content) # Write file content as bytes +await computer.interface.delete_file(path) # Delete file +await computer.interface.create_dir(path) # Create directory +await computer.interface.delete_dir(path) # Delete directory +await computer.interface.list_dir(path) # List directory contents + +# Accessibility +await computer.interface.get_accessibility_tree() # Get accessibility tree + +# Delay Configuration +# Set default delay between all actions (in seconds) +computer.interface.delay = 0.5 # 500ms delay between actions + +# Or specify delay for individual actions +await computer.interface.left_click(x, y, delay=1.0) # 1 second delay after click +await computer.interface.type_text("Hello", delay=0.2) # 200ms delay after typing +await computer.interface.press_key("enter", delay=0.5) # 500ms delay after key press + +# Python Virtual Environment Operations +await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment +await computer.venv_cmd("demo_venv", "python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'") # Run a shell command in a virtual environment +await computer.venv_exec("demo_venv", python_function_or_code, *args, **kwargs) # Run a Python function in a virtual environment and return the result / raise an exception + +# Example: Use sandboxed functions to execute code in a C/ua Container +from computer.helpers import sandboxed + +@sandboxed("demo_venv") +def greet_and_print(name): + """Get the HTML of the current Safari tab""" + import PyXA + safari = PyXA.Application("Safari") + html = safari.current_document.source() + print(f"Hello from inside the container, {name}!") + return {"greeted": name, "safari_html": html} + +# When a @sandboxed function is called, it will execute in the container +result = await greet_and_print("C/ua") +# Result: {"greeted": "C/ua", "safari_html": "..."} +# stdout and stderr are also captured and printed / raised +print("Result from sandboxed function:", result) +``` + +## ComputerAgent Reference + +For complete examples, see [agent_examples.py](./examples/agent_examples.py) or [agent_nb.ipynb](./notebooks/agent_nb.ipynb) + +```python +# Import necessary components +from agent import ComputerAgent + +# UI-TARS-1.5 agent for local execution with MLX +ComputerAgent(model="mlx/mlx-community/UI-TARS-1.5-7B-6bit") +# OpenAI Computer-Use agent using OPENAI_API_KEY +ComputerAgent(model="computer-use-preview") +# Anthropic Claude agent using ANTHROPIC_API_KEY +ComputerAgent(model="anthropic/claude-3-5-sonnet-20240620") + +# OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision LLM +ComputerAgent(model="omniparser+ollama_chat/gemma3:12b-it-q4_K_M") +``` + ## Community Join our [Discord community](https://discord.com/invite/mVnXXpdE85) to discuss ideas, get assistance, or share your demos! @@ -179,4 +409,4 @@ Thank you to all our supporters! - + \ No newline at end of file