From 2d438f8b176346daae31270060d5ba8478ca65c4 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 10:34:51 -0400 Subject: [PATCH 01/26] cleanup readme --- README.md | 214 +++++++++++++++++++++++------------------------------- 1 file changed, 89 insertions(+), 125 deletions(-) diff --git a/README.md b/README.md index 640902dc..d09b385f 100644 --- a/README.md +++ b/README.md @@ -5,200 +5,164 @@ Cua logo - - [![Python](https://img.shields.io/badge/Python-333333?logo=python&logoColor=white&labelColor=333333)](#) [![Swift](https://img.shields.io/badge/Swift-F05138?logo=swift&logoColor=white)](#) [![macOS](https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0)](#) [![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white)](https://discord.com/invite/mVnXXpdE85) -**TL;DR**: **c/ua** (pronounced "koo-ah", short for Computer-Use Agent) is a framework that enables AI agents to control full operating systems within high-performance, lightweight virtual containers. It delivers up to 97% native speed on Apple Silicon and works with any vision language models. +**c/ua** (pronounced "koo-ah") enables AI agents to control full operating systems in high-performance virtual containers with near-native speed on Apple Silicon. -## What is c/ua? +
+ +
-**c/ua** offers two primary capabilities in a single integrated framework: +# šŸš€ Quick Start -1. **High-Performance Virtualization** - Create and run macOS/Linux virtual machines on Apple Silicon with near-native performance (up to 97% of native speed) using the **Lume CLI** with `Apple's Virtualization.Framework`. +Get started with a Computer-Use Agent UI and a VM with a single command: -2. **Computer-Use Interface & Agent** - A framework that allows AI systems to observe and control these virtual environments - interacting with applications, browsing the web, writing code, and performing complex workflows. -## Why Use c/ua? +```bash +/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground.sh)" +``` +This script will: +- Install Lume CLI for VM management +- Pull the latest macOS CUA image +- Set up Python environment and install required packages +- Create a desktop shortcut for easy access +- Launch the Computer-Use Agent UI -- **Security & Isolation**: Run AI agents in fully isolated virtual environments instead of giving them access to your main system -- **Performance**: [Near-native performance](https://browser.geekbench.com/v6/cpu/compare/11283746?baseline=11102709) on Apple Silicon -- **Flexibility**: Run macOS or Linux environments with the same framework -- **Reproducibility**: Create consistent, deterministic environments for AI agent workflows -- **LLM Integration**: Built-in support for connecting to various LLM providers - -## System Requirements +### System Requirements - Mac with Apple Silicon (M1/M2/M3/M4 series) - macOS 15 (Sequoia) or newer -- Python 3.10+ (required for the Computer, Agent, and MCP libraries). We recommend using Conda (or Anaconda) to create an ad hoc Python environment. - Disk space for VM images (30GB+ recommended) -## Quick Start -### Option 1: Lume CLI Only (VM Management) -If you only need the virtualization capabilities: +# šŸ’» For Developers + +### Step 1: Install Lume CLI ```bash /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)" ``` -Optionally, if you don't want Lume to run as a background service: +Lume CLI manages high-performance macOS/Linux VMs with near-native speed on Apple Silicon. + +### Step 2: Install Python SDK + ```bash -/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh) --no-background-service" +pip install cua-computer "cua-agent[all]" ``` -**Note:** If you choose this option, you'll need to manually start the Lume API service whenever needed by running `lume serve` in your terminal. This applies to Option 2 after completing step 1. +Alternatively, see the [Developer Guide](./docs/Developer-Guide.md) for building from source. -For Lume usage instructions, refer to the [Lume documentation](./libs/lume/README.md). +### Step 3: Use in Your Code -### Option 2: Full Computer-Use Agent Capabilities -If you want to use AI agents with virtualized environments: +```python +# Example: Using the Computer-Use Agent +from agent import ComputerAgent -1. Install the Lume CLI: - ```bash - /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)" - ``` +# Create and run an agent locally using UI-TARS and MLX +agent = ComputerAgent(computer=my_computer, loop="uitars") +agent.run("Search for information about CUA on GitHub") -2. Pull the latest macOS CUA image: - ```bash - lume pull macos-sequoia-cua:latest - ``` +# Example: Direct control of a macOS VM with Computer +from computer import Computer -3. Install the Python libraries: - ```bash - pip install cua-computer cua-agent[all] - ``` +async with Computer(os_type="macos") as computer: + # Take a screenshot + screenshot = await computer.screenshot() + # Click on an element + await computer.mouse.click(x=100, y=200) + # Type text + await computer.keyboard.type("Hello, world!") +``` -4. Use the libraries in your Python code: - ```python - from computer import Computer - from agent import ComputerAgent, LLM, AgentLoop, LLMProvider +For ready-to-use examples, check out our [Notebooks](./notebooks/) collection. - async with Computer(os_type="macos", display="1024x768") as macos_computer: - agent = ComputerAgent( - computer=macos_computer, - loop=AgentLoop.OPENAI, # or AgentLoop.ANTHROPIC, or AgentLoop.UITARS, or AgentLoop.OMNI - model=LLM(provider=LLMProvider.OPENAI) # or LLM(provider=LLMProvider.ANTHROPIC) - ) +### Lume CLI Reference - tasks = [ - "Look for a repository named trycua/cua on GitHub.", - ] +```bash +# Install Lume CLI +curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash - for task in tasks: - async for result in agent.run(task): - print(result) - ``` - - Explore the [Agent Notebook](./notebooks/) for a ready-to-run example. +# List available VM images +lume list -5. Optionally, you can use the Agent with a Gradio UI: +# Pull a VM image +lume pull macos-sequoia-cua:latest - ```python - from utils import load_dotenv_files - load_dotenv_files() - - from agent.ui.gradio.app import create_gradio_ui - - app = create_gradio_ui() - app.launch(share=False) - ``` +# Create a new VM +lume create my-vm --image macos-sequoia-cua:latest -### Option 3: Build from Source (Nightly) -If you want to contribute to the project or need the latest nightly features: +# Start a VM +lume start my-vm - ```bash - # Clone the repository - git clone https://github.com/trycua/cua.git - cd cua - - # Open the project in VSCode - code ./.vscode/py.code-workspace +# Stop a VM +lume stop my-vm - # Build the project - ./scripts/build.sh - ``` - - See our [Developer-Guide](./docs/Developer-Guide.md) for more information. +# Delete a VM +lume delete my-vm +``` -## Monorepo Libraries +## Resources -| Library | Description | Installation | Version | -|---------|-------------|--------------|---------| -| [**Lume**](./libs/lume/README.md) | CLI for running macOS/Linux VMs with near-native performance using Apple's `Virtualization.Framework`. | [![Download](https://img.shields.io/badge/Download-333333?style=for-the-badge&logo=github&logoColor=white)](https://github.com/trycua/cua/releases/latest/download/lume.pkg.tar.gz) | [![GitHub release](https://img.shields.io/github/v/release/trycua/cua?color=333333)](https://github.com/trycua/cua/releases) | -| [**Computer**](./libs/computer/README.md) | Computer-Use Interface (CUI) framework for interacting with macOS/Linux sandboxes | `pip install cua-computer` | [![PyPI](https://img.shields.io/pypi/v/cua-computer?color=333333)](https://pypi.org/project/cua-computer/) | -| [**Agent**](./libs/agent/README.md) | Computer-Use Agent (CUA) framework for running agentic workflows in macOS/Linux dedicated sandboxes | `pip install cua-agent` | [![PyPI](https://img.shields.io/pypi/v/cua-agent?color=333333)](https://pypi.org/project/cua-agent/) | +- [How to use Lume CLI for managing desktops](./libs/lume/README.md) +- [Training Computer-Use Models: Collecting Human Trajectories with C/ua (Part 1)](https://www.trycua.com/blog/training-computer-use-models-trajectories-1) +- [Build Your Own Operator on macOS (Part 1)](https://www.trycua.com/blog/build-your-own-operator-on-macos-1) -## Docs +## Modules -For the best onboarding experience with the packages in this monorepo, we recommend starting with the [Computer](./libs/computer/README.md) documentation to cover the core functionality of the Computer sandbox, then exploring the [Agent](./libs/agent/README.md) documentation to understand Cua's AI agent capabilities, and finally working through the Notebook examples. - -- [Lume](./libs/lume/README.md) -- [Computer](./libs/computer/README.md) -- [Agent](./libs/agent/README.md) -- [Notebooks](./notebooks/) +| Module | Description | Installation | +|--------|-------------|---------------| +| [**Lume**](./libs/lume/README.md) | VM management for macOS/Linux using Apple's Virtualization.Framework | `curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh \| bash` | +| [**Computer**](./libs/computer/README.md) | Interface for controlling virtual machines | `pip install cua-computer` | +| [**Agent**](./libs/agent/README.md) | AI agent framework for automating tasks | `pip install cua-agent` | +| [**SOM**](./libs/som/README.md) | Self-of-Mark library for Agent | `pip install cua-som` | +| [**PyLume**](./libs/pylume/README.md) | Python bindings for Lume | `pip install pylume` | +| [**Computer Server**](./libs/computer-server/README.md) | Server component for Computer | `pip install cua-computer-server` | +| [**Core**](./libs/core/README.md) | Core utilities | `pip install cua-core` | ## Demos -Demos of the Computer-Use Agent in action. Share your most impressive demos in Cua's [Discord community](https://discord.com/invite/mVnXXpdE85)! +Check out these demos of the Computer-Use Agent in action:
-MCP Server: Work with Claude Desktop and Tableau +MCP Server: Work with Claude Desktop and Tableau
- +
+
-
-AI-Gradio: multi-app workflow requiring browser, VS Code and terminal access +
+AI-Gradio: Multi-app workflow with browser, VS Code and terminal
-
-
-Notebook: Fix GitHub issue in Cursor -
-
- -
+## Community -
- -## Accessory Libraries - -| Library | Description | Installation | Version | -|---------|-------------|--------------|---------| -| [**Core**](./libs/core/README.md) | Core functionality and utilities used by other Cua packages | `pip install cua-core` | [![PyPI](https://img.shields.io/pypi/v/cua-core?color=333333)](https://pypi.org/project/cua-core/) | -| [**PyLume**](./libs/pylume/README.md) | Python bindings for Lume | `pip install pylume` | [![PyPI](https://img.shields.io/pypi/v/pylume?color=333333)](https://pypi.org/project/pylume/) | -| [**Computer Server**](./libs/computer-server/README.md) | Server component for the Computer-Use Interface (CUI) framework | `pip install cua-computer-server` | [![PyPI](https://img.shields.io/pypi/v/cua-computer-server?color=333333)](https://pypi.org/project/cua-computer-server/) | -| [**SOM**](./libs/som/README.md) | Self-of-Mark library for Agent | `pip install cua-som` | [![PyPI](https://img.shields.io/pypi/v/cua-som?color=333333)](https://pypi.org/project/cua-som/) | - -## Contributing - -We welcome and greatly appreciate contributions to Cua! Whether you're improving documentation, adding new features, fixing bugs, or adding new VM images, your efforts help make lume better for everyone. For detailed instructions on how to contribute, please refer to our [Contributing Guidelines](CONTRIBUTING.md). - -Join our [Discord community](https://discord.com/invite/mVnXXpdE85) to discuss ideas or get assistance. +Join our [Discord community](https://discord.com/invite/mVnXXpdE85) to discuss ideas, get assistance, or share your demos! ## License Cua is open-sourced under the MIT License - see the [LICENSE](LICENSE) file for details. -Microsoft's OmniParser, which is used in this project, is licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0) - see the [OmniParser LICENSE](https://github.com/microsoft/OmniParser/blob/master/LICENSE) file for details. +## Contributing + +We welcome contributions to CUA! Please refer to our [Contributing Guidelines](CONTRIBUTING.md) for details. ## Trademarks -Apple, macOS, and Apple Silicon are trademarks of Apple Inc. Ubuntu and Canonical are registered trademarks of Canonical Ltd. Microsoft is a registered trademark of Microsoft Corporation. This project is not affiliated with, endorsed by, or sponsored by Apple Inc., Canonical Ltd., or Microsoft Corporation. +Apple, macOS, and Apple Silicon are trademarks of Apple Inc. This project is not affiliated with, endorsed by, or sponsored by Apple Inc. -## Stargazers over time +## Stargazers + +Thank you to all our supporters! [![Stargazers over time](https://starchart.cc/trycua/cua.svg?variant=adaptive)](https://starchart.cc/trycua/cua) From a9edc1aaf0c7e76ed94d855a4df49e9903462740 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 10:39:29 -0400 Subject: [PATCH 02/26] revised readme --- README.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index d09b385f..b936cd0a 100644 --- a/README.md +++ b/README.md @@ -49,7 +49,15 @@ This script will: Lume CLI manages high-performance macOS/Linux VMs with near-native speed on Apple Silicon. -### Step 2: Install Python SDK +### Step 2: Pull the macOS CUA Image + +```bash +lume pull macos-sequoia-cua:latest +``` + +The macOS image contains the default Mac apps and the Computer Server for seamless interaction. + +### Step 3: Install Python SDK ```bash pip install cua-computer "cua-agent[all]" @@ -57,7 +65,7 @@ pip install cua-computer "cua-agent[all]" Alternatively, see the [Developer Guide](./docs/Developer-Guide.md) for building from source. -### Step 3: Use in Your Code +### Step 4: Use in Your Code ```python # Example: Using the Computer-Use Agent From be69a98fe477a85414e1a40258cb9f0c6d9b9332 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 10:39:36 -0400 Subject: [PATCH 03/26] revised readme --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b936cd0a..a231d785 100644 --- a/README.md +++ b/README.md @@ -55,7 +55,7 @@ Lume CLI manages high-performance macOS/Linux VMs with near-native speed on Appl lume pull macos-sequoia-cua:latest ``` -The macOS image contains the default Mac apps and the Computer Server for seamless interaction. +The macOS CUA image contains the default Mac apps and the Computer Server for easy automation. ### Step 3: Install Python SDK From d86f62917930cb9795e0a18b13fb3f60185e290c Mon Sep 17 00:00:00 2001 From: ddupont <3820588+ddupont808@users.noreply.github.com> Date: Fri, 9 May 2025 10:48:36 -0400 Subject: [PATCH 04/26] Update README.md added video --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index a231d785..df39ef90 100644 --- a/README.md +++ b/README.md @@ -14,8 +14,7 @@ **c/ua** (pronounced "koo-ah") enables AI agents to control full operating systems in high-performance virtual containers with near-native speed on Apple Silicon.
- -
+ # šŸš€ Quick Start @@ -25,6 +24,8 @@ Get started with a Computer-Use Agent UI and a VM with a single command: ```bash /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground.sh)" ``` + + This script will: - Install Lume CLI for VM management - Pull the latest macOS CUA image From 0bb11412438cb9d2697918fd4ab440f2740d38a3 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 11:21:02 -0400 Subject: [PATCH 05/26] different providers --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index a231d785..129dfb93 100644 --- a/README.md +++ b/README.md @@ -116,6 +116,7 @@ lume delete my-vm ## Resources +- [When and how to use OpenAI Computer-Use, Anthropic, OmniParser, or UI-TARS for your Computer-Use Agent](./libs/agent/README.md) - [How to use Lume CLI for managing desktops](./libs/lume/README.md) - [Training Computer-Use Models: Collecting Human Trajectories with C/ua (Part 1)](https://www.trycua.com/blog/training-computer-use-models-trajectories-1) - [Build Your Own Operator on macOS (Part 1)](https://www.trycua.com/blog/build-your-own-operator-on-macos-1) From c47a7d41b3cab3d7ce2bb921fe64935472660281 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 11:22:32 -0400 Subject: [PATCH 06/26] proper async --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 977d77a2..228b29e7 100644 --- a/README.md +++ b/README.md @@ -73,8 +73,8 @@ Alternatively, see the [Developer Guide](./docs/Developer-Guide.md) for building from agent import ComputerAgent # Create and run an agent locally using UI-TARS and MLX -agent = ComputerAgent(computer=my_computer, loop="uitars") -agent.run("Search for information about CUA on GitHub") +agent = ComputerAgent(computer=my_computer, loop="UITARS") +await agent.run("Search for information about CUA on GitHub") # Example: Direct control of a macOS VM with Computer from computer import Computer From af296a818bd94b20190a3eca5787bfa04fb4e2d8 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 11:54:58 -0400 Subject: [PATCH 07/26] added computer and agent reference --- README.md | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/README.md b/README.md index 228b29e7..3a0fdb7e 100644 --- a/README.md +++ b/README.md @@ -134,6 +134,62 @@ lume delete my-vm | [**Computer Server**](./libs/computer-server/README.md) | Server component for Computer | `pip install cua-computer-server` | | [**Core**](./libs/core/README.md) | Core utilities | `pip install cua-core` | +## Computer Interface Reference + +```python +# Mouse Actions +await computer.interface.left_click(x, y) # Left click at coordinates +await computer.interface.right_click(x, y) # Right click at coordinates +await computer.interface.double_click(x, y) # Double click at coordinates +await computer.interface.move_cursor(x, y) # Move cursor to coordinates +await computer.interface.drag_to(x, y, duration) # Drag to coordinates +await computer.interface.get_cursor_position() # Get current cursor position + +# Keyboard Actions +await computer.interface.type_text("Hello") # Type text +await computer.interface.press_key("enter") # Press a single key +await computer.interface.hotkey("command", "c") # Press key combination + +# Screen Actions +await computer.interface.screenshot() # Take a screenshot +await computer.interface.get_screen_size() # Get screen dimensions + +# Clipboard Actions +await computer.interface.set_clipboard(text) # Set clipboard content +await computer.interface.copy_to_clipboard() # Get clipboard content + +# File System Operations +await computer.interface.file_exists(path) # Check if file exists +await computer.interface.directory_exists(path) # Check if directory exists +await computer.interface.run_command(cmd) # Run shell command + +# Accessibility +await computer.interface.get_accessibility_tree() # Get accessibility tree +``` + +## ComputerAgent Reference + +```python +# Import necessary components +from agent import ComputerAgent, LLM, AgentLoop, LLMProvider + +# Agent Loops +ComputerAgent(loop=AgentLoop.UITARS) # UI-TARS loop for local execution with MLX +ComputerAgent(loop=AgentLoop.OPENAI) # OpenAI Computer-Use model using OpenAI provider +ComputerAgent(loop=AgentLoop.ANTHROPIC) # Anthropic Claude model using Anthropic provider +ComputerAgent(loop=AgentLoop.OMNI, model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:12b-it-q4_K_M")) # OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision model + +# OpenRouter example using OAICOMPAT provider +ComputerAgent( + loop=AgentLoop.OMNI, + model=LLM( + provider=LLMProvider.OAICOMPAT, + name="openai/gpt-4.1", + provider_base_url="https://openrouter.ai/api/v1" + ) +) +``` + ## Demos Check out these demos of the Computer-Use Agent in action: From c42fea6bb2b20e91b0640c48c2b2a6ed4e41613b Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 11:56:30 -0400 Subject: [PATCH 08/26] moved omniparser comment --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 3a0fdb7e..0c795eff 100644 --- a/README.md +++ b/README.md @@ -177,7 +177,9 @@ from agent import ComputerAgent, LLM, AgentLoop, LLMProvider ComputerAgent(loop=AgentLoop.UITARS) # UI-TARS loop for local execution with MLX ComputerAgent(loop=AgentLoop.OPENAI) # OpenAI Computer-Use model using OpenAI provider ComputerAgent(loop=AgentLoop.ANTHROPIC) # Anthropic Claude model using Anthropic provider -ComputerAgent(loop=AgentLoop.OMNI, model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:12b-it-q4_K_M")) # OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision model + +# OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision model +ComputerAgent(loop=AgentLoop.OMNI, model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:12b-it-q4_K_M")) # OpenRouter example using OAICOMPAT provider ComputerAgent( From 20b2227ac398df074dcba70b0218e51288f6cb31 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 12:01:33 -0400 Subject: [PATCH 09/26] more mcp --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 0c795eff..85eaea72 100644 --- a/README.md +++ b/README.md @@ -117,6 +117,7 @@ lume delete my-vm ## Resources +- [How to use the MCP Server with Claude Desktop or other MCP clients](./libs/mcp-server/README.md) - One of the easiest ways to get started with C/ua - [When and how to use OpenAI Computer-Use, Anthropic, OmniParser, or UI-TARS for your Computer-Use Agent](./libs/agent/README.md) - [How to use Lume CLI for managing desktops](./libs/lume/README.md) - [Training Computer-Use Models: Collecting Human Trajectories with C/ua (Part 1)](https://www.trycua.com/blog/training-computer-use-models-trajectories-1) @@ -129,6 +130,7 @@ lume delete my-vm | [**Lume**](./libs/lume/README.md) | VM management for macOS/Linux using Apple's Virtualization.Framework | `curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh \| bash` | | [**Computer**](./libs/computer/README.md) | Interface for controlling virtual machines | `pip install cua-computer` | | [**Agent**](./libs/agent/README.md) | AI agent framework for automating tasks | `pip install cua-agent` | +| [**MCP Server**](./libs/mcp-server/README.md) | MCP server for using CUA with Claude Desktop | `pip install cua-mcp-server` | | [**SOM**](./libs/som/README.md) | Self-of-Mark library for Agent | `pip install cua-som` | | [**PyLume**](./libs/pylume/README.md) | Python bindings for Lume | `pip install pylume` | | [**Computer Server**](./libs/computer-server/README.md) | Server component for Computer | `pip install cua-computer-server` | From 33c898ff2ed79a2438b096810dcdfdf17cab25d2 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 13:19:23 -0400 Subject: [PATCH 10/26] updated reference --- README.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 85eaea72..e4962eb8 100644 --- a/README.md +++ b/README.md @@ -176,11 +176,11 @@ await computer.interface.get_accessibility_tree() # Get accessibility tree from agent import ComputerAgent, LLM, AgentLoop, LLMProvider # Agent Loops -ComputerAgent(loop=AgentLoop.UITARS) # UI-TARS loop for local execution with MLX -ComputerAgent(loop=AgentLoop.OPENAI) # OpenAI Computer-Use model using OpenAI provider -ComputerAgent(loop=AgentLoop.ANTHROPIC) # Anthropic Claude model using Anthropic provider +ComputerAgent(loop=AgentLoop.UITARS) # UI-TARS-1.5 agent for local execution with MLX +ComputerAgent(loop=AgentLoop.OPENAI) # OpenAI Computer-Use agent using OPENAI_API_KEY +ComputerAgent(loop=AgentLoop.ANTHROPIC) # Anthropic Claude agent using ANTHROPIC_API_KEY -# OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision model +# OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision LLM ComputerAgent(loop=AgentLoop.OMNI, model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:12b-it-q4_K_M")) # OpenRouter example using OAICOMPAT provider @@ -190,7 +190,8 @@ ComputerAgent( provider=LLMProvider.OAICOMPAT, name="openai/gpt-4.1", provider_base_url="https://openrouter.ai/api/v1" - ) + ), + api_key="your-openrouter-api-key" ) ``` From 0b2264b93a163c4717c76f9e27f808b6de28ebba Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 13:24:09 -0400 Subject: [PATCH 11/26] added links to examples --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index e4962eb8..893d2750 100644 --- a/README.md +++ b/README.md @@ -138,6 +138,8 @@ lume delete my-vm ## Computer Interface Reference +For complete examples, see [computer_examples.py](./examples/computer_examples.py) or [computer_nb.ipynb](./notebooks/computer_nb.ipynb) + ```python # Mouse Actions await computer.interface.left_click(x, y) # Left click at coordinates @@ -171,6 +173,8 @@ await computer.interface.get_accessibility_tree() # Get accessibility tree ## ComputerAgent Reference +For complete examples, see [agent_examples.py](./examples/agent_examples.py) or [agent_nb.ipynb](./notebooks/agent_nb.ipynb) + ```python # Import necessary components from agent import ComputerAgent, LLM, AgentLoop, LLMProvider From fe695b1aa57f6d9afde540e0448761a054f7cb50 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 13:55:11 -0400 Subject: [PATCH 12/26] lume and lumier updates in readme --- README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 893d2750..b8eb0f86 100644 --- a/README.md +++ b/README.md @@ -96,17 +96,17 @@ For ready-to-use examples, check out our [Notebooks](./notebooks/) collection. # Install Lume CLI curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash -# List available VM images -lume list +# List all VMs +lume ls # Pull a VM image lume pull macos-sequoia-cua:latest # Create a new VM -lume create my-vm --image macos-sequoia-cua:latest +lume create my-vm --os macos --cpu 4 --memory 8GB --disk-size 50GB -# Start a VM -lume start my-vm +# Run a VM (creates and starts if it doesn't exist) +lume run macos-sequoia-cua:latest # Stop a VM lume stop my-vm @@ -115,6 +115,8 @@ lume stop my-vm lume delete my-vm ``` +For advanced container-based virtualization, check out [Lumier](./libs/lumier/README.md) - a Docker interface for macOS and Linux VMs. + ## Resources - [How to use the MCP Server with Claude Desktop or other MCP clients](./libs/mcp-server/README.md) - One of the easiest ways to get started with C/ua From 7e00a77adf5f564773f3808c443c298052284ba0 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 13:58:23 -0400 Subject: [PATCH 13/26] wording --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index b8eb0f86..60970a6d 100644 --- a/README.md +++ b/README.md @@ -109,13 +109,13 @@ lume create my-vm --os macos --cpu 4 --memory 8GB --disk-size 50GB lume run macos-sequoia-cua:latest # Stop a VM -lume stop my-vm +lume stop macos-sequoia-cua_latest # Delete a VM -lume delete my-vm +lume delete macos-sequoia-cua_latest ``` -For advanced container-based virtualization, check out [Lumier](./libs/lumier/README.md) - a Docker interface for macOS and Linux VMs. +For advanced container-like virtualization, check out [Lumier](./libs/lumier/README.md) - a Docker interface for macOS and Linux VMs. ## Resources From 116b4dc9a90761455a4300f4fb831caa50985ab6 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 14:02:18 -0400 Subject: [PATCH 14/26] interface fix --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 60970a6d..f57994b3 100644 --- a/README.md +++ b/README.md @@ -81,11 +81,11 @@ from computer import Computer async with Computer(os_type="macos") as computer: # Take a screenshot - screenshot = await computer.screenshot() + screenshot = await computer.interface.screenshot() # Click on an element - await computer.mouse.click(x=100, y=200) + await computer.interface.left_click(100, 200) # Type text - await computer.keyboard.type("Hello, world!") + await computer.interface.type_text("Hello, world!") ``` For ready-to-use examples, check out our [Notebooks](./notebooks/) collection. From ce0fd05d6d9411b01d9f7f2cb2e65a55d19446a4 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 14:54:33 -0400 Subject: [PATCH 15/26] optimize onboarding --- libs/agent/agent/ui/gradio/app.py | 190 +++++++++++++++++++++++------- scripts/playground.sh | 157 ++++++++++++++++++++++++ 2 files changed, 304 insertions(+), 43 deletions(-) create mode 100755 scripts/playground.sh diff --git a/libs/agent/agent/ui/gradio/app.py b/libs/agent/agent/ui/gradio/app.py index a4541019..2ab2a3ca 100644 --- a/libs/agent/agent/ui/gradio/app.py +++ b/libs/agent/agent/ui/gradio/app.py @@ -480,6 +480,83 @@ def create_gradio_ui( "Open Safari, search for 'macOS automation tools', and save the first three results as bookmarks", "Configure SSH keys and set up a connection to a remote server", ] + + # Function to generate Python code based on configuration and tasks + def generate_python_code(agent_loop_choice, provider, model_name, tasks, provider_url, recent_images=3, save_trajectory=True): + """Generate Python code for the current configuration and tasks. + + Args: + agent_loop_choice: The agent loop type (e.g., UITARS, OPENAI, ANTHROPIC, OMNI) + provider: The provider type (e.g., OPENAI, ANTHROPIC, OLLAMA, OAICOMPAT) + model_name: The model name + tasks: List of tasks to execute + provider_url: The provider base URL for OAICOMPAT providers + recent_images: Number of recent images to keep in context + save_trajectory: Whether to save the agent trajectory + + Returns: + Formatted Python code as a string + """ + # Format the tasks as a Python list + tasks_str = "" + for task in tasks: + if task and task.strip(): + tasks_str += f' "{task}",\n' + + # Create the Python code template + code = f'''import asyncio +from computer import Computer +from agent import ComputerAgent, LLM, AgentLoop, LLMProvider + +async def main(): + async with Computer() as macos_computer: + agent = ComputerAgent( + computer=macos_computer, + loop=AgentLoop.{agent_loop_choice}, + only_n_most_recent_images={recent_images}, + save_trajectory={save_trajectory},''' + + # Add the model configuration based on provider + if provider == LLMProvider.OAICOMPAT: + code += f''' + model=LLM( + provider=LLMProvider.OAICOMPAT, + name="{model_name}", + provider_base_url="{provider_url}" + )''' + + code += """ + ) + """ + + # Add tasks section if there are tasks + if tasks_str: + code += f''' + # Prompts for the computer-use agent + tasks = [ +{tasks_str.rstrip()} + ] + + for task in tasks: + print(f"Executing task: {{task}}") + async for result in agent.run(task): + print(result)''' + else: + # If no tasks, just add a placeholder for a single task + code += f''' + # Execute a single task + task = "Search for information about CUA on GitHub" + print(f"Executing task: {{task}}") + async for result in agent.run(task): + print(result)''' + + # Add the main block + code += ''' + +if __name__ == "__main__": + asyncio.run(main())''' + + return code # Function to update model choices based on agent loop selection def update_model_choices(loop): @@ -537,50 +614,20 @@ def create_gradio_ui( """ ) - # Add installation prerequisites as a collapsible section - with gr.Accordion("Prerequisites & Installation", open=False): - gr.Markdown( - """ - ## Prerequisites - - Before using the Computer-Use Agent, you need to set up the Lume daemon and pull the macOS VM image. - - ### 1. Install Lume daemon - - While a lume binary is included with Computer, we recommend installing the standalone version with brew, and starting the lume daemon service: - - ```bash - sudo /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)" - ``` - - ### 2. Start the Lume daemon service - - In a separate terminal: - - ```bash - lume serve - ``` - - ### 3. Pull the pre-built macOS image - - ```bash - lume pull macos-sequoia-cua:latest - ``` - - Initial download requires 80GB storage, but reduces to ~30GB after first run due to macOS's sparse file system. - - VMs are stored in `~/.lume`, and locally cached images are stored in `~/.lume/cache`. - - ### 4. Test the sandbox - - ```bash - lume run macos-sequoia-cua:latest - ``` - - For more detailed instructions, visit the [CUA GitHub repository](https://github.com/trycua/cua). - """ + # Add accordion for Python code + with gr.Accordion("Python Code", open=False): + code_display = gr.Code( + language="python", + value=generate_python_code( + initial_loop, + LLMProvider.OPENAI, + "gpt-4o", + [], + "https://openrouter.ai/api/v1" + ), + interactive=False, ) - + with gr.Accordion("Configuration", open=True): # Configuration options agent_loop = gr.Dropdown( @@ -643,6 +690,7 @@ def create_gradio_ui( info="Number of recent images to keep in context", interactive=True, ) + # Right column for chat interface with gr.Column(scale=2): @@ -900,6 +948,62 @@ def create_gradio_ui( queue=False, # Process immediately without queueing ) + # Function to update the code display based on configuration and chat history + def update_code_display(agent_loop, model_choice_val, custom_model_val, chat_history, provider_base_url, recent_images_val, save_trajectory_val): + # Extract messages from chat history + messages = [] + if chat_history: + for msg in chat_history: + if msg.get("role") == "user": + messages.append(msg.get("content", "")) + + # Determine provider and model name based on selection + model_string = custom_model_val if model_choice_val == "Custom model..." else model_choice_val + provider, model_name, _ = get_provider_and_model(model_string, agent_loop) + + # Generate and return the code + return generate_python_code( + agent_loop, + provider, + model_name, + messages, + provider_base_url, + recent_images_val, + save_trajectory_val + ) + + # Update code display when configuration changes + agent_loop.change( + update_code_display, + inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory], + outputs=[code_display] + ) + model_choice.change( + update_code_display, + inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory], + outputs=[code_display] + ) + custom_model.change( + update_code_display, + inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory], + outputs=[code_display] + ) + chatbot_history.change( + update_code_display, + inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory], + outputs=[code_display] + ) + recent_images.change( + update_code_display, + inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory], + outputs=[code_display] + ) + save_trajectory.change( + update_code_display, + inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory], + outputs=[code_display] + ) + return demo diff --git a/scripts/playground.sh b/scripts/playground.sh new file mode 100755 index 00000000..bad1df3b --- /dev/null +++ b/scripts/playground.sh @@ -0,0 +1,157 @@ +#!/bin/bash + +set -e + +echo "šŸš€ Setting up CUA playground environment..." + +# Check for Apple Silicon Mac +if [[ $(uname -s) != "Darwin" || $(uname -m) != "arm64" ]]; then + echo "āŒ This script requires an Apple Silicon Mac (M1/M2/M3/M4)." + exit 1 +fi + +# Check for macOS 15 (Sequoia) or newer +OSVERSION=$(sw_vers -productVersion) +if [[ $(echo "$OSVERSION 15.0" | tr " " "\n" | sort -V | head -n 1) != "15.0" ]]; then + echo "āŒ This script requires macOS 15 (Sequoia) or newer. You have $OSVERSION." + exit 1 +fi + +# Create a temporary directory for our work +TMP_DIR=$(mktemp -d) +cd "$TMP_DIR" + +# Function to clean up on exit +cleanup() { + cd ~ + rm -rf "$TMP_DIR" +} +trap cleanup EXIT + +# Install Lume if not already installed +if ! command -v lume &> /dev/null; then + echo "šŸ“¦ Installing Lume CLI..." + curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash + + # Add lume to PATH for this session if it's not already there + if ! command -v lume &> /dev/null; then + export PATH="$PATH:$HOME/.lume/bin" + fi +fi + +# Pull the macOS CUA image if not already present +if ! lume ls | grep -q "macos-sequoia-cua"; then + # Check available disk space + IMAGE_SIZE_GB=30 + AVAILABLE_SPACE_KB=$(df -k $HOME | tail -1 | awk '{print $4}') + AVAILABLE_SPACE_GB=$(($AVAILABLE_SPACE_KB / 1024 / 1024)) + + echo "šŸ“Š The macOS CUA image will use approximately ${IMAGE_SIZE_GB}GB of disk space." + echo " You currently have ${AVAILABLE_SPACE_GB}GB available on your system." + + # Prompt for confirmation + read -p " Continue? [y]/n: " CONTINUE + CONTINUE=${CONTINUE:-y} + + if [[ $CONTINUE =~ ^[Yy]$ ]]; then + echo "šŸ“„ Pulling macOS CUA image (this may take a while)..." + lume pull macos-sequoia-cua:latest + else + echo "āŒ Installation cancelled." + exit 1 + fi +fi + +# Create a Python virtual environment +echo "šŸ Setting up Python environment..." +PYTHON_CMD="python3" + +# Check if Python 3.11+ is available +PYTHON_VERSION=$($PYTHON_CMD --version 2>&1 | cut -d" " -f2) +PYTHON_MAJOR=$(echo $PYTHON_VERSION | cut -d. -f1) +PYTHON_MINOR=$(echo $PYTHON_VERSION | cut -d. -f2) + +if [ "$PYTHON_MAJOR" -lt 3 ] || ([ "$PYTHON_MAJOR" -eq 3 ] && [ "$PYTHON_MINOR" -lt 11 ]); then + echo "āŒ Python 3.11+ is required. You have $PYTHON_VERSION." + echo "Please install Python 3.11+ and try again." + exit 1 +fi + +# Create a virtual environment +VENV_DIR="$HOME/.cua-venv" +if [ ! -d "$VENV_DIR" ]; then + $PYTHON_CMD -m venv "$VENV_DIR" +fi + +# Activate the virtual environment +source "$VENV_DIR/bin/activate" + +# Install required packages +echo "šŸ“¦ Installing CUA packages..." +pip install -U pip +pip install cua-computer cua-agent[all] + +# Setup environment for MCP server +echo "šŸ”§ Setting up MCP server..." + +# Create a simple demo script +DEMO_DIR="$HOME/.cua-demo" +mkdir -p "$DEMO_DIR" + +cat > "$DEMO_DIR/run_demo.py" << 'EOF' +import asyncio +import os +from computer import Computer +from agent import ComputerAgent, LLM, AgentLoop, LLMProvider +from agent.ui.gradio.app import create_gradio_ui + +# Try to load API keys from environment +api_key = os.environ.get("OPENAI_API_KEY", "") +if not api_key: + print("\nāš ļø No OpenAI API key found. You'll need to provide one in the UI.") + +# Launch the Gradio UI +app = create_gradio_ui() +app.launch(share=False) +EOF + +# Create a convenience script to run the demo +cat > "$DEMO_DIR/start_demo.sh" << EOF +#!/bin/bash +source "$VENV_DIR/bin/activate" +cd "$DEMO_DIR" +python run_demo.py +EOF +chmod +x "$DEMO_DIR/start_demo.sh" + +# Create a script to run the MCP server with the correct PYTHONPATH +cat > "$DEMO_DIR/start_mcp_server.sh" << EOF +#!/bin/bash +source "$VENV_DIR/bin/activate" + +# Set PYTHONPATH to include all necessary libraries +export PYTHONPATH="$PYTHONPATH:$(pip show cua-computer-server | grep Location | cut -d' ' -f2)" + +# Run the MCP server using the Python module approach +python -m computer_server.mcp_server +EOF +chmod +x "$DEMO_DIR/start_mcp_server.sh" + +# Create a desktop shortcut for the demo +cat > "$HOME/Desktop/CUA Playground.command" << EOF +#!/bin/bash +"$DEMO_DIR/start_demo.sh" +EOF +chmod +x "$HOME/Desktop/CUA Playground.command" + +echo "āœ… Setup complete!" +echo "šŸ–„ļø You can start the CUA playground by running: $DEMO_DIR/start_demo.sh" +echo "šŸ–±ļø Or double-click the 'CUA Playground' shortcut on your desktop" +echo "šŸ¤– To run the MCP server: $DEMO_DIR/start_mcp_server.sh" + +# Ask if the user wants to start the demo now +read -p "Would you like to start the CUA playground now? (y/n) " -n 1 -r +echo +if [[ $REPLY =~ ^[Yy]$ ]]; then + "$DEMO_DIR/start_demo.sh" +fi From 22c3a9062edb973d3903c49ad00418aab4399d83 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 15:57:33 -0400 Subject: [PATCH 16/26] removed mcp from playground.sh for now --- scripts/playground.sh | 43 ++++++++++++++++--------------------------- 1 file changed, 16 insertions(+), 27 deletions(-) diff --git a/scripts/playground.sh b/scripts/playground.sh index bad1df3b..6614bbb0 100755 --- a/scripts/playground.sh +++ b/scripts/playground.sh @@ -91,9 +91,6 @@ echo "šŸ“¦ Installing CUA packages..." pip install -U pip pip install cua-computer cua-agent[all] -# Setup environment for MCP server -echo "šŸ”§ Setting up MCP server..." - # Create a simple demo script DEMO_DIR="$HOME/.cua-demo" mkdir -p "$DEMO_DIR" @@ -110,9 +107,9 @@ api_key = os.environ.get("OPENAI_API_KEY", "") if not api_key: print("\nāš ļø No OpenAI API key found. You'll need to provide one in the UI.") -# Launch the Gradio UI +# Launch the Gradio UI and open it in the browser app = create_gradio_ui() -app.launch(share=False) +app.launch(share=False, inbrowser=True) EOF # Create a convenience script to run the demo @@ -124,34 +121,26 @@ python run_demo.py EOF chmod +x "$DEMO_DIR/start_demo.sh" -# Create a script to run the MCP server with the correct PYTHONPATH -cat > "$DEMO_DIR/start_mcp_server.sh" << EOF -#!/bin/bash -source "$VENV_DIR/bin/activate" - -# Set PYTHONPATH to include all necessary libraries -export PYTHONPATH="$PYTHONPATH:$(pip show cua-computer-server | grep Location | cut -d' ' -f2)" - -# Run the MCP server using the Python module approach -python -m computer_server.mcp_server -EOF -chmod +x "$DEMO_DIR/start_mcp_server.sh" - -# Create a desktop shortcut for the demo -cat > "$HOME/Desktop/CUA Playground.command" << EOF -#!/bin/bash -"$DEMO_DIR/start_demo.sh" -EOF -chmod +x "$HOME/Desktop/CUA Playground.command" - echo "āœ… Setup complete!" echo "šŸ–„ļø You can start the CUA playground by running: $DEMO_DIR/start_demo.sh" -echo "šŸ–±ļø Or double-click the 'CUA Playground' shortcut on your desktop" -echo "šŸ¤– To run the MCP server: $DEMO_DIR/start_mcp_server.sh" + +# Check if the VM is running +echo "šŸ” Checking if the macOS CUA VM is running..." +VM_RUNNING=$(lume ls | grep "macos-sequoia-cua" | grep "running" || echo "") + +if [ -z "$VM_RUNNING" ]; then + echo "šŸš€ Starting the macOS CUA VM..." + lume start macos-sequoia-cua:latest + echo "āœ… VM started successfully." +else + echo "āœ… macOS CUA VM is already running." +fi # Ask if the user wants to start the demo now read -p "Would you like to start the CUA playground now? (y/n) " -n 1 -r echo if [[ $REPLY =~ ^[Yy]$ ]]; then + echo "šŸš€ Starting the CUA playground..." + echo "" "$DEMO_DIR/start_demo.sh" fi From 82278b8670e789fb271e64071ed21e0c135eea3f Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 16:00:14 -0400 Subject: [PATCH 17/26] update playground script docs --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index f57994b3..2bfc74af 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,6 @@ This script will: - Install Lume CLI for VM management - Pull the latest macOS CUA image - Set up Python environment and install required packages -- Create a desktop shortcut for easy access - Launch the Computer-Use Agent UI ### System Requirements From c620ec613e090fbf382449c4cbb358455685c029 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 16:01:06 -0400 Subject: [PATCH 18/26] update playground docs --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 2bfc74af..726c7892 100644 --- a/README.md +++ b/README.md @@ -27,9 +27,9 @@ Get started with a Computer-Use Agent UI and a VM with a single command: This script will: -- Install Lume CLI for VM management -- Pull the latest macOS CUA image -- Set up Python environment and install required packages +- Install Lume CLI for VM management (if needed) +- Pull the latest macOS CUA image (if needed) +- Set up Python environment and install/update required packages - Launch the Computer-Use Agent UI ### System Requirements From a45279bb603d74882ccc7169275d33d106e37c2c Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 16:11:53 -0400 Subject: [PATCH 19/26] specify default quant --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 726c7892..e673ebee 100644 --- a/README.md +++ b/README.md @@ -71,7 +71,7 @@ Alternatively, see the [Developer Guide](./docs/Developer-Guide.md) for building # Example: Using the Computer-Use Agent from agent import ComputerAgent -# Create and run an agent locally using UI-TARS and MLX +# Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit (default) agent = ComputerAgent(computer=my_computer, loop="UITARS") await agent.run("Search for information about CUA on GitHub") From 689bac641d966b6aa13e00b2bb33fb97cf2862e0 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 16:24:43 -0400 Subject: [PATCH 20/26] wording tweak --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e673ebee..f03a044c 100644 --- a/README.md +++ b/README.md @@ -119,7 +119,7 @@ For advanced container-like virtualization, check out [Lumier](./libs/lumier/REA ## Resources - [How to use the MCP Server with Claude Desktop or other MCP clients](./libs/mcp-server/README.md) - One of the easiest ways to get started with C/ua -- [When and how to use OpenAI Computer-Use, Anthropic, OmniParser, or UI-TARS for your Computer-Use Agent](./libs/agent/README.md) +- [How to use OpenAI Computer-Use, Anthropic, OmniParser, or UI-TARS for your Computer-Use Agent](./libs/agent/README.md) - [How to use Lume CLI for managing desktops](./libs/lume/README.md) - [Training Computer-Use Models: Collecting Human Trajectories with C/ua (Part 1)](https://www.trycua.com/blog/training-computer-use-models-trajectories-1) - [Build Your Own Operator on macOS (Part 1)](https://www.trycua.com/blog/build-your-own-operator-on-macos-1) From d11315306197f21844b0ca0e1f1eaef80443a3ad Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Fri, 9 May 2025 16:50:22 -0400 Subject: [PATCH 21/26] run vm in bg --- scripts/playground.sh | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/scripts/playground.sh b/scripts/playground.sh index 6614bbb0..b7de51bb 100755 --- a/scripts/playground.sh +++ b/scripts/playground.sh @@ -129,14 +129,17 @@ echo "šŸ” Checking if the macOS CUA VM is running..." VM_RUNNING=$(lume ls | grep "macos-sequoia-cua" | grep "running" || echo "") if [ -z "$VM_RUNNING" ]; then - echo "šŸš€ Starting the macOS CUA VM..." - lume start macos-sequoia-cua:latest + echo "šŸš€ Starting the macOS CUA VM in the background..." + lume run macos-sequoia-cua:latest & + # Wait a moment for the VM to initialize + sleep 5 echo "āœ… VM started successfully." else echo "āœ… macOS CUA VM is already running." fi # Ask if the user wants to start the demo now +echo read -p "Would you like to start the CUA playground now? (y/n) " -n 1 -r echo if [[ $REPLY =~ ^[Yy]$ ]]; then From 58cf513cca3838d9eaef3aeed07c56d3c23c5e68 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Sat, 10 May 2025 16:53:54 -0400 Subject: [PATCH 22/26] updated example --- README.md | 49 +++++++++++++++++++++++++++---------------------- 1 file changed, 27 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index f03a044c..ab31e4c9 100644 --- a/README.md +++ b/README.md @@ -68,23 +68,27 @@ Alternatively, see the [Developer Guide](./docs/Developer-Guide.md) for building ### Step 4: Use in Your Code ```python -# Example: Using the Computer-Use Agent -from agent import ComputerAgent - -# Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit (default) -agent = ComputerAgent(computer=my_computer, loop="UITARS") -await agent.run("Search for information about CUA on GitHub") - -# Example: Direct control of a macOS VM with Computer from computer import Computer +from agent import ComputerAgent, LLM -async with Computer(os_type="macos") as computer: - # Take a screenshot - screenshot = await computer.interface.screenshot() - # Click on an element - await computer.interface.left_click(100, 200) - # Type text - await computer.interface.type_text("Hello, world!") +async def main(): + # Start a local macOS VM with a 1024x768 display + async with Computer(os_type="macos", display="1024x768") as computer: + + # Example: Direct control of a macOS VM with Computer + await computer.interface.left_click(100, 200) + await computer.interface.type_text("Hello, world!") + screenshot_bytes = await computer.interface.screenshot() + + # Example: Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit + agent = ComputerAgent( + computer=computer, + loop="UITARS", + model=LLM(provider="MLX", name="mlx-community/UI-TARS-1.5-7B-6bit") + ) + await agent.run("Find the trycua/cua repository on GitHub and follow the quick start guide") + +main() ``` For ready-to-use examples, check out our [Notebooks](./notebooks/) collection. @@ -180,20 +184,21 @@ For complete examples, see [agent_examples.py](./examples/agent_examples.py) or # Import necessary components from agent import ComputerAgent, LLM, AgentLoop, LLMProvider -# Agent Loops -ComputerAgent(loop=AgentLoop.UITARS) # UI-TARS-1.5 agent for local execution with MLX -ComputerAgent(loop=AgentLoop.OPENAI) # OpenAI Computer-Use agent using OPENAI_API_KEY -ComputerAgent(loop=AgentLoop.ANTHROPIC) # Anthropic Claude agent using ANTHROPIC_API_KEY +# UI-TARS-1.5 agent for local execution with MLX +ComputerAgent(loop=AgentLoop.UITARS, model=LLM(provider=LLMProvider.MLX, name="mlx-community/UI-TARS-1.5-7B-6bit")) +# OpenAI Computer-Use agent using OPENAI_API_KEY +ComputerAgent(loop=AgentLoop.OPENAI, model=LLM(provider=LLMProvider.OPENAI, name="computer-use-preview")) +# Anthropic Claude agent using ANTHROPIC_API_KEY +ComputerAgent(loop=AgentLoop.ANTHROPIC, model=LLM(provider=LLMProvider.ANTHROPIC)) # OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision LLM -ComputerAgent(loop=AgentLoop.OMNI, model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:12b-it-q4_K_M")) - +ComputerAgent(loop=AgentLoop.OMNI, model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:12b-it-q4_K_M")) # OpenRouter example using OAICOMPAT provider ComputerAgent( loop=AgentLoop.OMNI, model=LLM( provider=LLMProvider.OAICOMPAT, - name="openai/gpt-4.1", + name="openai/gpt-4o-mini", provider_base_url="https://openrouter.ai/api/v1" ), api_key="your-openrouter-api-key" From 9c31c4d4abd6fbe920f1549e24a6fa18b524a7ca Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Sat, 10 May 2025 16:57:38 -0400 Subject: [PATCH 23/26] added missing demo --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index ab31e4c9..4bd4218c 100644 --- a/README.md +++ b/README.md @@ -225,6 +225,14 @@ Check out these demos of the Computer-Use Agent in action:
+
+Notebook: Fix GitHub issue in Cursor +
+
+ +
+
+ ## Community Join our [Discord community](https://discord.com/invite/mVnXXpdE85) to discuss ideas, get assistance, or share your demos! From 09bff781ded16f29fffb3edb8fe66a25f6045482 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Sat, 10 May 2025 16:58:32 -0400 Subject: [PATCH 24/26] added missing license/trademark --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 4bd4218c..8cca7fbd 100644 --- a/README.md +++ b/README.md @@ -241,13 +241,15 @@ Join our [Discord community](https://discord.com/invite/mVnXXpdE85) to discuss i Cua is open-sourced under the MIT License - see the [LICENSE](LICENSE) file for details. +Microsoft's OmniParser, which is used in this project, is licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0) - see the [OmniParser LICENSE](https://github.com/microsoft/OmniParser/blob/master/LICENSE) file for details. + ## Contributing We welcome contributions to CUA! Please refer to our [Contributing Guidelines](CONTRIBUTING.md) for details. ## Trademarks -Apple, macOS, and Apple Silicon are trademarks of Apple Inc. This project is not affiliated with, endorsed by, or sponsored by Apple Inc. +Apple, macOS, and Apple Silicon are trademarks of Apple Inc. Ubuntu and Canonical are registered trademarks of Canonical Ltd. Microsoft is a registered trademark of Microsoft Corporation. This project is not affiliated with, endorsed by, or sponsored by Apple Inc., Canonical Ltd., or Microsoft Corporation. ## Stargazers From ffcbacc86ff1df5c103865e833fdccde94adf4da Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Sat, 10 May 2025 17:05:21 -0400 Subject: [PATCH 25/26] supported loops --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index 8cca7fbd..ccf5ef7b 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,8 @@ **c/ua** (pronounced "koo-ah") enables AI agents to control full operating systems in high-performance virtual containers with near-native speed on Apple Silicon. + +
@@ -32,6 +34,12 @@ This script will: - Set up Python environment and install/update required packages - Launch the Computer-Use Agent UI +#### Supported Agent Loops +- [UITARS-1.5](https://github.com/mlx-community/UI-TARS-1.5) - Run locally on Apple Silicon with MLX, or use cloud providers +- [OpenAI CUA](https://platform.openai.com/docs/models/computer-use-preview) - Use OpenAI's Computer-Use Preview model +- [Anthropic CUA](https://docs.anthropic.com/claude/docs/computer-use) - Use Anthropic's Computer-Use capabilities +- Any vision model through [OmniParser](https://github.com/microsoft/OmniParser) - Control UI with [Set-of-Marks prompting](https://som-gpt4v.github.io/) + ### System Requirements - Mac with Apple Silicon (M1/M2/M3/M4 series) From 132bfb54fd8b2f44d7e45ea4c6052c2680ef0aba Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Sat, 10 May 2025 17:08:53 -0400 Subject: [PATCH 26/26] fixed links --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index ccf5ef7b..673590f4 100644 --- a/README.md +++ b/README.md @@ -34,11 +34,11 @@ This script will: - Set up Python environment and install/update required packages - Launch the Computer-Use Agent UI -#### Supported Agent Loops -- [UITARS-1.5](https://github.com/mlx-community/UI-TARS-1.5) - Run locally on Apple Silicon with MLX, or use cloud providers -- [OpenAI CUA](https://platform.openai.com/docs/models/computer-use-preview) - Use OpenAI's Computer-Use Preview model -- [Anthropic CUA](https://docs.anthropic.com/claude/docs/computer-use) - Use Anthropic's Computer-Use capabilities -- Any vision model through [OmniParser](https://github.com/microsoft/OmniParser) - Control UI with [Set-of-Marks prompting](https://som-gpt4v.github.io/) +#### Supported [Agent Loops](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) +- [UITARS-1.5](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Run locally on Apple Silicon with MLX, or use cloud providers +- [OpenAI CUA](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Use OpenAI's Computer-Use Preview model +- [Anthropic CUA](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Use Anthropic's Computer-Use capabilities +- [OmniParser](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Control UI with [Set-of-Marks prompting](https://som-gpt4v.github.io/) using any vision model ### System Requirements