Merge branch 'main' into feature/computer/extensions

This commit is contained in:
Dillon DuPont
2025-05-31 09:14:11 -04:00
38 changed files with 1478 additions and 518 deletions

View File

@@ -169,6 +169,15 @@
"contributions": [
"code"
]
},
{
"login": "evnsnclr",
"name": "Evan smith",
"avatar_url": "https://avatars.githubusercontent.com/u/139897548?v=4",
"profile": "https://github.com/evnsnclr",
"contributions": [
"code"
]
}
]
}

View File

@@ -56,7 +56,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
python-version: '3.11'
- name: Update dependencies to latest versions
id: update-deps

View File

@@ -54,7 +54,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
python-version: '3.11'
- name: Update dependencies to latest versions
id: update-deps

View File

@@ -59,7 +59,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
python-version: '3.11'
- name: Update dependencies to latest versions
id: update-deps

View File

@@ -52,7 +52,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
python-version: '3.11'
- name: Create root pdm.lock file
run: |
@@ -62,7 +62,7 @@ jobs:
- name: Install PDM
uses: pdm-project/setup-pdm@v3
with:
python-version: '3.10'
python-version: '3.11'
cache: true
- name: Set version

158
README.md
View File

@@ -9,44 +9,108 @@
[![Swift](https://img.shields.io/badge/Swift-F05138?logo=swift&logoColor=white)](#)
[![macOS](https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0)](#)
[![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white)](https://discord.com/invite/mVnXXpdE85)
<br>
<a href="https://trendshift.io/repositories/13685" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13685" alt="trycua%2Fcua | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</div>
**c/ua** (pronounced "koo-ah") enables AI agents to control full operating systems in high-performance virtual containers with near-native speed on Apple Silicon.
**c/ua** ("koo-ah") is Docker for [Computer-Use Agents](https://www.oneusefulthing.org/p/when-you-give-a-claude-a-mouse) - it enables AI agents to control full operating systems in virtual containers and deploy them locally or to the cloud.
<div align="center">
<video src="https://github.com/user-attachments/assets/c619b4ea-bb8e-4382-860e-f3757e36af20" width="800" controls></video>
</div>
<details>
<summary><b>Check out more demos of the Computer-Use Agent in action
</b></summary>
# 🚀 Quick Start
<details open>
<summary><b>MCP Server: Work with Claude Desktop and Tableau</b></summary>
<br>
<div align="center">
<video src="https://github.com/user-attachments/assets/9f573547-5149-493e-9a72-396f3cff29df" width="800" controls></video>
</div>
</details>
Get started with a Computer-Use Agent UI and a VM with a single command:
<details>
<summary><b>AI-Gradio: Multi-app workflow with browser, VS Code and terminal</b></summary>
<br>
<div align="center">
<video src="https://github.com/user-attachments/assets/723a115d-1a07-4c8e-b517-88fbdf53ed0f" width="800" controls></video>
</div>
</details>
<details>
<summary><b>Notebook: Fix GitHub issue in Cursor</b></summary>
<br>
<div align="center">
<video src="https://github.com/user-attachments/assets/f67f0107-a1e1-46dc-aa9f-0146eb077077" width="800" controls></video>
</div>
</details>
</details><br/>
# 🚀 Quick Start with a Computer-Use Agent UI
**Need to automate desktop tasks? Launch the Computer-Use Agent UI with a single command.**
### Option 1: Fully-managed install (recommended)
*I want to be totally guided in the process*
**macOS/Linux/Windows (via WSL):**
```bash
# Requires Python 3.11+
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground.sh)"
```
This script will:
- Install Lume CLI for VM management (if needed)
- Pull the latest macOS CUA image (if needed)
- Set up Python environment and install/update required packages
- Ask if you want to use local VMs or C/ua Cloud Containers
- Install necessary dependencies (Lume CLI for local VMs)
- Download VM images if needed
- Install Python packages
- Launch the Computer-Use Agent UI
#### Supported [Agent Loops](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops)
### Option 2: Key manual steps
<details>
<summary>If you are skeptical running one-install scripts</summary>
**For C/ua Agent UI (any system, cloud VMs only):**
```bash
# Requires Python 3.11+ and C/ua API key
pip install -U "cua-computer[all]" "cua-agent[all]"
python -m agent.ui.gradio.app
```
**For Local macOS/Linux VMs (Apple Silicon only):**
```bash
# 1. Install Lume CLI
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
# 2. Pull macOS image
lume pull macos-sequoia-cua:latest
# 3. Start VM
lume run macos-sequoia-cua:latest
# 4. Install packages and launch UI
pip install -U "cua-computer[all]" "cua-agent[all]"
python -m agent.ui.gradio.app
```
</details>
---
*How it works: Computer module provides secure desktops (Lume CLI locally, [C/ua Cloud Containers](https://trycua.com) remotely), Agent module provides local/API agents with OpenAI AgentResponse format and [trajectory tracing](https://trycua.com/trajectory-viewer).*
### Supported [Agent Loops](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops)
- [UITARS-1.5](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Run locally on Apple Silicon with MLX, or use cloud providers
- [OpenAI CUA](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Use OpenAI's Computer-Use Preview model
- [Anthropic CUA](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Use Anthropic's Computer-Use capabilities
- [OmniParser-v2.0](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Control UI with [Set-of-Marks prompting](https://som-gpt4v.github.io/) using any vision model
### System Requirements
- Mac with Apple Silicon (M1/M2/M3/M4 series)
- macOS 15 (Sequoia) or newer
- Disk space for VM images (30GB+ recommended)
# 💻 For Developers
# 💻 Developer Guide
Follow these steps to use C/ua in your own code. See [Developer Guide](./docs/Developer-Guide.md) for building from source.
### Step 1: Install Lume CLI
@@ -70,8 +134,6 @@ The macOS CUA image contains the default Mac apps and the Computer Server for ea
pip install "cua-computer[all]" "cua-agent[all]"
```
Alternatively, see the [Developer Guide](./docs/Developer-Guide.md) for building from source.
### Step 4: Use in Your Code
```python
@@ -79,21 +141,29 @@ from computer import Computer
from agent import ComputerAgent, LLM
async def main():
# Start a local macOS VM with a 1024x768 display
async with Computer(os_type="macos", display="1024x768") as computer:
# Start a local macOS VM
computer = Computer(os_type="macos")
await computer.run()
# Example: Direct control of a macOS VM with Computer
await computer.interface.left_click(100, 200)
await computer.interface.type_text("Hello, world!")
screenshot_bytes = await computer.interface.screenshot()
# Example: Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit
agent = ComputerAgent(
computer=computer,
loop="UITARS",
model=LLM(provider="MLXVLM", name="mlx-community/UI-TARS-1.5-7B-6bit")
)
await agent.run("Find the trycua/cua repository on GitHub and follow the quick start guide")
# Or with C/ua Cloud Container
computer = Computer(
os_type="linux",
api_key="your_cua_api_key_here",
name="your_container_name_here"
)
# Example: Direct control of a macOS VM with Computer
await computer.interface.left_click(100, 200)
await computer.interface.type_text("Hello, world!")
screenshot_bytes = await computer.interface.screenshot()
# Example: Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit
agent = ComputerAgent(
computer=computer,
loop="UITARS",
model=LLM(provider="MLXVLM", name="mlx-community/UI-TARS-1.5-7B-6bit")
)
await agent.run("Find the trycua/cua repository on GitHub and follow the quick start guide")
main()
```
@@ -234,33 +304,6 @@ ComputerAgent(
)
```
## Demos
Check out these demos of the Computer-Use Agent in action:
<details open>
<summary><b>MCP Server: Work with Claude Desktop and Tableau</b></summary>
<br>
<div align="center">
<video src="https://github.com/user-attachments/assets/9f573547-5149-493e-9a72-396f3cff29df" width="800" controls></video>
</div>
</details>
<details>
<summary><b>AI-Gradio: Multi-app workflow with browser, VS Code and terminal</b></summary>
<br>
<div align="center">
<video src="https://github.com/user-attachments/assets/723a115d-1a07-4c8e-b517-88fbdf53ed0f" width="800" controls></video>
</div>
</details>
<details>
<summary><b>Notebook: Fix GitHub issue in Cursor</b></summary>
<br>
<div align="center">
<video src="https://github.com/user-attachments/assets/f67f0107-a1e1-46dc-aa9f-0146eb077077" width="800" controls></video>
</div>
</details>
## Community
@@ -316,6 +359,7 @@ Thank you to all our supporters!
<td align="center" valign="top" width="14.28%"><a href="https://mjspeck.github.io/"><img src="https://avatars.githubusercontent.com/u/20689127?v=4?s=100" width="100px;" alt="Matt Speck"/><br /><sub><b>Matt Speck</b></sub></a><br /><a href="#code-mjspeck" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/FinnBorge"><img src="https://avatars.githubusercontent.com/u/9272726?v=4?s=100" width="100px;" alt="FinnBorge"/><br /><sub><b>FinnBorge</b></sub></a><br /><a href="#code-FinnBorge" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/jklapacz"><img src="https://avatars.githubusercontent.com/u/5343758?v=4?s=100" width="100px;" alt="Jakub Klapacz"/><br /><sub><b>Jakub Klapacz</b></sub></a><br /><a href="#code-jklapacz" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/evnsnclr"><img src="https://avatars.githubusercontent.com/u/139897548?v=4?s=100" width="100px;" alt="Evan smith"/><br /><sub><b>Evan smith</b></sub></a><br /><a href="#code-evnsnclr" title="Code">💻</a></td>
</tr>
</tbody>
</table>

View File

@@ -62,7 +62,7 @@ Refer to the [Lume README](../libs/lume/docs/Development.md) for instructions on
## Python Development
There are two ways to instal Lume:
There are two ways to install Lume:
### Run the build script
@@ -91,7 +91,7 @@ To install with PDM, simply run:
pdm install -G:all
```
This installs all the dependencies for development, testing, and building the docs. If you'd oly like development dependencies, you can run:
This installs all the dependencies for development, testing, and building the docs. If you'd only like development dependencies, you can run:
```console
pdm install -d
@@ -200,11 +200,11 @@ The formatting configuration is defined in the root `pyproject.toml` file:
```toml
[tool.black]
line-length = 100
target-version = ["py310"]
target-version = ["py311"]
[tool.ruff]
line-length = 100
target-version = "py310"
target-version = "py311"
select = ["E", "F", "B", "I"]
fix = true
@@ -213,7 +213,7 @@ docstring-code-format = true
[tool.mypy]
strict = true
python_version = "3.10"
python_version = "3.11"
ignore_missing_imports = true
disallow_untyped_defs = true
check_untyped_defs = true
@@ -225,7 +225,7 @@ warn_unused_ignores = false
#### Key Formatting Rules
- **Line Length**: Maximum of 100 characters
- **Python Version**: Code should be compatible with Python 3.10+
- **Python Version**: Code should be compatible with Python 3.11+
- **Imports**: Automatically sorted (using Ruff's "I" rule)
- **Type Hints**: Required for all function definitions (strict mypy mode)

View File

@@ -10,7 +10,7 @@ CUA libraries collect minimal anonymous usage data to help improve our software.
- Basic system information:
- Operating system (e.g., 'darwin', 'win32', 'linux')
- Python version (e.g., '3.10.0')
- Python version (e.g., '3.11.0')
- Module initialization events:
- When a module (like 'computer' or 'agent') is imported
- Version of the module being used

View File

@@ -5,7 +5,7 @@ import logging
import traceback
import signal
from computer import Computer
from computer import Computer, VMProviderType
# Import the unified agent class and types
from agent import ComputerAgent, LLMProvider, LLM, AgentLoop
@@ -23,76 +23,88 @@ async def run_agent_example():
print("\n=== Example: ComputerAgent with OpenAI and Omni provider ===")
try:
# Create a local macOS computer
computer = Computer(
os_type="macos",
verbosity=logging.DEBUG,
)
# Create a remote Linux computer with C/ua
# computer = Computer(
# os_type="linux",
# api_key=os.getenv("CUA_API_KEY"),
# name=os.getenv("CUA_CONTAINER_NAME"),
# provider_type=VMProviderType.CLOUD,
# )
# Create Computer instance with async context manager
async with Computer(verbosity=logging.DEBUG) as macos_computer:
# Create agent with loop and provider
agent = ComputerAgent(
computer=macos_computer,
# loop=AgentLoop.OPENAI,
# loop=AgentLoop.ANTHROPIC,
# loop=AgentLoop.UITARS,
loop=AgentLoop.OMNI,
# model=LLM(provider=LLMProvider.OPENAI), # No model name for Operator CUA
# model=LLM(provider=LLMProvider.OPENAI, name="gpt-4o"),
# model=LLM(provider=LLMProvider.ANTHROPIC, name="claude-3-7-sonnet-20250219"),
# model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:4b-it-q4_K_M"),
# model=LLM(provider=LLMProvider.MLXVLM, name="mlx-community/UI-TARS-1.5-7B-4bit"),
model=LLM(
provider=LLMProvider.OAICOMPAT,
name="gemma-3-12b-it",
provider_base_url="http://localhost:1234/v1", # LM Studio local endpoint
),
save_trajectory=True,
only_n_most_recent_images=3,
verbosity=logging.DEBUG,
)
agent = ComputerAgent(
computer=computer,
loop=AgentLoop.OPENAI,
# loop=AgentLoop.ANTHROPIC,
# loop=AgentLoop.UITARS,
# loop=AgentLoop.OMNI,
model=LLM(provider=LLMProvider.OPENAI), # No model name for Operator CUA
# model=LLM(provider=LLMProvider.OPENAI, name="gpt-4o"),
# model=LLM(provider=LLMProvider.ANTHROPIC, name="claude-3-7-sonnet-20250219"),
# model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:4b-it-q4_K_M"),
# model=LLM(provider=LLMProvider.MLXVLM, name="mlx-community/UI-TARS-1.5-7B-4bit"),
# model=LLM(
# provider=LLMProvider.OAICOMPAT,
# name="gemma-3-12b-it",
# provider_base_url="http://localhost:1234/v1", # LM Studio local endpoint
# ),
save_trajectory=True,
only_n_most_recent_images=3,
verbosity=logging.DEBUG,
)
tasks = [
"Look for a repository named trycua/cua on GitHub.",
"Check the open issues, open the most recent one and read it.",
"Clone the repository in users/lume/projects if it doesn't exist yet.",
"Open the repository with an app named Cursor (on the dock, black background and white cube icon).",
"From Cursor, open Composer if not already open.",
"Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.",
]
tasks = [
"Look for a repository named trycua/cua on GitHub.",
"Check the open issues, open the most recent one and read it.",
"Clone the repository in users/lume/projects if it doesn't exist yet.",
"Open the repository with an app named Cursor (on the dock, black background and white cube icon).",
"From Cursor, open Composer if not already open.",
"Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.",
]
for i, task in enumerate(tasks):
print(f"\nExecuting task {i}/{len(tasks)}: {task}")
async for result in agent.run(task):
print("Response ID: ", result.get("id"))
for i, task in enumerate(tasks):
print(f"\nExecuting task {i}/{len(tasks)}: {task}")
async for result in agent.run(task):
print("Response ID: ", result.get("id"))
# Print detailed usage information
usage = result.get("usage")
if usage:
print("\nUsage Details:")
print(f" Input Tokens: {usage.get('input_tokens')}")
if "input_tokens_details" in usage:
print(f" Input Tokens Details: {usage.get('input_tokens_details')}")
print(f" Output Tokens: {usage.get('output_tokens')}")
if "output_tokens_details" in usage:
print(f" Output Tokens Details: {usage.get('output_tokens_details')}")
print(f" Total Tokens: {usage.get('total_tokens')}")
# Print detailed usage information
usage = result.get("usage")
if usage:
print("\nUsage Details:")
print(f" Input Tokens: {usage.get('input_tokens')}")
if "input_tokens_details" in usage:
print(f" Input Tokens Details: {usage.get('input_tokens_details')}")
print(f" Output Tokens: {usage.get('output_tokens')}")
if "output_tokens_details" in usage:
print(f" Output Tokens Details: {usage.get('output_tokens_details')}")
print(f" Total Tokens: {usage.get('total_tokens')}")
print("Response Text: ", result.get("text"))
print("Response Text: ", result.get("text"))
# Print tools information
tools = result.get("tools")
if tools:
print("\nTools:")
print(tools)
# Print tools information
tools = result.get("tools")
if tools:
print("\nTools:")
print(tools)
# Print reasoning and tool call outputs
outputs = result.get("output", [])
for output in outputs:
output_type = output.get("type")
if output_type == "reasoning":
print("\nReasoning Output:")
print(output)
elif output_type == "computer_call":
print("\nTool Call Output:")
print(output)
# Print reasoning and tool call outputs
outputs = result.get("output", [])
for output in outputs:
output_type = output.get("type")
if output_type == "reasoning":
print("\nReasoning Output:")
print(output)
elif output_type == "computer_call":
print("\nTool Call Output:")
print(output)
print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")
print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")
except Exception as e:
logger.error(f"Error in run_agent_example: {e}")

View File

@@ -16,17 +16,18 @@ load_dotenv(env_file)
pythonpath = os.environ.get("PYTHONPATH", "")
for path in pythonpath.split(":"):
if path and path not in sys.path:
sys.path.append(path)
sys.path.insert(0, path) # Insert at beginning to prioritize
print(f"Added to sys.path: {path}")
from computer import Computer, VMProviderType
from computer.computer import Computer
from computer.providers.base import VMProviderType
from computer.logger import LogLevel
async def main():
try:
print("\n=== Using direct initialization ===")
# Create computer with configured host
# Create a local macOS computer
computer = Computer(
display="1024x768",
memory="8GB",
@@ -41,12 +42,31 @@ async def main():
],
ephemeral=False,
)
# Create a remote Linux computer with C/ua
# computer = Computer(
# os_type="linux",
# api_key=os.getenv("CUA_API_KEY"),
# name=os.getenv("CONTAINER_NAME"),
# provider_type=VMProviderType.CLOUD,
# )
try:
# Run the computer with default parameters
await computer.run()
await computer.interface.hotkey("command", "space")
screenshot = await computer.interface.screenshot()
# Create output directory if it doesn't exist
output_dir = Path("./output")
output_dir.mkdir(exist_ok=True)
screenshot_path = output_dir / "screenshot.png"
with open(screenshot_path, "wb") as f:
f.write(screenshot)
print(f"Screenshot saved to: {screenshot_path.absolute()}")
# await computer.interface.hotkey("command", "space")
# res = await computer.interface.run_command("touch ./Downloads/empty_file")
# print(f"Run command result: {res}")

View File

@@ -1,10 +1,10 @@
"""Tool-related type definitions."""
from enum import Enum
from enum import StrEnum
from typing import Dict, Any, Optional
from pydantic import BaseModel, ConfigDict
class ToolInvocationState(str, Enum):
class ToolInvocationState(StrEnum):
"""States for tool invocation."""
CALL = 'call'
PARTIAL_CALL = 'partial-call'

View File

@@ -1,18 +1,18 @@
"""Core type definitions."""
from typing import Any, Dict, List, Optional, TypedDict, Union
from enum import Enum, StrEnum, auto
from enum import StrEnum
from dataclasses import dataclass
class AgentLoop(Enum):
class AgentLoop(StrEnum):
"""Enumeration of available loop types."""
ANTHROPIC = auto() # Anthropic implementation
OMNI = auto() # OmniLoop implementation
OPENAI = auto() # OpenAI implementation
OLLAMA = auto() # OLLAMA implementation
UITARS = auto() # UI-TARS implementation
ANTHROPIC = "anthropic" # Anthropic implementation
OMNI = "omni" # OmniLoop implementation
OPENAI = "openai" # OpenAI implementation
OLLAMA = "ollama" # OLLAMA implementation
UITARS = "uitars" # UI-TARS implementation
# Add more loop types as needed

View File

@@ -3,6 +3,9 @@
from datetime import datetime
import platform
today = datetime.today()
today = f"{today.strftime('%A, %B')} {today.day}, {today.year}"
SYSTEM_PROMPT = f"""<SYSTEM_CAPABILITY>
* You are utilising a macOS virtual machine using ARM architecture with internet access and Safari as default browser.
* You can feel free to install macOS applications with your bash tool. Use curl instead of wget.
@@ -10,7 +13,7 @@ SYSTEM_PROMPT = f"""<SYSTEM_CAPABILITY>
* When using your bash tool with commands that are expected to output very large quantities of text, redirect into a tmp file and use str_replace_editor or `grep -n -B <lines before> -A <lines after> <query> <filename>` to confirm output.
* When viewing a page it can be helpful to zoom out so that you can see everything on the page. Either that, or make sure you scroll down to see everything before deciding something isn't available.
* When using your computer function calls, they take a while to run and send back to you. Where possible/feasible, try to chain multiple of these calls all into one function calls request.
* The current date is {datetime.today().strftime('%A, %B %-d, %Y')}.
* The current date is {today}.
</SYSTEM_CAPABILITY>
<IMPORTANT>

View File

@@ -22,7 +22,7 @@ Supported Agent Loops and Models:
Requirements:
- Mac with Apple Silicon (M1/M2/M3/M4)
- macOS 14 (Sonoma) or newer
- Python 3.10+
- Python 3.11+
- Lume CLI installed (https://github.com/trycua/cua)
- OpenAI or Anthropic API key
"""
@@ -31,6 +31,7 @@ import os
import asyncio
import logging
import json
import platform
from pathlib import Path
from typing import Dict, List, Optional, AsyncGenerator, Any, Tuple, Union
import gradio as gr
@@ -129,6 +130,9 @@ class GradioChatScreenshotHandler(DefaultCallbackHandler):
)
# Detect if current device is MacOS
is_mac = platform.system().lower() == "darwin"
# Map model names to specific provider model names
MODEL_MAPPINGS = {
"openai": {
@@ -165,7 +169,7 @@ MODEL_MAPPINGS = {
},
"uitars": {
# UI-TARS models using MLXVLM provider
"default": "mlx-community/UI-TARS-1.5-7B-4bit",
"default": "mlx-community/UI-TARS-1.5-7B-4bit" if is_mac else "tgi",
"mlx-community/UI-TARS-1.5-7B-4bit": "mlx-community/UI-TARS-1.5-7B-4bit",
"mlx-community/UI-TARS-1.5-7B-6bit": "mlx-community/UI-TARS-1.5-7B-6bit"
},
@@ -290,7 +294,7 @@ def get_provider_and_model(model_name: str, loop_provider: str) -> tuple:
model_name_to_use = cleaned_model_name
# agent_loop remains AgentLoop.OMNI
elif agent_loop == AgentLoop.UITARS:
# For UITARS, use MLXVLM provider for the MLX models, OAICOMPAT for custom
# For UITARS, use MLXVLM for mlx-community models, OAICOMPAT for custom
if model_name == "Custom model (OpenAI compatible API)":
provider = LLMProvider.OAICOMPAT
model_name_to_use = "tgi"
@@ -333,12 +337,25 @@ def get_ollama_models() -> List[str]:
logging.error(f"Error getting Ollama models: {e}")
return []
def create_computer_instance(verbosity: int = logging.INFO) -> Computer:
def create_computer_instance(
verbosity: int = logging.INFO,
os_type: str = "macos",
provider_type: str = "lume",
name: Optional[str] = None,
api_key: Optional[str] = None
) -> Computer:
"""Create or get the global Computer instance."""
global global_computer
if global_computer is None:
global_computer = Computer(verbosity=verbosity)
global_computer = Computer(
verbosity=verbosity,
os_type=os_type,
provider_type=provider_type,
name=name if name else "",
api_key=api_key
)
return global_computer
@@ -353,12 +370,22 @@ def create_agent(
verbosity: int = logging.INFO,
use_oaicompat: bool = False,
provider_base_url: Optional[str] = None,
computer_os: str = "macos",
computer_provider: str = "lume",
computer_name: Optional[str] = None,
computer_api_key: Optional[str] = None,
) -> ComputerAgent:
"""Create or update the global agent with the specified parameters."""
global global_agent
# Create the computer if not already done
computer = create_computer_instance(verbosity=verbosity)
computer = create_computer_instance(
verbosity=verbosity,
os_type=computer_os,
provider_type=computer_provider,
name=computer_name,
api_key=computer_api_key
)
# Get API key from environment if not provided
if api_key is None:
@@ -401,6 +428,7 @@ def create_agent(
return global_agent
def create_gradio_ui(
provider_name: str = "openai",
model_name: str = "gpt-4o",
@@ -421,7 +449,8 @@ def create_gradio_ui(
# Check for API keys
openai_api_key = os.environ.get("OPENAI_API_KEY", "")
anthropic_api_key = os.environ.get("ANTHROPIC_API_KEY", "")
cua_api_key = os.environ.get("CUA_API_KEY", "")
# Always show models regardless of API key availability
openai_models = ["OpenAI: Computer-Use Preview"]
anthropic_models = [
@@ -439,22 +468,29 @@ def create_gradio_ui(
# Check if API keys are available
has_openai_key = bool(openai_api_key)
has_anthropic_key = bool(anthropic_api_key)
has_cua_key = bool(cua_api_key)
print("has_openai_key", has_openai_key)
print("has_anthropic_key", has_anthropic_key)
print("has_cua_key", has_cua_key)
# Get Ollama models for OMNI
ollama_models = get_ollama_models()
if ollama_models:
omni_models += ollama_models
# Detect if current device is MacOS
is_mac = platform.system().lower() == "darwin"
# Format model choices
provider_to_models = {
"OPENAI": openai_models,
"ANTHROPIC": anthropic_models,
"OMNI": omni_models + ["Custom model (OpenAI compatible API)", "Custom model (ollama)"], # Add custom model options
"UITARS": [
"UITARS": ([
"mlx-community/UI-TARS-1.5-7B-4bit",
"mlx-community/UI-TARS-1.5-7B-6bit",
"Custom model (OpenAI compatible API)"
], # UI-TARS options with MLX models
] if is_mac else []) + ["Custom model (OpenAI compatible API)"], # UI-TARS options with MLX models
}
# --- Apply Saved Settings (override defaults if available) ---
@@ -473,7 +509,7 @@ def create_gradio_ui(
elif initial_loop == "ANTHROPIC":
initial_model = anthropic_models[0] if anthropic_models else "No models available"
else: # OMNI
initial_model = omni_models[0] if omni_models else "No models available"
initial_model = omni_models[0] if omni_models else "Custom model (OpenAI compatible API)"
if "Custom model (OpenAI compatible API)" in available_models_for_loop:
initial_model = (
"Custom model (OpenAI compatible API)" # Default to custom if available and no other default fits
@@ -494,7 +530,7 @@ def create_gradio_ui(
]
# Function to generate Python code based on configuration and tasks
def generate_python_code(agent_loop_choice, provider, model_name, tasks, provider_url, recent_images=3, save_trajectory=True):
def generate_python_code(agent_loop_choice, provider, model_name, tasks, provider_url, recent_images=3, save_trajectory=True, computer_os="macos", computer_provider="lume", container_name="", cua_cloud_api_key=""):
"""Generate Python code for the current configuration and tasks.
Args:
@@ -505,6 +541,10 @@ def create_gradio_ui(
provider_url: The provider base URL for OAICOMPAT providers
recent_images: Number of recent images to keep in context
save_trajectory: Whether to save the agent trajectory
computer_os: Operating system type for the computer
computer_provider: Provider type for the computer
container_name: Optional VM name
cua_cloud_api_key: Optional CUA Cloud API key
Returns:
Formatted Python code as a string
@@ -515,13 +555,29 @@ def create_gradio_ui(
if task and task.strip():
tasks_str += f' "{task}",\n'
# Create the Python code template
# Create the Python code template with computer configuration
computer_args = []
if computer_os != "macos":
computer_args.append(f'os_type="{computer_os}"')
if computer_provider != "lume":
computer_args.append(f'provider_type="{computer_provider}"')
if container_name:
computer_args.append(f'name="{container_name}"')
if cua_cloud_api_key:
computer_args.append(f'api_key="{cua_cloud_api_key}"')
computer_args_str = ", ".join(computer_args)
if computer_args_str:
computer_args_str = f"({computer_args_str})"
else:
computer_args_str = "()"
code = f'''import asyncio
from computer import Computer
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
async def main():
async with Computer() as macos_computer:
async with Computer{computer_args_str} as macos_computer:
agent = ComputerAgent(
computer=macos_computer,
loop=AgentLoop.{agent_loop_choice},
@@ -660,12 +716,54 @@ if __name__ == "__main__":
LLMProvider.OPENAI,
"gpt-4o",
[],
"https://openrouter.ai/api/v1"
"https://openrouter.ai/api/v1",
3, # recent_images default
True, # save_trajectory default
"macos",
"lume",
"",
""
),
interactive=False,
)
with gr.Accordion("Configuration", open=True):
with gr.Accordion("Computer Configuration", open=True):
# Computer configuration options
computer_os = gr.Radio(
choices=["macos", "linux"],
label="Operating System",
value="macos",
info="Select the operating system for the computer",
)
# Detect if current device is MacOS
is_mac = platform.system().lower() == "darwin"
computer_provider = gr.Radio(
choices=["cloud", "lume"],
label="Provider",
value="lume" if is_mac else "cloud",
visible=is_mac,
info="Select the computer provider",
)
container_name = gr.Textbox(
label="Container Name",
placeholder="Enter container name (optional)",
value="",
info="Optional name for the container",
)
cua_cloud_api_key = gr.Textbox(
label="CUA Cloud API Key",
placeholder="Enter your CUA Cloud API key",
value="",
type="password",
info="Required for cloud provider",
visible=(not has_cua_key)
)
with gr.Accordion("Agent Configuration", open=True):
# Configuration options
agent_loop = gr.Dropdown(
choices=["OPENAI", "ANTHROPIC", "OMNI", "UITARS"],
@@ -986,6 +1084,10 @@ if __name__ == "__main__":
custom_api_key=None,
openai_key_input=None,
anthropic_key_input=None,
computer_os="macos",
computer_provider="lume",
container_name="",
cua_cloud_api_key="",
):
if not history:
yield history
@@ -1083,6 +1185,8 @@ if __name__ == "__main__":
else:
# For Ollama or default OAICOMPAT (without custom key), no key needed/expected
api_key = ""
cua_cloud_api_key = cua_cloud_api_key or os.environ.get("CUA_API_KEY", "")
# --- Save Settings Before Running Agent ---
current_settings = {
@@ -1092,6 +1196,10 @@ if __name__ == "__main__":
"provider_base_url": custom_url_value,
"save_trajectory": save_traj,
"recent_images": recent_imgs,
"computer_os": computer_os,
"computer_provider": computer_provider,
"container_name": container_name,
"cua_cloud_api_key": cua_cloud_api_key,
}
save_settings(current_settings)
# --- End Save Settings ---
@@ -1109,6 +1217,10 @@ if __name__ == "__main__":
use_oaicompat=is_oaicompat, # Set flag if custom model was selected
# Pass custom URL only if custom model was selected
provider_base_url=custom_url_value if is_oaicompat else None,
computer_os=computer_os,
computer_provider=computer_provider,
computer_name=container_name,
computer_api_key=cua_cloud_api_key,
verbosity=logging.DEBUG, # Added verbosity here
)
@@ -1235,6 +1347,10 @@ if __name__ == "__main__":
provider_api_key,
openai_api_key_input,
anthropic_api_key_input,
computer_os,
computer_provider,
container_name,
cua_cloud_api_key,
],
outputs=[chatbot_history],
queue=True,
@@ -1253,82 +1369,20 @@ if __name__ == "__main__":
# Function to update the code display based on configuration and chat history
def update_code_display(agent_loop, model_choice_val, custom_model_val, chat_history, provider_base_url, recent_images_val, save_trajectory_val):
def update_code_display(agent_loop, model_choice_val, custom_model_val, chat_history, provider_base_url, recent_images_val, save_trajectory_val, computer_os, computer_provider, container_name, cua_cloud_api_key):
# Extract messages from chat history
messages = []
if chat_history:
for msg in chat_history:
if msg.get("role") == "user":
if isinstance(msg, dict) and msg.get("role") == "user":
messages.append(msg.get("content", ""))
# Determine if this is a custom model selection and which type
is_custom_openai_api = model_choice_val == "Custom model (OpenAI compatible API)"
is_custom_ollama = model_choice_val == "Custom model (ollama)"
is_custom_model_selected = is_custom_openai_api or is_custom_ollama
# Determine provider and model based on current selection
provider, model_name, _ = get_provider_and_model(
model_choice_val or custom_model_val or "gpt-4o",
agent_loop
)
# Determine provider and model name based on agent loop
if agent_loop == "OPENAI":
# For OPENAI loop, always use OPENAI provider with computer-use-preview
provider = LLMProvider.OPENAI
model_name = "computer-use-preview"
elif agent_loop == "ANTHROPIC":
# For ANTHROPIC loop, always use ANTHROPIC provider
provider = LLMProvider.ANTHROPIC
# Extract model name from the UI string
if model_choice_val.startswith("Anthropic: Claude "):
# Extract the model name based on the UI string
model_parts = model_choice_val.replace("Anthropic: Claude ", "").split(" (")
version = model_parts[0] # e.g., "3.7 Sonnet"
date = model_parts[1].replace(")", "") if len(model_parts) > 1 else "" # e.g., "20250219"
# Format as claude-3-7-sonnet-20250219 or claude-3-5-sonnet-20240620
version = version.replace(".", "-").replace(" ", "-").lower()
model_name = f"claude-{version}-{date}"
else:
# Use the model_choice_val directly if it doesn't match the expected format
model_name = model_choice_val
elif agent_loop == "UITARS":
# For UITARS, use MLXVLM for mlx-community models, OAICOMPAT for custom
if model_choice_val == "Custom model (OpenAI compatible API)":
provider = LLMProvider.OAICOMPAT
model_name = custom_model_val
else:
provider = LLMProvider.MLXVLM
model_name = model_choice_val
elif agent_loop == "OMNI":
# For OMNI, provider can be OPENAI, ANTHROPIC, OLLAMA, or OAICOMPAT
if is_custom_openai_api:
provider = LLMProvider.OAICOMPAT
model_name = custom_model_val
elif is_custom_ollama:
provider = LLMProvider.OLLAMA
model_name = custom_model_val
elif model_choice_val.startswith("OMNI: OpenAI "):
provider = LLMProvider.OPENAI
# Extract model name from UI string (e.g., "OMNI: OpenAI GPT-4o" -> "gpt-4o")
model_name = model_choice_val.replace("OMNI: OpenAI ", "").lower().replace(" ", "-")
elif model_choice_val.startswith("OMNI: Claude "):
provider = LLMProvider.ANTHROPIC
# Extract model name from UI string (similar to ANTHROPIC loop case)
model_parts = model_choice_val.replace("OMNI: Claude ", "").split(" (")
version = model_parts[0] # e.g., "3.7 Sonnet"
date = model_parts[1].replace(")", "") if len(model_parts) > 1 else "" # e.g., "20250219"
# Format as claude-3-7-sonnet-20250219 or claude-3-5-sonnet-20240620
version = version.replace(".", "-").replace(" ", "-").lower()
model_name = f"claude-{version}-{date}"
elif model_choice_val.startswith("OMNI: Ollama "):
provider = LLMProvider.OLLAMA
# Extract model name from UI string (e.g., "OMNI: Ollama llama3" -> "llama3")
model_name = model_choice_val.replace("OMNI: Ollama ", "")
else:
# Fallback to get_provider_and_model for any other cases
provider, model_name, _ = get_provider_and_model(model_choice_val, agent_loop)
else:
# Fallback for any other agent loop
provider, model_name, _ = get_provider_and_model(model_choice_val, agent_loop)
# Generate and return the code
return generate_python_code(
agent_loop,
provider,
@@ -1336,38 +1390,62 @@ if __name__ == "__main__":
messages,
provider_base_url,
recent_images_val,
save_trajectory_val
save_trajectory_val,
computer_os,
computer_provider,
container_name,
cua_cloud_api_key
)
# Update code display when configuration changes
agent_loop.change(
update_code_display,
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
outputs=[code_display]
)
model_choice.change(
update_code_display,
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
outputs=[code_display]
)
custom_model.change(
update_code_display,
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
outputs=[code_display]
)
chatbot_history.change(
update_code_display,
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
outputs=[code_display]
)
recent_images.change(
update_code_display,
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
outputs=[code_display]
)
save_trajectory.change(
update_code_display,
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
outputs=[code_display]
)
computer_os.change(
update_code_display,
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
outputs=[code_display]
)
computer_provider.change(
update_code_display,
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
outputs=[code_display]
)
container_name.change(
update_code_display,
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
outputs=[code_display]
)
cua_cloud_api_key.change(
update_code_display,
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
outputs=[code_display]
)
@@ -1377,7 +1455,7 @@ if __name__ == "__main__":
def test_cua():
"""Standalone function to launch the Gradio app."""
demo = create_gradio_ui()
demo.launch(share=False) # Don't create a public link
demo.launch(share=False, inbrowser=True) # Don't create a public link
if __name__ == "__main__":

View File

@@ -19,11 +19,11 @@ dependencies = [
"pydantic>=2.6.4,<3.0.0",
"rich>=13.7.1,<14.0.0",
"python-dotenv>=1.0.1,<2.0.0",
"cua-computer>=0.1.0,<0.2.0",
"cua-computer>=0.2.0,<0.3.0",
"cua-core>=0.1.0,<0.2.0",
"certifi>=2024.2.2"
]
requires-python = ">=3.10"
requires-python = ">=3.11"
[project.optional-dependencies]
anthropic = [
@@ -102,11 +102,11 @@ source-includes = ["tests/", "README.md", "LICENSE"]
[tool.black]
line-length = 100
target-version = ["py310"]
target-version = ["py311"]
[tool.ruff]
line-length = 100
target-version = "py310"
target-version = "py311"
select = ["E", "F", "B", "I"]
fix = true
@@ -115,7 +115,7 @@ docstring-code-format = true
[tool.mypy]
strict = true
python_version = "3.10"
python_version = "3.11"
ignore_missing_imports = true
disallow_untyped_defs = true
check_untyped_defs = true

View File

@@ -27,6 +27,16 @@ def parse_args(args: Optional[List[str]] = None) -> argparse.Namespace:
default="info",
help="Logging level (default: info)",
)
parser.add_argument(
"--ssl-keyfile",
type=str,
help="Path to SSL private key file (enables HTTPS)",
)
parser.add_argument(
"--ssl-certfile",
type=str,
help="Path to SSL certificate file (enables HTTPS)",
)
return parser.parse_args(args)
@@ -43,7 +53,21 @@ def main() -> None:
# Create and start the server
logger.info(f"Starting CUA Computer API server on {args.host}:{args.port}...")
server = Server(host=args.host, port=args.port, log_level=args.log_level)
# Handle SSL configuration
ssl_args = {}
if args.ssl_keyfile and args.ssl_certfile:
ssl_args = {
"ssl_keyfile": args.ssl_keyfile,
"ssl_certfile": args.ssl_certfile,
}
logger.info("HTTPS mode enabled with SSL certificates")
elif args.ssl_keyfile or args.ssl_certfile:
logger.warning("Both --ssl-keyfile and --ssl-certfile are required for HTTPS. Running in HTTP mode.")
else:
logger.info("HTTP mode (no SSL certificates provided)")
server = Server(host=args.host, port=args.port, log_level=args.log_level, **ssl_args)
try:
server.start()

View File

@@ -8,11 +8,11 @@ import traceback
from contextlib import redirect_stdout, redirect_stderr
from io import StringIO
from .handlers.factory import HandlerFactory
import os
import aiohttp
# Set up logging with more detail
logging.basicConfig(
level=logging.DEBUG, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Configure WebSocket with larger message size
@@ -48,6 +48,112 @@ manager = ConnectionManager()
async def websocket_endpoint(websocket: WebSocket):
# WebSocket message size is configured at the app or endpoint level, not on the instance
await manager.connect(websocket)
# Check if CONTAINER_NAME is set (indicating cloud provider)
container_name = os.environ.get("CONTAINER_NAME")
# If cloud provider, perform authentication handshake
if container_name:
try:
logger.info(f"Cloud provider detected. CONTAINER_NAME: {container_name}. Waiting for authentication...")
# Wait for authentication message
auth_data = await websocket.receive_json()
# Validate auth message format
if auth_data.get("command") != "authenticate":
await websocket.send_json({
"success": False,
"error": "First message must be authentication"
})
await websocket.close()
manager.disconnect(websocket)
return
# Extract credentials
client_api_key = auth_data.get("params", {}).get("api_key")
client_container_name = auth_data.get("params", {}).get("container_name")
# Layer 1: VM Identity Verification
if client_container_name != container_name:
logger.warning(f"VM name mismatch. Expected: {container_name}, Got: {client_container_name}")
await websocket.send_json({
"success": False,
"error": "VM name mismatch"
})
await websocket.close()
manager.disconnect(websocket)
return
# Layer 2: API Key Validation with TryCUA API
if not client_api_key:
await websocket.send_json({
"success": False,
"error": "API key required"
})
await websocket.close()
manager.disconnect(websocket)
return
# Validate with TryCUA API
try:
async with aiohttp.ClientSession() as session:
headers = {
"Authorization": f"Bearer {client_api_key}"
}
async with session.get(
f"https://www.trycua.com/api/vm/auth?container_name={container_name}",
headers=headers,
) as resp:
if resp.status != 200:
error_msg = await resp.text()
logger.warning(f"API validation failed: {error_msg}")
await websocket.send_json({
"success": False,
"error": "Authentication failed"
})
await websocket.close()
manager.disconnect(websocket)
return
# If we get a 200 response with VNC URL, the VM exists and user has access
vnc_url = (await resp.text()).strip()
if not vnc_url:
logger.warning(f"No VNC URL returned for VM: {container_name}")
await websocket.send_json({
"success": False,
"error": "VM not found"
})
await websocket.close()
manager.disconnect(websocket)
return
logger.info(f"Authentication successful for VM: {container_name}")
await websocket.send_json({
"success": True,
"message": "Authenticated"
})
except Exception as e:
logger.error(f"Error validating with TryCUA API: {e}")
await websocket.send_json({
"success": False,
"error": "Authentication service unavailable"
})
await websocket.close()
manager.disconnect(websocket)
return
except Exception as e:
logger.error(f"Authentication error: {e}")
await websocket.send_json({
"success": False,
"error": "Authentication failed"
})
await websocket.close()
manager.disconnect(websocket)
return
# Map commands to appropriate handler methods
handlers = {

View File

@@ -32,7 +32,8 @@ class Server:
await server.stop() # Stop the server
"""
def __init__(self, host: str = "0.0.0.0", port: int = 8000, log_level: str = "info"):
def __init__(self, host: str = "0.0.0.0", port: int = 8000, log_level: str = "info",
ssl_keyfile: Optional[str] = None, ssl_certfile: Optional[str] = None):
"""
Initialize the server.
@@ -40,10 +41,14 @@ class Server:
host: Host to bind the server to
port: Port to bind the server to
log_level: Logging level (debug, info, warning, error, critical)
ssl_keyfile: Path to SSL private key file (for HTTPS)
ssl_certfile: Path to SSL certificate file (for HTTPS)
"""
self.host = host
self.port = port
self.log_level = log_level
self.ssl_keyfile = ssl_keyfile
self.ssl_certfile = ssl_certfile
self.app = fastapi_app
self._server_task: Optional[asyncio.Task] = None
self._should_exit = asyncio.Event()
@@ -52,7 +57,14 @@ class Server:
"""
Start the server synchronously. This will block until the server is stopped.
"""
uvicorn.run(self.app, host=self.host, port=self.port, log_level=self.log_level)
uvicorn.run(
self.app,
host=self.host,
port=self.port,
log_level=self.log_level,
ssl_keyfile=self.ssl_keyfile,
ssl_certfile=self.ssl_certfile
)
async def start_async(self) -> None:
"""
@@ -60,7 +72,12 @@ class Server:
will run in the background.
"""
server_config = uvicorn.Config(
self.app, host=self.host, port=self.port, log_level=self.log_level
self.app,
host=self.host,
port=self.port,
log_level=self.log_level,
ssl_keyfile=self.ssl_keyfile,
ssl_certfile=self.ssl_certfile
)
self._should_exit.clear()
@@ -72,7 +89,8 @@ class Server:
# Wait a short time to ensure the server starts
await asyncio.sleep(0.5)
logger.info(f"Server started at http://{self.host}:{self.port}")
protocol = "https" if self.ssl_certfile else "http"
logger.info(f"Server started at {protocol}://{self.host}:{self.port}")
async def stop(self) -> None:
"""

View File

@@ -17,7 +17,8 @@ dependencies = [
"uvicorn[standard]>=0.27.0",
"pydantic>=2.0.0",
"pyautogui>=0.9.54",
"pillow>=10.2.0"
"pillow>=10.2.0",
"aiohttp>=3.9.1"
]
[project.optional-dependencies]

View File

@@ -51,7 +51,8 @@ class Computer:
noVNC_port: Optional[int] = 8006,
host: str = os.environ.get("PYLUME_HOST", "localhost"),
storage: Optional[str] = None,
ephemeral: bool = False
ephemeral: bool = False,
api_key: Optional[str] = None
):
"""Initialize a new Computer instance.
@@ -90,6 +91,8 @@ class Computer:
self.os_type = os_type
self.provider_type = provider_type
self.ephemeral = ephemeral
self.api_key = api_key
# The default is currently to use non-ephemeral storage
if storage and ephemeral and storage != "ephemeral":
@@ -269,9 +272,7 @@ class Computer:
elif self.provider_type == VMProviderType.CLOUD:
self.config.vm_provider = VMProviderFactory.create_provider(
self.provider_type,
port=port,
host=host,
storage=storage,
api_key=self.api_key,
verbose=verbose,
)
else:
@@ -405,12 +406,25 @@ class Computer:
self.logger.info(f"Initializing interface for {self.os_type} at {ip_address}")
from .interface.base import BaseComputerInterface
self._interface = cast(
BaseComputerInterface,
InterfaceFactory.create_interface_for_os(
os=self.os_type, ip_address=ip_address # type: ignore[arg-type]
),
)
# Pass authentication credentials if using cloud provider
if self.provider_type == VMProviderType.CLOUD and self.api_key and self.config.name:
self._interface = cast(
BaseComputerInterface,
InterfaceFactory.create_interface_for_os(
os=self.os_type,
ip_address=ip_address,
api_key=self.api_key,
vm_name=self.config.name
),
)
else:
self._interface = cast(
BaseComputerInterface,
InterfaceFactory.create_interface_for_os(
os=self.os_type,
ip_address=ip_address
),
)
# Wait for the WebSocket interface to be ready
self.logger.info("Connecting to WebSocket interface...")
@@ -505,6 +519,11 @@ class Computer:
# Call the provider's get_ip method which will wait indefinitely
storage_param = "ephemeral" if self.ephemeral else self.storage
# Log the image being used
self.logger.info(f"Running VM using image: {self.image}")
# Call provider.get_ip with explicit image parameter
ip = await self.config.vm_provider.get_ip(
name=self.config.name,
storage=storage_param,

View File

@@ -8,17 +8,21 @@ from ..logger import Logger, LogLevel
class BaseComputerInterface(ABC):
"""Base class for computer control interfaces."""
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume"):
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume", api_key: Optional[str] = None, vm_name: Optional[str] = None):
"""Initialize interface.
Args:
ip_address: IP address of the computer to control
username: Username for authentication
password: Password for authentication
api_key: Optional API key for cloud authentication
vm_name: Optional VM name for cloud authentication
"""
self.ip_address = ip_address
self.username = username
self.password = password
self.api_key = api_key
self.vm_name = vm_name
self.logger = Logger("cua.interface", LogLevel.NORMAL)
@abstractmethod

View File

@@ -1,6 +1,6 @@
"""Factory for creating computer interfaces."""
from typing import Literal
from typing import Literal, Optional
from .base import BaseComputerInterface
class InterfaceFactory:
@@ -9,13 +9,17 @@ class InterfaceFactory:
@staticmethod
def create_interface_for_os(
os: Literal['macos', 'linux'],
ip_address: str
ip_address: str,
api_key: Optional[str] = None,
vm_name: Optional[str] = None
) -> BaseComputerInterface:
"""Create an interface for the specified OS.
Args:
os: Operating system type ('macos' or 'linux')
ip_address: IP address of the computer to control
api_key: Optional API key for cloud authentication
vm_name: Optional VM name for cloud authentication
Returns:
BaseComputerInterface: The appropriate interface for the OS
@@ -28,8 +32,8 @@ class InterfaceFactory:
from .linux import LinuxComputerInterface
if os == 'macos':
return MacOSComputerInterface(ip_address)
return MacOSComputerInterface(ip_address, api_key=api_key, vm_name=vm_name)
elif os == 'linux':
return LinuxComputerInterface(ip_address)
return LinuxComputerInterface(ip_address, api_key=api_key, vm_name=vm_name)
else:
raise ValueError(f"Unsupported OS type: {os}")

View File

@@ -15,8 +15,8 @@ from .models import Key, KeyType
class LinuxComputerInterface(BaseComputerInterface):
"""Interface for Linux."""
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume"):
super().__init__(ip_address, username, password)
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume", api_key: Optional[str] = None, vm_name: Optional[str] = None):
super().__init__(ip_address, username, password, api_key, vm_name)
self._ws = None
self._reconnect_task = None
self._closed = False
@@ -26,6 +26,7 @@ class LinuxComputerInterface(BaseComputerInterface):
self._reconnect_delay = 1 # Start with 1 second delay
self._max_reconnect_delay = 30 # Maximum delay between reconnection attempts
self._log_connection_attempts = True # Flag to control connection attempt logging
self._authenticated = False # Track authentication status
# Set logger name for Linux interface
self.logger = Logger("cua.interface.linux", LogLevel.NORMAL)
@@ -37,7 +38,9 @@ class LinuxComputerInterface(BaseComputerInterface):
Returns:
WebSocket URI for the Computer API Server
"""
return f"ws://{self.ip_address}:8000/ws"
protocol = "wss" if self.api_key else "ws"
port = "8443" if self.api_key else "8000"
return f"{protocol}://{self.ip_address}:{port}/ws"
async def _keep_alive(self):
"""Keep the WebSocket connection alive with automatic reconnection."""
@@ -86,9 +89,15 @@ class LinuxComputerInterface(BaseComputerInterface):
timeout=30,
)
self.logger.info("WebSocket connection established")
# Authentication will be handled by the first command that needs it
# Don't do authentication here to avoid recv conflicts
self._reconnect_delay = 1 # Reset reconnect delay on successful connection
self._last_ping = time.time()
retry_count = 0 # Reset retry count on successful connection
self._authenticated = False # Reset auth status on new connection
except (asyncio.TimeoutError, websockets.exceptions.WebSocketException) as e:
next_retry = self._reconnect_delay
@@ -112,13 +121,6 @@ class LinuxComputerInterface(BaseComputerInterface):
pass
self._ws = None
# Use exponential backoff for connection retries
await asyncio.sleep(self._reconnect_delay)
self._reconnect_delay = min(
self._reconnect_delay * 2, self._max_reconnect_delay
)
continue
# Regular ping to check connection
if self._ws and self._ws.state == websockets.protocol.State.OPEN:
try:
@@ -197,6 +199,31 @@ class LinuxComputerInterface(BaseComputerInterface):
if not self._ws:
raise ConnectionError("WebSocket connection is not established")
# Handle authentication if needed
if self.api_key and self.vm_name and not self._authenticated:
self.logger.info("Performing authentication handshake...")
auth_message = {
"command": "authenticate",
"params": {
"api_key": self.api_key,
"container_name": self.vm_name
}
}
await self._ws.send(json.dumps(auth_message))
# Wait for authentication response
auth_response = await asyncio.wait_for(self._ws.recv(), timeout=10)
auth_result = json.loads(auth_response)
if not auth_result.get("success"):
error_msg = auth_result.get("error", "Authentication failed")
self.logger.error(f"Authentication failed: {error_msg}")
self._authenticated = False
raise ConnectionError(f"Authentication failed: {error_msg}")
self.logger.info("Authentication successful")
self._authenticated = True
message = {"command": command, "params": params or {}}
await self._ws.send(json.dumps(message))
response = await asyncio.wait_for(self._ws.recv(), timeout=30)
@@ -217,9 +244,7 @@ class LinuxComputerInterface(BaseComputerInterface):
f"Failed to send command '{command}' after {max_retries} retries"
)
self.logger.debug(f"Command failure details: {e}")
raise
raise last_error if last_error else RuntimeError("Failed to send command")
raise last_error if last_error else RuntimeError("Failed to send command")
async def wait_for_ready(self, timeout: int = 60, interval: float = 1.0):
"""Wait for WebSocket connection to become available."""

View File

@@ -13,10 +13,10 @@ from .models import Key, KeyType
class MacOSComputerInterface(BaseComputerInterface):
"""Interface for MacOS."""
"""Interface for macOS."""
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume"):
super().__init__(ip_address, username, password)
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume", api_key: Optional[str] = None, vm_name: Optional[str] = None):
super().__init__(ip_address, username, password, api_key, vm_name)
self._ws = None
self._reconnect_task = None
self._closed = False
@@ -27,7 +27,7 @@ class MacOSComputerInterface(BaseComputerInterface):
self._max_reconnect_delay = 30 # Maximum delay between reconnection attempts
self._log_connection_attempts = True # Flag to control connection attempt logging
# Set logger name for MacOS interface
# Set logger name for macOS interface
self.logger = Logger("cua.interface.macos", LogLevel.NORMAL)
@property
@@ -37,7 +37,9 @@ class MacOSComputerInterface(BaseComputerInterface):
Returns:
WebSocket URI for the Computer API Server
"""
return f"ws://{self.ip_address}:8000/ws"
protocol = "wss" if self.api_key else "ws"
port = "8443" if self.api_key else "8000"
return f"{protocol}://{self.ip_address}:{port}/ws"
async def _keep_alive(self):
"""Keep the WebSocket connection alive with automatic reconnection."""
@@ -86,6 +88,32 @@ class MacOSComputerInterface(BaseComputerInterface):
timeout=30,
)
self.logger.info("WebSocket connection established")
# If api_key and vm_name are provided, perform authentication handshake
if self.api_key and self.vm_name:
self.logger.info("Performing authentication handshake...")
auth_message = {
"command": "authenticate",
"params": {
"api_key": self.api_key,
"container_name": self.vm_name
}
}
await self._ws.send(json.dumps(auth_message))
# Wait for authentication response
auth_response = await asyncio.wait_for(self._ws.recv(), timeout=10)
auth_result = json.loads(auth_response)
if not auth_result.get("success"):
error_msg = auth_result.get("error", "Authentication failed")
self.logger.error(f"Authentication failed: {error_msg}")
await self._ws.close()
self._ws = None
raise ConnectionError(f"Authentication failed: {error_msg}")
self.logger.info("Authentication successful")
self._reconnect_delay = 1 # Reset reconnect delay on successful connection
self._last_ping = time.time()
retry_count = 0 # Reset retry count on successful connection

View File

@@ -1,11 +1,11 @@
"""Base provider interface for VM backends."""
import abc
from enum import Enum
from enum import StrEnum
from typing import Dict, List, Optional, Any, AsyncContextManager
class VMProviderType(str, Enum):
class VMProviderType(StrEnum):
"""Enum of supported VM provider types."""
LUME = "lume"
LUMIER = "lumier"

View File

@@ -11,90 +11,65 @@ from ..base import BaseVMProvider, VMProviderType
# Setup logging
logger = logging.getLogger(__name__)
import asyncio
import aiohttp
from urllib.parse import urlparse
class CloudProvider(BaseVMProvider):
"""Cloud VM Provider stub implementation.
This is a placeholder for a future cloud VM provider implementation.
"""
"""Cloud VM Provider implementation."""
def __init__(
self,
host: str = "localhost",
port: int = 7777,
storage: Optional[str] = None,
self,
api_key: str,
verbose: bool = False,
**kwargs,
):
"""Initialize the Cloud provider.
"""
Args:
host: Host to use for API connections (default: localhost)
port: Port for the API server (default: 7777)
storage: Path to store VM data
api_key: API key for authentication
name: Name of the VM
verbose: Enable verbose logging
"""
self.host = host
self.port = port
self.storage = storage
assert api_key, "api_key required for CloudProvider"
self.api_key = api_key
self.verbose = verbose
logger.warning("CloudProvider is not yet implemented")
@property
def provider_type(self) -> VMProviderType:
"""Get the provider type."""
return VMProviderType.CLOUD
async def __aenter__(self):
"""Enter async context manager."""
logger.debug("Entering CloudProvider context")
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
"""Exit async context manager."""
logger.debug("Exiting CloudProvider context")
pass
async def get_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
"""Get VM information by name."""
logger.warning("CloudProvider.get_vm is not implemented")
return {
"name": name,
"status": "unavailable",
"message": "CloudProvider is not implemented"
}
"""Get VM VNC URL by name using the cloud API."""
return {"name": name, "hostname": f"{name}.containers.cloud.trycua.com"}
async def list_vms(self) -> List[Dict[str, Any]]:
"""List all available VMs."""
logger.warning("CloudProvider.list_vms is not implemented")
return []
async def run_vm(self, image: str, name: str, run_opts: Dict[str, Any], storage: Optional[str] = None) -> Dict[str, Any]:
"""Run a VM with the given options."""
logger.warning("CloudProvider.run_vm is not implemented")
return {
"name": name,
"status": "unavailable",
"message": "CloudProvider is not implemented"
}
return {"name": name, "status": "unavailable", "message": "CloudProvider is not implemented"}
async def stop_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
"""Stop a running VM."""
logger.warning("CloudProvider.stop_vm is not implemented")
return {
"name": name,
"status": "stopped",
"message": "CloudProvider is not implemented"
}
return {"name": name, "status": "stopped", "message": "CloudProvider is not implemented"}
async def update_vm(self, name: str, update_opts: Dict[str, Any], storage: Optional[str] = None) -> Dict[str, Any]:
"""Update VM configuration."""
logger.warning("CloudProvider.update_vm is not implemented")
return {
"name": name,
"status": "unchanged",
"message": "CloudProvider is not implemented"
}
async def get_ip(self, name: str, storage: Optional[str] = None, retry_delay: int = 2) -> str:
"""Get the IP address of a VM."""
logger.warning("CloudProvider.get_ip is not implemented")
raise NotImplementedError("CloudProvider.get_ip is not implemented")
return {"name": name, "status": "unchanged", "message": "CloudProvider is not implemented"}
async def get_ip(self, name: Optional[str] = None, storage: Optional[str] = None, retry_delay: int = 2) -> str:
"""
Return the VM's IP address as '{container_name}.containers.cloud.trycua.com'.
Uses the provided 'name' argument (the VM name requested by the caller),
falling back to self.name only if 'name' is None.
Retries up to 3 times with retry_delay seconds if hostname is not available.
"""
if name is None:
raise ValueError("VM name is required for CloudProvider.get_ip")
return f"{name}.containers.cloud.trycua.com"

View File

@@ -22,7 +22,8 @@ class VMProviderFactory:
image: Optional[str] = None,
verbose: bool = False,
ephemeral: bool = False,
noVNC_port: Optional[int] = None
noVNC_port: Optional[int] = None,
**kwargs,
) -> BaseVMProvider:
"""Create a VM provider of the specified type.
@@ -101,12 +102,9 @@ class VMProviderFactory:
elif provider_type == VMProviderType.CLOUD:
try:
from .cloud import CloudProvider
# Return the stub implementation of CloudProvider
return CloudProvider(
host=host,
port=port,
storage=storage,
verbose=verbose
verbose=verbose,
**kwargs,
)
except ImportError as e:
logger.error(f"Failed to import CloudProvider: {e}")

View File

@@ -344,9 +344,15 @@ class LumierProvider(BaseVMProvider):
# Use the VM image passed from the Computer class
print(f"Using VM image: {self.image}")
# If ghcr.io is in the image, use the full image name
if "ghcr.io" in self.image:
vm_image = self.image
else:
vm_image = f"ghcr.io/trycua/{self.image}"
cmd.extend([
"-e", f"VM_NAME={self.container_name}",
"-e", f"VERSION=ghcr.io/trycua/{self.image}",
"-e", f"VERSION={vm_image}",
"-e", f"CPU_CORES={run_opts.get('cpu', '4')}",
"-e", f"RAM_SIZE={memory_mb}",
])

View File

@@ -18,7 +18,7 @@ dependencies = [
"cua-core>=0.1.0,<0.2.0",
"pydantic>=2.11.1"
]
requires-python = ">=3.10"
requires-python = ">=3.11"
[project.optional-dependencies]
lume = [
@@ -46,11 +46,11 @@ source-includes = ["tests/", "README.md", "LICENSE"]
[tool.black]
line-length = 100
target-version = ["py310"]
target-version = ["py311"]
[tool.ruff]
line-length = 100
target-version = "py310"
target-version = "py311"
select = ["E", "F", "B", "I"]
fix = true
@@ -59,7 +59,7 @@ docstring-code-format = true
[tool.mypy]
strict = true
python_version = "3.10"
python_version = "3.11"
ignore_missing_imports = true
disallow_untyped_defs = true
check_untyped_defs = true

View File

@@ -15,7 +15,7 @@ dependencies = [
"httpx>=0.24.0",
"posthog>=3.20.0"
]
requires-python = ">=3.10"
requires-python = ">=3.11"
[tool.pdm]
distribution = true
@@ -26,11 +26,11 @@ source-includes = ["tests/", "README.md", "LICENSE"]
[tool.black]
line-length = 100
target-version = ["py310"]
target-version = ["py311"]
[tool.ruff]
line-length = 100
target-version = "py310"
target-version = "py311"
select = ["E", "F", "B", "I"]
fix = true
@@ -39,7 +39,7 @@ docstring-code-format = true
[tool.mypy]
strict = true
python_version = "3.10"
python_version = "3.11"
ignore_missing_imports = true
disallow_untyped_defs = true
check_untyped_defs = true

View File

@@ -10,7 +10,6 @@
[![Swift 6](https://img.shields.io/badge/Swift_6-F54A2A?logo=swift&logoColor=white&labelColor=F54A2A)](#)
[![macOS](https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0)](#)
[![Homebrew](https://img.shields.io/badge/Homebrew-FBB040?logo=homebrew&logoColor=fff)](#install)
[![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white)](https://discord.com/invite/mVnXXpdE85)
</h1>
</div>

View File

@@ -6,15 +6,15 @@ build-backend = "pdm.backend"
name = "cua-mcp-server"
description = "MCP Server for Computer-Use Agent (CUA)"
readme = "README.md"
requires-python = ">=3.10"
requires-python = ">=3.11"
version = "0.1.0"
authors = [
{name = "TryCua", email = "gh@trycua.com"}
]
dependencies = [
"mcp>=1.6.0,<2.0.0",
"cua-agent[all]>=0.1.0,<0.2.0",
"cua-computer>=0.1.0,<0.2.0",
"cua-agent[all]>=0.2.0,<0.3.0",
"cua-computer>=0.2.0,<0.3.0",
]
[project.scripts]
@@ -31,10 +31,10 @@ dev = [
[tool.black]
line-length = 100
target-version = ["py310"]
target-version = ["py311"]
[tool.ruff]
line-length = 100
target-version = "py310"
target-version = "py311"
select = ["E", "F", "B", "I"]
fix = true

View File

@@ -43,13 +43,13 @@ dev = [
[tool.black]
line-length = 100
target-version = ["py310"]
target-version = ["py311"]
[tool.ruff]
fix = true
line-length = 100
select = ["B", "E", "F", "I"]
target-version = "py310"
target-version = "py311"
[tool.ruff.format]
docstring-code-format = true
@@ -58,7 +58,7 @@ docstring-code-format = true
check_untyped_defs = true
disallow_untyped_defs = true
ignore_missing_imports = true
python_version = "3.10"
python_version = "3.11"
show_error_codes = true
strict = true
warn_return_any = true

View File

@@ -24,7 +24,7 @@ dependencies = [
"typing-extensions>=4.9.0",
"pydantic>=2.6.3"
]
requires-python = ">=3.10"
requires-python = ">=3.11"
readme = "README.md"
license = {text = "MIT"}
keywords = ["computer-vision", "ocr", "ui-analysis", "icon-detection"]

View File

@@ -6,7 +6,7 @@
"source": [
"## Agent\n",
"\n",
"This notebook demonstrates how to use Cua's Agent to run a workflow in a virtual sandbox on Apple Silicon Macs."
"This notebook demonstrates how to use Cua's Agent to run workflows in virtual sandboxes, either using C/ua Cloud Containers or local VMs on Apple Silicon Macs."
]
},
{
@@ -68,7 +68,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Agent allows you to run an agentic workflow in a virtual sandbox instances on Apple Silicon. Here's a basic example:"
"Agent allows you to run an agentic workflow in virtual sandbox instances. You can choose between cloud containers or local VMs."
]
},
{
@@ -83,15 +83,17 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Get API keys from environment or prompt user\n",
"anthropic_key = os.getenv(\"ANTHROPIC_API_KEY\") or input(\"Enter your Anthropic API key: \")\n",
"openai_key = os.getenv(\"OPENAI_API_KEY\") or input(\"Enter your OpenAI API key: \")\n",
"anthropic_key = os.getenv(\"ANTHROPIC_API_KEY\") or \\\n",
" input(\"Enter your Anthropic API key: \")\n",
"openai_key = os.getenv(\"OPENAI_API_KEY\") or \\\n",
" input(\"Enter your OpenAI API key: \")\n",
"\n",
"os.environ[\"ANTHROPIC_API_KEY\"] = anthropic_key\n",
"os.environ[\"OPENAI_API_KEY\"] = openai_key"
@@ -101,7 +103,165 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Similar to Computer, you can either use the async context manager pattern or initialize the ComputerAgent instance directly."
"## Option 1: Agent with C/ua Cloud Containers\n",
"\n",
"Use cloud containers for running agents from any system without local setup."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites for Cloud Containers\n",
"\n",
"To use C/ua Cloud Containers, you need to:\n",
"1. Sign up at https://trycua.com\n",
"2. Create a Cloud Container\n",
"3. Generate an API Key\n",
"\n",
"Once you have these, you can connect to your cloud container and run agents on it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get C/ua API credentials and container details"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cua_api_key = os.getenv(\"CUA_API_KEY\") or \\\n",
" input(\"Enter your C/ua API Key: \")\n",
"container_name = os.getenv(\"CONTAINER_NAME\") or \\\n",
" input(\"Enter your Cloud Container name: \")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Choose the OS type for your container (linux or macos)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"os_type = input(\"Enter the OS type of your container (linux/macos) [default: linux]: \").lower() or \"linux\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an agent with cloud container"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"from pathlib import Path\n",
"\n",
"# Connect to your existing cloud container\n",
"computer = Computer(\n",
" os_type=os_type,\n",
" api_key=cua_api_key,\n",
" name=container_name,\n",
" provider_type=VMProviderType.CLOUD,\n",
" verbosity=logging.INFO\n",
")\n",
"\n",
"# Create agent\n",
"agent = ComputerAgent(\n",
" computer=computer,\n",
" loop=AgentLoop.OPENAI,\n",
" model=LLM(provider=LLMProvider.OPENAI),\n",
" save_trajectory=True,\n",
" trajectory_dir=str(Path(\"trajectories\")),\n",
" only_n_most_recent_images=3,\n",
" verbosity=logging.INFO\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run tasks on cloud container"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tasks = [\n",
" \"Open a web browser and navigate to GitHub\",\n",
" \"Search for the trycua/cua repository\",\n",
" \"Take a screenshot of the repository page\"\n",
"]\n",
"\n",
"for i, task in enumerate(tasks):\n",
" print(f\"\\nExecuting task {i+1}/{len(tasks)}: {task}\")\n",
" async for result in cloud_agent.run(task):\n",
" # print(result)\n",
" pass\n",
" print(f\"✅ Task {i+1}/{len(tasks)} completed: {task}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Option 2: Agent with Local VMs (Lume daemon)\n",
"\n",
"For Apple Silicon Macs, run agents on local VMs with near-native performance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we can create an agent, we need to initialize a local computer with Lume."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"from pathlib import Path\n",
"\n",
"\n",
"computer = Computer(\n",
" verbosity=logging.INFO, \n",
" provider_type=VMProviderType.LUME,\n",
" display=\"1024x768\",\n",
" memory=\"8GB\",\n",
" cpu=\"4\",\n",
" os_type=\"macos\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an agent with local VM"
]
},
{
@@ -117,22 +277,31 @@
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"from pathlib import Path\n",
"\n",
"computer = Computer(verbosity=logging.INFO, provider_type=VMProviderType.LUME)\n",
"\n",
"# Create agent with Anthropic loop and provider\n",
"agent = ComputerAgent(\n",
" computer=computer,\n",
" loop=AgentLoop.OPENAI,\n",
" model=LLM(provider=LLMProvider.OPENAI),\n",
" save_trajectory=True,\n",
" trajectory_dir=str(Path(\"trajectories\")),\n",
" only_n_most_recent_images=3,\n",
" verbosity=logging.INFO\n",
" )\n",
"\n",
" computer=computer,\n",
" loop=AgentLoop.OPENAI,\n",
" model=LLM(provider=LLMProvider.OPENAI),\n",
" save_trajectory=True,\n",
" trajectory_dir=str(Path(\"trajectories\")),\n",
" only_n_most_recent_images=3,\n",
" verbosity=logging.INFO\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run tasks on a local Lume VM"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tasks = [\n",
" \"Look for a repository named trycua/cua on GitHub.\",\n",
" \"Check the open issues, open the most recent one and read it.\",\n",
@@ -210,22 +379,6 @@
"The agent includes a Gradio-based user interface for easy interaction. To use it:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Get API keys from environment or prompt user\n",
"anthropic_key = os.getenv(\"ANTHROPIC_API_KEY\") or input(\"Enter your Anthropic API key: \")\n",
"openai_key = os.getenv(\"OPENAI_API_KEY\") or input(\"Enter your OpenAI API key: \")\n",
"\n",
"os.environ[\"ANTHROPIC_API_KEY\"] = anthropic_key\n",
"os.environ[\"OPENAI_API_KEY\"] = openai_key"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -237,6 +390,146 @@
"app = create_gradio_ui()\n",
"app.launch(share=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Agent Configurations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using different agent loops"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use different agent loops depending on your needs:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. OpenAI Agent Loop"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"openai_agent = ComputerAgent(\n",
" computer=computer, # Can be cloud or local\n",
" loop=AgentLoop.OPENAI,\n",
" model=LLM(provider=LLMProvider.OPENAI),\n",
" save_trajectory=True,\n",
" trajectory_dir=str(Path(\"trajectories\")),\n",
" verbosity=logging.INFO\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Anthropic Agent Loop"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"anthropic_agent = ComputerAgent(\n",
" computer=computer,\n",
" loop=AgentLoop.ANTHROPIC,\n",
" model=LLM(provider=LLMProvider.ANTHROPIC),\n",
" save_trajectory=True,\n",
" trajectory_dir=str(Path(\"trajectories\")),\n",
" verbosity=logging.INFO\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. Omni Agent Loop (supports multiple providers)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"omni_agent = ComputerAgent(\n",
" computer=computer,\n",
" loop=AgentLoop.OMNI,\n",
" model=LLM(provider=LLMProvider.ANTHROPIC, name=\"claude-3-7-sonnet-20250219\"),\n",
" # model=LLM(provider=LLMProvider.OPENAI, name=\"gpt-4.5-preview\"),\n",
" # model=LLM(provider=LLMProvider.OLLAMA, name=\"gemma3:12b-it-q4_K_M\"),\n",
" save_trajectory=True,\n",
" trajectory_dir=str(Path(\"trajectories\")),\n",
" only_n_most_recent_images=3,\n",
" verbosity=logging.INFO\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"4. UITARS Agent Loop (for local inference on Apple Silicon)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"uitars_agent = ComputerAgent(\n",
" computer=computer,\n",
" loop=AgentLoop.UITARS,\n",
" model=LLM(provider=LLMProvider.UITARS),\n",
" save_trajectory=True,\n",
" trajectory_dir=str(Path(\"trajectories\")),\n",
" verbosity=logging.INFO\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Trajectory viewing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All agent runs save trajectories that can be viewed at https://trycua.com/trajectory-viewer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f\"Trajectories saved to: {Path('trajectories').absolute()}\")\n",
"print(\"Upload trajectory files to https://trycua.com/trajectory-viewer to visualize agent actions\")\n"
]
}
],
"metadata": {

View File

@@ -6,7 +6,7 @@
"source": [
"## Computer\n",
"\n",
"This notebook demonstrates how to use Computer to operate a Lume sandbox VMs programmatically on Apple Silicon macOS systems."
"This notebook demonstrates how to use Computer to operate sandbox VMs programmatically, either using C/ua Cloud Containers or local Lume VMs on Apple Silicon macOS systems."
]
},
{
@@ -22,25 +22,23 @@
"metadata": {},
"outputs": [],
"source": [
"!pip uninstall -y cua-computer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip uninstall -y cua-computer\n",
"!pip install \"cua-computer[all]\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If locally installed, use this instead:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If locally installed, use this instead:\n",
"import os\n",
"\n",
"os.chdir('../libs/computer')\n",
@@ -55,7 +53,126 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lume daemon\n",
"## Option 1: C/ua Cloud Containers\n",
"\n",
"C/ua Cloud Containers provide remote VMs that can be accessed from any system without local setup."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites for Cloud Containers\n",
"\n",
"To use C/ua Cloud Containers, you need to:\n",
"1. Sign up at https://trycua.com\n",
"2. Create a Cloud Container\n",
"3. Generate an API Key\n",
"\n",
"Once you have these, you can connect to your cloud container using its name."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get API key and container name from environment or prompt user\n",
"import os\n",
"\n",
"cua_api_key = os.getenv(\"CUA_API_KEY\") or \\\n",
" input(\"Enter your C/ua API Key: \")\n",
"container_name = os.getenv(\"CONTAINER_NAME\") or \\\n",
" input(\"Enter your Cloud Container name: \")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Choose the OS type for your container (linux or macos)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"os_type = input(\"Enter the OS type of your container (linux/macos) [default: linux]: \").lower() or \"linux\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to your Cloud Container"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from computer import Computer, VMProviderType"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Connect to your existing C/ua Cloud Container"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"computer = Computer(\n",
" os_type=os_type, # Must match the OS type of your cloud container\n",
" api_key=cua_api_key,\n",
" name=container_name,\n",
" provider_type=VMProviderType.CLOUD,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Take a screenshot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"screenshot = await computer.interface.screenshot()\n",
"\n",
"with open(\"screenshot.png\", \"wb\") as f:\n",
" f.write(screenshot)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Option 2: Local VMs (Lume daemon)\n",
"\n",
"For Apple Silicon Macs, you can run VMs locally using the Lume daemon."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Lume daemon setup\n",
"\n",
"Refer to [../libs/lume/README.md](../libs/lume/README.md) for more details on the lume cli."
]
@@ -143,7 +260,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize a Computer instance"
"### Initialize a Local Computer instance"
]
},
{
@@ -190,7 +307,7 @@
" os_type=\"macos\",\n",
" provider_type=VMProviderType.LUME,\n",
") as computer:\n",
" await computer.run()\n",
" pass\n",
" # ... do something with the computer interface"
]
},
@@ -217,6 +334,15 @@
"await computer.run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computer Interface\n",
"\n",
"Both cloud and local computers provide the same interface for interaction."
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -461,7 +587,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "cua312",
"display_name": ".venv",
"language": "python",
"name": "python3"
},
@@ -475,7 +601,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.9"
"version": "3.12.2"
}
},
"nbformat": 4,

View File

@@ -9,7 +9,7 @@ description = "CUA (Computer Use Agent) mono-repo"
license = { text = "MIT" }
name = "cua-workspace"
readme = "README.md"
requires-python = ">=3.10"
requires-python = ">=3.11"
version = "0.1.0"
[project.urls]
@@ -53,13 +53,13 @@ respect-source-order = true
[tool.black]
line-length = 100
target-version = ["py310"]
target-version = ["py311"]
[tool.ruff]
fix = true
line-length = 100
select = ["B", "E", "F", "I"]
target-version = "py310"
target-version = "py311"
[tool.ruff.format]
docstring-code-format = true
@@ -68,7 +68,7 @@ docstring-code-format = true
check_untyped_defs = true
disallow_untyped_defs = true
ignore_missing_imports = true
python_version = "3.10"
python_version = "3.11"
show_error_codes = true
strict = true
warn_return_any = true

View File

@@ -2,83 +2,173 @@
set -e
echo "🚀 Setting up CUA playground environment..."
echo "🚀 Launching C/ua Computer-Use Agent UI..."
# Check for Apple Silicon Mac
if [[ $(uname -s) != "Darwin" || $(uname -m) != "arm64" ]]; then
echo "❌ This script requires an Apple Silicon Mac (M1/M2/M3/M4)."
exit 1
fi
# Save the original working directory
ORIGINAL_DIR="$(pwd)"
# Check for macOS 15 (Sequoia) or newer
OSVERSION=$(sw_vers -productVersion)
if [[ $(echo "$OSVERSION 15.0" | tr " " "\n" | sort -V | head -n 1) != "15.0" ]]; then
echo "❌ This script requires macOS 15 (Sequoia) or newer. You have $OSVERSION."
exit 1
fi
# Create a temporary directory for our work
TMP_DIR=$(mktemp -d)
cd "$TMP_DIR"
# Directories used by the script
DEMO_DIR="$HOME/.cua-demo"
VENV_DIR="$DEMO_DIR/venv"
# Function to clean up on exit
cleanup() {
cd ~
rm -rf "$TMP_DIR"
rm -rf "$TMP_DIR" 2>/dev/null || true
}
# Create a temporary directory for our work
TMP_DIR=$(mktemp -d)
cd "$TMP_DIR"
trap cleanup EXIT
# Install Lume if not already installed
if ! command -v lume &> /dev/null; then
echo "📦 Installing Lume CLI..."
curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash
# Ask user to choose between local macOS VMs or C/ua Cloud Containers
echo ""
echo "Choose your C/ua setup:"
echo "1) ☁️ C/ua Cloud Containers (works on any system)"
echo "2) 🖥️ Local macOS VMs (requires Apple Silicon Mac + macOS 15+)"
echo ""
read -p "Enter your choice (1 or 2): " CHOICE
if [[ "$CHOICE" == "1" ]]; then
# C/ua Cloud Container setup
echo ""
echo "☁️ Setting up C/ua Cloud Containers..."
echo ""
# Add lume to PATH for this session if it's not already there
if ! command -v lume &> /dev/null; then
export PATH="$PATH:$HOME/.local/bin"
# Check if existing .env.local already has CUA_API_KEY (check current dir and demo dir)
# Look for .env.local in the original working directory (before cd to temp dir)
CURRENT_ENV_FILE="$ORIGINAL_DIR/.env.local"
DEMO_ENV_FILE="$DEMO_DIR/.env.local"
CUA_API_KEY=""
# First check current directory
if [[ -f "$CURRENT_ENV_FILE" ]] && grep -q "CUA_API_KEY=" "$CURRENT_ENV_FILE"; then
EXISTING_CUA_KEY=$(grep "CUA_API_KEY=" "$CURRENT_ENV_FILE" | cut -d'=' -f2- | tr -d '"' | tr -d "'" | xargs)
if [[ -n "$EXISTING_CUA_KEY" && "$EXISTING_CUA_KEY" != "your_cua_api_key_here" && "$EXISTING_CUA_KEY" != "" ]]; then
CUA_API_KEY="$EXISTING_CUA_KEY"
fi
fi
# Then check demo directory if not found in current dir
if [[ -z "$CUA_API_KEY" ]] && [[ -f "$DEMO_ENV_FILE" ]] && grep -q "CUA_API_KEY=" "$DEMO_ENV_FILE"; then
EXISTING_CUA_KEY=$(grep "CUA_API_KEY=" "$DEMO_ENV_FILE" | cut -d'=' -f2- | tr -d '"' | tr -d "'" | xargs)
if [[ -n "$EXISTING_CUA_KEY" && "$EXISTING_CUA_KEY" != "your_cua_api_key_here" && "$EXISTING_CUA_KEY" != "" ]]; then
CUA_API_KEY="$EXISTING_CUA_KEY"
fi
fi
# If no valid API key found, prompt for one
if [[ -z "$CUA_API_KEY" ]]; then
echo "To use C/ua Cloud Containers, you need to:"
echo "1. Sign up at https://trycua.com"
echo "2. Create a Cloud Container"
echo "3. Generate an Api Key"
echo ""
read -p "Enter your C/ua Api Key: " CUA_API_KEY
if [[ -z "$CUA_API_KEY" ]]; then
echo "❌ C/ua Api Key is required for Cloud Containers."
exit 1
fi
fi
USE_CLOUD=true
elif [[ "$CHOICE" == "2" ]]; then
# Local macOS VM setup
echo ""
echo "🖥️ Setting up local macOS VMs..."
# Check for Apple Silicon Mac
if [[ $(uname -s) != "Darwin" || $(uname -m) != "arm64" ]]; then
echo "❌ Local macOS VMs require an Apple Silicon Mac (M1/M2/M3/M4)."
echo "💡 Consider using C/ua Cloud Containers instead (option 1)."
exit 1
fi
# Check for macOS 15 (Sequoia) or newer
OSVERSION=$(sw_vers -productVersion)
if [[ $(echo "$OSVERSION 15.0" | tr " " "\n" | sort -V | head -n 1) != "15.0" ]]; then
echo "❌ Local macOS VMs require macOS 15 (Sequoia) or newer. You have $OSVERSION."
echo "💡 Consider using C/ua Cloud Containers instead (option 1)."
exit 1
fi
USE_CLOUD=false
else
echo "❌ Invalid choice. Please run the script again and choose 1 or 2."
exit 1
fi
# Pull the macOS CUA image if not already present
if ! lume ls | grep -q "macos-sequoia-cua"; then
# Check available disk space
IMAGE_SIZE_GB=30
AVAILABLE_SPACE_KB=$(df -k $HOME | tail -1 | awk '{print $4}')
AVAILABLE_SPACE_GB=$(($AVAILABLE_SPACE_KB / 1024 / 1024))
echo "📊 The macOS CUA image will use approximately ${IMAGE_SIZE_GB}GB of disk space."
echo " You currently have ${AVAILABLE_SPACE_GB}GB available on your system."
# Prompt for confirmation
read -p " Continue? [y]/n: " CONTINUE
CONTINUE=${CONTINUE:-y}
if [[ $CONTINUE =~ ^[Yy]$ ]]; then
echo "📥 Pulling macOS CUA image (this may take a while)..."
lume pull macos-sequoia-cua:latest
else
echo "❌ Installation cancelled."
exit 1
# Install Lume if not already installed (only for local VMs)
if [[ "$USE_CLOUD" == "false" ]]; then
if ! command -v lume &> /dev/null; then
echo "📦 Installing Lume CLI..."
curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash
# Add lume to PATH for this session if it's not already there
if ! command -v lume &> /dev/null; then
export PATH="$PATH:$HOME/.local/bin"
fi
fi
# Pull the macOS CUA image if not already present
if ! lume ls | grep -q "macos-sequoia-cua"; then
# Check available disk space
IMAGE_SIZE_GB=30
AVAILABLE_SPACE_KB=$(df -k $HOME | tail -1 | awk '{print $4}')
AVAILABLE_SPACE_GB=$(($AVAILABLE_SPACE_KB / 1024 / 1024))
echo "📊 The macOS CUA image will use approximately ${IMAGE_SIZE_GB}GB of disk space."
echo " You currently have ${AVAILABLE_SPACE_GB}GB available on your system."
# Prompt for confirmation
read -p " Continue? [y]/n: " CONTINUE
CONTINUE=${CONTINUE:-y}
if [[ $CONTINUE =~ ^[Yy]$ ]]; then
echo "📥 Pulling macOS CUA image (this may take a while)..."
lume pull macos-sequoia-cua:latest
else
echo "❌ Installation cancelled."
exit 1
fi
fi
fi
# Create a Python virtual environment
echo "🐍 Setting up Python environment..."
PYTHON_CMD="python3"
# Check if Python 3.11+ is available
PYTHON_VERSION=$($PYTHON_CMD --version 2>&1 | cut -d" " -f2)
PYTHON_MAJOR=$(echo $PYTHON_VERSION | cut -d. -f1)
PYTHON_MINOR=$(echo $PYTHON_VERSION | cut -d. -f2)
# Try different Python commands in order of preference
PYTHON_CMD=""
for cmd in python3.11 python3 python; do
if command -v $cmd &> /dev/null; then
# Check if this Python version is 3.11+
PYTHON_VERSION=$($cmd --version 2>&1 | cut -d" " -f2)
PYTHON_MAJOR=$(echo $PYTHON_VERSION | cut -d. -f1)
PYTHON_MINOR=$(echo $PYTHON_VERSION | cut -d. -f2)
if [ "$PYTHON_MAJOR" -gt 3 ] || ([ "$PYTHON_MAJOR" -eq 3 ] && [ "$PYTHON_MINOR" -ge 11 ]); then
PYTHON_CMD=$cmd
echo "✅ Found suitable Python: $cmd (version $PYTHON_VERSION)"
break
else
echo "⚠️ Found $cmd (version $PYTHON_VERSION) but it's too old, trying next..."
fi
fi
done
if [ "$PYTHON_MAJOR" -lt 3 ] || ([ "$PYTHON_MAJOR" -eq 3 ] && [ "$PYTHON_MINOR" -lt 11 ]); then
echo "❌ Python 3.11+ is required. You have $PYTHON_VERSION."
# If no suitable Python was found, error out
if [ -z "$PYTHON_CMD" ]; then
echo "❌ Python 3.11+ is required but not found."
echo "Please install Python 3.11+ and try again."
exit 1
fi
# Create a virtual environment
VENV_DIR="$HOME/.cua-venv"
if [ ! -d "$VENV_DIR" ]; then
$PYTHON_CMD -m venv "$VENV_DIR"
fi
@@ -87,66 +177,144 @@ fi
source "$VENV_DIR/bin/activate"
# Install required packages
echo "📦 Updating CUA packages..."
pip install -U pip
echo "📦 Updating C/ua packages..."
pip install -U pip setuptools wheel Cmake
pip install -U cua-computer "cua-agent[all]"
# Temporary fix for mlx-vlm, see https://github.com/Blaizzy/mlx-vlm/pull/349
pip install git+https://github.com/ddupont808/mlx-vlm.git@stable/fix/qwen2-position-id
# Create a simple demo script
DEMO_DIR="$HOME/.cua-demo"
mkdir -p "$DEMO_DIR"
cat > "$DEMO_DIR/run_demo.py" << 'EOF'
import asyncio
import os
from computer import Computer
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
from agent.ui.gradio.app import create_gradio_ui
# Try to load API keys from environment
api_key = os.environ.get("OPENAI_API_KEY", "")
if not api_key:
print("\n⚠ No OpenAI API key found. You'll need to provide one in the UI.")
# Launch the Gradio UI and open it in the browser
app = create_gradio_ui()
app.launch(share=False, inbrowser=True)
# Create .env.local file with API keys (only if it doesn't exist)
if [[ ! -f "$DEMO_DIR/.env.local" ]]; then
cat > "$DEMO_DIR/.env.local" << EOF
# Uncomment and add your API keys here
# OPENAI_API_KEY=your_openai_api_key_here
# ANTHROPIC_API_KEY=your_anthropic_api_key_here
CUA_API_KEY=your_cua_api_key_here
EOF
echo "📝 Created .env.local file with API key placeholders"
else
echo "📝 Found existing .env.local file - keeping your current settings"
fi
if [[ "$USE_CLOUD" == "true" ]]; then
# Add CUA API key to .env.local if not already present
if ! grep -q "CUA_API_KEY" "$DEMO_DIR/.env.local"; then
echo "CUA_API_KEY=$CUA_API_KEY" >> "$DEMO_DIR/.env.local"
echo "🔑 Added CUA_API_KEY to .env.local"
elif grep -q "CUA_API_KEY=your_cua_api_key_here" "$DEMO_DIR/.env.local"; then
# Update placeholder with actual key
sed -i.bak "s/CUA_API_KEY=your_cua_api_key_here/CUA_API_KEY=$CUA_API_KEY/" "$DEMO_DIR/.env.local"
echo "🔑 Updated CUA_API_KEY in .env.local"
fi
fi
# Create a convenience script to run the demo
cat > "$DEMO_DIR/start_demo.sh" << EOF
cat > "$DEMO_DIR/start_ui.sh" << EOF
#!/bin/bash
source "$VENV_DIR/bin/activate"
cd "$DEMO_DIR"
python run_demo.py
EOF
chmod +x "$DEMO_DIR/start_demo.sh"
chmod +x "$DEMO_DIR/start_ui.sh"
echo "✅ Setup complete!"
echo "🖥️ You can start the CUA playground by running: $DEMO_DIR/start_demo.sh"
# Check if the VM is running
echo "🔍 Checking if the macOS CUA VM is running..."
VM_RUNNING=$(lume ls | grep "macos-sequoia-cua" | grep "running" || echo "")
if [[ "$USE_CLOUD" == "true" ]]; then
# Create run_demo.py for cloud containers
cat > "$DEMO_DIR/run_demo.py" << 'EOF'
import asyncio
import os
from pathlib import Path
from dotenv import load_dotenv
from computer import Computer
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
from agent.ui.gradio.app import create_gradio_ui
if [ -z "$VM_RUNNING" ]; then
echo "🚀 Starting the macOS CUA VM in the background..."
lume run macos-sequoia-cua:latest &
# Wait a moment for the VM to initialize
sleep 5
echo "✅ VM started successfully."
# Load environment variables from .env.local
load_dotenv(Path(__file__).parent / ".env.local")
# Check for required API keys
cua_api_key = os.environ.get("CUA_API_KEY", "")
if not cua_api_key:
print("\n❌ CUA_API_KEY not found in .env.local file.")
print("Please add your CUA API key to the .env.local file.")
exit(1)
openai_key = os.environ.get("OPENAI_API_KEY", "")
anthropic_key = os.environ.get("ANTHROPIC_API_KEY", "")
if not openai_key and not anthropic_key:
print("\n⚠ No OpenAI or Anthropic API keys found in .env.local.")
print("Please add at least one API key to use AI agents.")
print("🚀 Starting CUA playground with Cloud Containers...")
print("📝 Edit .env.local to update your API keys")
# Launch the Gradio UI and open it in the browser
app = create_gradio_ui()
app.launch(share=False, inbrowser=True)
EOF
else
echo "✅ macOS CUA VM is already running."
# Create run_demo.py for local macOS VMs
cat > "$DEMO_DIR/run_demo.py" << 'EOF'
import asyncio
import os
from pathlib import Path
from dotenv import load_dotenv
from computer import Computer
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
from agent.ui.gradio.app import create_gradio_ui
# Load environment variables from .env.local
load_dotenv(Path(__file__).parent / ".env.local")
# Try to load API keys from environment
openai_key = os.environ.get("OPENAI_API_KEY", "")
anthropic_key = os.environ.get("ANTHROPIC_API_KEY", "")
if not openai_key and not anthropic_key:
print("\n⚠ No OpenAI or Anthropic API keys found in .env.local.")
print("Please add at least one API key to use AI agents.")
print("🚀 Starting CUA playground with local macOS VMs...")
print("📝 Edit .env.local to update your API keys")
# Launch the Gradio UI and open it in the browser
app = create_gradio_ui()
app.launch(share=False, inbrowser=True)
EOF
fi
echo "☁️ CUA Cloud Container setup complete!"
echo "📝 Edit $DEMO_DIR/.env.local to update your API keys"
echo "🖥️ Start the playground by running: $DEMO_DIR/start_ui.sh"
# Check if the VM is running (only for local setup)
if [[ "$USE_CLOUD" == "false" ]]; then
echo "🔍 Checking if the macOS CUA VM is running..."
VM_RUNNING=$(lume ls | grep "macos-sequoia-cua" | grep "running" || echo "")
if [ -z "$VM_RUNNING" ]; then
echo "🚀 Starting the macOS CUA VM in the background..."
lume run macos-sequoia-cua:latest &
# Wait a moment for the VM to initialize
sleep 5
echo "✅ VM started successfully."
else
echo "✅ macOS CUA VM is already running."
fi
fi
# Ask if the user wants to start the demo now
echo
read -p "Would you like to start the CUA playground now? (y/n) " -n 1 -r
read -p "Would you like to start the C/ua Computer-Use Agent UI now? (y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "🚀 Starting the CUA playground..."
echo "🚀 Starting the C/ua Computer-Use Agent UI..."
echo ""
"$DEMO_DIR/start_demo.sh"
"$DEMO_DIR/start_ui.sh"
fi