Add Cua Preview

2026-01-05 21:09:58 -06:00 · 2025-03-16 16:06:32 +01:00
parent 65a851c43b
commit 8baef75e6e
240 changed files with 21368 additions and 424 deletions
--- a/docs/API-Reference.md
+++ b/docs/API-Reference.md
@@ -1,227 +0,0 @@
-## API Reference
-
-<details open>
-<summary><strong>Create VM</strong> - POST /vms</summary>
-
-```bash
-curl --connect-timeout 6000 \
-    --max-time 5000 \
-    -X POST \
-    -H "Content-Type: application/json" \
-    -d '{
-      "name": "lume_vm",
-      "os": "macOS",
-      "cpu": 2,
-      "memory": "4GB",
-      "diskSize": "64GB",
-      "display": "1024x768",
-      "ipsw": "latest"
-    }' \
-    http://localhost:3000/lume/vms
-```
-</details>
-
-<details open>
-<summary><strong>Run VM</strong> - POST /vms/:name/run</summary>
-
-```bash
-# Basic run
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  -X POST \
-  http://localhost:3000/lume/vms/my-vm-name/run
-
-# Run with VNC client started and shared directory
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  -X POST \
-  -H "Content-Type: application/json" \
-  -d '{
-    "noDisplay": false,
-    "sharedDirectories": [
-      {
-        "hostPath": "~/Projects",
-        "readOnly": false
-      }
-    ],
-    "recoveryMode": false
-  }' \
-  http://localhost:3000/lume/vms/lume_vm/run
-```
-</details>
-
-<details open>
-<summary><strong>List VMs</strong> - GET /vms</summary>
-
-```bash
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  http://localhost:3000/lume/vms
-```
-```
-[
-  {
-    "name": "my-vm",
-    "state": "stopped",
-    "os": "macOS",
-    "cpu": 2,
-    "memory": "4GB",
-    "diskSize": "64GB"
-  },
-  {
-    "name": "my-vm-2",
-    "state": "stopped",
-    "os": "linux",
-    "cpu": 2,
-    "memory": "4GB",
-    "diskSize": "64GB"
-  }
-]
-```
-</details>
-
-<details open>
-<summary><strong>Get VM Details</strong> - GET /vms/:name</summary>
-
-```bash
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  http://localhost:3000/lume/vms/lume_vm\
-```
-```
-{
-  "name": "lume_vm",
-  "state": "running",
-  "os": "macOS",
-  "cpu": 2,
-  "memory": "4GB",
-  "diskSize": "64GB"
-}
-```
-</details>
-
-<details open>
-<summary><strong>Update VM Settings</strong> - PATCH /vms/:name</summary>
-
-```bash
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  -X PATCH \
-  -H "Content-Type: application/json" \
-  -d '{
-    "cpu": 4,
-    "memory": "8GB",
-    "diskSize": "128GB"
-  }' \
-  http://localhost:3000/lume/vms/my-vm-name
-```
-</details>
-
-<details open>
-<summary><strong>Stop VM</strong> - POST /vms/:name/stop</summary>
-
-```bash
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  -X POST \
-  http://localhost:3000/lume/vms/my-vm-name/stop
-```
-</details>
-
-<details open>
-<summary><strong>Delete VM</strong> - DELETE /vms/:name</summary>
-
-```bash
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  -X DELETE \
-  http://localhost:3000/lume/vms/my-vm-name
-```
-</details>
-
-<details open>
-<summary><strong>Pull Image</strong> - POST /pull</summary>
-
-```bash
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  -X POST \
-  -H "Content-Type: application/json" \
-  -d '{
-    "image": "macos-sequoia-vanilla:latest",
-    "name": "my-vm-name",
-    "registry": "ghcr.io",
-    "organization": "trycua"
-  }' \
-  http://localhost:3000/lume/pull
-```
-
-```bash
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  -X POST \
-  -H "Content-Type: application/json" \
-  -d '{
-    "image": "macos-sequoia-vanilla:15.2",
-    "name": "macos-sequoia-vanilla"
-  }' \
-  http://localhost:3000/lume/pull
-```
-</details>
-
-<details open>
-<summary><strong>Clone VM</strong> - POST /vms/:name/clone</summary>
-
-```bash
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  -X POST \
-  -H "Content-Type: application/json" \
-  -d '{
-    "name": "source-vm",
-    "newName": "cloned-vm"
-  }' \
-  http://localhost:3000/lume/vms/source-vm/clone
-```
-</details>
-
-<details open>
-<summary><strong>Get Latest IPSW URL</strong> - GET /ipsw</summary>
-
-```bash
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  http://localhost:3000/lume/ipsw
-```
-</details>
-
-<details open>
-<summary><strong>List Images</strong> - GET /images</summary>
-
-```bash
-# List images with default organization (trycua)
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  http://localhost:3000/lume/images
-```
-
-```json
-{
-  "local": [
-    "macos-sequoia-xcode:latest",
-    "macos-sequoia-vanilla:latest"
-  ]
-}
-```
-</details>
-
-<details open>
-<summary><strong>Prune Images</strong> - POST /lume/prune</summary>
-
-```bash
-curl --connect-timeout 6000 \
-  --max-time 5000 \
-  -X POST \
-  http://localhost:3000/lume/prune
-```
-</details>
--- a/docs/Developer-Guide.md
+++ b/docs/Developer-Guide.md
@@ -0,0 +1,147 @@
+## Developer Guide
+
+### Project Structure
+
+The project is organized as a monorepo with these main packages:
+- `libs/core/` - Base package with telemetry support
+- `libs/pylume/` - Python bindings for Lume
+- `libs/computer/` - Core computer interaction library
+- `libs/agent/` - AI agent library with multi-provider support
+- `libs/som/` - Computer vision and NLP processing library (formerly omniparser)
+- `libs/computer-server/` - Server implementation for computer control
+- `libs/lume/` - Swift implementation for enhanced macOS integration
+
+Each package has its own virtual environment and dependencies, managed through PDM.
+
+### Local Development Setup
+
+1. Clone the repository:
+```bash
+git clone https://github.com/trycua/cua.git
+cd cua
+```
+
+2. Create a `.env.local` file in the root directory with your API keys:
+```bash
+# Required for Anthropic provider
+ANTHROPIC_API_KEY=your_anthropic_key_here
+
+# Required for OpenAI provider
+OPENAI_API_KEY=your_openai_key_here
+```
+
+3. Run the build script to set up all packages:
+```bash
+./scripts/build.sh
+```
+
+This will:
+- Create a virtual environment for the project
+- Install all packages in development mode
+- Set up the correct Python path
+- Install development tools
+
+4. Open the workspace in VSCode or Cursor:
+```bash
+# Using VSCode or Cursor
+code .vscode/py.code-workspace
+
+# For Lume (Swift) development
+code .vscode/lume.code-workspace
+```
+
+Using the workspace file is strongly recommended as it:
+- Sets up correct Python environments for each package
+- Configures proper import paths
+- Enables debugging configurations
+- Maintains consistent settings across packages
+
+### Cleanup and Reset
+
+If you need to clean up the environment and start fresh:
+
+```bash
+./scripts/cleanup.sh
+```
+
+This will:
+- Remove all virtual environments
+- Clean Python cache files and directories
+- Remove build artifacts
+- Clean PDM-related files
+- Reset environment configurations
+
+### Package Virtual Environments
+
+The build script creates a shared virtual environment for all packages. The workspace configuration automatically handles import paths with the correct Python path settings.
+
+### Running Examples
+
+The Python workspace includes launch configurations for all packages:
+
+- "Run Computer Examples" - Runs computer examples
+- "Run Computer API Server" - Runs the computer-server
+- "Run Omni Agent Examples" - Runs agent examples
+- "SOM" configurations - Various settings for running SOM
+
+To run examples:
+1. Open the workspace file (`.vscode/py.code-workspace`)
+2. Press F5 or use the Run/Debug view
+3. Select the desired configuration
+
+The workspace also includes compound launch configurations:
+- "Run Computer Examples + Server" - Runs both the Computer Examples and Server simultaneously
+
+## Release and Publishing Process
+
+This monorepo contains multiple Python packages that can be published to PyPI. The packages 
+have dependencies on each other in the following order:
+
+1. `pylume` - Base package for VM management
+2. `cua-computer` - Computer control interface (depends on pylume)
+3. `cua-som` - Parser for UI elements (independent, formerly omniparser)
+4. `cua-agent` - AI agent (depends on cua-computer and optionally cua-som)
+5. `computer-server` - Server component installed on the sandbox
+
+#### Workflow Structure
+
+The publishing process is managed by these GitHub workflow files:
+
+- **Package-specific workflows**: 
+  - `.github/workflows/publish-pylume.yml`
+  - `.github/workflows/publish-computer.yml`
+  - `.github/workflows/publish-som.yml`
+  - `.github/workflows/publish-agent.yml`
+  - `.github/workflows/publish-computer-server.yml`
+
+- **Coordinator workflow**:
+  - `.github/workflows/publish-all.yml` - Manages global releases and manual selections
+
+### Version Management
+
+#### Special Considerations for Pylume
+
+The `pylume` package requires special handling as it incorporates the binary executable from the [lume repository](https://github.com/trycua/lume):
+
+- When releasing `pylume`, ensure the version matches a corresponding release in the lume repository
+- The workflow automatically downloads the matching lume binary and includes it in the pylume package
+- If you need to release a new version of pylume, make sure to coordinate with a matching lume release
+
+## Development Workspaces
+
+This monorepo includes multiple VS Code workspace configurations to optimize the development experience based on which components you're working with:
+
+### Available Workspace Files
+
+- **[py.code-workspace](.vscode/py.code-workspace)**: For Python package development (Computer, Agent, SOM, etc.)
+- **[lume.code-workspace](.vscode/lume.code-workspace)**: For Swift-based Lume development
+
+To open a specific workspace:
+
+```bash
+# For Python development
+code .vscode/py.code-workspace
+
+# For Lume (Swift) development
+code .vscode/lume.code-workspace
+```
--- a/docs/Development.md
+++ b/docs/Development.md
@@ -1,45 +0,0 @@
-# Development Guide
-
-This guide will help you set up your development environment and understand the process for contributing code to lume.
-
-## Environment Setup
-
-Lume development requires:
- Swift 6 or higher
- Xcode 15 or higher
- macOS Sequoia 15.2 or higher
- (Optional) VS Code with Swift extension
-
-## Setting Up the Repository Locally
-
-1. **Fork the Repository**: Create your own fork of lume
-2. **Clone the Repository**: 
-   ```bash
-   git clone https://github.com/trycua/lume.git
-   cd lume
-   ```
-3. **Install Dependencies**:
-   ```bash
-   swift package resolve
-   ```
-4. **Build the Project**:
-   ```bash
-   swift build
-   ```
-
-## Development Workflow
-
-1. Create a new branch for your changes
-2. Make your changes
-3. Run the tests: `swift test`
-4. Build and test your changes locally
-5. Commit your changes with clear commit messages
-
-## Submitting Pull Requests
-
-1. Push your changes to your fork
-2. Open a Pull Request with:
-   - A clear title and description
-   - Reference to any related issues
-   - Screenshots or logs if relevant
-3. Respond to any feedback from maintainers
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -1,55 +1,50 @@
 # FAQs

-### Where are the VMs stored?
+### Why a local sandbox?

-VMs are stored in `~/.lume`.
+A local sandbox is a dedicated environment that is isolated from the rest of the system. As AI agents rapidly evolve towards 70-80% success rates on average tasks, having a controlled and secure environment becomes crucial. Cua's Computer-Use AI agents run in a local sandbox to ensure reliability, safety, and controlled execution.

-### How are images cached?
+Benefits of using a local sandbox rather than running the Computer-Use AI agent in the host system:

-Images are cached in `~/.lume/cache`. When doing `lume pull <image>`, it will check if the image is already cached. If not, it will download the image and cache it, removing any older versions.
+- **Reliability**: The sandbox provides a reproducible environment - critical for benchmarking and debugging agent behavior. Frameworks like [OSWorld](https://github.com/xlang-ai/OSWorld), [Simular AI](https://github.com/simular-ai/Agent-S), Microsoft's [OmniTool](https://github.com/microsoft/OmniParser/tree/master/omnitool), [WindowsAgentArena](https://github.com/microsoft/WindowsAgentArena) and more are using Computer-Use AI agents running in local sandboxes.
+- **Safety & Isolation**: The sandbox is isolated from the rest of the system, protecting sensitive data and system resources. As CUA agent capabilities grow, this isolation becomes increasingly important for preventing potential safety breaches.
+- **Control**: The sandbox can be easily monitored and terminated if needed, providing oversight for autonomous agent operation.

-### Are VM disks taking up all the disk space?
+### Where are the sandbox images stored?
+
+Sandbox are stored in `~/.lume`, and cached images are stored in `~/.lume/cache`.
+
+### Which image is Computer using?
+
+Computer uses an optimized macOS image for Computer-Use interactions, with pre-installed apps and settings for optimal performance.
+The image is available on our [ghcr registry](https://github.com/orgs/trycua/packages/container/package/macos-sequoia-cua).
+
+### Are Sandbox disks taking up all the disk space?

 No, macOS uses sparse files, which only allocate space as needed. For example, VM disks totaling 50 GB may only use 20 GB on disk.

-### How do I get the latest macOS restore image URL?
-
-```bash
-lume ipsw
-```
-
 ### How do I delete a VM?

 ```bash
 lume delete <name>
 ```

-### How to Install macOS from an IPSW Image
+### How do I troubleshoot Computer not connecting to lume daemon?

-#### Create a new macOS VM using the latest supported IPSW image:
-Run the following command to create a new macOS virtual machine using the latest available IPSW image:
+If you're experiencing connection issues between Computer and the lume daemon, it could be because the port 3000 (used by lume) is already in use by an orphaned process. You can diagnose this issue with:

 ```bash
-lume create <name> --os macos --ipsw latest
+sudo lsof -i :3000
 ```

-#### Create a new macOS VM using a specific IPSW image:
-To create a macOS virtual machine from an older or specific IPSW file, first download the desired IPSW (UniversalMac) from a trusted source.
-
-Then, use the downloaded IPSW path:
+This command will show all processes using port 3000. If you see a lume process already running, you can terminate it with:

 ```bash
-lume create <name> --os macos --ipsw <downloaded_ipsw_path>
+kill <PID>
 ```

-### How do I install a custom Linux image?
+Where `<PID>` is the process ID shown in the output of the `lsof` command. After terminating the process, run `lume serve` again to start the lume daemon.

-The process for creating a custom Linux image differs than macOS, with IPSW restore files not being used. You need to create a linux VM first, then mount a setup image file to the VM for the first boot.
+### What information does Cua track?

-```bash
-lume create <name> --os linux
-
-lume run <name> --mount <path-to-setup-image>
-
-lume run <name>
-```
+Cua tracks anonymized usage and error report statistics; we ascribe to Posthog's approach as detailed [here](https://posthog.com/blog/open-source-telemetry-ethical). If you would like to opt out of sending anonymized info, you can set `telemetry_enabled` to false in the Computer or Agent constructor. Check out our [Telemetry](Telemetry.md) documentation for more details.
--- a/docs/Telemetry.md
+++ b/docs/Telemetry.md
@@ -0,0 +1,74 @@
+# Telemetry in CUA
+
+This document explains how telemetry works in CUA libraries and how you can control it.
+
+CUA tracks anonymized usage and error report statistics; we ascribe to Posthog's approach as detailed [here](https://posthog.com/blog/open-source-telemetry-ethical). If you would like to opt out of sending anonymized info, you can set `telemetry_enabled` to false.
+
+## What telemetry data we collect
+
+CUA libraries collect minimal anonymous usage data to help improve our software. The telemetry data we collect is specifically limited to:
+
+- Basic system information:
+  - Operating system (e.g., 'darwin', 'win32', 'linux')
+  - Python version (e.g., '3.10.0')
+- Module initialization events:
+  - When a module (like 'computer' or 'agent') is imported
+  - Version of the module being used
+
+We do NOT collect:
+- Personal information
+- Contents of files
+- Specific text being typed
+- Actual screenshots or screen contents
+- User-specific identifiers
+- API keys
+- File contents
+- Application data or content
+- User interactions with the computer
+- Information about files being accessed
+
+## Controlling Telemetry
+
+We are committed to transparency and user control over telemetry. There are two ways to control telemetry:
+
+## 1. Environment Variable (Global Control)
+
+Telemetry is enabled by default. To disable telemetry, set the `CUA_TELEMETRY_ENABLED` environment variable to a falsy value (`0`, `false`, `no`, or `off`):
+
+```bash
+# Disable telemetry before running your script
+export CUA_TELEMETRY_ENABLED=false
+
+# Or as part of the command
+CUA_TELEMETRY_ENABLED=1 python your_script.py
+
+```
+Or from Python:
+```python
+import os
+os.environ["CUA_TELEMETRY_ENABLED"] = "false"
+```
+
+## 2. Instance-Level Control
+
+You can control telemetry for specific CUA instances by setting `telemetry_enabled` when creating them:
+
+```python
+# Disable telemetry for a specific Computer instance
+computer = Computer(telemetry_enabled=False)
+
+# Enable telemetry for a specific Agent instance
+agent = ComputerAgent(telemetry_enabled=True)
+```
+
+You can check if telemetry is enabled for an instance:
+
+```python
+print(computer.telemetry_enabled)  # Will print True or False
+```
+
+Note that telemetry settings must be configured during initialization and cannot be changed after the object is created.
+
+## Transparency
+
+We believe in being transparent about the data we collect. If you have any questions about our telemetry practices, please open an issue on our GitHub repository.