Files
computer/docs/Developer-Guide.md
2025-03-19 23:28:38 +01:00

6.3 KiB

Developer Guide

Project Structure

The project is organized as a monorepo with these main packages:

  • libs/core/ - Base package with telemetry support
  • libs/computer/ - Computer-use interface (CUI) library
  • libs/agent/ - AI agent library with multi-provider support
  • libs/som/ - Set-of-Mark parser
  • libs/computer-server/ - Server component for VM
  • libs/lume/ - Lume CLI
  • libs/pylume/ - Python bindings for Lume

Each package has its own virtual environment and dependencies, managed through PDM.

Local Development Setup

  1. Install Lume CLI:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
  1. Clone the repository:
git clone https://github.com/trycua/cua.git
cd cua
  1. Create a .env.local file in the root directory with your API keys:
# Required for Anthropic provider
ANTHROPIC_API_KEY=your_anthropic_key_here

# Required for OpenAI provider
OPENAI_API_KEY=your_openai_key_here
  1. Run the build script to set up all packages:
./scripts/build.sh

This will:

  • Create a virtual environment for the project
  • Install all packages in development mode
  • Set up the correct Python path
  • Install development tools
  1. Open the workspace in VSCode or Cursor:
# For Cua Python development
code .vscode/py.code-workspace

# For Lume (Swift) development
code .vscode/lume.code-workspace

Using the workspace file is strongly recommended as it:

  • Sets up correct Python environments for each package
  • Configures proper import paths
  • Enables debugging configurations
  • Maintains consistent settings across packages

Docker Development Environment

As an alternative to running directly on your host machine, you can use Docker for development. This approach has several advantages:

  • Ensures consistent development environment across different machines
  • Isolates dependencies from your host system
  • Works well for cross-platform development
  • Avoids conflicts with existing Python installations

Prerequisites

  • Docker installed on your machine
  • Lume server running on your host (port 3000): lume serve

Setup and Usage

  1. Build the development Docker image:
./scripts/run-docker-dev.sh build
  1. Run an example in the container:
./scripts/run-docker-dev.sh run computer_examples.py
  1. Get an interactive shell in the container:
./scripts/run-docker-dev.sh run --interactive
  1. Stop any running containers:
./scripts/run-docker-dev.sh stop

How it Works

The Docker development environment:

  • Installs all required Python dependencies in the container
  • Mounts your source code from the host at runtime
  • Automatically configures the connection to use host.docker.internal:3000 for accessing the Lume server on your host machine
  • Preserves your code changes without requiring rebuilds (source code is mounted as a volume)

Note

: The Docker container doesn't include the macOS-specific Lume executable. Instead, it connects to the Lume server running on your host machine via host.docker.internal:3000. Make sure to start the Lume server on your host before running examples in the container.

Cleanup and Reset

If you need to clean up the environment (non-docker) and start fresh:

./scripts/cleanup.sh

This will:

  • Remove all virtual environments
  • Clean Python cache files and directories
  • Remove build artifacts
  • Clean PDM-related files
  • Reset environment configurations

Package Virtual Environments

The build script creates a shared virtual environment for all packages. The workspace configuration automatically handles import paths with the correct Python path settings.

Running Examples

The Python workspace includes launch configurations for all packages:

  • "Run Computer Examples" - Runs computer examples
  • "Run Computer API Server" - Runs the computer-server
  • "Run Omni Agent Examples" - Runs agent examples
  • "SOM" configurations - Various settings for running SOM

To run examples:

  1. Open the workspace file (.vscode/py.code-workspace)
  2. Press F5 or use the Run/Debug view
  3. Select the desired configuration

The workspace also includes compound launch configurations:

  • "Run Computer Examples + Server" - Runs both the Computer Examples and Server simultaneously

Release and Publishing Process

This monorepo contains multiple Python packages that can be published to PyPI. The packages have dependencies on each other in the following order:

  1. pylume - Base package for VM management
  2. cua-computer - Computer control interface (depends on pylume)
  3. cua-som - Parser for UI elements (independent, formerly omniparser)
  4. cua-agent - AI agent (depends on cua-computer and optionally cua-som)
  5. computer-server - Server component installed on the sandbox

Workflow Structure

The publishing process is managed by these GitHub workflow files:

  • Package-specific workflows:

    • .github/workflows/publish-pylume.yml
    • .github/workflows/publish-computer.yml
    • .github/workflows/publish-som.yml
    • .github/workflows/publish-agent.yml
    • .github/workflows/publish-computer-server.yml
  • Coordinator workflow:

    • .github/workflows/publish-all.yml - Manages global releases and manual selections

Version Management

Special Considerations for Pylume

The pylume package requires special handling as it incorporates the binary executable from the lume repository:

  • When releasing pylume, ensure the version matches a corresponding release in the lume repository
  • The workflow automatically downloads the matching lume binary and includes it in the pylume package
  • If you need to release a new version of pylume, make sure to coordinate with a matching lume release

Development Workspaces

This monorepo includes multiple VS Code workspace configurations to optimize the development experience based on which components you're working with:

Available Workspace Files

To open a specific workspace:

# For Python development
code .vscode/py.code-workspace

# For Lume (Swift) development
code .vscode/lume.code-workspace