4.7 KiB
Developer Guide
Project Structure
The project is organized as a monorepo with these main packages:
libs/core/- Base package with telemetry supportlibs/pylume/- Python bindings for Lumelibs/computer/- Core computer interaction librarylibs/agent/- AI agent library with multi-provider supportlibs/som/- Computer vision and NLP processing library (formerly omniparser)libs/computer-server/- Server implementation for computer controllibs/lume/- Swift implementation for enhanced macOS integration
Each package has its own virtual environment and dependencies, managed through PDM.
Local Development Setup
- Clone the repository:
git clone https://github.com/trycua/cua.git
cd cua
- Create a
.env.localfile in the root directory with your API keys:
# Required for Anthropic provider
ANTHROPIC_API_KEY=your_anthropic_key_here
# Required for OpenAI provider
OPENAI_API_KEY=your_openai_key_here
- Run the build script to set up all packages:
./scripts/build.sh
This will:
- Create a virtual environment for the project
- Install all packages in development mode
- Set up the correct Python path
- Install development tools
- Open the workspace in VSCode or Cursor:
# Using VSCode or Cursor
code .vscode/py.code-workspace
# For Lume (Swift) development
code .vscode/lume.code-workspace
Using the workspace file is strongly recommended as it:
- Sets up correct Python environments for each package
- Configures proper import paths
- Enables debugging configurations
- Maintains consistent settings across packages
Cleanup and Reset
If you need to clean up the environment and start fresh:
./scripts/cleanup.sh
This will:
- Remove all virtual environments
- Clean Python cache files and directories
- Remove build artifacts
- Clean PDM-related files
- Reset environment configurations
Package Virtual Environments
The build script creates a shared virtual environment for all packages. The workspace configuration automatically handles import paths with the correct Python path settings.
Running Examples
The Python workspace includes launch configurations for all packages:
- "Run Computer Examples" - Runs computer examples
- "Run Computer API Server" - Runs the computer-server
- "Run Omni Agent Examples" - Runs agent examples
- "SOM" configurations - Various settings for running SOM
To run examples:
- Open the workspace file (
.vscode/py.code-workspace) - Press F5 or use the Run/Debug view
- Select the desired configuration
The workspace also includes compound launch configurations:
- "Run Computer Examples + Server" - Runs both the Computer Examples and Server simultaneously
Release and Publishing Process
This monorepo contains multiple Python packages that can be published to PyPI. The packages have dependencies on each other in the following order:
pylume- Base package for VM managementcua-computer- Computer control interface (depends on pylume)cua-som- Parser for UI elements (independent, formerly omniparser)cua-agent- AI agent (depends on cua-computer and optionally cua-som)computer-server- Server component installed on the sandbox
Workflow Structure
The publishing process is managed by these GitHub workflow files:
-
Package-specific workflows:
.github/workflows/publish-pylume.yml.github/workflows/publish-computer.yml.github/workflows/publish-som.yml.github/workflows/publish-agent.yml.github/workflows/publish-computer-server.yml
-
Coordinator workflow:
.github/workflows/publish-all.yml- Manages global releases and manual selections
Version Management
Special Considerations for Pylume
The pylume package requires special handling as it incorporates the binary executable from the lume repository:
- When releasing
pylume, ensure the version matches a corresponding release in the lume repository - The workflow automatically downloads the matching lume binary and includes it in the pylume package
- If you need to release a new version of pylume, make sure to coordinate with a matching lume release
Development Workspaces
This monorepo includes multiple VS Code workspace configurations to optimize the development experience based on which components you're working with:
Available Workspace Files
- py.code-workspace: For Python package development (Computer, Agent, SOM, etc.)
- lume.code-workspace: For Swift-based Lume development
To open a specific workspace:
# For Python development
code .vscode/py.code-workspace
# For Lume (Swift) development
code .vscode/lume.code-workspace