Merge branch 'main' into feat/docs/init

This commit is contained in:
Morgan Dean
2025-07-02 14:10:42 -07:00
295 changed files with 11380 additions and 2864 deletions

View File

@@ -154,7 +154,7 @@ weights/icon_detect/model.pt
weights/icon_detect/model.pt.zip
weights/icon_detect/model.pt.zip.part*
libs/omniparser/weights/icon_detect/model.pt
libs/python/omniparser/weights/icon_detect/model.pt
# Example test data and output
examples/test_data/

66
.devcontainer/README.md Normal file
View File

@@ -0,0 +1,66 @@
# Dev Container Setup
This repository includes a Dev Container configuration that simplifies the development setup to just 3 steps:
## Quick Start
![Clipboard-20250611-180809-459](https://github.com/user-attachments/assets/447eaeeb-0eec-4354-9a82-44446e202e06)
1. **Install the Dev Containers extension ([VS Code](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) or [WindSurf](https://docs.windsurf.com/windsurf/advanced#dev-containers-beta))**
2. **Open the repository in the Dev Container:**
- Press `Ctrl+Shift+P` (or `⌘+Shift+P` on macOS)
- Select `Dev Containers: Clone Repository in Container Volume...` and paste the repository URL: `https://github.com/trycua/cua.git` (if not cloned) or `Dev Containers: Open Folder in Container...` (if git cloned).
> **Note**: On WindSurf, the post install hook might not run automatically. If so, run `/bin/bash .devcontainer/post-install.sh` manually.
3. **Open the VS Code workspace:** Once the post-install.sh is done running, open the `.vscode/py.code-workspace` workspace and press ![Open Workspace](https://github.com/user-attachments/assets/923bdd43-8c8f-4060-8d78-75bfa302b48c)
.
4. **Run the Agent UI example:** Click ![Run Agent UI](https://github.com/user-attachments/assets/7a61ef34-4b22-4dab-9864-f86bf83e290b)
to start the Gradio UI. If prompted to install **debugpy (Python Debugger)** to enable remote debugging, select 'Yes' to proceed.
5. **Access the Gradio UI:** The Gradio UI will be available at `http://localhost:7860` and will automatically forward to your host machine.
## What's Included
The dev container automatically:
- ✅ Sets up Python 3.11 environment
- ✅ Installs all system dependencies (build tools, OpenGL, etc.)
- ✅ Configures Python paths for all packages
- ✅ Installs Python extensions (Black, Ruff, Pylance)
- ✅ Forwards port 7860 for the Gradio web UI
- ✅ Mounts your source code for live editing
- ✅ Creates the required `.env.local` file
## Running Examples
After the container is built, you can run examples directly:
```bash
# Run the agent UI (Gradio web interface)
python examples/agent_ui_examples.py
# Run computer examples
python examples/computer_examples.py
# Run computer UI examples
python examples/computer_ui_examples.py
```
The Gradio UI will be available at `http://localhost:7860` and will automatically forward to your host machine.
## Environment Variables
You'll need to add your API keys to `.env.local`:
```bash
# Required for Anthropic provider
ANTHROPIC_API_KEY=your_anthropic_key_here
# Required for OpenAI provider
OPENAI_API_KEY=your_openai_key_here
```
## Notes
- The container connects to `host.docker.internal:7777` for Lume server communication
- All Python packages are pre-installed and configured
- Source code changes are reflected immediately (no rebuild needed)
- The container uses the same Dockerfile as the regular Docker development environment

View File

@@ -0,0 +1,18 @@
{
"name": "C/ua - OSS",
"build": {
"dockerfile": "../Dockerfile"
},
"containerEnv": {
"DISPLAY": "",
"PYLUME_HOST": "host.docker.internal"
},
"forwardPorts": [7860],
"portsAttributes": {
"7860": {
"label": "C/ua web client (Gradio)",
"onAutoForward": "silent"
}
},
"postCreateCommand": "/bin/bash .devcontainer/post-install.sh"
}

28
.devcontainer/post-install.sh Executable file
View File

@@ -0,0 +1,28 @@
#!/usr/bin/env bash
WORKSPACE="/workspaces/cua"
# Setup .env.local
echo "PYTHON_BIN=python" > /workspaces/cua/.env.local
# Run /scripts/build.sh
./scripts/build.sh
# ---
# Build is complete. Show user a clear message to open the workspace manually.
# ---
cat << 'EOM'
============================================
🚀 Build complete!
👉 Next steps:
1. Open '.vscode/py.code-workspace'
2. Press 'Open Workspace'
Happy coding!
============================================
EOM

2
.gitattributes vendored Normal file
View File

@@ -0,0 +1,2 @@
* text=auto
*.sh text eol=lf

View File

@@ -0,0 +1,50 @@
name: Publish @trycua/computer to npm
on:
push:
branches: main
jobs:
publish:
permissions:
id-token: write
contents: read
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Use Node.js 24.x
uses: actions/setup-node@v4
with:
node-version: "24.x"
registry-url: "https://registry.npmjs.org"
- name: Setup pnpm 10
uses: pnpm/action-setup@v4
with:
version: 10
- name: Check if version changed
id: check-version
uses: EndBug/version-check@v2
with:
file-name: libs/typescript/computer/package.json
diff-search: true
- name: Install dependencies
if: steps.check-version.outputs.changed == 'true'
working-directory: ./libs/typescript/computer
run: pnpm install --frozen-lockfile
- name: Build package
if: steps.check-version.outputs.changed == 'true'
working-directory: ./libs/typescript/computer
run: pnpm run build --if-present
- name: Publish to npm
if: steps.check-version.outputs.changed == 'true'
working-directory: ./libs/typescript/computer
run: pnpm publish --access public --no-git-checks
env:
NPM_CONFIG_PROVENANCE: true
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}

50
.github/workflows/npm-publish-core.yml vendored Normal file
View File

@@ -0,0 +1,50 @@
name: Publish @trycua/core to npm
on:
push:
branches: main
jobs:
publish:
permissions:
id-token: write
contents: read
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Use Node.js 24.x
uses: actions/setup-node@v4
with:
node-version: "24.x"
registry-url: "https://registry.npmjs.org"
- name: Setup pnpm 10
uses: pnpm/action-setup@v4
with:
version: 10
- name: Check if version changed
id: check-version
uses: EndBug/version-check@v2
with:
file-name: libs/typescript/core/package.json
diff-search: true
- name: Install dependencies
if: steps.check-version.outputs.changed == 'true'
working-directory: ./libs/typescript/core
run: pnpm install --frozen-lockfile
- name: Build package
if: steps.check-version.outputs.changed == 'true'
working-directory: ./libs/typescript/core
run: pnpm run build --if-present
- name: Publish to npm
if: steps.check-version.outputs.changed == 'true'
working-directory: ./libs/typescript/core
run: pnpm publish --access public --no-git-checks
env:
NPM_CONFIG_PROVENANCE: true
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}

View File

@@ -1,162 +0,0 @@
name: Publish Agent Package
on:
push:
tags:
- 'agent-v*'
workflow_dispatch:
inputs:
version:
description: 'Version to publish (without v prefix)'
required: true
default: '0.1.0'
workflow_call:
inputs:
version:
description: 'Version to publish'
required: true
type: string
# Adding permissions at workflow level
permissions:
contents: write
jobs:
prepare:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
computer_version: ${{ steps.update-deps.outputs.computer_version }}
som_version: ${{ steps.update-deps.outputs.som_version }}
core_version: ${{ steps.update-deps.outputs.core_version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/agent-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for agent"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Update dependencies to latest versions
id: update-deps
run: |
cd libs/agent
# Install required package for PyPI API access
pip install requests
# Create a more robust Python script for PyPI version checking
cat > get_latest_versions.py << 'EOF'
import requests
import json
import sys
def get_package_version(package_name, fallback="0.1.0"):
try:
response = requests.get(f'https://pypi.org/pypi/{package_name}/json')
print(f"API Response Status for {package_name}: {response.status_code}", file=sys.stderr)
if response.status_code != 200:
print(f"API request failed for {package_name}, using fallback version", file=sys.stderr)
return fallback
data = json.loads(response.text)
if 'info' not in data:
print(f"Missing 'info' key in API response for {package_name}, using fallback version", file=sys.stderr)
return fallback
return data['info']['version']
except Exception as e:
print(f"Error fetching version for {package_name}: {str(e)}", file=sys.stderr)
return fallback
# Get latest versions
print(get_package_version('cua-computer'))
print(get_package_version('cua-som'))
print(get_package_version('cua-core'))
EOF
# Execute the script to get the versions
VERSIONS=($(python get_latest_versions.py))
LATEST_COMPUTER=${VERSIONS[0]}
LATEST_SOM=${VERSIONS[1]}
LATEST_CORE=${VERSIONS[2]}
echo "Latest cua-computer version: $LATEST_COMPUTER"
echo "Latest cua-som version: $LATEST_SOM"
echo "Latest cua-core version: $LATEST_CORE"
# Output the versions for the next job
echo "computer_version=$LATEST_COMPUTER" >> $GITHUB_OUTPUT
echo "som_version=$LATEST_SOM" >> $GITHUB_OUTPUT
echo "core_version=$LATEST_CORE" >> $GITHUB_OUTPUT
# Determine major version for version constraint
COMPUTER_MAJOR=$(echo $LATEST_COMPUTER | cut -d. -f1)
SOM_MAJOR=$(echo $LATEST_SOM | cut -d. -f1)
CORE_MAJOR=$(echo $LATEST_CORE | cut -d. -f1)
NEXT_COMPUTER_MAJOR=$((COMPUTER_MAJOR + 1))
NEXT_SOM_MAJOR=$((SOM_MAJOR + 1))
NEXT_CORE_MAJOR=$((CORE_MAJOR + 1))
# Update dependencies in pyproject.toml
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS version of sed needs an empty string for -i
sed -i '' "s/\"cua-computer>=.*,<.*\"/\"cua-computer>=$LATEST_COMPUTER,<$NEXT_COMPUTER_MAJOR.0.0\"/" pyproject.toml
sed -i '' "s/\"cua-som>=.*,<.*\"/\"cua-som>=$LATEST_SOM,<$NEXT_SOM_MAJOR.0.0\"/" pyproject.toml
sed -i '' "s/\"cua-core>=.*,<.*\"/\"cua-core>=$LATEST_CORE,<$NEXT_CORE_MAJOR.0.0\"/" pyproject.toml
else
# Linux version
sed -i "s/\"cua-computer>=.*,<.*\"/\"cua-computer>=$LATEST_COMPUTER,<$NEXT_COMPUTER_MAJOR.0.0\"/" pyproject.toml
sed -i "s/\"cua-som>=.*,<.*\"/\"cua-som>=$LATEST_SOM,<$NEXT_SOM_MAJOR.0.0\"/" pyproject.toml
sed -i "s/\"cua-core>=.*,<.*\"/\"cua-core>=$LATEST_CORE,<$NEXT_CORE_MAJOR.0.0\"/" pyproject.toml
fi
# Display the updated dependencies
echo "Updated dependencies in pyproject.toml:"
grep -E "cua-computer|cua-som|cua-core" pyproject.toml
publish:
needs: prepare
uses: ./.github/workflows/reusable-publish.yml
with:
package_name: "agent"
package_dir: "libs/agent"
version: ${{ needs.prepare.outputs.version }}
is_lume_package: false
base_package_name: "cua-agent"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
set-env-variables:
needs: [prepare, publish]
runs-on: macos-latest
steps:
- name: Set environment variables for use in other jobs
run: |
echo "COMPUTER_VERSION=${{ needs.prepare.outputs.computer_version }}" >> $GITHUB_ENV
echo "SOM_VERSION=${{ needs.prepare.outputs.som_version }}" >> $GITHUB_ENV
echo "CORE_VERSION=${{ needs.prepare.outputs.core_version }}" >> $GITHUB_ENV

View File

@@ -1,80 +0,0 @@
name: Publish Computer Server Package
on:
push:
tags:
- 'computer-server-v*'
workflow_dispatch:
inputs:
version:
description: 'Version to publish (without v prefix)'
required: true
default: '0.1.0'
workflow_call:
inputs:
version:
description: 'Version to publish'
required: true
type: string
outputs:
version:
description: "The version that was published"
value: ${{ jobs.prepare.outputs.version }}
# Adding permissions at workflow level
permissions:
contents: write
jobs:
prepare:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/computer-server-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for computer-server"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
publish:
needs: prepare
uses: ./.github/workflows/reusable-publish.yml
with:
package_name: "computer-server"
package_dir: "libs/computer-server"
version: ${{ needs.prepare.outputs.version }}
is_lume_package: false
base_package_name: "cua-computer-server"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
set-env-variables:
needs: [prepare, publish]
runs-on: macos-latest
steps:
- name: Set environment variables for use in other jobs
run: |
echo "COMPUTER_VERSION=${{ needs.prepare.outputs.version }}" >> $GITHUB_ENV

View File

@@ -1,140 +0,0 @@
name: Publish Computer Package
on:
push:
tags:
- 'computer-v*'
workflow_dispatch:
inputs:
version:
description: 'Version to publish (without v prefix)'
required: true
default: '0.1.0'
workflow_call:
inputs:
version:
description: 'Version to publish'
required: true
type: string
# Adding permissions at workflow level
permissions:
contents: write
jobs:
prepare:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
core_version: ${{ steps.update-deps.outputs.core_version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/computer-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for computer"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Update dependencies to latest versions
id: update-deps
run: |
cd libs/computer
# Install required package for PyPI API access
pip install requests
# Create a more robust Python script for PyPI version checking
cat > get_latest_versions.py << 'EOF'
import requests
import json
import sys
def get_package_version(package_name, fallback="0.1.0"):
try:
response = requests.get(f'https://pypi.org/pypi/{package_name}/json')
print(f"API Response Status for {package_name}: {response.status_code}", file=sys.stderr)
if response.status_code != 200:
print(f"API request failed for {package_name}, using fallback version", file=sys.stderr)
return fallback
data = json.loads(response.text)
if 'info' not in data:
print(f"Missing 'info' key in API response for {package_name}, using fallback version", file=sys.stderr)
return fallback
return data['info']['version']
except Exception as e:
print(f"Error fetching version for {package_name}: {str(e)}", file=sys.stderr)
return fallback
# Get latest versions
print(get_package_version('cua-core'))
EOF
# Execute the script to get the versions
VERSIONS=($(python get_latest_versions.py))
LATEST_CORE=${VERSIONS[0]}
echo "Latest cua-core version: $LATEST_CORE"
# Output the versions for the next job
echo "core_version=$LATEST_CORE" >> $GITHUB_OUTPUT
# Determine major version for version constraint
CORE_MAJOR=$(echo $LATEST_CORE | cut -d. -f1)
NEXT_CORE_MAJOR=$((CORE_MAJOR + 1))
# Update dependencies in pyproject.toml
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS version of sed needs an empty string for -i
sed -i '' "s/\"cua-core>=.*,<.*\"/\"cua-core>=$LATEST_CORE,<$NEXT_CORE_MAJOR.0.0\"/" pyproject.toml
else
# Linux version
sed -i "s/\"cua-core>=.*,<.*\"/\"cua-core>=$LATEST_CORE,<$NEXT_CORE_MAJOR.0.0\"/" pyproject.toml
fi
# Display the updated dependencies
echo "Updated dependencies in pyproject.toml:"
grep -E "cua-core" pyproject.toml
publish:
needs: prepare
uses: ./.github/workflows/reusable-publish.yml
with:
package_name: "computer"
package_dir: "libs/computer"
version: ${{ needs.prepare.outputs.version }}
is_lume_package: false
base_package_name: "cua-computer"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
set-env-variables:
needs: [prepare, publish]
runs-on: macos-latest
steps:
- name: Set environment variables for use in other jobs
run: |
echo "CORE_VERSION=${{ needs.prepare.outputs.core_version }}" >> $GITHUB_ENV

View File

@@ -1,63 +0,0 @@
name: Publish Core Package
on:
push:
tags:
- 'core-v*'
workflow_dispatch:
inputs:
version:
description: 'Version to publish (without v prefix)'
required: true
default: '0.1.0'
workflow_call:
inputs:
version:
description: 'Version to publish'
required: true
type: string
# Adding permissions at workflow level
permissions:
contents: write
jobs:
prepare:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/core-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for core"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
publish:
needs: prepare
uses: ./.github/workflows/reusable-publish.yml
with:
package_name: "core"
package_dir: "libs/core"
version: ${{ needs.prepare.outputs.version }}
is_lume_package: false
base_package_name: "cua-core"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}

View File

@@ -3,17 +3,17 @@ name: Publish Notarized Lume
on:
push:
tags:
- 'lume-v*'
- "lume-v*"
workflow_dispatch:
inputs:
version:
description: 'Version to notarize (without v prefix)'
description: "Version to notarize (without v prefix)"
required: true
default: '0.1.0'
default: "0.1.0"
workflow_call:
inputs:
version:
description: 'Version to notarize'
description: "Version to notarize"
required: true
type: string
secrets:
@@ -64,7 +64,7 @@ jobs:
- name: Create .release directory
run: mkdir -p .release
- name: Set version
id: set_version
run: |
@@ -82,11 +82,11 @@ jobs:
echo "Error: No version found in tag or input"
exit 1
fi
# Update version in Main.swift
echo "Updating version in Main.swift to $VERSION"
sed -i '' "s/static let current: String = \".*\"/static let current: String = \"$VERSION\"/" libs/lume/src/Main.swift
# Set output for later steps
echo "version=$VERSION" >> $GITHUB_OUTPUT
@@ -106,18 +106,34 @@ jobs:
# Import certificates
echo $APPLICATION_CERT_BASE64 | base64 --decode > application.p12
echo $INSTALLER_CERT_BASE64 | base64 --decode > installer.p12
# Import certificates silently (minimize output)
security import application.p12 -k build.keychain -P "$CERT_PASSWORD" -T /usr/bin/codesign -T /usr/bin/pkgbuild > /dev/null 2>&1
security import installer.p12 -k build.keychain -P "$CERT_PASSWORD" -T /usr/bin/codesign -T /usr/bin/pkgbuild > /dev/null 2>&1
# Allow codesign to access the certificates (minimal output)
security set-key-partition-list -S apple-tool:,apple:,codesign: -s -k "$KEYCHAIN_PASSWORD" build.keychain > /dev/null 2>&1
# Verify certificates were imported but only show count, not details
echo "Verifying signing identity (showing count only)..."
security find-identity -v -p codesigning | grep -c "valid identities found" || true
# Verify certificates were imported
echo "Verifying signing identities..."
CERT_COUNT=$(security find-identity -v -p codesigning build.keychain | grep -c "Developer ID Application" || echo "0")
INSTALLER_COUNT=$(security find-identity -v build.keychain | grep -c "Developer ID Installer" || echo "0")
if [ "$CERT_COUNT" -eq 0 ]; then
echo "Error: No Developer ID Application certificate found"
security find-identity -v -p codesigning build.keychain
exit 1
fi
if [ "$INSTALLER_COUNT" -eq 0 ]; then
echo "Error: No Developer ID Installer certificate found"
security find-identity -v build.keychain
exit 1
fi
echo "Found $CERT_COUNT Developer ID Application certificate(s) and $INSTALLER_COUNT Developer ID Installer certificate(s)"
echo "All required certificates verified successfully"
# Clean up certificate files
rm application.p12 installer.p12
@@ -137,32 +153,32 @@ jobs:
echo "Starting build process..."
echo "Swift version: $(swift --version | head -n 1)"
echo "Building version: $VERSION"
# Ensure .release directory exists
mkdir -p .release
chmod 755 .release
# Build the project first (redirect verbose output)
echo "Building project..."
swift build --configuration release > build.log 2>&1
echo "Build completed."
# Run the notarization script with LOG_LEVEL env var
chmod +x scripts/build/build-release-notarized.sh
cd scripts/build
LOG_LEVEL=minimal ./build-release-notarized.sh
# Return to the lume directory
cd ../..
# Debug: List what files were actually created
echo "Files in .release directory:"
find .release -type f -name "*.tar.gz" -o -name "*.pkg.tar.gz"
# Get architecture for output filename
ARCH=$(uname -m)
OS_IDENTIFIER="darwin-${ARCH}"
# Output paths for later use
echo "tarball_path=.release/lume-${VERSION}-${OS_IDENTIFIER}.tar.gz" >> $GITHUB_OUTPUT
echo "pkg_path=.release/lume-${VERSION}-${OS_IDENTIFIER}.pkg.tar.gz" >> $GITHUB_OUTPUT
@@ -181,12 +197,12 @@ jobs:
shasum -a 256 lume-*.tar.gz >> checksums.txt
echo '```' >> checksums.txt
fi
checksums=$(cat checksums.txt)
echo "checksums<<EOF" >> $GITHUB_OUTPUT
echo "$checksums" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
# Debug: Show all files in the release directory
echo "All files in release directory:"
ls -la
@@ -197,15 +213,15 @@ jobs:
VERSION=${{ steps.set_version.outputs.version }}
ARCH=$(uname -m)
OS_IDENTIFIER="darwin-${ARCH}"
# Create OS-tagged symlinks
ln -sf "lume-${VERSION}-${OS_IDENTIFIER}.tar.gz" "lume-darwin.tar.gz"
ln -sf "lume-${VERSION}-${OS_IDENTIFIER}.pkg.tar.gz" "lume-darwin.pkg.tar.gz"
# Create simple symlinks
ln -sf "lume-${VERSION}-${OS_IDENTIFIER}.tar.gz" "lume.tar.gz"
ln -sf "lume-${VERSION}-${OS_IDENTIFIER}.pkg.tar.gz" "lume.pkg.tar.gz"
# List all files (including symlinks)
echo "Files with symlinks in release directory:"
ls -la
@@ -237,10 +253,10 @@ jobs:
./libs/lume/.release/lume.pkg.tar.gz
body: |
${{ steps.generate_checksums.outputs.checksums }}
### Installation with script
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
```
generate_release_notes: true
make_latest: true
make_latest: true

View File

@@ -1,157 +0,0 @@
name: Publish MCP Server Package
on:
push:
tags:
- 'mcp-server-v*'
workflow_dispatch:
inputs:
version:
description: 'Version to publish (without v prefix)'
required: true
default: '0.1.0'
workflow_call:
inputs:
version:
description: 'Version to publish'
required: true
type: string
outputs:
version:
description: "The version that was published"
value: ${{ jobs.prepare.outputs.version }}
# Adding permissions at workflow level
permissions:
contents: write
jobs:
prepare:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
agent_version: ${{ steps.update-deps.outputs.agent_version }}
computer_version: ${{ steps.update-deps.outputs.computer_version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/mcp-server-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for mcp-server"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Update dependencies to latest versions
id: update-deps
run: |
cd libs/mcp-server
# Install required package for PyPI API access
pip install requests
# Create a Python script for PyPI version checking
cat > get_latest_versions.py << 'EOF'
import requests
import json
import sys
def get_package_version(package_name, fallback="0.1.0"):
try:
response = requests.get(f'https://pypi.org/pypi/{package_name}/json')
print(f"API Response Status for {package_name}: {response.status_code}", file=sys.stderr)
if response.status_code != 200:
print(f"API request failed for {package_name}, using fallback version", file=sys.stderr)
return fallback
data = json.loads(response.text)
if 'info' not in data:
print(f"Missing 'info' key in API response for {package_name}, using fallback version", file=sys.stderr)
return fallback
return data['info']['version']
except Exception as e:
print(f"Error fetching version for {package_name}: {str(e)}", file=sys.stderr)
return fallback
# Get latest versions
print(get_package_version('cua-agent'))
print(get_package_version('cua-computer'))
EOF
# Execute the script to get the versions
VERSIONS=($(python get_latest_versions.py))
LATEST_AGENT=${VERSIONS[0]}
LATEST_COMPUTER=${VERSIONS[1]}
echo "Latest cua-agent version: $LATEST_AGENT"
echo "Latest cua-computer version: $LATEST_COMPUTER"
# Output the versions for the next job
echo "agent_version=$LATEST_AGENT" >> $GITHUB_OUTPUT
echo "computer_version=$LATEST_COMPUTER" >> $GITHUB_OUTPUT
# Determine major version for version constraint
AGENT_MAJOR=$(echo $LATEST_AGENT | cut -d. -f1)
COMPUTER_MAJOR=$(echo $LATEST_COMPUTER | cut -d. -f1)
NEXT_AGENT_MAJOR=$((AGENT_MAJOR + 1))
NEXT_COMPUTER_MAJOR=$((COMPUTER_MAJOR + 1))
# Update dependencies in pyproject.toml
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS version of sed needs an empty string for -i
# Update cua-agent with all extras
sed -i '' "s/\"cua-agent\[all\]>=.*,<.*\"/\"cua-agent[all]>=$LATEST_AGENT,<$NEXT_AGENT_MAJOR.0.0\"/" pyproject.toml
sed -i '' "s/\"cua-computer>=.*,<.*\"/\"cua-computer>=$LATEST_COMPUTER,<$NEXT_COMPUTER_MAJOR.0.0\"/" pyproject.toml
else
# Linux version
sed -i "s/\"cua-agent\[all\]>=.*,<.*\"/\"cua-agent[all]>=$LATEST_AGENT,<$NEXT_AGENT_MAJOR.0.0\"/" pyproject.toml
sed -i "s/\"cua-computer>=.*,<.*\"/\"cua-computer>=$LATEST_COMPUTER,<$NEXT_COMPUTER_MAJOR.0.0\"/" pyproject.toml
fi
# Display the updated dependencies
echo "Updated dependencies in pyproject.toml:"
grep -E "cua-agent|cua-computer" pyproject.toml
publish:
needs: prepare
uses: ./.github/workflows/reusable-publish.yml
with:
package_name: "mcp-server"
package_dir: "libs/mcp-server"
version: ${{ needs.prepare.outputs.version }}
is_lume_package: false
base_package_name: "cua-mcp-server"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
set-env-variables:
needs: [prepare, publish]
runs-on: macos-latest
steps:
- name: Set environment variables for use in other jobs
run: |
echo "AGENT_VERSION=${{ needs.prepare.outputs.agent_version }}" >> $GITHUB_ENV
echo "COMPUTER_VERSION=${{ needs.prepare.outputs.computer_version }}" >> $GITHUB_ENV

View File

@@ -1,82 +0,0 @@
name: Publish Pylume Package
on:
push:
tags:
- 'pylume-v*'
workflow_dispatch:
inputs:
version:
description: 'Version to publish (without v prefix)'
required: true
default: '0.1.0'
workflow_call:
inputs:
version:
description: 'Version to publish'
required: true
type: string
outputs:
version:
description: "The version that was published"
value: ${{ jobs.determine-version.outputs.version }}
# Adding permissions at workflow level
permissions:
contents: write
jobs:
determine-version:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/pylume-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for pylume"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
validate-version:
runs-on: macos-latest
needs: determine-version
steps:
- uses: actions/checkout@v4
- name: Validate version
id: validate-version
run: |
CODE_VERSION=$(grep '__version__' libs/pylume/pylume/__init__.py | cut -d'"' -f2)
if [ "${{ needs.determine-version.outputs.version }}" != "$CODE_VERSION" ]; then
echo "Version mismatch: expected $CODE_VERSION, got ${{ needs.determine-version.outputs.version }}"
exit 1
fi
echo "Version validated: $CODE_VERSION"
publish:
needs: determine-version
uses: ./.github/workflows/reusable-publish.yml
with:
package_name: "pylume"
package_dir: "libs/pylume"
version: ${{ needs.determine-version.outputs.version }}
is_lume_package: true
base_package_name: "pylume"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}

View File

@@ -1,67 +0,0 @@
name: Publish SOM Package
on:
push:
tags:
- 'som-v*'
workflow_dispatch:
inputs:
version:
description: 'Version to publish (without v prefix)'
required: true
default: '0.1.0'
workflow_call:
inputs:
version:
description: 'Version to publish'
required: true
type: string
outputs:
version:
description: "The version that was published"
value: ${{ jobs.determine-version.outputs.version }}
# Adding permissions at workflow level
permissions:
contents: write
jobs:
determine-version:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/som-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for som"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
publish:
needs: determine-version
uses: ./.github/workflows/reusable-publish.yml
with:
package_name: "som"
package_dir: "libs/som"
version: ${{ needs.determine-version.outputs.version }}
is_lume_package: false
base_package_name: "cua-som"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}

162
.github/workflows/pypi-publish-agent.yml vendored Normal file
View File

@@ -0,0 +1,162 @@
name: Publish Agent Package
on:
push:
tags:
- "agent-v*"
workflow_dispatch:
inputs:
version:
description: "Version to publish (without v prefix)"
required: true
default: "0.1.0"
workflow_call:
inputs:
version:
description: "Version to publish"
required: true
type: string
# Adding permissions at workflow level
permissions:
contents: write
jobs:
prepare:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
computer_version: ${{ steps.update-deps.outputs.computer_version }}
som_version: ${{ steps.update-deps.outputs.som_version }}
core_version: ${{ steps.update-deps.outputs.core_version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/agent-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for agent"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Update dependencies to latest versions
id: update-deps
run: |
cd libs/python/agent
# Install required package for PyPI API access
pip install requests
# Create a more robust Python script for PyPI version checking
cat > get_latest_versions.py << 'EOF'
import requests
import json
import sys
def get_package_version(package_name, fallback="0.1.0"):
try:
response = requests.get(f'https://pypi.org/pypi/{package_name}/json')
print(f"API Response Status for {package_name}: {response.status_code}", file=sys.stderr)
if response.status_code != 200:
print(f"API request failed for {package_name}, using fallback version", file=sys.stderr)
return fallback
data = json.loads(response.text)
if 'info' not in data:
print(f"Missing 'info' key in API response for {package_name}, using fallback version", file=sys.stderr)
return fallback
return data['info']['version']
except Exception as e:
print(f"Error fetching version for {package_name}: {str(e)}", file=sys.stderr)
return fallback
# Get latest versions
print(get_package_version('cua-computer'))
print(get_package_version('cua-som'))
print(get_package_version('cua-core'))
EOF
# Execute the script to get the versions
VERSIONS=($(python get_latest_versions.py))
LATEST_COMPUTER=${VERSIONS[0]}
LATEST_SOM=${VERSIONS[1]}
LATEST_CORE=${VERSIONS[2]}
echo "Latest cua-computer version: $LATEST_COMPUTER"
echo "Latest cua-som version: $LATEST_SOM"
echo "Latest cua-core version: $LATEST_CORE"
# Output the versions for the next job
echo "computer_version=$LATEST_COMPUTER" >> $GITHUB_OUTPUT
echo "som_version=$LATEST_SOM" >> $GITHUB_OUTPUT
echo "core_version=$LATEST_CORE" >> $GITHUB_OUTPUT
# Determine major version for version constraint
COMPUTER_MAJOR=$(echo $LATEST_COMPUTER | cut -d. -f1)
SOM_MAJOR=$(echo $LATEST_SOM | cut -d. -f1)
CORE_MAJOR=$(echo $LATEST_CORE | cut -d. -f1)
NEXT_COMPUTER_MAJOR=$((COMPUTER_MAJOR + 1))
NEXT_SOM_MAJOR=$((SOM_MAJOR + 1))
NEXT_CORE_MAJOR=$((CORE_MAJOR + 1))
# Update dependencies in pyproject.toml
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS version of sed needs an empty string for -i
sed -i '' "s/\"cua-computer>=.*,<.*\"/\"cua-computer>=$LATEST_COMPUTER,<$NEXT_COMPUTER_MAJOR.0.0\"/" pyproject.toml
sed -i '' "s/\"cua-som>=.*,<.*\"/\"cua-som>=$LATEST_SOM,<$NEXT_SOM_MAJOR.0.0\"/" pyproject.toml
sed -i '' "s/\"cua-core>=.*,<.*\"/\"cua-core>=$LATEST_CORE,<$NEXT_CORE_MAJOR.0.0\"/" pyproject.toml
else
# Linux version
sed -i "s/\"cua-computer>=.*,<.*\"/\"cua-computer>=$LATEST_COMPUTER,<$NEXT_COMPUTER_MAJOR.0.0\"/" pyproject.toml
sed -i "s/\"cua-som>=.*,<.*\"/\"cua-som>=$LATEST_SOM,<$NEXT_SOM_MAJOR.0.0\"/" pyproject.toml
sed -i "s/\"cua-core>=.*,<.*\"/\"cua-core>=$LATEST_CORE,<$NEXT_CORE_MAJOR.0.0\"/" pyproject.toml
fi
# Display the updated dependencies
echo "Updated dependencies in pyproject.toml:"
grep -E "cua-computer|cua-som|cua-core" pyproject.toml
publish:
needs: prepare
uses: ./.github/workflows/pypi-reusable-publish.yml
with:
package_name: "agent"
package_dir: "libs/python/agent"
version: ${{ needs.prepare.outputs.version }}
is_lume_package: false
base_package_name: "cua-agent"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
set-env-variables:
needs: [prepare, publish]
runs-on: macos-latest
steps:
- name: Set environment variables for use in other jobs
run: |
echo "COMPUTER_VERSION=${{ needs.prepare.outputs.computer_version }}" >> $GITHUB_ENV
echo "SOM_VERSION=${{ needs.prepare.outputs.som_version }}" >> $GITHUB_ENV
echo "CORE_VERSION=${{ needs.prepare.outputs.core_version }}" >> $GITHUB_ENV

View File

@@ -0,0 +1,80 @@
name: Publish Computer Server Package
on:
push:
tags:
- "computer-server-v*"
workflow_dispatch:
inputs:
version:
description: "Version to publish (without v prefix)"
required: true
default: "0.1.0"
workflow_call:
inputs:
version:
description: "Version to publish"
required: true
type: string
outputs:
version:
description: "The version that was published"
value: ${{ jobs.prepare.outputs.version }}
# Adding permissions at workflow level
permissions:
contents: write
jobs:
prepare:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/computer-server-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for computer-server"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
publish:
needs: prepare
uses: ./.github/workflows/pypi-reusable-publish.yml
with:
package_name: "computer-server"
package_dir: "libs/python/computer-server"
version: ${{ needs.prepare.outputs.version }}
is_lume_package: false
base_package_name: "cua-computer-server"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
set-env-variables:
needs: [prepare, publish]
runs-on: macos-latest
steps:
- name: Set environment variables for use in other jobs
run: |
echo "COMPUTER_VERSION=${{ needs.prepare.outputs.version }}" >> $GITHUB_ENV

View File

@@ -0,0 +1,140 @@
name: Publish Computer Package
on:
push:
tags:
- "computer-v*"
workflow_dispatch:
inputs:
version:
description: "Version to publish (without v prefix)"
required: true
default: "0.1.0"
workflow_call:
inputs:
version:
description: "Version to publish"
required: true
type: string
# Adding permissions at workflow level
permissions:
contents: write
jobs:
prepare:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
core_version: ${{ steps.update-deps.outputs.core_version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/computer-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for computer"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Update dependencies to latest versions
id: update-deps
run: |
cd libs/python/computer
# Install required package for PyPI API access
pip install requests
# Create a more robust Python script for PyPI version checking
cat > get_latest_versions.py << 'EOF'
import requests
import json
import sys
def get_package_version(package_name, fallback="0.1.0"):
try:
response = requests.get(f'https://pypi.org/pypi/{package_name}/json')
print(f"API Response Status for {package_name}: {response.status_code}", file=sys.stderr)
if response.status_code != 200:
print(f"API request failed for {package_name}, using fallback version", file=sys.stderr)
return fallback
data = json.loads(response.text)
if 'info' not in data:
print(f"Missing 'info' key in API response for {package_name}, using fallback version", file=sys.stderr)
return fallback
return data['info']['version']
except Exception as e:
print(f"Error fetching version for {package_name}: {str(e)}", file=sys.stderr)
return fallback
# Get latest versions
print(get_package_version('cua-core'))
EOF
# Execute the script to get the versions
VERSIONS=($(python get_latest_versions.py))
LATEST_CORE=${VERSIONS[0]}
echo "Latest cua-core version: $LATEST_CORE"
# Output the versions for the next job
echo "core_version=$LATEST_CORE" >> $GITHUB_OUTPUT
# Determine major version for version constraint
CORE_MAJOR=$(echo $LATEST_CORE | cut -d. -f1)
NEXT_CORE_MAJOR=$((CORE_MAJOR + 1))
# Update dependencies in pyproject.toml
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS version of sed needs an empty string for -i
sed -i '' "s/\"cua-core>=.*,<.*\"/\"cua-core>=$LATEST_CORE,<$NEXT_CORE_MAJOR.0.0\"/" pyproject.toml
else
# Linux version
sed -i "s/\"cua-core>=.*,<.*\"/\"cua-core>=$LATEST_CORE,<$NEXT_CORE_MAJOR.0.0\"/" pyproject.toml
fi
# Display the updated dependencies
echo "Updated dependencies in pyproject.toml:"
grep -E "cua-core" pyproject.toml
publish:
needs: prepare
uses: ./.github/workflows/pypi-reusable-publish.yml
with:
package_name: "computer"
package_dir: "libs/python/computer"
version: ${{ needs.prepare.outputs.version }}
is_lume_package: false
base_package_name: "cua-computer"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
set-env-variables:
needs: [prepare, publish]
runs-on: macos-latest
steps:
- name: Set environment variables for use in other jobs
run: |
echo "CORE_VERSION=${{ needs.prepare.outputs.core_version }}" >> $GITHUB_ENV

63
.github/workflows/pypi-publish-core.yml vendored Normal file
View File

@@ -0,0 +1,63 @@
name: Publish Core Package
on:
push:
tags:
- "core-v*"
workflow_dispatch:
inputs:
version:
description: "Version to publish (without v prefix)"
required: true
default: "0.1.0"
workflow_call:
inputs:
version:
description: "Version to publish"
required: true
type: string
# Adding permissions at workflow level
permissions:
contents: write
jobs:
prepare:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/core-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for core"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
publish:
needs: prepare
uses: ./.github/workflows/pypi-reusable-publish.yml
with:
package_name: "core"
package_dir: "libs/python/core"
version: ${{ needs.prepare.outputs.version }}
is_lume_package: false
base_package_name: "cua-core"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}

View File

@@ -0,0 +1,157 @@
name: Publish MCP Server Package
on:
push:
tags:
- "mcp-server-v*"
workflow_dispatch:
inputs:
version:
description: "Version to publish (without v prefix)"
required: true
default: "0.1.0"
workflow_call:
inputs:
version:
description: "Version to publish"
required: true
type: string
outputs:
version:
description: "The version that was published"
value: ${{ jobs.prepare.outputs.version }}
# Adding permissions at workflow level
permissions:
contents: write
jobs:
prepare:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
agent_version: ${{ steps.update-deps.outputs.agent_version }}
computer_version: ${{ steps.update-deps.outputs.computer_version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/mcp-server-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for mcp-server"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Update dependencies to latest versions
id: update-deps
run: |
cd libs/python/mcp-server
# Install required package for PyPI API access
pip install requests
# Create a Python script for PyPI version checking
cat > get_latest_versions.py << 'EOF'
import requests
import json
import sys
def get_package_version(package_name, fallback="0.1.0"):
try:
response = requests.get(f'https://pypi.org/pypi/{package_name}/json')
print(f"API Response Status for {package_name}: {response.status_code}", file=sys.stderr)
if response.status_code != 200:
print(f"API request failed for {package_name}, using fallback version", file=sys.stderr)
return fallback
data = json.loads(response.text)
if 'info' not in data:
print(f"Missing 'info' key in API response for {package_name}, using fallback version", file=sys.stderr)
return fallback
return data['info']['version']
except Exception as e:
print(f"Error fetching version for {package_name}: {str(e)}", file=sys.stderr)
return fallback
# Get latest versions
print(get_package_version('cua-agent'))
print(get_package_version('cua-computer'))
EOF
# Execute the script to get the versions
VERSIONS=($(python get_latest_versions.py))
LATEST_AGENT=${VERSIONS[0]}
LATEST_COMPUTER=${VERSIONS[1]}
echo "Latest cua-agent version: $LATEST_AGENT"
echo "Latest cua-computer version: $LATEST_COMPUTER"
# Output the versions for the next job
echo "agent_version=$LATEST_AGENT" >> $GITHUB_OUTPUT
echo "computer_version=$LATEST_COMPUTER" >> $GITHUB_OUTPUT
# Determine major version for version constraint
AGENT_MAJOR=$(echo $LATEST_AGENT | cut -d. -f1)
COMPUTER_MAJOR=$(echo $LATEST_COMPUTER | cut -d. -f1)
NEXT_AGENT_MAJOR=$((AGENT_MAJOR + 1))
NEXT_COMPUTER_MAJOR=$((COMPUTER_MAJOR + 1))
# Update dependencies in pyproject.toml
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS version of sed needs an empty string for -i
# Update cua-agent with all extras
sed -i '' "s/\"cua-agent\[all\]>=.*,<.*\"/\"cua-agent[all]>=$LATEST_AGENT,<$NEXT_AGENT_MAJOR.0.0\"/" pyproject.toml
sed -i '' "s/\"cua-computer>=.*,<.*\"/\"cua-computer>=$LATEST_COMPUTER,<$NEXT_COMPUTER_MAJOR.0.0\"/" pyproject.toml
else
# Linux version
sed -i "s/\"cua-agent\[all\]>=.*,<.*\"/\"cua-agent[all]>=$LATEST_AGENT,<$NEXT_AGENT_MAJOR.0.0\"/" pyproject.toml
sed -i "s/\"cua-computer>=.*,<.*\"/\"cua-computer>=$LATEST_COMPUTER,<$NEXT_COMPUTER_MAJOR.0.0\"/" pyproject.toml
fi
# Display the updated dependencies
echo "Updated dependencies in pyproject.toml:"
grep -E "cua-agent|cua-computer" pyproject.toml
publish:
needs: prepare
uses: ./.github/workflows/pypi-reusable-publish.yml
with:
package_name: "mcp-server"
package_dir: "libs/python/mcp-server"
version: ${{ needs.prepare.outputs.version }}
is_lume_package: false
base_package_name: "cua-mcp-server"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
set-env-variables:
needs: [prepare, publish]
runs-on: macos-latest
steps:
- name: Set environment variables for use in other jobs
run: |
echo "AGENT_VERSION=${{ needs.prepare.outputs.agent_version }}" >> $GITHUB_ENV
echo "COMPUTER_VERSION=${{ needs.prepare.outputs.computer_version }}" >> $GITHUB_ENV

View File

@@ -0,0 +1,82 @@
name: Publish Pylume Package
on:
push:
tags:
- "pylume-v*"
workflow_dispatch:
inputs:
version:
description: "Version to publish (without v prefix)"
required: true
default: "0.1.0"
workflow_call:
inputs:
version:
description: "Version to publish"
required: true
type: string
outputs:
version:
description: "The version that was published"
value: ${{ jobs.determine-version.outputs.version }}
# Adding permissions at workflow level
permissions:
contents: write
jobs:
determine-version:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/pylume-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for pylume"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
validate-version:
runs-on: macos-latest
needs: determine-version
steps:
- uses: actions/checkout@v4
- name: Validate version
id: validate-version
run: |
CODE_VERSION=$(grep '__version__' libs/python/pylume/pylume/__init__.py | cut -d'"' -f2)
if [ "${{ needs.determine-version.outputs.version }}" != "$CODE_VERSION" ]; then
echo "Version mismatch: expected $CODE_VERSION, got ${{ needs.determine-version.outputs.version }}"
exit 1
fi
echo "Version validated: $CODE_VERSION"
publish:
needs: determine-version
uses: ./.github/workflows/pypi-reusable-publish.yml
with:
package_name: "pylume"
package_dir: "libs/python/pylume"
version: ${{ needs.determine-version.outputs.version }}
is_lume_package: true
base_package_name: "pylume"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}

67
.github/workflows/pypi-publish-som.yml vendored Normal file
View File

@@ -0,0 +1,67 @@
name: Publish SOM Package
on:
push:
tags:
- "som-v*"
workflow_dispatch:
inputs:
version:
description: "Version to publish (without v prefix)"
required: true
default: "0.1.0"
workflow_call:
inputs:
version:
description: "Version to publish"
required: true
type: string
outputs:
version:
description: "The version that was published"
value: ${{ jobs.determine-version.outputs.version }}
# Adding permissions at workflow level
permissions:
contents: write
jobs:
determine-version:
runs-on: macos-latest
outputs:
version: ${{ steps.get-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Determine version
id: get-version
run: |
if [ "${{ github.event_name }}" == "push" ]; then
# Extract version from tag (for package-specific tags)
if [[ "${{ github.ref }}" =~ ^refs/tags/som-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
VERSION=${BASH_REMATCH[1]}
else
echo "Invalid tag format for som"
exit 1
fi
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
# Use version from workflow dispatch
VERSION=${{ github.event.inputs.version }}
else
# Use version from workflow_call
VERSION=${{ inputs.version }}
fi
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT
publish:
needs: determine-version
uses: ./.github/workflows/pypi-reusable-publish.yml
with:
package_name: "som"
package_dir: "libs/python/som"
version: ${{ needs.determine-version.outputs.version }}
is_lume_package: false
base_package_name: "cua-som"
secrets:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}

View File

@@ -0,0 +1,280 @@
name: Reusable Package Publish Workflow
on:
workflow_call:
inputs:
package_name:
description: "Name of the package (e.g. pylume, computer, agent)"
required: true
type: string
package_dir:
description: "Directory containing the package relative to workspace root (e.g. libs/python/pylume)"
required: true
type: string
version:
description: "Version to publish"
required: true
type: string
is_lume_package:
description: "Whether this package includes the lume binary"
required: false
type: boolean
default: false
base_package_name:
description: "PyPI package name (e.g. pylume, cua-agent)"
required: true
type: string
make_latest:
description: "Whether to mark this release as latest (should only be true for lume)"
required: false
type: boolean
default: false
secrets:
PYPI_TOKEN:
required: true
outputs:
version:
description: "The version that was published"
value: ${{ jobs.build-and-publish.outputs.version }}
jobs:
build-and-publish:
runs-on: macos-latest
permissions:
contents: write # This permission is needed for creating releases
outputs:
version: ${{ steps.set-version.outputs.version }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for release creation
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Create root pdm.lock file
run: |
# Create an empty pdm.lock file in the root
touch pdm.lock
- name: Install PDM
uses: pdm-project/setup-pdm@v3
with:
python-version: "3.11"
cache: true
- name: Set version
id: set-version
run: |
echo "VERSION=${{ inputs.version }}" >> $GITHUB_ENV
echo "version=${{ inputs.version }}" >> $GITHUB_OUTPUT
- name: Initialize PDM in package directory
run: |
# Make sure we're working with a properly initialized PDM project
cd ${{ inputs.package_dir }}
# Create pdm.lock if it doesn't exist
if [ ! -f "pdm.lock" ]; then
echo "No pdm.lock found, initializing PDM project..."
pdm lock
fi
- name: Set version in package
run: |
cd ${{ inputs.package_dir }}
# Replace pdm bump with direct edit of pyproject.toml
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS version of sed needs an empty string for -i
sed -i '' "s/version = \".*\"/version = \"$VERSION\"/" pyproject.toml
else
# Linux version
sed -i "s/version = \".*\"/version = \"$VERSION\"/" pyproject.toml
fi
# Verify version was updated
echo "Updated version in pyproject.toml:"
grep "version =" pyproject.toml
# Conditional step for lume binary download (only for pylume package)
- name: Download and setup lume binary
if: inputs.is_lume_package
run: |
# Create a temporary directory for extraction
mkdir -p temp_lume
# Download the latest lume release directly
echo "Downloading latest lume version..."
curl -sL "https://github.com/trycua/lume/releases/latest/download/lume.tar.gz" -o temp_lume/lume.tar.gz
# Extract the tar file (ignore ownership and suppress warnings)
cd temp_lume && tar --no-same-owner -xzf lume.tar.gz
# Make the binary executable
chmod +x lume
# Copy the lume binary to the correct location in the pylume package
mkdir -p "${GITHUB_WORKSPACE}/${{ inputs.package_dir }}/pylume"
cp lume "${GITHUB_WORKSPACE}/${{ inputs.package_dir }}/pylume/lume"
# Verify the binary exists and is executable
test -x "${GITHUB_WORKSPACE}/${{ inputs.package_dir }}/pylume/lume" || { echo "lume binary not found or not executable"; exit 1; }
# Get the version from the downloaded binary for reference
LUME_VERSION=$(./lume --version | grep -oE '[0-9]+\.[0-9]+\.[0-9]+' || echo "unknown")
echo "Using lume version: $LUME_VERSION"
# Cleanup
cd "${GITHUB_WORKSPACE}" && rm -rf temp_lume
# Save the lume version for reference
echo "LUME_VERSION=${LUME_VERSION}" >> $GITHUB_ENV
- name: Build and publish
env:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: |
cd ${{ inputs.package_dir }}
# Build with PDM
pdm build
# For pylume package, verify the binary is in the wheel
if [ "${{ inputs.is_lume_package }}" = "true" ]; then
python -m pip install wheel
wheel unpack dist/*.whl --dest temp_wheel
echo "Listing contents of wheel directory:"
find temp_wheel -type f
test -f temp_wheel/pylume-*/pylume/lume || { echo "lume binary not found in wheel"; exit 1; }
rm -rf temp_wheel
echo "Publishing ${{ inputs.base_package_name }} ${VERSION} with lume ${LUME_VERSION}"
else
echo "Publishing ${{ inputs.base_package_name }} ${VERSION}"
fi
# Install and use twine directly instead of PDM publish
echo "Installing twine for direct publishing..."
pip install twine
echo "Publishing to PyPI using twine..."
TWINE_USERNAME="__token__" TWINE_PASSWORD="$PYPI_TOKEN" python -m twine upload dist/*
# Save the wheel file path for the release
WHEEL_FILE=$(ls dist/*.whl | head -1)
echo "WHEEL_FILE=${WHEEL_FILE}" >> $GITHUB_ENV
- name: Prepare Simple Release Notes
if: startsWith(github.ref, 'refs/tags/')
run: |
# Create release notes based on package type
echo "# ${{ inputs.base_package_name }} v${VERSION}" > release_notes.md
echo "" >> release_notes.md
if [ "${{ inputs.package_name }}" = "pylume" ]; then
echo "## Python SDK for lume - run macOS and Linux VMs on Apple Silicon" >> release_notes.md
echo "" >> release_notes.md
echo "This package provides Python bindings for the lume virtualization tool." >> release_notes.md
echo "" >> release_notes.md
echo "## Dependencies" >> release_notes.md
echo "* lume binary: v${LUME_VERSION}" >> release_notes.md
elif [ "${{ inputs.package_name }}" = "computer" ]; then
echo "## Computer control library for the Computer Universal Automation (CUA) project" >> release_notes.md
echo "" >> release_notes.md
echo "## Dependencies" >> release_notes.md
echo "* pylume: ${PYLUME_VERSION:-latest}" >> release_notes.md
elif [ "${{ inputs.package_name }}" = "agent" ]; then
echo "## Dependencies" >> release_notes.md
echo "* cua-computer: ${COMPUTER_VERSION:-latest}" >> release_notes.md
echo "* cua-som: ${SOM_VERSION:-latest}" >> release_notes.md
echo "" >> release_notes.md
echo "## Installation Options" >> release_notes.md
echo "" >> release_notes.md
echo "### Basic installation with Anthropic" >> release_notes.md
echo '```bash' >> release_notes.md
echo "pip install cua-agent[anthropic]==${VERSION}" >> release_notes.md
echo '```' >> release_notes.md
echo "" >> release_notes.md
echo "### With SOM (recommended)" >> release_notes.md
echo '```bash' >> release_notes.md
echo "pip install cua-agent[som]==${VERSION}" >> release_notes.md
echo '```' >> release_notes.md
echo "" >> release_notes.md
echo "### All features" >> release_notes.md
echo '```bash' >> release_notes.md
echo "pip install cua-agent[all]==${VERSION}" >> release_notes.md
echo '```' >> release_notes.md
elif [ "${{ inputs.package_name }}" = "som" ]; then
echo "## Computer Vision and OCR library for detecting and analyzing UI elements" >> release_notes.md
echo "" >> release_notes.md
echo "This package provides enhanced UI understanding capabilities through computer vision and OCR." >> release_notes.md
elif [ "${{ inputs.package_name }}" = "computer-server" ]; then
echo "## Computer Server for the Computer Universal Automation (CUA) project" >> release_notes.md
echo "" >> release_notes.md
echo "A FastAPI-based server implementation for computer control." >> release_notes.md
echo "" >> release_notes.md
echo "## Dependencies" >> release_notes.md
echo "* cua-computer: ${COMPUTER_VERSION:-latest}" >> release_notes.md
echo "" >> release_notes.md
echo "## Usage" >> release_notes.md
echo '```bash' >> release_notes.md
echo "# Run the server" >> release_notes.md
echo "cua-computer-server" >> release_notes.md
echo '```' >> release_notes.md
elif [ "${{ inputs.package_name }}" = "mcp-server" ]; then
echo "## MCP Server for the Computer-Use Agent (CUA)" >> release_notes.md
echo "" >> release_notes.md
echo "This package provides MCP (Model Context Protocol) integration for CUA agents, allowing them to be used with Claude Desktop, Cursor, and other MCP clients." >> release_notes.md
echo "" >> release_notes.md
echo "## Dependencies" >> release_notes.md
echo "* cua-computer: ${COMPUTER_VERSION:-latest}" >> release_notes.md
echo "* cua-agent: ${AGENT_VERSION:-latest}" >> release_notes.md
echo "" >> release_notes.md
echo "## Usage" >> release_notes.md
echo '```bash' >> release_notes.md
echo "# Run the MCP server directly" >> release_notes.md
echo "cua-mcp-server" >> release_notes.md
echo '```' >> release_notes.md
echo "" >> release_notes.md
echo "## Claude Desktop Integration" >> release_notes.md
echo "Add to your Claude Desktop configuration (~/.config/claude-desktop/claude_desktop_config.json or OS-specific location):" >> release_notes.md
echo '```json' >> release_notes.md
echo '"mcpServers": {' >> release_notes.md
echo ' "cua-agent": {' >> release_notes.md
echo ' "command": "cua-mcp-server",' >> release_notes.md
echo ' "args": [],' >> release_notes.md
echo ' "env": {' >> release_notes.md
echo ' "CUA_AGENT_LOOP": "OMNI",' >> release_notes.md
echo ' "CUA_MODEL_PROVIDER": "ANTHROPIC",' >> release_notes.md
echo ' "CUA_MODEL_NAME": "claude-3-opus-20240229",' >> release_notes.md
echo ' "ANTHROPIC_API_KEY": "your-api-key",' >> release_notes.md
echo ' "PYTHONIOENCODING": "utf-8"' >> release_notes.md
echo ' }' >> release_notes.md
echo ' }' >> release_notes.md
echo '}' >> release_notes.md
echo '```' >> release_notes.md
fi
# Add installation section if not agent (which has its own installation section)
if [ "${{ inputs.package_name }}" != "agent" ]; then
echo "" >> release_notes.md
echo "## Installation" >> release_notes.md
echo '```bash' >> release_notes.md
echo "pip install ${{ inputs.base_package_name }}==${VERSION}" >> release_notes.md
echo '```' >> release_notes.md
fi
echo "Release notes created:"
cat release_notes.md
- name: Create GitHub Release
uses: softprops/action-gh-release@v2
if: startsWith(github.ref, 'refs/tags/')
with:
name: "${{ inputs.base_package_name }} v${{ env.VERSION }}"
body_path: release_notes.md
files: ${{ inputs.package_dir }}/${{ env.WHEEL_FILE }}
draft: false
prerelease: false
make_latest: ${{ inputs.package_name == 'lume' }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -1,280 +0,0 @@
name: Reusable Package Publish Workflow
on:
workflow_call:
inputs:
package_name:
description: 'Name of the package (e.g. pylume, computer, agent)'
required: true
type: string
package_dir:
description: 'Directory containing the package relative to workspace root (e.g. libs/pylume)'
required: true
type: string
version:
description: 'Version to publish'
required: true
type: string
is_lume_package:
description: 'Whether this package includes the lume binary'
required: false
type: boolean
default: false
base_package_name:
description: 'PyPI package name (e.g. pylume, cua-agent)'
required: true
type: string
make_latest:
description: 'Whether to mark this release as latest (should only be true for lume)'
required: false
type: boolean
default: false
secrets:
PYPI_TOKEN:
required: true
outputs:
version:
description: "The version that was published"
value: ${{ jobs.build-and-publish.outputs.version }}
jobs:
build-and-publish:
runs-on: macos-latest
permissions:
contents: write # This permission is needed for creating releases
outputs:
version: ${{ steps.set-version.outputs.version }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for release creation
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Create root pdm.lock file
run: |
# Create an empty pdm.lock file in the root
touch pdm.lock
- name: Install PDM
uses: pdm-project/setup-pdm@v3
with:
python-version: '3.11'
cache: true
- name: Set version
id: set-version
run: |
echo "VERSION=${{ inputs.version }}" >> $GITHUB_ENV
echo "version=${{ inputs.version }}" >> $GITHUB_OUTPUT
- name: Initialize PDM in package directory
run: |
# Make sure we're working with a properly initialized PDM project
cd ${{ inputs.package_dir }}
# Create pdm.lock if it doesn't exist
if [ ! -f "pdm.lock" ]; then
echo "No pdm.lock found, initializing PDM project..."
pdm lock
fi
- name: Set version in package
run: |
cd ${{ inputs.package_dir }}
# Replace pdm bump with direct edit of pyproject.toml
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS version of sed needs an empty string for -i
sed -i '' "s/version = \".*\"/version = \"$VERSION\"/" pyproject.toml
else
# Linux version
sed -i "s/version = \".*\"/version = \"$VERSION\"/" pyproject.toml
fi
# Verify version was updated
echo "Updated version in pyproject.toml:"
grep "version =" pyproject.toml
# Conditional step for lume binary download (only for pylume package)
- name: Download and setup lume binary
if: inputs.is_lume_package
run: |
# Create a temporary directory for extraction
mkdir -p temp_lume
# Download the latest lume release directly
echo "Downloading latest lume version..."
curl -sL "https://github.com/trycua/lume/releases/latest/download/lume.tar.gz" -o temp_lume/lume.tar.gz
# Extract the tar file (ignore ownership and suppress warnings)
cd temp_lume && tar --no-same-owner -xzf lume.tar.gz
# Make the binary executable
chmod +x lume
# Copy the lume binary to the correct location in the pylume package
mkdir -p "${GITHUB_WORKSPACE}/${{ inputs.package_dir }}/pylume"
cp lume "${GITHUB_WORKSPACE}/${{ inputs.package_dir }}/pylume/lume"
# Verify the binary exists and is executable
test -x "${GITHUB_WORKSPACE}/${{ inputs.package_dir }}/pylume/lume" || { echo "lume binary not found or not executable"; exit 1; }
# Get the version from the downloaded binary for reference
LUME_VERSION=$(./lume --version | grep -oE '[0-9]+\.[0-9]+\.[0-9]+' || echo "unknown")
echo "Using lume version: $LUME_VERSION"
# Cleanup
cd "${GITHUB_WORKSPACE}" && rm -rf temp_lume
# Save the lume version for reference
echo "LUME_VERSION=${LUME_VERSION}" >> $GITHUB_ENV
- name: Build and publish
env:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: |
cd ${{ inputs.package_dir }}
# Build with PDM
pdm build
# For pylume package, verify the binary is in the wheel
if [ "${{ inputs.is_lume_package }}" = "true" ]; then
python -m pip install wheel
wheel unpack dist/*.whl --dest temp_wheel
echo "Listing contents of wheel directory:"
find temp_wheel -type f
test -f temp_wheel/pylume-*/pylume/lume || { echo "lume binary not found in wheel"; exit 1; }
rm -rf temp_wheel
echo "Publishing ${{ inputs.base_package_name }} ${VERSION} with lume ${LUME_VERSION}"
else
echo "Publishing ${{ inputs.base_package_name }} ${VERSION}"
fi
# Install and use twine directly instead of PDM publish
echo "Installing twine for direct publishing..."
pip install twine
echo "Publishing to PyPI using twine..."
TWINE_USERNAME="__token__" TWINE_PASSWORD="$PYPI_TOKEN" python -m twine upload dist/*
# Save the wheel file path for the release
WHEEL_FILE=$(ls dist/*.whl | head -1)
echo "WHEEL_FILE=${WHEEL_FILE}" >> $GITHUB_ENV
- name: Prepare Simple Release Notes
if: startsWith(github.ref, 'refs/tags/')
run: |
# Create release notes based on package type
echo "# ${{ inputs.base_package_name }} v${VERSION}" > release_notes.md
echo "" >> release_notes.md
if [ "${{ inputs.package_name }}" = "pylume" ]; then
echo "## Python SDK for lume - run macOS and Linux VMs on Apple Silicon" >> release_notes.md
echo "" >> release_notes.md
echo "This package provides Python bindings for the lume virtualization tool." >> release_notes.md
echo "" >> release_notes.md
echo "## Dependencies" >> release_notes.md
echo "* lume binary: v${LUME_VERSION}" >> release_notes.md
elif [ "${{ inputs.package_name }}" = "computer" ]; then
echo "## Computer control library for the Computer Universal Automation (CUA) project" >> release_notes.md
echo "" >> release_notes.md
echo "## Dependencies" >> release_notes.md
echo "* pylume: ${PYLUME_VERSION:-latest}" >> release_notes.md
elif [ "${{ inputs.package_name }}" = "agent" ]; then
echo "## Dependencies" >> release_notes.md
echo "* cua-computer: ${COMPUTER_VERSION:-latest}" >> release_notes.md
echo "* cua-som: ${SOM_VERSION:-latest}" >> release_notes.md
echo "" >> release_notes.md
echo "## Installation Options" >> release_notes.md
echo "" >> release_notes.md
echo "### Basic installation with Anthropic" >> release_notes.md
echo '```bash' >> release_notes.md
echo "pip install cua-agent[anthropic]==${VERSION}" >> release_notes.md
echo '```' >> release_notes.md
echo "" >> release_notes.md
echo "### With SOM (recommended)" >> release_notes.md
echo '```bash' >> release_notes.md
echo "pip install cua-agent[som]==${VERSION}" >> release_notes.md
echo '```' >> release_notes.md
echo "" >> release_notes.md
echo "### All features" >> release_notes.md
echo '```bash' >> release_notes.md
echo "pip install cua-agent[all]==${VERSION}" >> release_notes.md
echo '```' >> release_notes.md
elif [ "${{ inputs.package_name }}" = "som" ]; then
echo "## Computer Vision and OCR library for detecting and analyzing UI elements" >> release_notes.md
echo "" >> release_notes.md
echo "This package provides enhanced UI understanding capabilities through computer vision and OCR." >> release_notes.md
elif [ "${{ inputs.package_name }}" = "computer-server" ]; then
echo "## Computer Server for the Computer Universal Automation (CUA) project" >> release_notes.md
echo "" >> release_notes.md
echo "A FastAPI-based server implementation for computer control." >> release_notes.md
echo "" >> release_notes.md
echo "## Dependencies" >> release_notes.md
echo "* cua-computer: ${COMPUTER_VERSION:-latest}" >> release_notes.md
echo "" >> release_notes.md
echo "## Usage" >> release_notes.md
echo '```bash' >> release_notes.md
echo "# Run the server" >> release_notes.md
echo "cua-computer-server" >> release_notes.md
echo '```' >> release_notes.md
elif [ "${{ inputs.package_name }}" = "mcp-server" ]; then
echo "## MCP Server for the Computer-Use Agent (CUA)" >> release_notes.md
echo "" >> release_notes.md
echo "This package provides MCP (Model Context Protocol) integration for CUA agents, allowing them to be used with Claude Desktop, Cursor, and other MCP clients." >> release_notes.md
echo "" >> release_notes.md
echo "## Dependencies" >> release_notes.md
echo "* cua-computer: ${COMPUTER_VERSION:-latest}" >> release_notes.md
echo "* cua-agent: ${AGENT_VERSION:-latest}" >> release_notes.md
echo "" >> release_notes.md
echo "## Usage" >> release_notes.md
echo '```bash' >> release_notes.md
echo "# Run the MCP server directly" >> release_notes.md
echo "cua-mcp-server" >> release_notes.md
echo '```' >> release_notes.md
echo "" >> release_notes.md
echo "## Claude Desktop Integration" >> release_notes.md
echo "Add to your Claude Desktop configuration (~/.config/claude-desktop/claude_desktop_config.json or OS-specific location):" >> release_notes.md
echo '```json' >> release_notes.md
echo '"mcpServers": {' >> release_notes.md
echo ' "cua-agent": {' >> release_notes.md
echo ' "command": "cua-mcp-server",' >> release_notes.md
echo ' "args": [],' >> release_notes.md
echo ' "env": {' >> release_notes.md
echo ' "CUA_AGENT_LOOP": "OMNI",' >> release_notes.md
echo ' "CUA_MODEL_PROVIDER": "ANTHROPIC",' >> release_notes.md
echo ' "CUA_MODEL_NAME": "claude-3-opus-20240229",' >> release_notes.md
echo ' "ANTHROPIC_API_KEY": "your-api-key",' >> release_notes.md
echo ' "PYTHONIOENCODING": "utf-8"' >> release_notes.md
echo ' }' >> release_notes.md
echo ' }' >> release_notes.md
echo '}' >> release_notes.md
echo '```' >> release_notes.md
fi
# Add installation section if not agent (which has its own installation section)
if [ "${{ inputs.package_name }}" != "agent" ]; then
echo "" >> release_notes.md
echo "## Installation" >> release_notes.md
echo '```bash' >> release_notes.md
echo "pip install ${{ inputs.base_package_name }}==${VERSION}" >> release_notes.md
echo '```' >> release_notes.md
fi
echo "Release notes created:"
cat release_notes.md
- name: Create GitHub Release
uses: softprops/action-gh-release@v2
if: startsWith(github.ref, 'refs/tags/')
with:
name: "${{ inputs.base_package_name }} v${{ env.VERSION }}"
body_path: release_notes.md
files: ${{ inputs.package_dir }}/${{ env.WHEEL_FILE }}
draft: false
prerelease: false
make_latest: ${{ inputs.package_name == 'lume' }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

6
.gitignore vendored
View File

@@ -6,6 +6,10 @@ __pycache__/
# C extensions
*.so
node_modules/*
*/node_modules
**/node_modules
# Distribution / packaging
.Python
build/
@@ -155,7 +159,7 @@ weights/icon_detect/model.pt
weights/icon_detect/model.pt.zip
weights/icon_detect/model.pt.zip.part*
libs/omniparser/weights/icon_detect/model.pt
libs/python/omniparser/weights/icon_detect/model.pt
# Example test data and output
examples/test_data/

57
.vscode/launch.json vendored
View File

@@ -1,5 +1,31 @@
{
"configurations": [
{
"name": "Agent UI",
"type": "debugpy",
"request": "launch",
"program": "examples/agent_ui_examples.py",
"console": "integratedTerminal",
"justMyCode": false,
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/python/core:${workspaceFolder:cua-root}/libs/python/computer:${workspaceFolder:cua-root}/libs/python/agent:${workspaceFolder:cua-root}/libs/python/som:${workspaceFolder:cua-root}/libs/python/pylume"
}
},
{
"name": "Computer UI",
"type": "debugpy",
"request": "launch",
"program": "examples/computer_ui_examples.py",
"console": "integratedTerminal",
"justMyCode": false,
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/python/core:${workspaceFolder:cua-root}/libs/python/computer:${workspaceFolder:cua-root}/libs/python/agent:${workspaceFolder:cua-root}/libs/python/som:${workspaceFolder:cua-root}/libs/python/pylume"
}
},
{
"name": "Run Computer Examples",
"type": "debugpy",
@@ -10,7 +36,7 @@
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/python/core:${workspaceFolder:cua-root}/libs/python/computer:${workspaceFolder:cua-root}/libs/python/agent:${workspaceFolder:cua-root}/libs/python/som:${workspaceFolder:cua-root}/libs/python/pylume"
}
},
{
@@ -23,20 +49,7 @@
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
}
},
{
"name": "Run Agent UI Examples",
"type": "debugpy",
"request": "launch",
"program": "examples/agent_ui_examples.py",
"console": "integratedTerminal",
"justMyCode": false,
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/python/core:${workspaceFolder:cua-root}/libs/python/computer:${workspaceFolder:cua-root}/libs/python/agent:${workspaceFolder:cua-root}/libs/python/som:${workspaceFolder:cua-root}/libs/python/pylume"
}
},
{
@@ -49,7 +62,7 @@
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/python/core:${workspaceFolder:cua-root}/libs/python/computer:${workspaceFolder:cua-root}/libs/python/agent:${workspaceFolder:cua-root}/libs/python/som:${workspaceFolder:cua-root}/libs/python/pylume"
}
},
{
@@ -71,7 +84,7 @@
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/python/core:${workspaceFolder:cua-root}/libs/python/computer:${workspaceFolder:cua-root}/libs/python/agent:${workspaceFolder:cua-root}/libs/python/som:${workspaceFolder:cua-root}/libs/python/pylume"
}
},
{
@@ -93,27 +106,27 @@
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/python/core:${workspaceFolder:cua-root}/libs/python/computer:${workspaceFolder:cua-root}/libs/python/agent:${workspaceFolder:cua-root}/libs/python/som:${workspaceFolder:cua-root}/libs/python/pylume"
}
},
{
"name": "Run Computer Server",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/libs/computer-server/run_server.py",
"program": "${workspaceFolder}/libs/python/computer-server/run_server.py",
"console": "integratedTerminal",
"justMyCode": true,
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/python/core:${workspaceFolder:cua-root}/libs/python/computer:${workspaceFolder:cua-root}/libs/python/agent:${workspaceFolder:cua-root}/libs/python/som:${workspaceFolder:cua-root}/libs/python/pylume"
}
},
{
"name": "Run Computer Server with Args",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/libs/computer-server/run_server.py",
"program": "${workspaceFolder}/libs/python/computer-server/run_server.py",
"args": [
"--host",
"0.0.0.0",
@@ -127,7 +140,7 @@
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer-server"
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/python/core:${workspaceFolder:cua-root}/libs/python/computer-server"
}
},
{

13
.vscode/libs-ts.code-workspace vendored Normal file
View File

@@ -0,0 +1,13 @@
{
"folders": [
{
"name": "libs-ts",
"path": "../libs/typescript"
}
],
"extensions": {
"recommendations": [
"biomejs.biome",
]
}
}

View File

@@ -6,27 +6,27 @@
},
{
"name": "computer",
"path": "../libs/computer"
"path": "../libs/python/computer"
},
{
"name": "agent",
"path": "../libs/agent"
"path": "../libs/python/agent"
},
{
"name": "som",
"path": "../libs/som"
"path": "../libs/python/som"
},
{
"name": "computer-server",
"path": "../libs/computer-server"
"path": "../libs/python/computer-server"
},
{
"name": "pylume",
"path": "../libs/pylume"
"path": "../libs/python/pylume"
},
{
"name": "core",
"path": "../libs/core"
"path": "../libs/python/core"
}
],
"settings": {
@@ -47,11 +47,11 @@
"libs"
],
"python.analysis.extraPaths": [
"${workspaceFolder:cua-root}/libs/core",
"${workspaceFolder:cua-root}/libs/computer",
"${workspaceFolder:cua-root}/libs/agent",
"${workspaceFolder:cua-root}/libs/som",
"${workspaceFolder:cua-root}/libs/pylume",
"${workspaceFolder:cua-root}/libs/python/core",
"${workspaceFolder:cua-root}/libs/python/computer",
"${workspaceFolder:cua-root}/libs/python/agent",
"${workspaceFolder:cua-root}/libs/python/som",
"${workspaceFolder:cua-root}/libs/python/pylume",
"${workspaceFolder:cua-root}/.vscode/typings"
],
"python.envFile": "${workspaceFolder:cua-root}/.env",
@@ -99,11 +99,11 @@
}
],
"python.autoComplete.extraPaths": [
"${workspaceFolder:cua-root}/libs/core",
"${workspaceFolder:cua-root}/libs/computer",
"${workspaceFolder:cua-root}/libs/agent",
"${workspaceFolder:cua-root}/libs/som",
"${workspaceFolder:cua-root}/libs/pylume"
"${workspaceFolder:cua-root}/libs/python/core",
"${workspaceFolder:cua-root}/libs/python/computer",
"${workspaceFolder:cua-root}/libs/python/agent",
"${workspaceFolder:cua-root}/libs/python/som",
"${workspaceFolder:cua-root}/libs/python/pylume"
],
"python.languageServer": "None",
"[python]": {
@@ -118,8 +118,8 @@
"examples/agent_examples.py": "python"
},
"python.interpreterPaths": {
"examples/computer_examples.py": "${workspaceFolder}/libs/computer/.venv/bin/python",
"examples/agent_examples.py": "${workspaceFolder}/libs/agent/.venv/bin/python"
"examples/computer_examples.py": "${workspaceFolder}/libs/python/computer/.venv/bin/python",
"examples/agent_examples.py": "${workspaceFolder}/libs/python/agent/.venv/bin/python"
}
},
"tasks": {
@@ -148,119 +148,6 @@
}
]
},
"launch": {
"version": "0.2.0",
"configurations": [
{
"name": "Run Computer Examples",
"type": "debugpy",
"request": "launch",
"program": "examples/computer_examples.py",
"console": "integratedTerminal",
"justMyCode": true,
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
}
},
{
"name": "Run Agent Examples",
"type": "debugpy",
"request": "launch",
"program": "examples/agent_examples.py",
"console": "integratedTerminal",
"justMyCode": false,
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
}
},
{
"name": "Run PyLume Examples",
"type": "debugpy",
"request": "launch",
"program": "examples/pylume_examples.py",
"console": "integratedTerminal",
"justMyCode": true,
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
}
},
{
"name": "SOM: Run Experiments (No OCR)",
"type": "debugpy",
"request": "launch",
"program": "examples/som_examples.py",
"args": [
"examples/test_data",
"--output-dir", "examples/output",
"--ocr", "none",
"--mode", "experiment"
],
"console": "integratedTerminal",
"justMyCode": false,
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
}
},
{
"name": "SOM: Run Experiments (EasyOCR)",
"type": "debugpy",
"request": "launch",
"program": "examples/som_examples.py",
"args": [
"examples/test_data",
"--output-dir", "examples/output",
"--ocr", "easyocr",
"--mode", "experiment"
],
"console": "integratedTerminal",
"justMyCode": false,
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
}
},
{
"name": "Run Computer Server",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/libs/computer-server/run_server.py",
"console": "integratedTerminal",
"justMyCode": true,
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer:${workspaceFolder:cua-root}/libs/agent:${workspaceFolder:cua-root}/libs/som:${workspaceFolder:cua-root}/libs/pylume"
}
},
{
"name": "Run Computer Server with Args",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/libs/computer-server/run_server.py",
"args": [
"--host", "0.0.0.0",
"--port", "8000",
"--log-level", "debug"
],
"console": "integratedTerminal",
"justMyCode": false,
"python": "${workspaceFolder:cua-root}/.venv/bin/python",
"cwd": "${workspaceFolder:cua-root}",
"env": {
"PYTHONPATH": "${workspaceFolder:cua-root}/libs/core:${workspaceFolder:cua-root}/libs/computer-server"
}
}
]
},
"compounds": [
{
"name": "Run Computer Examples + Server",

86
COMPATIBILITY.md Normal file
View File

@@ -0,0 +1,86 @@
# C/ua Compatibility Matrix
## Table of Contents
- [Host OS Compatibility](#host-os-compatibility)
- [macOS Host](#macos-host)
- [Ubuntu/Linux Host](#ubuntulinux-host)
- [Windows Host](#windows-host)
- [VM Emulation Support](#vm-emulation-support)
- [Model Provider Compatibility](#model-provider-compatibility)
---
## Host OS Compatibility
*This section shows compatibility based on your **host operating system** (the OS you're running C/ua on).*
### macOS Host
| Installation Method | Requirements | Lume | Cloud | Notes |
|-------------------|-------------|------|-------|-------|
| **playground-docker.sh** | Docker Desktop | ✅ Full | ✅ Full | Recommended for quick setup |
| **Dev Container** | VS Code/WindSurf + Docker | ✅ Full | ✅ Full | Best for development |
| **PyPI packages** | Python 3.12+ | ✅ Full | ✅ Full | Most flexible |
**macOS Host Requirements:**
- macOS 15+ (Sequoia) for local VM support
- Apple Silicon (M1/M2/M3/M4) recommended for best performance
- Docker Desktop for containerized installations
---
### Ubuntu/Linux Host
| Installation Method | Requirements | Lume | Cloud | Notes |
|-------------------|-------------|------|-------|-------|
| **playground-docker.sh** | Docker Engine | ✅ Full | ✅ Full | Recommended for quick setup |
| **Dev Container** | VS Code/WindSurf + Docker | ✅ Full | ✅ Full | Best for development |
| **PyPI packages** | Python 3.12+ | ✅ Full | ✅ Full | Most flexible |
**Ubuntu/Linux Host Requirements:**
- Ubuntu 20.04+ or equivalent Linux distribution
- Docker Engine or Docker Desktop
- Python 3.12+ for PyPI installation
---
### Windows Host
| Installation Method | Requirements | Lume | Winsandbox | Cloud | Notes |
|-------------------|-------------|------|------------|-------|-------|
| **playground-docker.sh** | Docker Desktop + WSL2 | ❌ Not supported | ❌ Not supported | ✅ Full | Requires WSL2 |
| **Dev Container** | VS Code/WindSurf + Docker + WSL2 | ❌ Not supported | ❌ Not supported | ✅ Full | Requires WSL2 |
| **PyPI packages** | Python 3.12+ | ❌ Not supported | ✅ Full | ✅ Full | |
**Windows Host Requirements:**
- Windows 10/11 with WSL2 enabled for shell script execution
- Docker Desktop with WSL2 backend
- Windows Sandbox feature enabled (for Winsandbox support)
- Python 3.12+ installed in WSL2 or Windows
- **Note**: Lume CLI is not available on Windows - use Cloud or Winsandbox providers
---
## VM Emulation Support
*This section shows which **virtual machine operating systems** each provider can emulate.*
| Provider | macOS VM | Ubuntu/Linux VM | Windows VM | Notes |
|----------|----------|-----------------|------------|-------|
| **Lume** | ✅ Full support | ⚠️ Limited support | ⚠️ Limited support | macOS: native; Ubuntu/Linux/Windows: need custom image |
| **Cloud** | 🚧 Coming soon | ✅ Full support | 🚧 Coming soon | Currently Ubuntu only, macOS/Windows in development |
| **Winsandbox** | ❌ Not supported | ❌ Not supported | ✅ Windows only | Windows 10/11 environments only |
---
## Model Provider Compatibility
*This section shows which **AI model providers** are supported on each host operating system.*
| Provider | macOS Host | Ubuntu/Linux Host | Windows Host | Notes |
|----------|------------|-------------------|--------------|-------|
| **Anthropic** | ✅ Full support | ✅ Full support | ✅ Full support | Cloud-based API |
| **OpenAI** | ✅ Full support | ✅ Full support | ✅ Full support | Cloud-based API |
| **Ollama** | ✅ Full support | ✅ Full support | ✅ Full support | Local model serving |
| **OpenAI Compatible** | ✅ Full support | ✅ Full support | ✅ Full support | Any OpenAI-compatible API endpoint |
| **MLX VLM** | ✅ macOS only | ❌ Not supported | ❌ Not supported | Apple Silicon required. PyPI installation only. |

View File

@@ -1,11 +1,11 @@
FROM python:3.11-slim
FROM python:3.12-slim
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PYTHONPATH="/app/libs/core:/app/libs/computer:/app/libs/agent:/app/libs/som:/app/libs/pylume:/app/libs/computer-server"
PYTHONPATH="/app/libs/python/core:/app/libs/python/computer:/app/libs/python/agent:/app/libs/python/som:/app/libs/python/pylume:/app/libs/python/computer-server:/app/libs/python/mcp-server"
# Install system dependencies for ARM architecture
RUN apt-get update && apt-get install -y --no-install-recommends \
@@ -21,6 +21,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
iputils-ping \
net-tools \
sed \
xxd \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

137
README.md
View File

@@ -51,71 +51,72 @@
**Need to automate desktop tasks? Launch the Computer-Use Agent UI with a single command.**
### Option 1: Fully-managed install (recommended)
### Option 1: Fully-managed install with Docker (recommended)
*I want to be totally guided in the process*
*Docker-based guided install for quick use*
**macOS/Linux/Windows (via WSL):**
```bash
# Requires Python 3.11+
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground.sh)"
# Requires Docker
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground-docker.sh)"
```
This script will:
- Ask if you want to use local VMs or C/ua Cloud Containers
- Install necessary dependencies (Lume CLI for local VMs)
- Download VM images if needed
- Install Python packages
- Launch the Computer-Use Agent UI
### Option 2: Key manual steps
<details>
<summary>If you are skeptical running one-install scripts</summary>
**For C/ua Agent UI (any system, cloud VMs only):**
```bash
# Requires Python 3.11+ and C/ua API key
pip install -U "cua-computer[all]" "cua-agent[all]"
python -m agent.ui.gradio.app
```
**For Local macOS/Linux VMs (Apple Silicon only):**
```bash
# 1. Install Lume CLI
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
# 2. Pull macOS image
lume pull macos-sequoia-cua:latest
# 3. Start VM
lume run macos-sequoia-cua:latest
# 4. Install packages and launch UI
pip install -U "cua-computer[all]" "cua-agent[all]"
python -m agent.ui.gradio.app
```
</details>
This script will guide you through setup using Docker containers and launch the Computer-Use Agent UI.
---
*How it works: Computer module provides secure desktops (Lume CLI locally, [C/ua Cloud Containers](https://trycua.com) remotely), Agent module provides local/API agents with OpenAI AgentResponse format and [trajectory tracing](https://trycua.com/trajectory-viewer).*
### Option 2: [Dev Container](./.devcontainer/README.md)
### Supported [Agent Loops](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops)
*Best for contributors and development*
- [UITARS-1.5](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Run locally on Apple Silicon with MLX, or use cloud providers
- [OpenAI CUA](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Use OpenAI's Computer-Use Preview model
- [Anthropic CUA](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Use Anthropic's Computer-Use capabilities
- [OmniParser-v2.0](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Control UI with [Set-of-Marks prompting](https://som-gpt4v.github.io/) using any vision model
This repository includes a [Dev Container](./.devcontainer/README.md) configuration that simplifies setup to a few steps:
# 💻 Developer Guide
1. **Install the Dev Containers extension ([VS Code](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) or [WindSurf](https://docs.windsurf.com/windsurf/advanced#dev-containers-beta))**
2. **Open the repository in the Dev Container:**
- Press `Ctrl+Shift+P` (or `⌘+Shift+P` on macOS)
- Select `Dev Containers: Clone Repository in Container Volume...` and paste the repository URL: `https://github.com/trycua/cua.git` (if not cloned) or `Dev Containers: Open Folder in Container...` (if git cloned).
> **Note**: On WindSurf, the post install hook might not run automatically. If so, run `/bin/bash .devcontainer/post-install.sh` manually.
3. **Open the VS Code workspace:** Once the post-install.sh is done running, open the `.vscode/py.code-workspace` workspace and press ![Open Workspace](https://github.com/user-attachments/assets/923bdd43-8c8f-4060-8d78-75bfa302b48c)
.
4. **Run the Agent UI example:** Click ![Run Agent UI](https://github.com/user-attachments/assets/7a61ef34-4b22-4dab-9864-f86bf83e290b)
to start the Gradio UI. If prompted to install **debugpy (Python Debugger)** to enable remote debugging, select 'Yes' to proceed.
5. **Access the Gradio UI:** The Gradio UI will be available at `http://localhost:7860` and will automatically forward to your host machine.
Follow these steps to use C/ua in your own code. See [Developer Guide](https://docs.trycua.com/home/developer-guide) for building from source.
---
### Option 3: PyPI
*Direct Python package installation*
```bash
# conda create -yn cua python==3.12
pip install -U "cua-computer[all]" "cua-agent[all]"
python -m agent.ui # Start the agent UI
```
Or check out the [Usage Guide](#-usage-guide) to learn how to use our Python SDK in your own code.
---
## Supported [Agent Loops](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops)
- [UITARS-1.5](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Run locally on Apple Silicon with MLX, or use cloud providers
- [OpenAI CUA](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Use OpenAI's Computer-Use Preview model
- [Anthropic CUA](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Use Anthropic's Computer-Use capabilities
- [OmniParser-v2.0](https://github.com/trycua/cua/blob/main/libs/python/agent/README.md#agent-loops) - Control UI with [Set-of-Marks prompting](https://som-gpt4v.github.io/) using any vision model
## 🖥️ Compatibility
For detailed compatibility information including host OS support, VM emulation capabilities, and model provider compatibility, see the [Compatibility Matrix](./COMPATIBILITY.md).
<br/>
<br/>
# 🐍 Usage Guide
Follow these steps to use C/ua in your own Python code. See [Developer Guide](./docs/Developer-Guide.md) for building from source.
### Step 1: Install Lume CLI
@@ -227,8 +228,8 @@ docker run -it --rm \
## Resources
- [How to use the MCP Server with Claude Desktop or other MCP clients](./libs/mcp-server/README.md) - One of the easiest ways to get started with C/ua
- [How to use OpenAI Computer-Use, Anthropic, OmniParser, or UI-TARS for your Computer-Use Agent](./libs/agent/README.md)
- [How to use the MCP Server with Claude Desktop or other MCP clients](./libs/python/mcp-server/README.md) - One of the easiest ways to get started with C/ua
- [How to use OpenAI Computer-Use, Anthropic, OmniParser, or UI-TARS for your Computer-Use Agent](./libs/python/agent/README.md)
- [How to use Lume CLI for managing desktops](./libs/lume/README.md)
- [Training Computer-Use Models: Collecting Human Trajectories with C/ua (Part 1)](https://www.trycua.com/blog/training-computer-use-models-trajectories-1)
- [Build Your Own Operator on macOS (Part 1)](https://www.trycua.com/blog/build-your-own-operator-on-macos-1)
@@ -239,13 +240,14 @@ docker run -it --rm \
|--------|-------------|---------------|
| [**Lume**](./libs/lume/README.md) | VM management for macOS/Linux using Apple's Virtualization.Framework | `curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh \| bash` |
| [**Lumier**](./libs/lumier/README.md) | Docker interface for macOS and Linux VMs | `docker pull trycua/lumier:latest` |
| [**Computer**](./libs/computer/README.md) | Interface for controlling virtual machines | `pip install "cua-computer[all]"` |
| [**Agent**](./libs/agent/README.md) | AI agent framework for automating tasks | `pip install "cua-agent[all]"` |
| [**MCP Server**](./libs/mcp-server/README.md) | MCP server for using CUA with Claude Desktop | `pip install cua-mcp-server` |
| [**SOM**](./libs/som/README.md) | Self-of-Mark library for Agent | `pip install cua-som` |
| [**PyLume**](./libs/pylume/README.md) | Python bindings for Lume | `pip install pylume` |
| [**Computer Server**](./libs/computer-server/README.md) | Server component for Computer | `pip install cua-computer-server` |
| [**Core**](./libs/core/README.md) | Core utilities | `pip install cua-core` |
| [**Computer (Python)**](./libs/python/computer/README.md) | Python Interface for controlling virtual machines | `pip install "cua-computer[all]"` |
| [**Computer (Typescript)**](./libs/typescript/computer/README.md) | Typescript Interface for controlling virtual machines | `npm install @trycua/computer` |
| [**Agent**](./libs/python/agent/README.md) | AI agent framework for automating tasks | `pip install "cua-agent[all]"` |
| [**MCP Server**](./libs/python/mcp-server/README.md) | MCP server for using CUA with Claude Desktop | `pip install cua-mcp-server` |
| [**SOM**](./libs/python/som/README.md) | Self-of-Mark library for Agent | `pip install cua-som` |
| [**Computer Server**](./libs/python/computer-server/README.md) | Server component for Computer | `pip install cua-computer-server` |
| [**Core (Python)**](./libs/python/core/README.md) | Python Core utilities | `pip install cua-core` |
| [**Core (Typescript)**](./libs/typescript/core/README.md) | Typescript Core utilities | `npm install @trycua/core` |
## Computer Interface Reference
@@ -253,7 +255,8 @@ For complete examples, see [computer_examples.py](./examples/computer_examples.p
```python
# Shell Actions
await computer.interface.run_command(cmd) # Run shell command
result = await computer.interface.run_command(cmd) # Run shell command
# result.stdout, result.stderr, result.returncode
# Mouse Actions
await computer.interface.left_click(x, y) # Left click at coordinates
@@ -288,8 +291,8 @@ await computer.interface.copy_to_clipboard() # Get clipboard content
# File System Operations
await computer.interface.file_exists(path) # Check if file exists
await computer.interface.directory_exists(path) # Check if directory exists
await computer.interface.read_text(path) # Read file content
await computer.interface.write_text(path, content) # Write file content
await computer.interface.read_text(path, encoding="utf-8") # Read file content
await computer.interface.write_text(path, content, encoding="utf-8") # Write file content
await computer.interface.read_bytes(path) # Read file content as bytes
await computer.interface.write_bytes(path, content) # Write file content as bytes
await computer.interface.delete_file(path) # Delete file
@@ -399,14 +402,6 @@ Thank you to all our supporters!
<td align="center" valign="top" width="14.28%"><a href="https://ricterz.me"><img src="https://avatars.githubusercontent.com/u/5282759?v=4?s=100" width="100px;" alt="Ricter Zheng"/><br /><sub><b>Ricter Zheng</b></sub></a><br /><a href="#code-RicterZ" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://www.trytruffle.ai/"><img src="https://avatars.githubusercontent.com/u/50844303?v=4?s=100" width="100px;" alt="Rahul Karajgikar"/><br /><sub><b>Rahul Karajgikar</b></sub></a><br /><a href="#code-rahulkarajgikar" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/trospix"><img src="https://avatars.githubusercontent.com/u/81363696?v=4?s=100" width="100px;" alt="trospix"/><br /><sub><b>trospix</b></sub></a><br /><a href="#code-trospix" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://wavee.world/invitation/b96d00e6-b802-4a1b-8a66-2e3854a01ffd"><img src="https://avatars.githubusercontent.com/u/22633385?v=4?s=100" width="100px;" alt="Ikko Eltociear Ashimine"/><br /><sub><b>Ikko Eltociear Ashimine</b></sub></a><br /><a href="#code-eltociear" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/dp221125"><img src="https://avatars.githubusercontent.com/u/10572119?v=4?s=100" width="100px;" alt="한석호(MilKyo)"/><br /><sub><b>한석호(MilKyo)</b></sub></a><br /><a href="#code-dp221125" title="Code">💻</a></td>
</tr>
<tr>
<td align="center" valign="top" width="14.28%"><a href="https://www.encona.com/"><img src="https://avatars.githubusercontent.com/u/891558?v=4?s=100" width="100px;" alt="Rahim Nathwani"/><br /><sub><b>Rahim Nathwani</b></sub></a><br /><a href="#code-rahimnathwani" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://mjspeck.github.io/"><img src="https://avatars.githubusercontent.com/u/20689127?v=4?s=100" width="100px;" alt="Matt Speck"/><br /><sub><b>Matt Speck</b></sub></a><br /><a href="#code-mjspeck" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/FinnBorge"><img src="https://avatars.githubusercontent.com/u/9272726?v=4?s=100" width="100px;" alt="FinnBorge"/><br /><sub><b>FinnBorge</b></sub></a><br /><a href="#code-FinnBorge" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/jklapacz"><img src="https://avatars.githubusercontent.com/u/5343758?v=4?s=100" width="100px;" alt="Jakub Klapacz"/><br /><sub><b>Jakub Klapacz</b></sub></a><br /><a href="#code-jklapacz" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/evnsnclr"><img src="https://avatars.githubusercontent.com/u/139897548?v=4?s=100" width="100px;" alt="Evan smith"/><br /><sub><b>Evan smith</b></sub></a><br /><a href="#code-evnsnclr" title="Code">💻</a></td>
</tr>
</tbody>

293
docs/Developer-Guide.md Normal file
View File

@@ -0,0 +1,293 @@
# Getting Started
## Project Structure
The project is organized as a monorepo with these main packages:
### Python
- `libs/python/core/` - Base package with telemetry support
- `libs/python/computer/` - Computer-use interface (CUI) library
- `libs/python/agent/` - AI agent library with multi-provider support
- `libs/python/som/` - Set-of-Mark parser
- `libs/python/computer-server/` - Server component for VM
- `libs/python/pylume/` - Python bindings for Lume
### TypeScript
- `libs/typescript/computer/` - Computer-use interface (CUI) library
- `libs/typescript/agent/` - AI agent library with multi-provider support
### Other
- `libs/lume/` - Lume CLI
Each package has its own virtual environment and dependencies, managed through PDM.
## Local Development Setup
1. Install Lume CLI:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
```
2. Clone the repository:
```bash
git clone https://github.com/trycua/cua.git
cd cua
```
3. Create a `.env.local` file in the root directory with your API keys:
```bash
# Required for Anthropic provider
ANTHROPIC_API_KEY=your_anthropic_key_here
# Required for OpenAI provider
OPENAI_API_KEY=your_openai_key_here
```
4. Open the workspace in VSCode or Cursor:
```bash
# For Cua Python development
code .vscode/py.code-workspace
# For Lume (Swift) development
code .vscode/lume.code-workspace
```
Using the workspace file is strongly recommended as it:
- Sets up correct Python environments for each package
- Configures proper import paths
- Enables debugging configurations
- Maintains consistent settings across packages
## Lume Development
Refer to the [Lume README](../libs/lume/docs/Development.md) for instructions on how to develop the Lume CLI.
## Python Development
There are two ways to install Lume:
### Run the build script
Run the build script to set up all packages:
```bash
./scripts/build.sh
```
The build script creates a shared virtual environment for all packages. The workspace configuration automatically handles import paths with the correct Python path settings.
This will:
- Create a virtual environment for the project
- Install all packages in development mode
- Set up the correct Python path
- Install development tools
### Install with PDM
If PDM is not already installed, you can follow the installation instructions [here](https://pdm-project.org/en/latest/#installation).
To install with PDM, simply run:
```console
pdm install -G:all
```
This installs all the dependencies for development, testing, and building the docs. If you'd only like development dependencies, you can run:
```console
pdm install -d
```
## Running Examples
The Python workspace includes launch configurations for all packages:
- "Run Computer Examples" - Runs computer examples
- "Run Computer API Server" - Runs the computer-server
- "Run Agent Examples" - Runs agent examples
- "SOM" configurations - Various settings for running SOM
To run examples from VSCode / Cursor:
1. Press F5 or use the Run/Debug view
2. Select the desired configuration
The workspace also includes compound launch configurations:
- "Run Computer Examples + Server" - Runs both the Computer Examples and Server simultaneously
## Docker Development Environment
As an alternative to installing directly on your host machine, you can use Docker for development. This approach has several advantages:
### Prerequisites
- Docker installed on your machine
- Lume server running on your host (port 7777): `lume serve`
### Setup and Usage
1. Build the development Docker image:
```bash
./scripts/run-docker-dev.sh build
```
2. Run an example in the container:
```bash
./scripts/run-docker-dev.sh run computer_examples.py
```
3. Get an interactive shell in the container:
```bash
./scripts/run-docker-dev.sh run --interactive
```
4. Stop any running containers:
```bash
./scripts/run-docker-dev.sh stop
```
### How it Works
The Docker development environment:
- Installs all required Python dependencies in the container
- Mounts your source code from the host at runtime
- Automatically configures the connection to use host.docker.internal:7777 for accessing the Lume server on your host machine
- Preserves your code changes without requiring rebuilds (source code is mounted as a volume)
> **Note**: The Docker container doesn't include the macOS-specific Lume executable. Instead, it connects to the Lume server running on your host machine via host.docker.internal:7777. Make sure to start the Lume server on your host before running examples in the container.
## Cleanup and Reset
If you need to clean up the environment (non-docker) and start fresh:
```bash
./scripts/cleanup.sh
```
This will:
- Remove all virtual environments
- Clean Python cache files and directories
- Remove build artifacts
- Clean PDM-related files
- Reset environment configurations
## Code Formatting Standards
The cua project follows strict code formatting standards to ensure consistency across all packages.
### Python Code Formatting
#### Tools
The project uses the following tools for code formatting and linting:
- **[Black](https://black.readthedocs.io/)**: Code formatter
- **[Ruff](https://beta.ruff.rs/docs/)**: Fast linter and formatter
- **[MyPy](https://mypy.readthedocs.io/)**: Static type checker
These tools are automatically installed when you set up the development environment using the `./scripts/build.sh` script.
#### Configuration
The formatting configuration is defined in the root `pyproject.toml` file:
```toml
[tool.black]
line-length = 100
target-version = ["py311"]
[tool.ruff]
line-length = 100
target-version = "py311"
select = ["E", "F", "B", "I"]
fix = true
[tool.ruff.format]
docstring-code-format = true
[tool.mypy]
strict = true
python_version = "3.11"
ignore_missing_imports = true
disallow_untyped_defs = true
check_untyped_defs = true
warn_return_any = true
show_error_codes = true
warn_unused_ignores = false
```
#### Key Formatting Rules
- **Line Length**: Maximum of 100 characters
- **Python Version**: Code should be compatible with Python 3.11+
- **Imports**: Automatically sorted (using Ruff's "I" rule)
- **Type Hints**: Required for all function definitions (strict mypy mode)
#### IDE Integration
The repository includes VSCode workspace configurations that enable automatic formatting. When you open the workspace files (as recommended in the setup instructions), the correct formatting settings are automatically applied.
Python-specific settings in the workspace files:
```json
"[python]": {
"editor.formatOnSave": true,
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.codeActionsOnSave": {
"source.organizeImports": "explicit"
}
}
```
Recommended VS Code extensions:
- Black Formatter (ms-python.black-formatter)
- Ruff (charliermarsh.ruff)
- Pylance (ms-python.vscode-pylance)
#### Manual Formatting
To manually format code:
```bash
# Format all Python files using Black
pdm run black .
# Run Ruff linter with auto-fix
pdm run ruff check --fix .
# Run type checking with MyPy
pdm run mypy .
```
#### Pre-commit Validation
Before submitting a pull request, ensure your code passes all formatting checks:
```bash
# Run all checks
pdm run black --check .
pdm run ruff check .
pdm run mypy .
```
### Swift Code (Lume)
For Swift code in the `libs/lume` directory:
- Follow the [Swift API Design Guidelines](https://www.swift.org/documentation/api-design-guidelines/)
- Use SwiftFormat for consistent formatting
- Code will be automatically formatted on save when using the lume workspace

View File

@@ -18,4 +18,8 @@ from agent.ui.gradio.app import create_gradio_ui
if __name__ == "__main__":
print("Launching Computer-Use Agent Gradio UI with advanced features...")
app = create_gradio_ui()
app.launch(share=False)
app.launch(
share=False,
server_name="0.0.0.0",
server_port=7860,
)

View File

@@ -0,0 +1,3 @@
OPENAI_KEY=
CUA_KEY=
CUA_CONTAINER_NAME=

View File

@@ -0,0 +1,3 @@
node_modules
.DS_Store
.env

View File

@@ -0,0 +1,7 @@
{
"useTabs": false,
"semi": true,
"singleQuote": true,
"trailingComma": "es5",
"bracketSpacing": true
}

View File

@@ -0,0 +1,47 @@
# cua-cloud-openai Example
This example demonstrates how to control a c/ua Cloud container using the OpenAI `computer-use-preview` model and the `@trycua/computer` TypeScript library.
## Overview
- Connects to a c/ua Cloud container via the `@trycua/computer` library
- Sends screenshots and instructions to OpenAI's computer-use model
- Executes AI-generated actions (clicks, typing, etc.) inside the container
- Designed for Linux containers, but can be adapted for other OS types
## Getting Started
1. **Install dependencies:**
```bash
npm install
```
2. **Set up environment variables:**
Create a `.env` file with the following variables:
- `OPENAI_KEY` — your OpenAI API key
- `CUA_KEY` — your c/ua Cloud API key
- `CUA_CONTAINER_NAME` — the name of your provisioned container
3. **Run the example:**
```bash
npx tsx src/index.ts
```
## Files
- `src/index.ts` — Main example script
- `src/helpers.ts` — Helper for executing actions on the container
## Further Reading
For a step-by-step tutorial and more detailed explanation, see the accompanying blog post:
➡️ [Controlling a c/ua Cloud Container with JavaScript](https://placeholder-url-to-blog-post.com)
_(This link will be updated once the article is published.)_
---
If you have questions or issues, please open an issue or contact the maintainers.

View File

@@ -0,0 +1,25 @@
{
"name": "computer-example-ts",
"version": "1.0.0",
"description": "",
"type": "module",
"main": "index.js",
"scripts": {
"dev": "tsx watch src/index.ts",
"start": "tsx src/index.ts"
},
"keywords": [],
"author": "",
"license": "MIT",
"packageManager": "pnpm@10.12.3",
"dependencies": {
"@trycua/computer": "^0.1.3",
"dotenv": "^16.5.0",
"openai": "^5.7.0"
},
"devDependencies": {
"@types/node": "^22.15.33",
"tsx": "^4.20.3",
"typescript": "^5.8.3"
}
}

View File

@@ -0,0 +1,507 @@
lockfileVersion: '9.0'
settings:
autoInstallPeers: true
excludeLinksFromLockfile: false
importers:
.:
dependencies:
'@trycua/computer':
specifier: ^0.1.3
version: 0.1.3
dotenv:
specifier: ^16.5.0
version: 16.6.1
openai:
specifier: ^5.7.0
version: 5.8.2(ws@8.18.3)
devDependencies:
'@types/node':
specifier: ^22.15.33
version: 22.16.0
tsx:
specifier: ^4.20.3
version: 4.20.3
typescript:
specifier: ^5.8.3
version: 5.8.3
packages:
'@esbuild/aix-ppc64@0.25.5':
resolution: {integrity: sha512-9o3TMmpmftaCMepOdA5k/yDw8SfInyzWWTjYTFCX3kPSDJMROQTb8jg+h9Cnwnmm1vOzvxN7gIfB5V2ewpjtGA==}
engines: {node: '>=18'}
cpu: [ppc64]
os: [aix]
'@esbuild/android-arm64@0.25.5':
resolution: {integrity: sha512-VGzGhj4lJO+TVGV1v8ntCZWJktV7SGCs3Pn1GRWI1SBFtRALoomm8k5E9Pmwg3HOAal2VDc2F9+PM/rEY6oIDg==}
engines: {node: '>=18'}
cpu: [arm64]
os: [android]
'@esbuild/android-arm@0.25.5':
resolution: {integrity: sha512-AdJKSPeEHgi7/ZhuIPtcQKr5RQdo6OO2IL87JkianiMYMPbCtot9fxPbrMiBADOWWm3T2si9stAiVsGbTQFkbA==}
engines: {node: '>=18'}
cpu: [arm]
os: [android]
'@esbuild/android-x64@0.25.5':
resolution: {integrity: sha512-D2GyJT1kjvO//drbRT3Hib9XPwQeWd9vZoBJn+bu/lVsOZ13cqNdDeqIF/xQ5/VmWvMduP6AmXvylO/PIc2isw==}
engines: {node: '>=18'}
cpu: [x64]
os: [android]
'@esbuild/darwin-arm64@0.25.5':
resolution: {integrity: sha512-GtaBgammVvdF7aPIgH2jxMDdivezgFu6iKpmT+48+F8Hhg5J/sfnDieg0aeG/jfSvkYQU2/pceFPDKlqZzwnfQ==}
engines: {node: '>=18'}
cpu: [arm64]
os: [darwin]
'@esbuild/darwin-x64@0.25.5':
resolution: {integrity: sha512-1iT4FVL0dJ76/q1wd7XDsXrSW+oLoquptvh4CLR4kITDtqi2e/xwXwdCVH8hVHU43wgJdsq7Gxuzcs6Iq/7bxQ==}
engines: {node: '>=18'}
cpu: [x64]
os: [darwin]
'@esbuild/freebsd-arm64@0.25.5':
resolution: {integrity: sha512-nk4tGP3JThz4La38Uy/gzyXtpkPW8zSAmoUhK9xKKXdBCzKODMc2adkB2+8om9BDYugz+uGV7sLmpTYzvmz6Sw==}
engines: {node: '>=18'}
cpu: [arm64]
os: [freebsd]
'@esbuild/freebsd-x64@0.25.5':
resolution: {integrity: sha512-PrikaNjiXdR2laW6OIjlbeuCPrPaAl0IwPIaRv+SMV8CiM8i2LqVUHFC1+8eORgWyY7yhQY+2U2fA55mBzReaw==}
engines: {node: '>=18'}
cpu: [x64]
os: [freebsd]
'@esbuild/linux-arm64@0.25.5':
resolution: {integrity: sha512-Z9kfb1v6ZlGbWj8EJk9T6czVEjjq2ntSYLY2cw6pAZl4oKtfgQuS4HOq41M/BcoLPzrUbNd+R4BXFyH//nHxVg==}
engines: {node: '>=18'}
cpu: [arm64]
os: [linux]
'@esbuild/linux-arm@0.25.5':
resolution: {integrity: sha512-cPzojwW2okgh7ZlRpcBEtsX7WBuqbLrNXqLU89GxWbNt6uIg78ET82qifUy3W6OVww6ZWobWub5oqZOVtwolfw==}
engines: {node: '>=18'}
cpu: [arm]
os: [linux]
'@esbuild/linux-ia32@0.25.5':
resolution: {integrity: sha512-sQ7l00M8bSv36GLV95BVAdhJ2QsIbCuCjh/uYrWiMQSUuV+LpXwIqhgJDcvMTj+VsQmqAHL2yYaasENvJ7CDKA==}
engines: {node: '>=18'}
cpu: [ia32]
os: [linux]
'@esbuild/linux-loong64@0.25.5':
resolution: {integrity: sha512-0ur7ae16hDUC4OL5iEnDb0tZHDxYmuQyhKhsPBV8f99f6Z9KQM02g33f93rNH5A30agMS46u2HP6qTdEt6Q1kg==}
engines: {node: '>=18'}
cpu: [loong64]
os: [linux]
'@esbuild/linux-mips64el@0.25.5':
resolution: {integrity: sha512-kB/66P1OsHO5zLz0i6X0RxlQ+3cu0mkxS3TKFvkb5lin6uwZ/ttOkP3Z8lfR9mJOBk14ZwZ9182SIIWFGNmqmg==}
engines: {node: '>=18'}
cpu: [mips64el]
os: [linux]
'@esbuild/linux-ppc64@0.25.5':
resolution: {integrity: sha512-UZCmJ7r9X2fe2D6jBmkLBMQetXPXIsZjQJCjgwpVDz+YMcS6oFR27alkgGv3Oqkv07bxdvw7fyB71/olceJhkQ==}
engines: {node: '>=18'}
cpu: [ppc64]
os: [linux]
'@esbuild/linux-riscv64@0.25.5':
resolution: {integrity: sha512-kTxwu4mLyeOlsVIFPfQo+fQJAV9mh24xL+y+Bm6ej067sYANjyEw1dNHmvoqxJUCMnkBdKpvOn0Ahql6+4VyeA==}
engines: {node: '>=18'}
cpu: [riscv64]
os: [linux]
'@esbuild/linux-s390x@0.25.5':
resolution: {integrity: sha512-K2dSKTKfmdh78uJ3NcWFiqyRrimfdinS5ErLSn3vluHNeHVnBAFWC8a4X5N+7FgVE1EjXS1QDZbpqZBjfrqMTQ==}
engines: {node: '>=18'}
cpu: [s390x]
os: [linux]
'@esbuild/linux-x64@0.25.5':
resolution: {integrity: sha512-uhj8N2obKTE6pSZ+aMUbqq+1nXxNjZIIjCjGLfsWvVpy7gKCOL6rsY1MhRh9zLtUtAI7vpgLMK6DxjO8Qm9lJw==}
engines: {node: '>=18'}
cpu: [x64]
os: [linux]
'@esbuild/netbsd-arm64@0.25.5':
resolution: {integrity: sha512-pwHtMP9viAy1oHPvgxtOv+OkduK5ugofNTVDilIzBLpoWAM16r7b/mxBvfpuQDpRQFMfuVr5aLcn4yveGvBZvw==}
engines: {node: '>=18'}
cpu: [arm64]
os: [netbsd]
'@esbuild/netbsd-x64@0.25.5':
resolution: {integrity: sha512-WOb5fKrvVTRMfWFNCroYWWklbnXH0Q5rZppjq0vQIdlsQKuw6mdSihwSo4RV/YdQ5UCKKvBy7/0ZZYLBZKIbwQ==}
engines: {node: '>=18'}
cpu: [x64]
os: [netbsd]
'@esbuild/openbsd-arm64@0.25.5':
resolution: {integrity: sha512-7A208+uQKgTxHd0G0uqZO8UjK2R0DDb4fDmERtARjSHWxqMTye4Erz4zZafx7Di9Cv+lNHYuncAkiGFySoD+Mw==}
engines: {node: '>=18'}
cpu: [arm64]
os: [openbsd]
'@esbuild/openbsd-x64@0.25.5':
resolution: {integrity: sha512-G4hE405ErTWraiZ8UiSoesH8DaCsMm0Cay4fsFWOOUcz8b8rC6uCvnagr+gnioEjWn0wC+o1/TAHt+It+MpIMg==}
engines: {node: '>=18'}
cpu: [x64]
os: [openbsd]
'@esbuild/sunos-x64@0.25.5':
resolution: {integrity: sha512-l+azKShMy7FxzY0Rj4RCt5VD/q8mG/e+mDivgspo+yL8zW7qEwctQ6YqKX34DTEleFAvCIUviCFX1SDZRSyMQA==}
engines: {node: '>=18'}
cpu: [x64]
os: [sunos]
'@esbuild/win32-arm64@0.25.5':
resolution: {integrity: sha512-O2S7SNZzdcFG7eFKgvwUEZ2VG9D/sn/eIiz8XRZ1Q/DO5a3s76Xv0mdBzVM5j5R639lXQmPmSo0iRpHqUUrsxw==}
engines: {node: '>=18'}
cpu: [arm64]
os: [win32]
'@esbuild/win32-ia32@0.25.5':
resolution: {integrity: sha512-onOJ02pqs9h1iMJ1PQphR+VZv8qBMQ77Klcsqv9CNW2w6yLqoURLcgERAIurY6QE63bbLuqgP9ATqajFLK5AMQ==}
engines: {node: '>=18'}
cpu: [ia32]
os: [win32]
'@esbuild/win32-x64@0.25.5':
resolution: {integrity: sha512-TXv6YnJ8ZMVdX+SXWVBo/0p8LTcrUYngpWjvm91TMjjBQii7Oz11Lw5lbDV5Y0TzuhSJHwiH4hEtC1I42mMS0g==}
engines: {node: '>=18'}
cpu: [x64]
os: [win32]
'@trycua/computer@0.1.3':
resolution: {integrity: sha512-RTDgULV6wQJuTsiwhei9aQO6YQSM1TBQqOCDUPHUbTIjtRqzMvMdwtcKAKxZZptzJcBX14bWtbucY65Wu6IEFg==}
'@trycua/core@0.1.3':
resolution: {integrity: sha512-sv7BEajJyZ+JNxrOdhao4qCOtRrh+S0XYf64ehAT4UAhLC73Kep06bGa/Uel0Ow5xGXXrg0aiVBL7zO9+/w4/Q==}
'@types/node@22.16.0':
resolution: {integrity: sha512-B2egV9wALML1JCpv3VQoQ+yesQKAmNMBIAY7OteVrikcOcAkWm+dGL6qpeCktPjAv6N1JLnhbNiqS35UpFyBsQ==}
'@types/uuid@10.0.0':
resolution: {integrity: sha512-7gqG38EyHgyP1S+7+xomFtL+ZNHcKv6DwNaCZmJmo1vgMugyF3TCnXVg4t1uk89mLNwnLtnY3TpOpCOyp1/xHQ==}
atomic-sleep@1.0.0:
resolution: {integrity: sha512-kNOjDqAh7px0XWNI+4QbzoiR/nTkHAWNud2uvnJquD1/x5a7EQZMJT0AczqK0Qn67oY/TTQ1LbUKajZpp3I9tQ==}
engines: {node: '>=8.0.0'}
dotenv@16.6.1:
resolution: {integrity: sha512-uBq4egWHTcTt33a72vpSG0z3HnPuIl6NqYcTrKEg2azoEyl2hpW0zqlxysq2pK9HlDIHyHyakeYaYnSAwd8bow==}
engines: {node: '>=12'}
esbuild@0.25.5:
resolution: {integrity: sha512-P8OtKZRv/5J5hhz0cUAdu/cLuPIKXpQl1R9pZtvmHWQvrAUVd0UNIPT4IB4W3rNOqVO0rlqHmCIbSwxh/c9yUQ==}
engines: {node: '>=18'}
hasBin: true
fast-redact@3.5.0:
resolution: {integrity: sha512-dwsoQlS7h9hMeYUq1W++23NDcBLV4KqONnITDV9DjfS3q1SgDGVrBdvvTLUotWtPSD7asWDV9/CmsZPy8Hf70A==}
engines: {node: '>=6'}
fsevents@2.3.3:
resolution: {integrity: sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw==}
engines: {node: ^8.16.0 || ^10.6.0 || >=11.0.0}
os: [darwin]
get-tsconfig@4.10.1:
resolution: {integrity: sha512-auHyJ4AgMz7vgS8Hp3N6HXSmlMdUyhSUrfBF16w153rxtLIEOE+HGqaBppczZvnHLqQJfiHotCYpNhl0lUROFQ==}
on-exit-leak-free@2.1.2:
resolution: {integrity: sha512-0eJJY6hXLGf1udHwfNftBqH+g73EU4B504nZeKpz1sYRKafAghwxEJunB2O7rDZkL4PGfsMVnTXZ2EjibbqcsA==}
engines: {node: '>=14.0.0'}
openai@5.8.2:
resolution: {integrity: sha512-8C+nzoHYgyYOXhHGN6r0fcb4SznuEn1R7YZMvlqDbnCuE0FM2mm3T1HiYW6WIcMS/F1Of2up/cSPjLPaWt0X9Q==}
hasBin: true
peerDependencies:
ws: ^8.18.0
zod: ^3.23.8
peerDependenciesMeta:
ws:
optional: true
zod:
optional: true
pino-abstract-transport@2.0.0:
resolution: {integrity: sha512-F63x5tizV6WCh4R6RHyi2Ml+M70DNRXt/+HANowMflpgGFMAym/VKm6G7ZOQRjqN7XbGxK1Lg9t6ZrtzOaivMw==}
pino-std-serializers@7.0.0:
resolution: {integrity: sha512-e906FRY0+tV27iq4juKzSYPbUj2do2X2JX4EzSca1631EB2QJQUqGbDuERal7LCtOpxl6x3+nvo9NPZcmjkiFA==}
pino@9.7.0:
resolution: {integrity: sha512-vnMCM6xZTb1WDmLvtG2lE/2p+t9hDEIvTWJsu6FejkE62vB7gDhvzrpFR4Cw2to+9JNQxVnkAKVPA1KPB98vWg==}
hasBin: true
posthog-node@5.1.1:
resolution: {integrity: sha512-6VISkNdxO24ehXiDA4dugyCSIV7lpGVaEu5kn/dlAj+SJ1lgcDru9PQ8p/+GSXsXVxohd1t7kHL2JKc9NoGb0w==}
engines: {node: '>=20'}
process-warning@5.0.0:
resolution: {integrity: sha512-a39t9ApHNx2L4+HBnQKqxxHNs1r7KF+Intd8Q/g1bUh6q0WIp9voPXJ/x0j+ZL45KF1pJd9+q2jLIRMfvEshkA==}
quick-format-unescaped@4.0.4:
resolution: {integrity: sha512-tYC1Q1hgyRuHgloV/YXs2w15unPVh8qfu/qCTfhTYamaw7fyhumKa2yGpdSo87vY32rIclj+4fWYQXUMs9EHvg==}
real-require@0.2.0:
resolution: {integrity: sha512-57frrGM/OCTLqLOAh0mhVA9VBMHd+9U7Zb2THMGdBUoZVOtGbJzjxsYGDJ3A9AYYCP4hn6y1TVbaOfzWtm5GFg==}
engines: {node: '>= 12.13.0'}
resolve-pkg-maps@1.0.0:
resolution: {integrity: sha512-seS2Tj26TBVOC2NIc2rOe2y2ZO7efxITtLZcGSOnHHNOQ7CkiUBfw0Iw2ck6xkIhPwLhKNLS8BO+hEpngQlqzw==}
safe-stable-stringify@2.5.0:
resolution: {integrity: sha512-b3rppTKm9T+PsVCBEOUR46GWI7fdOs00VKZ1+9c1EWDaDMvjQc6tUwuFyIprgGgTcWoVHSKrU8H31ZHA2e0RHA==}
engines: {node: '>=10'}
sonic-boom@4.2.0:
resolution: {integrity: sha512-INb7TM37/mAcsGmc9hyyI6+QR3rR1zVRu36B0NeGXKnOOLiZOfER5SA+N7X7k3yUYRzLWafduTDvJAfDswwEww==}
split2@4.2.0:
resolution: {integrity: sha512-UcjcJOWknrNkF6PLX83qcHM6KHgVKNkV62Y8a5uYDVv9ydGQVwAHMKqHdJje1VTWpljG0WYpCDhrCdAOYH4TWg==}
engines: {node: '>= 10.x'}
thread-stream@3.1.0:
resolution: {integrity: sha512-OqyPZ9u96VohAyMfJykzmivOrY2wfMSf3C5TtFJVgN+Hm6aj+voFhlK+kZEIv2FBh1X6Xp3DlnCOfEQ3B2J86A==}
tsx@4.20.3:
resolution: {integrity: sha512-qjbnuR9Tr+FJOMBqJCW5ehvIo/buZq7vH7qD7JziU98h6l3qGy0a/yPFjwO+y0/T7GFpNgNAvEcPPVfyT8rrPQ==}
engines: {node: '>=18.0.0'}
hasBin: true
typescript@5.8.3:
resolution: {integrity: sha512-p1diW6TqL9L07nNxvRMM7hMMw4c5XOo/1ibL4aAIGmSAt9slTE1Xgw5KWuof2uTOvCg9BY7ZRi+GaF+7sfgPeQ==}
engines: {node: '>=14.17'}
hasBin: true
undici-types@6.21.0:
resolution: {integrity: sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==}
uuid@11.1.0:
resolution: {integrity: sha512-0/A9rDy9P7cJ+8w1c9WD9V//9Wj15Ce2MPz8Ri6032usz+NfePxx5AcN3bN+r6ZL6jEo066/yNYB3tn4pQEx+A==}
hasBin: true
ws@8.18.3:
resolution: {integrity: sha512-PEIGCY5tSlUt50cqyMXfCzX+oOPqN0vuGqWzbcJ2xvnkzkq46oOpz7dQaTDBdfICb4N14+GARUDw2XV2N4tvzg==}
engines: {node: '>=10.0.0'}
peerDependencies:
bufferutil: ^4.0.1
utf-8-validate: '>=5.0.2'
peerDependenciesMeta:
bufferutil:
optional: true
utf-8-validate:
optional: true
snapshots:
'@esbuild/aix-ppc64@0.25.5':
optional: true
'@esbuild/android-arm64@0.25.5':
optional: true
'@esbuild/android-arm@0.25.5':
optional: true
'@esbuild/android-x64@0.25.5':
optional: true
'@esbuild/darwin-arm64@0.25.5':
optional: true
'@esbuild/darwin-x64@0.25.5':
optional: true
'@esbuild/freebsd-arm64@0.25.5':
optional: true
'@esbuild/freebsd-x64@0.25.5':
optional: true
'@esbuild/linux-arm64@0.25.5':
optional: true
'@esbuild/linux-arm@0.25.5':
optional: true
'@esbuild/linux-ia32@0.25.5':
optional: true
'@esbuild/linux-loong64@0.25.5':
optional: true
'@esbuild/linux-mips64el@0.25.5':
optional: true
'@esbuild/linux-ppc64@0.25.5':
optional: true
'@esbuild/linux-riscv64@0.25.5':
optional: true
'@esbuild/linux-s390x@0.25.5':
optional: true
'@esbuild/linux-x64@0.25.5':
optional: true
'@esbuild/netbsd-arm64@0.25.5':
optional: true
'@esbuild/netbsd-x64@0.25.5':
optional: true
'@esbuild/openbsd-arm64@0.25.5':
optional: true
'@esbuild/openbsd-x64@0.25.5':
optional: true
'@esbuild/sunos-x64@0.25.5':
optional: true
'@esbuild/win32-arm64@0.25.5':
optional: true
'@esbuild/win32-ia32@0.25.5':
optional: true
'@esbuild/win32-x64@0.25.5':
optional: true
'@trycua/computer@0.1.3':
dependencies:
'@trycua/core': 0.1.3
pino: 9.7.0
ws: 8.18.3
transitivePeerDependencies:
- bufferutil
- utf-8-validate
'@trycua/core@0.1.3':
dependencies:
'@types/uuid': 10.0.0
pino: 9.7.0
posthog-node: 5.1.1
uuid: 11.1.0
'@types/node@22.16.0':
dependencies:
undici-types: 6.21.0
'@types/uuid@10.0.0': {}
atomic-sleep@1.0.0: {}
dotenv@16.6.1: {}
esbuild@0.25.5:
optionalDependencies:
'@esbuild/aix-ppc64': 0.25.5
'@esbuild/android-arm': 0.25.5
'@esbuild/android-arm64': 0.25.5
'@esbuild/android-x64': 0.25.5
'@esbuild/darwin-arm64': 0.25.5
'@esbuild/darwin-x64': 0.25.5
'@esbuild/freebsd-arm64': 0.25.5
'@esbuild/freebsd-x64': 0.25.5
'@esbuild/linux-arm': 0.25.5
'@esbuild/linux-arm64': 0.25.5
'@esbuild/linux-ia32': 0.25.5
'@esbuild/linux-loong64': 0.25.5
'@esbuild/linux-mips64el': 0.25.5
'@esbuild/linux-ppc64': 0.25.5
'@esbuild/linux-riscv64': 0.25.5
'@esbuild/linux-s390x': 0.25.5
'@esbuild/linux-x64': 0.25.5
'@esbuild/netbsd-arm64': 0.25.5
'@esbuild/netbsd-x64': 0.25.5
'@esbuild/openbsd-arm64': 0.25.5
'@esbuild/openbsd-x64': 0.25.5
'@esbuild/sunos-x64': 0.25.5
'@esbuild/win32-arm64': 0.25.5
'@esbuild/win32-ia32': 0.25.5
'@esbuild/win32-x64': 0.25.5
fast-redact@3.5.0: {}
fsevents@2.3.3:
optional: true
get-tsconfig@4.10.1:
dependencies:
resolve-pkg-maps: 1.0.0
on-exit-leak-free@2.1.2: {}
openai@5.8.2(ws@8.18.3):
optionalDependencies:
ws: 8.18.3
pino-abstract-transport@2.0.0:
dependencies:
split2: 4.2.0
pino-std-serializers@7.0.0: {}
pino@9.7.0:
dependencies:
atomic-sleep: 1.0.0
fast-redact: 3.5.0
on-exit-leak-free: 2.1.2
pino-abstract-transport: 2.0.0
pino-std-serializers: 7.0.0
process-warning: 5.0.0
quick-format-unescaped: 4.0.4
real-require: 0.2.0
safe-stable-stringify: 2.5.0
sonic-boom: 4.2.0
thread-stream: 3.1.0
posthog-node@5.1.1: {}
process-warning@5.0.0: {}
quick-format-unescaped@4.0.4: {}
real-require@0.2.0: {}
resolve-pkg-maps@1.0.0: {}
safe-stable-stringify@2.5.0: {}
sonic-boom@4.2.0:
dependencies:
atomic-sleep: 1.0.0
split2@4.2.0: {}
thread-stream@3.1.0:
dependencies:
real-require: 0.2.0
tsx@4.20.3:
dependencies:
esbuild: 0.25.5
get-tsconfig: 4.10.1
optionalDependencies:
fsevents: 2.3.3
typescript@5.8.3: {}
undici-types@6.21.0: {}
uuid@11.1.0: {}
ws@8.18.3: {}

View File

@@ -0,0 +1,63 @@
import type { Computer } from "@trycua/computer";
import type OpenAI from "openai";
export async function executeAction(
computer: Computer,
action: OpenAI.Responses.ResponseComputerToolCall["action"],
) {
switch (action.type) {
case "click": {
const { x, y, button } = action;
console.log(`Executing click at (${x}, ${y}) with button '${button}'.`);
await computer.interface.moveCursor(x, y);
if (button === "right") await computer.interface.rightClick();
else await computer.interface.leftClick();
break;
}
case "type":
{
const { text } = action;
console.log(`Typing text: ${text}`);
await computer.interface.typeText(text);
}
break;
case "scroll": {
const { x: locX, y: locY, scroll_x, scroll_y } = action;
console.log(
`Scrolling at (${locX}, ${locY}) with offsets (scroll_x=${scroll_x}, scroll_y=${scroll_y}).`,
);
await computer.interface.moveCursor(locX, locY);
await computer.interface.scroll(scroll_x, scroll_y);
break;
}
case "keypress": {
const { keys } = action;
for (const key of keys) {
console.log(`Pressing key: ${key}.`);
// Map common key names to CUA equivalents
if (key.toLowerCase() === "enter") {
await computer.interface.pressKey("return");
} else if (key.toLowerCase() === "space") {
await computer.interface.pressKey("space");
} else {
await computer.interface.pressKey(key);
}
}
break;
}
case "wait": {
console.log(`Waiting for 3 seconds.`);
await new Promise((resolve) => setTimeout(resolve, 3 * 1000));
break;
}
case "screenshot": {
console.log("Taking screenshot.");
// This is handled automatically in the main loop, but we can take an extra one if requested
const screenshot = await computer.interface.screenshot();
return screenshot;
}
default:
console.log(`Unrecognized action: ${action.type}`);
break;
}
}

View File

@@ -0,0 +1,104 @@
import { Computer, OSType } from "@trycua/computer";
import OpenAI from "openai";
import { executeAction } from "./helpers";
import "dotenv/config";
const openai = new OpenAI({ apiKey: process.env.OPENAI_KEY });
const COMPUTER_USE_PROMPT = "Open firefox and go to trycua.com";
// Initialize the Computer Connection
const computer = new Computer({
apiKey: process.env.CUA_KEY!,
name: process.env.CUA_CONTAINER_NAME!,
osType: OSType.LINUX,
});
await computer.run();
// Take the initial screenshot
const screenshot = await computer.interface.screenshot();
const screenshotBase64 = screenshot.toString("base64");
// Setup openai config for computer use
const computerUseConfig: OpenAI.Responses.ResponseCreateParamsNonStreaming = {
model: "computer-use-preview",
tools: [
{
type: "computer_use_preview",
display_width: 1024,
display_height: 768,
environment: "linux", // we're using a linux vm
},
],
truncation: "auto",
};
// Send initial screenshot to the openai computer use model
let res = await openai.responses.create({
...computerUseConfig,
input: [
{
role: "user",
content: [
// what we want the ai to do
{ type: "input_text", text: COMPUTER_USE_PROMPT },
// current screenshot of the vm
{
type: "input_image",
image_url: `data:image/png;base64,${screenshotBase64}`,
detail: "auto",
},
],
},
],
});
// Loop until there are no more computer use actions.
while (true) {
const computerCalls = res.output.filter((o) => o.type === "computer_call");
if (computerCalls.length < 1) {
console.log("No more computer calls. Loop complete.");
break;
}
// Get the first call
const call = computerCalls[0];
const action = call.action;
console.log("Received action from OpenAI Responses API:", action);
let ackChecks: OpenAI.Responses.ResponseComputerToolCall.PendingSafetyCheck[] =
[];
if (call.pending_safety_checks.length > 0) {
console.log("Safety checks pending:", call.pending_safety_checks);
// In a real implementation, you would want to get user confirmation here
ackChecks = call.pending_safety_checks;
}
// Execute the action in the container
await executeAction(computer, action);
// Wait for changes to process within the container (1sec)
await new Promise((resolve) => setTimeout(resolve, 1000));
// Capture new screenshot
const newScreenshot = await computer.interface.screenshot();
const newScreenshotBase64 = newScreenshot.toString("base64");
// Screenshot back as computer_call_output
res = await openai.responses.create({
...computerUseConfig,
previous_response_id: res.id,
input: [
{
type: "computer_call_output",
call_id: call.call_id,
acknowledged_safety_checks: ackChecks,
output: {
type: "computer_screenshot",
image_url: `data:image/png;base64,${newScreenshotBase64}`,
},
},
],
});
}
process.exit();

View File

@@ -0,0 +1,29 @@
{
"compilerOptions": {
"target": "esnext",
"lib": [
"es2023"
],
"moduleDetection": "force",
"module": "preserve",
"moduleResolution": "bundler",
"allowImportingTsExtensions": true,
"resolveJsonModule": true,
"types": [
"node"
],
"allowSyntheticDefaultImports": true,
"strict": true,
"noUnusedLocals": true,
"declaration": true,
"emitDeclarationOnly": true,
"esModuleInterop": true,
"isolatedModules": true,
"verbatimModuleSyntax": true,
"skipLibCheck": true,
"outDir": "build",
},
"include": [
"src"
]
}

View File

@@ -18,7 +18,11 @@ from computer.ui.gradio.app import create_gradio_ui
if __name__ == "__main__":
print("Launching Computer Interface Gradio UI with advanced features...")
app = create_gradio_ui()
app.launch(share=False)
app.launch(
share=False,
server_name="0.0.0.0",
server_port=7860,
)
# Optional: Using the saved dataset
# import datasets

View File

@@ -0,0 +1,51 @@
"""Example of using the Windows Sandbox computer provider.
Learn more at: https://learn.microsoft.com/en-us/windows/security/application-security/application-isolation/windows-sandbox/
"""
import asyncio
from computer import Computer
async def main():
"""Test the Windows Sandbox provider."""
# Create a computer instance using Windows Sandbox
computer = Computer(
provider_type="winsandbox",
os_type="windows",
memory="4GB",
# ephemeral=True, # Always true for Windows Sandbox
)
try:
print("Starting Windows Sandbox...")
await computer.run()
print("Windows Sandbox is ready!")
print(f"IP Address: {await computer.get_ip()}")
# Test basic functionality
print("Testing basic functionality...")
screenshot = await computer.interface.screenshot()
print(f"Screenshot taken: {len(screenshot)} bytes")
# Test running a command
print("Testing command execution...")
stdout, stderr = await computer.interface.run_command("echo Hello from Windows Sandbox!")
print(f"Command output: {stdout}")
print("Press any key to continue...")
input()
except Exception as e:
print(f"Error: {e}")
import traceback
traceback.print_exc()
finally:
print("Stopping Windows Sandbox...")
await computer.stop()
print("Windows Sandbox stopped.")
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,688 +0,0 @@
import asyncio
import json
import time
from typing import Any, Dict, List, Optional, Tuple
from PIL import Image
import websockets
from ..logger import Logger, LogLevel
from .base import BaseComputerInterface
from ..utils import decode_base64_image, encode_base64_image, bytes_to_image, draw_box, resize_image
from .models import Key, KeyType, MouseButton
class LinuxComputerInterface(BaseComputerInterface):
"""Interface for Linux."""
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume", api_key: Optional[str] = None, vm_name: Optional[str] = None):
super().__init__(ip_address, username, password, api_key, vm_name)
self._ws = None
self._reconnect_task = None
self._closed = False
self._last_ping = 0
self._ping_interval = 5 # Send ping every 5 seconds
self._ping_timeout = 120 # Wait 120 seconds for pong response
self._reconnect_delay = 1 # Start with 1 second delay
self._max_reconnect_delay = 30 # Maximum delay between reconnection attempts
self._log_connection_attempts = True # Flag to control connection attempt logging
self._authenticated = False # Track authentication status
self._command_lock = asyncio.Lock() # Lock to ensure only one command at a time
# Set logger name for Linux interface
self.logger = Logger("cua.interface.linux", LogLevel.NORMAL)
@property
def ws_uri(self) -> str:
"""Get the WebSocket URI using the current IP address.
Returns:
WebSocket URI for the Computer API Server
"""
protocol = "wss" if self.api_key else "ws"
port = "8443" if self.api_key else "8000"
return f"{protocol}://{self.ip_address}:{port}/ws"
async def _keep_alive(self):
"""Keep the WebSocket connection alive with automatic reconnection."""
retry_count = 0
max_log_attempts = 1 # Only log the first attempt at INFO level
log_interval = 500 # Then log every 500th attempt (significantly increased from 30)
last_warning_time = 0
min_warning_interval = 30 # Minimum seconds between connection lost warnings
min_retry_delay = 0.5 # Minimum delay between connection attempts (500ms)
while not self._closed:
try:
if self._ws is None or (
self._ws and self._ws.state == websockets.protocol.State.CLOSED
):
try:
retry_count += 1
# Add a minimum delay between connection attempts to avoid flooding
if retry_count > 1:
await asyncio.sleep(min_retry_delay)
# Only log the first attempt at INFO level, then every Nth attempt
if retry_count == 1:
self.logger.info(f"Attempting WebSocket connection to {self.ws_uri}")
elif retry_count % log_interval == 0:
self.logger.info(
f"Still attempting WebSocket connection (attempt {retry_count})..."
)
else:
# All other attempts are logged at DEBUG level
self.logger.debug(
f"Attempting WebSocket connection to {self.ws_uri} (attempt {retry_count})"
)
self._ws = await asyncio.wait_for(
websockets.connect(
self.ws_uri,
max_size=1024 * 1024 * 10, # 10MB limit
max_queue=32,
ping_interval=self._ping_interval,
ping_timeout=self._ping_timeout,
close_timeout=5,
compression=None, # Disable compression to reduce overhead
),
timeout=120,
)
self.logger.info("WebSocket connection established")
# Authentication will be handled by the first command that needs it
# Don't do authentication here to avoid recv conflicts
self._reconnect_delay = 1 # Reset reconnect delay on successful connection
self._last_ping = time.time()
retry_count = 0 # Reset retry count on successful connection
self._authenticated = False # Reset auth status on new connection
except (asyncio.TimeoutError, websockets.exceptions.WebSocketException) as e:
next_retry = self._reconnect_delay
# Only log the first error at WARNING level, then every Nth attempt
if retry_count == 1:
self.logger.warning(
f"Computer API Server not ready yet. Will retry automatically."
)
elif retry_count % log_interval == 0:
self.logger.warning(
f"Still waiting for Computer API Server (attempt {retry_count})..."
)
else:
# All other errors are logged at DEBUG level
self.logger.debug(f"Connection attempt {retry_count} failed: {e}")
if self._ws:
try:
await self._ws.close()
except:
pass
self._ws = None
# Regular ping to check connection
if self._ws and self._ws.state == websockets.protocol.State.OPEN:
try:
if time.time() - self._last_ping >= self._ping_interval:
pong_waiter = await self._ws.ping()
await asyncio.wait_for(pong_waiter, timeout=self._ping_timeout)
self._last_ping = time.time()
except Exception as e:
self.logger.debug(f"Ping failed: {e}")
if self._ws:
try:
await self._ws.close()
except:
pass
self._ws = None
continue
await asyncio.sleep(1)
except Exception as e:
current_time = time.time()
# Only log connection lost warnings at most once every min_warning_interval seconds
if current_time - last_warning_time >= min_warning_interval:
self.logger.warning(
f"Computer API Server connection lost. Will retry automatically."
)
last_warning_time = current_time
else:
# Log at debug level instead
self.logger.debug(f"Connection lost: {e}")
if self._ws:
try:
await self._ws.close()
except:
pass
self._ws = None
async def _ensure_connection(self):
"""Ensure WebSocket connection is established."""
if self._reconnect_task is None or self._reconnect_task.done():
self._reconnect_task = asyncio.create_task(self._keep_alive())
retry_count = 0
max_retries = 5
while retry_count < max_retries:
try:
if self._ws and self._ws.state == websockets.protocol.State.OPEN:
return
retry_count += 1
await asyncio.sleep(1)
except Exception as e:
# Only log at ERROR level for the last retry attempt
if retry_count == max_retries - 1:
self.logger.error(
f"Persistent connection check error after {retry_count} attempts: {e}"
)
else:
self.logger.debug(f"Connection check error (attempt {retry_count}): {e}")
retry_count += 1
await asyncio.sleep(1)
continue
raise ConnectionError("Failed to establish WebSocket connection after multiple retries")
async def _send_command(self, command: str, params: Optional[Dict] = None) -> Dict[str, Any]:
"""Send command through WebSocket."""
max_retries = 3
retry_count = 0
last_error = None
# Acquire lock to ensure only one command is processed at a time
async with self._command_lock:
self.logger.debug(f"Acquired lock for command: {command}")
while retry_count < max_retries:
try:
await self._ensure_connection()
if not self._ws:
raise ConnectionError("WebSocket connection is not established")
# Handle authentication if needed
if self.api_key and self.vm_name and not self._authenticated:
self.logger.info("Performing authentication handshake...")
auth_message = {
"command": "authenticate",
"params": {
"api_key": self.api_key,
"container_name": self.vm_name
}
}
await self._ws.send(json.dumps(auth_message))
# Wait for authentication response
auth_response = await asyncio.wait_for(self._ws.recv(), timeout=10)
auth_result = json.loads(auth_response)
if not auth_result.get("success"):
error_msg = auth_result.get("error", "Authentication failed")
self.logger.error(f"Authentication failed: {error_msg}")
self._authenticated = False
raise ConnectionError(f"Authentication failed: {error_msg}")
self.logger.info("Authentication successful")
self._authenticated = True
message = {"command": command, "params": params or {}}
await self._ws.send(json.dumps(message))
response = await asyncio.wait_for(self._ws.recv(), timeout=30)
self.logger.debug(f"Completed command: {command}")
return json.loads(response)
except Exception as e:
last_error = e
retry_count += 1
if retry_count < max_retries:
# Only log at debug level for intermediate retries
self.logger.debug(
f"Command '{command}' failed (attempt {retry_count}/{max_retries}): {e}"
)
await asyncio.sleep(1)
continue
else:
# Only log at error level for the final failure
self.logger.error(
f"Failed to send command '{command}' after {max_retries} retries"
)
self.logger.debug(f"Command failure details: {e}")
raise last_error if last_error else RuntimeError("Failed to send command")
async def wait_for_ready(self, timeout: int = 60, interval: float = 1.0):
"""Wait for WebSocket connection to become available."""
start_time = time.time()
last_error = None
attempt_count = 0
progress_interval = 10 # Log progress every 10 seconds
last_progress_time = start_time
# Disable detailed logging for connection attempts
self._log_connection_attempts = False
try:
self.logger.info(
f"Waiting for Computer API Server to be ready (timeout: {timeout}s)..."
)
# Start the keep-alive task if it's not already running
if self._reconnect_task is None or self._reconnect_task.done():
self._reconnect_task = asyncio.create_task(self._keep_alive())
# Wait for the connection to be established
while time.time() - start_time < timeout:
try:
attempt_count += 1
current_time = time.time()
# Log progress periodically without flooding logs
if current_time - last_progress_time >= progress_interval:
elapsed = current_time - start_time
self.logger.info(
f"Still waiting for Computer API Server... (elapsed: {elapsed:.1f}s, attempts: {attempt_count})"
)
last_progress_time = current_time
# Check if we have a connection
if self._ws and self._ws.state == websockets.protocol.State.OPEN:
# Test the connection with a simple command
try:
await self._send_command("get_screen_size")
elapsed = time.time() - start_time
self.logger.info(
f"Computer API Server is ready (after {elapsed:.1f}s, {attempt_count} attempts)"
)
return # Connection is fully working
except Exception as e:
last_error = e
self.logger.debug(f"Connection test failed: {e}")
# Wait before trying again
await asyncio.sleep(interval)
except Exception as e:
last_error = e
self.logger.debug(f"Connection attempt {attempt_count} failed: {e}")
await asyncio.sleep(interval)
# If we get here, we've timed out
error_msg = f"Could not connect to {self.ip_address} after {timeout} seconds"
if last_error:
error_msg += f": {str(last_error)}"
self.logger.error(error_msg)
raise TimeoutError(error_msg)
finally:
# Reset to default logging behavior
self._log_connection_attempts = False
def close(self):
"""Close WebSocket connection.
Note: In host computer server mode, we leave the connection open
to allow other clients to connect to the same server. The server
will handle cleaning up idle connections.
"""
# Only cancel the reconnect task
if self._reconnect_task:
self._reconnect_task.cancel()
# Don't set closed flag or close websocket by default
# This allows the server to stay connected for other clients
# self._closed = True
# if self._ws:
# asyncio.create_task(self._ws.close())
# self._ws = None
def force_close(self):
"""Force close the WebSocket connection.
This method should be called when you want to completely
shut down the connection, not just for regular cleanup.
"""
self._closed = True
if self._reconnect_task:
self._reconnect_task.cancel()
if self._ws:
asyncio.create_task(self._ws.close())
self._ws = None
# Mouse Actions
async def mouse_down(self, x: Optional[int] = None, y: Optional[int] = None, button: str = "left") -> None:
await self._send_command("mouse_down", {"x": x, "y": y, "button": button})
async def mouse_up(self, x: Optional[int] = None, y: Optional[int] = None, button: str = "left") -> None:
await self._send_command("mouse_up", {"x": x, "y": y, "button": button})
async def left_click(self, x: Optional[int] = None, y: Optional[int] = None) -> None:
await self._send_command("left_click", {"x": x, "y": y})
async def right_click(self, x: Optional[int] = None, y: Optional[int] = None) -> None:
await self._send_command("right_click", {"x": x, "y": y})
async def double_click(self, x: Optional[int] = None, y: Optional[int] = None) -> None:
await self._send_command("double_click", {"x": x, "y": y})
async def move_cursor(self, x: int, y: int) -> None:
await self._send_command("move_cursor", {"x": x, "y": y})
async def drag_to(self, x: int, y: int, button: "MouseButton" = "left", duration: float = 0.5) -> None:
await self._send_command(
"drag_to", {"x": x, "y": y, "button": button, "duration": duration}
)
async def drag(self, path: List[Tuple[int, int]], button: "MouseButton" = "left", duration: float = 0.5) -> None:
await self._send_command(
"drag", {"path": path, "button": button, "duration": duration}
)
# Keyboard Actions
async def key_down(self, key: "KeyType") -> None:
await self._send_command("key_down", {"key": key})
async def key_up(self, key: "KeyType") -> None:
await self._send_command("key_up", {"key": key})
async def type_text(self, text: str) -> None:
# Temporary fix for https://github.com/trycua/cua/issues/165
# Check if text contains Unicode characters
if any(ord(char) > 127 for char in text):
# For Unicode text, use clipboard and paste
await self.set_clipboard(text)
await self.hotkey(Key.COMMAND, 'v')
else:
# For ASCII text, use the regular typing method
await self._send_command("type_text", {"text": text})
async def press(self, key: "KeyType") -> None:
"""Press a single key.
Args:
key: The key to press. Can be any of:
- A Key enum value (recommended), e.g. Key.PAGE_DOWN
- A direct key value string, e.g. 'pagedown'
- A single character string, e.g. 'a'
Examples:
```python
# Using enum (recommended)
await interface.press(Key.PAGE_DOWN)
await interface.press(Key.ENTER)
# Using direct values
await interface.press('pagedown')
await interface.press('enter')
# Using single characters
await interface.press('a')
```
Raises:
ValueError: If the key type is invalid or the key is not recognized
"""
if isinstance(key, Key):
actual_key = key.value
elif isinstance(key, str):
# Try to convert to enum if it matches a known key
key_or_enum = Key.from_string(key)
actual_key = key_or_enum.value if isinstance(key_or_enum, Key) else key_or_enum
else:
raise ValueError(f"Invalid key type: {type(key)}. Must be Key enum or string.")
await self._send_command("press_key", {"key": actual_key})
async def press_key(self, key: "KeyType") -> None:
"""DEPRECATED: Use press() instead.
This method is kept for backward compatibility but will be removed in a future version.
Please use the press() method instead.
"""
await self.press(key)
async def hotkey(self, *keys: "KeyType") -> None:
"""Press multiple keys simultaneously.
Args:
*keys: Multiple keys to press simultaneously. Each key can be any of:
- A Key enum value (recommended), e.g. Key.COMMAND
- A direct key value string, e.g. 'command'
- A single character string, e.g. 'a'
Examples:
```python
# Using enums (recommended)
await interface.hotkey(Key.COMMAND, Key.C) # Copy
await interface.hotkey(Key.COMMAND, Key.V) # Paste
# Using mixed formats
await interface.hotkey(Key.COMMAND, 'a') # Select all
```
Raises:
ValueError: If any key type is invalid or not recognized
"""
actual_keys = []
for key in keys:
if isinstance(key, Key):
actual_keys.append(key.value)
elif isinstance(key, str):
# Try to convert to enum if it matches a known key
key_or_enum = Key.from_string(key)
actual_keys.append(key_or_enum.value if isinstance(key_or_enum, Key) else key_or_enum)
else:
raise ValueError(f"Invalid key type: {type(key)}. Must be Key enum or string.")
await self._send_command("hotkey", {"keys": actual_keys})
# Scrolling Actions
async def scroll(self, x: int, y: int) -> None:
await self._send_command("scroll", {"x": x, "y": y})
async def scroll_down(self, clicks: int = 1) -> None:
await self._send_command("scroll_down", {"clicks": clicks})
async def scroll_up(self, clicks: int = 1) -> None:
await self._send_command("scroll_up", {"clicks": clicks})
# Screen Actions
async def screenshot(
self,
boxes: Optional[List[Tuple[int, int, int, int]]] = None,
box_color: str = "#FF0000",
box_thickness: int = 2,
scale_factor: float = 1.0,
) -> bytes:
"""Take a screenshot with optional box drawing and scaling.
Args:
boxes: Optional list of (x, y, width, height) tuples defining boxes to draw in screen coordinates
box_color: Color of the boxes in hex format (default: "#FF0000" red)
box_thickness: Thickness of the box borders in pixels (default: 2)
scale_factor: Factor to scale the final image by (default: 1.0)
Use > 1.0 to enlarge, < 1.0 to shrink (e.g., 0.5 for half size, 2.0 for double)
Returns:
bytes: The screenshot image data, optionally with boxes drawn on it and scaled
"""
result = await self._send_command("screenshot")
if not result.get("image_data"):
raise RuntimeError("Failed to take screenshot")
screenshot = decode_base64_image(result["image_data"])
if boxes:
# Get the natural scaling between screen and screenshot
screen_size = await self.get_screen_size()
screenshot_width, screenshot_height = bytes_to_image(screenshot).size
width_scale = screenshot_width / screen_size["width"]
height_scale = screenshot_height / screen_size["height"]
# Scale box coordinates from screen space to screenshot space
for box in boxes:
scaled_box = (
int(box[0] * width_scale), # x
int(box[1] * height_scale), # y
int(box[2] * width_scale), # width
int(box[3] * height_scale), # height
)
screenshot = draw_box(
screenshot,
x=scaled_box[0],
y=scaled_box[1],
width=scaled_box[2],
height=scaled_box[3],
color=box_color,
thickness=box_thickness,
)
if scale_factor != 1.0:
screenshot = resize_image(screenshot, scale_factor)
return screenshot
async def get_screen_size(self) -> Dict[str, int]:
result = await self._send_command("get_screen_size")
if result["success"] and result["size"]:
return result["size"]
raise RuntimeError("Failed to get screen size")
async def get_cursor_position(self) -> Dict[str, int]:
result = await self._send_command("get_cursor_position")
if result["success"] and result["position"]:
return result["position"]
raise RuntimeError("Failed to get cursor position")
# Clipboard Actions
async def copy_to_clipboard(self) -> str:
result = await self._send_command("copy_to_clipboard")
if result["success"] and result["content"]:
return result["content"]
raise RuntimeError("Failed to get clipboard content")
async def set_clipboard(self, text: str) -> None:
await self._send_command("set_clipboard", {"text": text})
# File System Actions
async def file_exists(self, path: str) -> bool:
result = await self._send_command("file_exists", {"path": path})
return result.get("exists", False)
async def directory_exists(self, path: str) -> bool:
result = await self._send_command("directory_exists", {"path": path})
return result.get("exists", False)
async def list_dir(self, path: str) -> list[str]:
result = await self._send_command("list_dir", {"path": path})
if not result.get("success", False):
raise RuntimeError(result.get("error", "Failed to list directory"))
return result.get("files", [])
async def read_text(self, path: str) -> str:
result = await self._send_command("read_text", {"path": path})
if not result.get("success", False):
raise RuntimeError(result.get("error", "Failed to read file"))
return result.get("content", "")
async def write_text(self, path: str, content: str) -> None:
result = await self._send_command("write_text", {"path": path, "content": content})
if not result.get("success", False):
raise RuntimeError(result.get("error", "Failed to write file"))
async def read_bytes(self, path: str) -> bytes:
result = await self._send_command("read_bytes", {"path": path})
if not result.get("success", False):
raise RuntimeError(result.get("error", "Failed to read file"))
content_b64 = result.get("content_b64", "")
return decode_base64_image(content_b64)
async def write_bytes(self, path: str, content: bytes) -> None:
result = await self._send_command("write_bytes", {"path": path, "content_b64": encode_base64_image(content)})
if not result.get("success", False):
raise RuntimeError(result.get("error", "Failed to write file"))
async def delete_file(self, path: str) -> None:
result = await self._send_command("delete_file", {"path": path})
if not result.get("success", False):
raise RuntimeError(result.get("error", "Failed to delete file"))
async def create_dir(self, path: str) -> None:
result = await self._send_command("create_dir", {"path": path})
if not result.get("success", False):
raise RuntimeError(result.get("error", "Failed to create directory"))
async def delete_dir(self, path: str) -> None:
result = await self._send_command("delete_dir", {"path": path})
if not result.get("success", False):
raise RuntimeError(result.get("error", "Failed to delete directory"))
async def run_command(self, command: str) -> Tuple[str, str]:
result = await self._send_command("run_command", {"command": command})
if not result.get("success", False):
raise RuntimeError(result.get("error", "Failed to run command"))
return result.get("stdout", ""), result.get("stderr", "")
# Accessibility Actions
async def get_accessibility_tree(self) -> Dict[str, Any]:
"""Get the accessibility tree of the current screen."""
result = await self._send_command("get_accessibility_tree")
if not result.get("success", False):
raise RuntimeError(result.get("error", "Failed to get accessibility tree"))
return result
async def get_active_window_bounds(self) -> Dict[str, int]:
"""Get the bounds of the currently active window."""
result = await self._send_command("get_active_window_bounds")
if result["success"] and result["bounds"]:
return result["bounds"]
raise RuntimeError("Failed to get active window bounds")
async def to_screen_coordinates(self, x: float, y: float) -> tuple[float, float]:
"""Convert screenshot coordinates to screen coordinates.
Args:
x: X coordinate in screenshot space
y: Y coordinate in screenshot space
Returns:
tuple[float, float]: (x, y) coordinates in screen space
"""
screen_size = await self.get_screen_size()
screenshot = await self.screenshot()
screenshot_img = bytes_to_image(screenshot)
screenshot_width, screenshot_height = screenshot_img.size
# Calculate scaling factors
width_scale = screen_size["width"] / screenshot_width
height_scale = screen_size["height"] / screenshot_height
# Convert coordinates
screen_x = x * width_scale
screen_y = y * height_scale
return screen_x, screen_y
async def to_screenshot_coordinates(self, x: float, y: float) -> tuple[float, float]:
"""Convert screen coordinates to screenshot coordinates.
Args:
x: X coordinate in screen space
y: Y coordinate in screen space
Returns:
tuple[float, float]: (x, y) coordinates in screenshot space
"""
screen_size = await self.get_screen_size()
screenshot = await self.screenshot()
screenshot_img = bytes_to_image(screenshot)
screenshot_width, screenshot_height = screenshot_img.size
# Calculate scaling factors
width_scale = screenshot_width / screen_size["width"]
height_scale = screenshot_height / screen_size["height"]
# Convert coordinates
screenshot_x = x * width_scale
screenshot_y = y * height_scale
return screenshot_x, screenshot_y

View File

@@ -72,12 +72,23 @@ cp -f .build/release/lume "$TEMP_ROOT/usr/local/bin/"
# Build the installer package
log "essential" "Building installer package..."
pkgbuild --root "$TEMP_ROOT" \
if ! pkgbuild --root "$TEMP_ROOT" \
--identifier "com.trycua.lume" \
--version "1.0" \
--install-location "/" \
--sign "$CERT_INSTALLER_NAME" \
./.release/lume.pkg 2> /dev/null
./.release/lume.pkg; then
log "error" "Failed to build installer package"
exit 1
fi
# Verify the package was created
if [ ! -f "./.release/lume.pkg" ]; then
log "error" "Package file ./.release/lume.pkg was not created"
exit 1
fi
log "essential" "Package created successfully"
# Submit for notarization using stored credentials
log "essential" "Submitting for notarization..."
@@ -89,24 +100,33 @@ if [ "$LOG_LEVEL" = "minimal" ] || [ "$LOG_LEVEL" = "none" ]; then
--password "${APP_SPECIFIC_PASSWORD}" \
--wait 2>&1)
# Just show success or failure
# Check if notarization was successful
if echo "$NOTARY_OUTPUT" | grep -q "status: Accepted"; then
log "essential" "Notarization successful!"
else
log "error" "Notarization failed. Please check logs."
log "error" "Notarization output:"
echo "$NOTARY_OUTPUT"
exit 1
fi
else
# Normal verbose output
xcrun notarytool submit ./.release/lume.pkg \
if ! xcrun notarytool submit ./.release/lume.pkg \
--apple-id "${APPLE_ID}" \
--team-id "${TEAM_ID}" \
--password "${APP_SPECIFIC_PASSWORD}" \
--wait
--wait; then
log "error" "Notarization failed"
exit 1
fi
fi
# Staple the notarization ticket
log "essential" "Stapling notarization ticket..."
xcrun stapler staple ./.release/lume.pkg > /dev/null 2>&1
if ! xcrun stapler staple ./.release/lume.pkg > /dev/null 2>&1; then
log "error" "Failed to staple notarization ticket"
exit 1
fi
# Create temporary directory for package extraction
EXTRACT_ROOT=$(mktemp -d)

View File

@@ -34,10 +34,7 @@ pip install "cua-agent[anthropic]" # Anthropic Cua Loop
pip install "cua-agent[uitars]" # UI-Tars support
pip install "cua-agent[omni]" # Cua Loop based on OmniParser (includes Ollama for local models)
pip install "cua-agent[ui]" # Gradio UI for the agent
# For local UI-TARS with MLX support, you need to manually install mlx-vlm:
pip install "cua-agent[uitars-mlx]"
pip install git+https://github.com/ddupont808/mlx-vlm.git@stable/fix/qwen2-position-id # PR: https://github.com/Blaizzy/mlx-vlm/pull/349
pip install "cua-agent[uitars-mlx]" # MLX UI-Tars support
```
## Run

View File

@@ -6,7 +6,7 @@ import logging
__version__ = "0.1.0"
# Initialize logging
logger = logging.getLogger("cua.agent")
logger = logging.getLogger("agent")
# Initialize telemetry when the package is imported
try:

View File

@@ -11,10 +11,8 @@ from .types import AgentResponse
from .factory import LoopFactory
from .provider_config import DEFAULT_MODELS, ENV_VARS
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class ComputerAgent:
"""A computer agent that can perform automated tasks using natural language instructions."""

View File

@@ -81,16 +81,27 @@ class StandardMessageManager:
if not self.config.num_images_to_keep:
return messages
# Find user messages with images
# Find messages with images (both user messages and tool call outputs)
image_messages = []
for msg in messages:
has_image = False
# Check user messages with images
if msg["role"] == "user" and isinstance(msg["content"], list):
has_image = any(
item.get("type") == "image_url" or item.get("type") == "image"
for item in msg["content"]
)
if has_image:
image_messages.append(msg)
# Check assistant messages with tool calls that have images
elif msg["role"] == "assistant" and isinstance(msg["content"], list):
for item in msg["content"]:
if item.get("type") == "tool_result" and "base64_image" in item:
has_image = True
break
if has_image:
image_messages.append(msg)
# If we don't have more images than the limit, return all messages
if len(image_messages) <= self.config.num_images_to_keep:
@@ -100,13 +111,35 @@ class StandardMessageManager:
images_to_keep = image_messages[-self.config.num_images_to_keep :]
images_to_remove = image_messages[: -self.config.num_images_to_keep]
# Create a new message list without the older images
# Create a new message list, removing images from older messages
result = []
for msg in messages:
if msg in images_to_remove:
# Skip this message
continue
result.append(msg)
# Remove images from this message but keep the text content
if msg["role"] == "user" and isinstance(msg["content"], list):
# Keep only text content, remove images
new_content = [
item for item in msg["content"]
if item.get("type") not in ["image_url", "image"]
]
if new_content: # Only add if there's still content
result.append({"role": msg["role"], "content": new_content})
elif msg["role"] == "assistant" and isinstance(msg["content"], list):
# Remove base64_image from tool_result items
new_content = []
for item in msg["content"]:
if item.get("type") == "tool_result" and "base64_image" in item:
# Create a copy without the base64_image
new_item = {k: v for k, v in item.items() if k != "base64_image"}
new_content.append(new_item)
else:
new_content.append(item)
result.append({"role": msg["role"], "content": new_content})
else:
# For other message types, keep as is
result.append(msg)
else:
result.append(msg)
return result

View File

@@ -34,7 +34,7 @@ flush = _default_flush
is_telemetry_enabled = _default_is_telemetry_enabled
is_telemetry_globally_disabled = _default_is_telemetry_globally_disabled
logger = logging.getLogger("cua.agent.telemetry")
logger = logging.getLogger("agent.telemetry")
try:
# Import from core telemetry

View File

@@ -50,8 +50,8 @@ class BashTool(BaseBashTool, BaseAnthropicTool):
try:
async with asyncio.timeout(self._timeout):
stdout, stderr = await self.computer.interface.run_command(command)
return CLIResult(output=stdout or "", error=stderr or "")
result = await self.computer.interface.run_command(command)
return CLIResult(output=result.stdout or "", error=result.stderr or "")
except asyncio.TimeoutError as e:
raise ToolError(f"Command timed out after {self._timeout} seconds") from e
except Exception as e:

View File

@@ -205,26 +205,6 @@ class ComputerTool(BaseComputerTool, BaseAnthropicTool):
self.logger.info(f" Coordinates: ({x}, {y})")
try:
# Take pre-action screenshot to get current dimensions
pre_screenshot = await self.computer.interface.screenshot()
pre_img = Image.open(io.BytesIO(pre_screenshot))
# Scale image to match screen dimensions if needed
if pre_img.size != (self.width, self.height):
self.logger.info(
f"Scaling image from {pre_img.size} to {self.width}x{self.height} to match screen dimensions"
)
if not isinstance(self.width, int) or not isinstance(self.height, int):
raise ToolError("Screen dimensions must be integers")
size = (int(self.width), int(self.height))
pre_img = pre_img.resize(size, Image.Resampling.LANCZOS)
# Save the scaled image back to bytes
buffer = io.BytesIO()
pre_img.save(buffer, format="PNG")
pre_screenshot = buffer.getvalue()
self.logger.info(f" Current dimensions: {pre_img.width}x{pre_img.height}")
# Perform the click action
if action == "left_click":
self.logger.info(f"Clicking at ({x}, {y})")
@@ -242,45 +222,14 @@ class ComputerTool(BaseComputerTool, BaseAnthropicTool):
# Wait briefly for any UI changes
await asyncio.sleep(0.5)
# Take and save post-action screenshot
post_screenshot = await self.computer.interface.screenshot()
post_img = Image.open(io.BytesIO(post_screenshot))
# Scale post-action image if needed
if post_img.size != (self.width, self.height):
self.logger.info(
f"Scaling post-action image from {post_img.size} to {self.width}x{self.height}"
)
post_img = post_img.resize(
(self.width, self.height), Image.Resampling.LANCZOS
)
buffer = io.BytesIO()
post_img.save(buffer, format="PNG")
post_screenshot = buffer.getvalue()
return ToolResult(
output=f"Performed {action} at ({x}, {y})",
base64_image=base64.b64encode(post_screenshot).decode(),
)
except Exception as e:
self.logger.error(f"Error during {action} action: {str(e)}")
raise ToolError(f"Failed to perform {action}: {str(e)}")
else:
try:
# Take pre-action screenshot
pre_screenshot = await self.computer.interface.screenshot()
pre_img = Image.open(io.BytesIO(pre_screenshot))
# Scale image if needed
if pre_img.size != (self.width, self.height):
self.logger.info(
f"Scaling image from {pre_img.size} to {self.width}x{self.height}"
)
if not isinstance(self.width, int) or not isinstance(self.height, int):
raise ToolError("Screen dimensions must be integers")
size = (int(self.width), int(self.height))
pre_img = pre_img.resize(size, Image.Resampling.LANCZOS)
# Perform the click action
if action == "left_click":
self.logger.info("Performing left click at current position")
@@ -295,25 +244,8 @@ class ComputerTool(BaseComputerTool, BaseAnthropicTool):
# Wait briefly for any UI changes
await asyncio.sleep(0.5)
# Take post-action screenshot
post_screenshot = await self.computer.interface.screenshot()
post_img = Image.open(io.BytesIO(post_screenshot))
# Scale post-action image if needed
if post_img.size != (self.width, self.height):
self.logger.info(
f"Scaling post-action image from {post_img.size} to {self.width}x{self.height}"
)
post_img = post_img.resize(
(self.width, self.height), Image.Resampling.LANCZOS
)
buffer = io.BytesIO()
post_img.save(buffer, format="PNG")
post_screenshot = buffer.getvalue()
return ToolResult(
output=f"Performed {action} at current position",
base64_image=base64.b64encode(post_screenshot).decode(),
)
except Exception as e:
self.logger.error(f"Error during {action} action: {str(e)}")
@@ -328,20 +260,6 @@ class ComputerTool(BaseComputerTool, BaseAnthropicTool):
raise ToolError(f"{text} must be a string")
try:
# Take pre-action screenshot
pre_screenshot = await self.computer.interface.screenshot()
pre_img = Image.open(io.BytesIO(pre_screenshot))
# Scale image if needed
if pre_img.size != (self.width, self.height):
self.logger.info(
f"Scaling image from {pre_img.size} to {self.width}x{self.height}"
)
if not isinstance(self.width, int) or not isinstance(self.height, int):
raise ToolError("Screen dimensions must be integers")
size = (int(self.width), int(self.height))
pre_img = pre_img.resize(size, Image.Resampling.LANCZOS)
if action == "key":
# Special handling for page up/down on macOS
if text.lower() in ["pagedown", "page_down", "page down"]:
@@ -378,25 +296,8 @@ class ComputerTool(BaseComputerTool, BaseAnthropicTool):
# Wait briefly for UI changes
await asyncio.sleep(0.5)
# Take post-action screenshot
post_screenshot = await self.computer.interface.screenshot()
post_img = Image.open(io.BytesIO(post_screenshot))
# Scale post-action image if needed
if post_img.size != (self.width, self.height):
self.logger.info(
f"Scaling post-action image from {post_img.size} to {self.width}x{self.height}"
)
post_img = post_img.resize(
(self.width, self.height), Image.Resampling.LANCZOS
)
buffer = io.BytesIO()
post_img.save(buffer, format="PNG")
post_screenshot = buffer.getvalue()
return ToolResult(
output=f"Pressed key: {output_text}",
base64_image=base64.b64encode(post_screenshot).decode(),
)
elif action == "type":
@@ -406,66 +307,13 @@ class ComputerTool(BaseComputerTool, BaseAnthropicTool):
# Wait briefly for UI changes
await asyncio.sleep(0.5)
# Take post-action screenshot
post_screenshot = await self.computer.interface.screenshot()
post_img = Image.open(io.BytesIO(post_screenshot))
# Scale post-action image if needed
if post_img.size != (self.width, self.height):
self.logger.info(
f"Scaling post-action image from {post_img.size} to {self.width}x{self.height}"
)
post_img = post_img.resize(
(self.width, self.height), Image.Resampling.LANCZOS
)
buffer = io.BytesIO()
post_img.save(buffer, format="PNG")
post_screenshot = buffer.getvalue()
return ToolResult(
output=f"Typed text: {text}",
base64_image=base64.b64encode(post_screenshot).decode(),
)
except Exception as e:
self.logger.error(f"Error during {action} action: {str(e)}")
raise ToolError(f"Failed to perform {action}: {str(e)}")
elif action in ("screenshot", "cursor_position"):
if text is not None:
raise ToolError(f"text is not accepted for {action}")
if coordinate is not None:
raise ToolError(f"coordinate is not accepted for {action}")
try:
if action == "screenshot":
# Take screenshot
screenshot = await self.computer.interface.screenshot()
img = Image.open(io.BytesIO(screenshot))
# Scale image if needed
if img.size != (self.width, self.height):
self.logger.info(
f"Scaling image from {img.size} to {self.width}x{self.height}"
)
if not isinstance(self.width, int) or not isinstance(self.height, int):
raise ToolError("Screen dimensions must be integers")
size = (int(self.width), int(self.height))
img = img.resize(size, Image.Resampling.LANCZOS)
buffer = io.BytesIO()
img.save(buffer, format="PNG")
screenshot = buffer.getvalue()
return ToolResult(base64_image=base64.b64encode(screenshot).decode())
elif action == "cursor_position":
pos = await self.computer.interface.get_cursor_position()
x, y = pos # Unpack the tuple
return ToolResult(output=f"X={int(x)},Y={int(y)}")
except Exception as e:
self.logger.error(f"Error during {action} action: {str(e)}")
raise ToolError(f"Failed to perform {action}: {str(e)}")
elif action == "scroll":
# Implement scroll action
direction = kwargs.get("direction", "down")
@@ -487,28 +335,20 @@ class ComputerTool(BaseComputerTool, BaseAnthropicTool):
# Wait briefly for UI changes
await asyncio.sleep(0.5)
# Take post-action screenshot
post_screenshot = await self.computer.interface.screenshot()
post_img = Image.open(io.BytesIO(post_screenshot))
# Scale post-action image if needed
if post_img.size != (self.width, self.height):
self.logger.info(
f"Scaling post-action image from {post_img.size} to {self.width}x{self.height}"
)
post_img = post_img.resize((self.width, self.height), Image.Resampling.LANCZOS)
buffer = io.BytesIO()
post_img.save(buffer, format="PNG")
post_screenshot = buffer.getvalue()
return ToolResult(
output=f"Scrolled {direction} by {amount} steps",
base64_image=base64.b64encode(post_screenshot).decode(),
)
except Exception as e:
self.logger.error(f"Error during scroll action: {str(e)}")
raise ToolError(f"Failed to perform scroll: {str(e)}")
elif action == "screenshot":
# Take screenshot
return await self.screenshot()
elif action == "cursor_position":
pos = await self.computer.interface.get_cursor_position()
x, y = pos # Unpack the tuple
return ToolResult(output=f"X={int(x)},Y={int(y)}")
raise ToolError(f"Invalid action: {action}")
async def screenshot(self):

View File

@@ -95,13 +95,13 @@ class EditTool(BaseEditTool, BaseAnthropicTool):
result = await self.computer.interface.run_command(
f'[ -e "{str(path)}" ] && echo "exists" || echo "not exists"'
)
exists = result[0].strip() == "exists"
exists = result.stdout.strip() == "exists"
if exists:
result = await self.computer.interface.run_command(
f'[ -d "{str(path)}" ] && echo "dir" || echo "file"'
)
is_dir = result[0].strip() == "dir"
is_dir = result.stdout.strip() == "dir"
else:
is_dir = False
@@ -126,7 +126,7 @@ class EditTool(BaseEditTool, BaseAnthropicTool):
result = await self.computer.interface.run_command(
f'[ -d "{str(path)}" ] && echo "dir" || echo "file"'
)
is_dir = result[0].strip() == "dir"
is_dir = result.stdout.strip() == "dir"
if is_dir:
if view_range:
@@ -136,7 +136,7 @@ class EditTool(BaseEditTool, BaseAnthropicTool):
# List directory contents using ls
result = await self.computer.interface.run_command(f'ls -la "{str(path)}"')
contents = result[0]
contents = result.stdout
if contents:
stdout = f"Here's the files and directories in {path}:\n{contents}\n"
else:
@@ -272,9 +272,9 @@ class EditTool(BaseEditTool, BaseAnthropicTool):
"""Read the content of a file using cat command."""
try:
result = await self.computer.interface.run_command(f'cat "{str(path)}"')
if result[1]: # If there's stderr output
raise ToolError(f"Error reading file: {result[1]}")
return result[0]
if result.stderr: # If there's stderr output
raise ToolError(f"Error reading file: {result.stderr}")
return result.stdout
except Exception as e:
raise ToolError(f"Failed to read {path}: {str(e)}")
@@ -291,8 +291,8 @@ class EditTool(BaseEditTool, BaseAnthropicTool):
{content}
EOFCUA"""
result = await self.computer.interface.run_command(cmd)
if result[1]: # If there's stderr output
raise ToolError(f"Error writing file: {result[1]}")
if result.stderr: # If there's stderr output
raise ToolError(f"Error writing file: {result.stderr}")
except Exception as e:
raise ToolError(f"Failed to write to {path}: {str(e)}")

View File

@@ -26,10 +26,8 @@ from .api_handler import OmniAPIHandler
from .tools.manager import ToolManager
from .tools import ToolResult
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def extract_data(input_string: str, data_type: str) -> str:
"""Extract content from code blocks."""
pattern = f"```{data_type}" + r"(.*?)(```|$)"

Some files were not shown because too many files have changed in this diff Show More