First migration of readme files and doc files to the libraries.

This commit is contained in:
Morgan Dean
2025-06-11 11:41:07 -04:00
parent b8cc1d3e3c
commit e68e716f90
16 changed files with 1799 additions and 1 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 161 KiB

View File

@@ -0,0 +1,219 @@
---
title: Agent
---
<div align="center" style={{display: 'flex', gap: '10px', margin: '0 auto', width: '100%', justifyContent: 'center'}}>
<a href="#"><img src="https://img.shields.io/badge/Python-333333?logo=python&logoColor=white&labelColor=333333" alt="Python" /></a>
<a href="#"><img src="https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0" alt="macOS" /></a>
<a href="https://discord.com/invite/mVnXXpdE85"><img src="https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white" alt="Discord" /></a>
<a href="https://pypi.org/project/cua-computer/"><img src="https://img.shields.io/pypi/v/cua-computer?color=333333" alt="PyPI" /></a>
</div>
**cua-agent** is a general Computer-Use framework for running multi-app agentic workflows targeting macOS and Linux sandbox created with Cua, supporting local (Ollama) and cloud model providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen).
### Get started with Agent
<div align="center">
<img src="./agent.png"/>
</div>
## Install
```bash
pip install "cua-agent[all]"
# or install specific loop providers
pip install "cua-agent[openai]" # OpenAI Cua Loop
pip install "cua-agent[anthropic]" # Anthropic Cua Loop
pip install "cua-agent[uitars]" # UI-Tars support
pip install "cua-agent[omni]" # Cua Loop based on OmniParser (includes Ollama for local models)
pip install "cua-agent[ui]" # Gradio UI for the agent
# For local UI-TARS with MLX support, you need to manually install mlx-vlm:
pip install "cua-agent[uitars-mlx]"
pip install git+https://github.com/ddupont808/mlx-vlm.git@stable/fix/qwen2-position-id # PR: https://github.com/Blaizzy/mlx-vlm/pull/349
```
## Run
```bash
async with Computer() as macos_computer:
# Create agent with loop and provider
agent = ComputerAgent(
computer=macos_computer,
loop=AgentLoop.OPENAI,
model=LLM(provider=LLMProvider.OPENAI)
# or
# loop=AgentLoop.ANTHROPIC,
# model=LLM(provider=LLMProvider.ANTHROPIC)
# or
# loop=AgentLoop.OMNI,
# model=LLM(provider=LLMProvider.OLLAMA, name="gemma3")
# or
# loop=AgentLoop.UITARS,
# model=LLM(provider=LLMProvider.OAICOMPAT, name="ByteDance-Seed/UI-TARS-1.5-7B", provider_base_url="https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1")
)
tasks = [
"Look for a repository named trycua/cua on GitHub.",
"Check the open issues, open the most recent one and read it.",
"Clone the repository in users/lume/projects if it doesn't exist yet.",
"Open the repository with an app named Cursor (on the dock, black background and white cube icon).",
"From Cursor, open Composer if not already open.",
"Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.",
]
for i, task in enumerate(tasks):
print(f"\nExecuting task {i}/{len(tasks)}: {task}")
async for result in agent.run(task):
print(result)
print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")
```
Refer to these notebooks for step-by-step guides on how to use the Computer-Use Agent (CUA):
- [Agent Notebook](https://github.com/trycua/cua/tree/main/notebooks/agent_nb.ipynb) - Complete examples and workflows
## Using the Gradio UI
The agent includes a Gradio-based user interface for easier interaction.
<div align="center">
<img src="./agent_gradio_ui.png"/>
</div>
To use it:
```bash
# Install with Gradio support
pip install "cua-agent[ui]"
```
### Create a simple launcher script
```python
# launch_ui.py
from agent.ui.gradio.app import create_gradio_ui
app = create_gradio_ui()
app.launch(share=False)
```
### Setting up API Keys
For the Gradio UI to show available models, you need to set API keys as environment variables:
```bash
# For OpenAI models
export OPENAI_API_KEY=your_openai_key_here
# For Anthropic models
export ANTHROPIC_API_KEY=your_anthropic_key_here
# Launch with both keys set
OPENAI_API_KEY=your_key ANTHROPIC_API_KEY=your_key python launch_ui.py
```
Without these environment variables, the UI will show "No models available" for the corresponding providers, but you can still use local models with the OMNI loop provider.
### Using Local Models
You can use local models with the OMNI loop provider by selecting "Custom model..." from the dropdown. The default provider URL is set to `http://localhost:1234/v1` which works with LM Studio.
If you're using a different local model server:
- vLLM: `http://localhost:8000/v1`
- LocalAI: `http://localhost:8080/v1`
- Ollama with OpenAI compat API: `http://localhost:11434/v1`
The Gradio UI provides:
- Selection of different agent loops (OpenAI, Anthropic, OMNI)
- Model selection for each provider
- Configuration of agent parameters
- Chat interface for interacting with the agent
### Using UI-TARS
The UI-TARS models are available in two forms:
1. **MLX UI-TARS models** (Default): These models run locally using MLXVLM provider
- `mlx-community/UI-TARS-1.5-7B-4bit` (default) - 4-bit quantized version
- `mlx-community/UI-TARS-1.5-7B-6bit` - 6-bit quantized version for higher quality
```python
agent = ComputerAgent(
computer=macos_computer,
loop=AgentLoop.UITARS,
model=LLM(provider=LLMProvider.MLXVLM, name="mlx-community/UI-TARS-1.5-7B-4bit")
)
```
2. **OpenAI-compatible UI-TARS**: For using the original ByteDance model
- If you want to use the original ByteDance UI-TARS model via an OpenAI-compatible API, follow the [deployment guide](https://github.com/bytedance/UI-TARS/blob/main/README_deploy.md)
- This will give you a provider URL like `https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1` which you can use in the code or Gradio UI:
```python
agent = ComputerAgent(
computer=macos_computer,
loop=AgentLoop.UITARS,
model=LLM(provider=LLMProvider.OAICOMPAT, name="tgi",
provider_base_url="https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1")
)
```
## Agent Loops
The `cua-agent` package provides three agent loops variations, based on different CUA models providers and techniques:
| Agent Loop | Supported Models | Description | Set-Of-Marks |
|:-----------|:-----------------|:------------|:-------------|
| `AgentLoop.OPENAI` | • `computer_use_preview` | Use OpenAI Operator CUA model | Not Required |
| `AgentLoop.ANTHROPIC` | • `claude-3-5-sonnet-20240620`<br/>• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use | Not Required |
| `AgentLoop.UITARS` | • `mlx-community/UI-TARS-1.5-7B-4bit` (default)<br/>• `mlx-community/UI-TARS-1.5-7B-6bit`<br/>• `ByteDance-Seed/UI-TARS-1.5-7B` (via openAI-compatible endpoint) | Uses UI-TARS models with MLXVLM (default) or OAICOMPAT providers | Not Required |
| `AgentLoop.OMNI` | • `claude-3-5-sonnet-20240620`<br/>• `claude-3-7-sonnet-20250219`<br/>• `gpt-4.5-preview`<br/>• `gpt-4o`<br/>• `gpt-4`<br/>• `phi4`<br/>• `phi4-mini`<br/>• `gemma3`<br/>• `...`<br/>• `Any Ollama or OpenAI-compatible model` | Use OmniParser for element pixel-detection (SoM) and any VLMs for UI Grounding and Reasoning | OmniParser |
## AgentResponse
The `AgentResponse` class represents the structured output returned after each agent turn. It contains the agent's response, reasoning, tool usage, and other metadata. The response format aligns with the new [OpenAI Agent SDK specification](https://platform.openai.com/docs/api-reference/responses) for better consistency across different agent loops.
```python
async for result in agent.run(task):
print("Response ID: ", result.get("id"))
# Print detailed usage information
usage = result.get("usage")
if usage:
print("\nUsage Details:")
print(f" Input Tokens: {usage.get('input_tokens')}")
if "input_tokens_details" in usage:
print(f" Input Tokens Details: {usage.get('input_tokens_details')}")
print(f" Output Tokens: {usage.get('output_tokens')}")
if "output_tokens_details" in usage:
print(f" Output Tokens Details: {usage.get('output_tokens_details')}")
print(f" Total Tokens: {usage.get('total_tokens')}")
print("Response Text: ", result.get("text"))
# Print tools information
tools = result.get("tools")
if tools:
print("\nTools:")
print(tools)
# Print reasoning and tool call outputs
outputs = result.get("output", [])
for output in outputs:
output_type = output.get("type")
if output_type == "reasoning":
print("\nReasoning Output:")
print(output)
elif output_type == "computer_call":
print("\nTool Call Output:")
print(output)
```
**Note on Settings Persistence:**
* The Gradio UI automatically saves your configuration (Agent Loop, Model Choice, Custom Base URL, Save Trajectory state, Recent Images count) to a file named `.gradio_settings.json` in the project's root directory when you successfully run a task.
* This allows your preferences to persist between sessions.
* API keys entered into the custom provider field are **not** saved in this file for security reasons. Manage API keys using environment variables (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`) or a `.env` file.
* It's recommended to add `.gradio_settings.json` to your `.gitignore` file.

View File

@@ -0,0 +1,32 @@
---
title: Computer Server
---
<div align="center" style={{display: 'flex', gap: '10px', margin: '0 auto', width: '100%', justifyContent: 'center'}}>
<a href="#"><img src="https://img.shields.io/badge/Python-333333?logo=python&logoColor=white&labelColor=333333" alt="Python" /></a>
<a href="#"><img src="https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0" alt="macOS" /></a>
<a href="https://discord.com/invite/mVnXXpdE85"><img src="https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white" alt="Discord" /></a>
<a href="https://pypi.org/project/cua-computer-server/"><img src="https://img.shields.io/pypi/v/cua-computer-server?color=333333" alt="PyPI" /></a>
</div>
**Computer Server** is the server component for the Computer-Use Interface (CUI) framework powering Cua for interacting with local macOS and Linux sandboxes, PyAutoGUI-compatible, and pluggable with any AI agent systems (Cua, Langchain, CrewAI, AutoGen).
## Features
- WebSocket API for computer-use
- Cross-platform support (macOS, Linux)
- Integration with CUA computer library for screen control, keyboard/mouse automation, and accessibility
## Install
To install the Computer-Use Interface (CUI):
```bash
pip install cua-computer-server
```
## Run
Refer to this notebook for a step-by-step guide on how to use the Computer-Use Server on the host system or VM:
- [Computer-Use Server](https://github.com/trycua/cua/tree/main/notebooks/samples/computer_server_nb.ipynb)

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 MiB

View File

@@ -0,0 +1,138 @@
---
title: Computer
---
<div align="center" style={{display: 'flex', gap: '10px', margin: '0 auto', width: '100%', justifyContent: 'center'}}>
<a href="#"><img src="https://img.shields.io/badge/Python-333333?logo=python&logoColor=white&labelColor=333333" alt="Python" /></a>
<a href="#"><img src="https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0" alt="macOS" /></a>
<a href="https://discord.com/invite/mVnXXpdE85"><img src="https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white" alt="Discord" /></a>
<a href="https://pypi.org/project/cua-computer/"><img src="https://img.shields.io/pypi/v/cua-computer?color=333333" alt="PyPI" /></a>
</div>
**cua-computer** is a Computer-Use Interface (CUI) framework powering Cua for interacting with local macOS and Linux sandboxes, PyAutoGUI-compatible, and pluggable with any AI agent systems (Cua, Langchain, CrewAI, AutoGen). Computer relies on [Lume](https://github.com/trycua/lume) for creating and managing sandbox environments.
### Get started with Computer
<div align="center">
<img src="./computer.png"/>
</div>
```python
from computer import Computer
computer = Computer(os_type="macos", display="1024x768", memory="8GB", cpu="4")
try:
await computer.run()
screenshot = await computer.interface.screenshot()
with open("screenshot.png", "wb") as f:
f.write(screenshot)
await computer.interface.move_cursor(100, 100)
await computer.interface.left_click()
await computer.interface.right_click(300, 300)
await computer.interface.double_click(400, 400)
await computer.interface.type("Hello, World!")
await computer.interface.press_key("enter")
await computer.interface.set_clipboard("Test clipboard")
content = await computer.interface.copy_to_clipboard()
print(f"Clipboard content: {content}")
finally:
await computer.stop()
```
## Install
To install the Computer-Use Interface (CUI):
```bash
pip install "cua-computer[all]"
```
The `cua-computer` PyPi package pulls automatically the latest executable version of Lume through [pylume](https://github.com/trycua/pylume).
## Run
Refer to this notebook for a step-by-step guide on how to use the Computer-Use Interface (CUI):
- [Computer-Use Interface (CUI)](https://github.com/trycua/cua/tree/main/notebooks/samples/computer_nb.ipynb)
## Using the Gradio Computer UI
The computer module includes a Gradio UI for creating and sharing demonstration data. We make it easy for people to build community datasets for better computer use models with an upload to Huggingface feature.
```bash
# Install with UI support
pip install "cua-computer[ui]"
```
> **Note:** For precise control of the computer, we recommend using VNC or Screen Sharing instead of the Computer Gradio UI.
### Building and Sharing Demonstrations with Huggingface
Follow these steps to contribute your own demonstrations:
#### 1. Set up Huggingface Access
Set your HF_TOKEN in a .env file or in your environment variables:
```bash
# In .env file
HF_TOKEN=your_huggingface_token
```
#### 2. Launch the Computer UI
```python
# launch_ui.py
from computer.ui.gradio.app import create_gradio_ui
from dotenv import load_dotenv
load_dotenv('.env')
app = create_gradio_ui()
app.launch(share=False)
```
For examples, see [Computer UI Examples](https://github.com/trycua/cua/tree/main/examples/computer_ui_examples.py)
#### 3. Record Your Tasks
<details open>
<summary>View demonstration video</summary>
<video src="https://github.com/user-attachments/assets/de3c3477-62fe-413c-998d-4063e48de176" controls width="600"></video>
</details>
Record yourself performing various computer tasks using the UI.
#### 4. Save Your Demonstrations
<details open>
<summary>View demonstration video</summary>
<video src="https://github.com/user-attachments/assets/5ad1df37-026a-457f-8b49-922ae805faef" controls width="600"></video>
</details>
Save each task by picking a descriptive name and adding relevant tags (e.g., "office", "web-browsing", "coding").
#### 5. Record Additional Demonstrations
Repeat steps 3 and 4 until you have a good amount of demonstrations covering different tasks and scenarios.
#### 6. Upload to Huggingface
<details open>
<summary>View demonstration video</summary>
<video src="https://github.com/user-attachments/assets/c586d460-3877-4b5f-a736-3248886d2134" controls width="600"></video>
</details>
Upload your dataset to Huggingface by:
- Naming it as `{your_username}/{dataset_name}`
- Choosing public or private visibility
- Optionally selecting specific tags to upload only tasks with certain tags
#### Examples and Resources
- Example Dataset: [ddupont/test-dataset](https://huggingface.co/datasets/ddupont/test-dataset)
- Find Community Datasets: 🔍 [Browse CUA Datasets on Huggingface](https://huggingface.co/datasets?other=cua)

View File

@@ -0,0 +1,390 @@
---
title: API Reference
## API Reference
---
<details open>
<summary><strong>Create VM</strong> - POST /vms</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
-H "Content-Type: application/json" \
-d '{
"name": "lume_vm",
"os": "macOS",
"cpu": 2,
"memory": "4GB",
"diskSize": "64GB",
"display": "1024x768",
"ipsw": "latest",
"storage": "ssd"
}' \
http://localhost:7777/lume/vms
```
</details>
<details open>
<summary><strong>Run VM</strong> - POST /vms/:name/run</summary>
```bash
# Basic run
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
http://localhost:7777/lume/vms/my-vm-name/run
# Run with VNC client started and shared directory
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
-H "Content-Type: application/json" \
-d '{
"noDisplay": false,
"sharedDirectories": [
{
"hostPath": "~/Projects",
"readOnly": false
}
],
"recoveryMode": false,
"storage": "ssd"
}' \
http://localhost:7777/lume/vms/lume_vm/run
```
</details>
<details open>
<summary><strong>List VMs</strong> - GET /vms</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
http://localhost:7777/lume/vms
```
```
[
{
"name": "my-vm",
"state": "stopped",
"os": "macOS",
"cpu": 2,
"memory": "4GB",
"diskSize": "64GB"
},
{
"name": "my-vm-2",
"state": "stopped",
"os": "linux",
"cpu": 2,
"memory": "4GB",
"diskSize": "64GB"
}
]
```
</details>
<details open>
<summary><strong>Get VM Details</strong> - GET /vms/:name</summary>
```bash
# Basic get
curl --connect-timeout 6000 \
--max-time 5000 \
http://localhost:7777/lume/vms/lume_vm
# Get with storage location specified
curl --connect-timeout 6000 \
--max-time 5000 \
http://localhost:7777/lume/vms/lume_vm?storage=ssd
```
```
{
"name": "lume_vm",
"state": "running",
"os": "macOS",
"cpu": 2,
"memory": "4GB",
"diskSize": "64GB"
}
```
</details>
<details open>
<summary><strong>Update VM Settings</strong> - PATCH /vms/:name</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
-X PATCH \
-H "Content-Type: application/json" \
-d '{
"cpu": 4,
"memory": "8GB",
"diskSize": "128GB",
"storage": "ssd"
}' \
http://localhost:7777/lume/vms/my-vm-name
```
</details>
<details open>
<summary><strong>Stop VM</strong> - POST /vms/:name/stop</summary>
```bash
# Basic stop
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
http://localhost:7777/lume/vms/my-vm-name/stop
# Stop with storage location specified
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
http://localhost:7777/lume/vms/my-vm-name/stop?storage=ssd
```
</details>
<details open>
<summary><strong>Delete VM</strong> - DELETE /vms/:name</summary>
```bash
# Basic delete
curl --connect-timeout 6000 \
--max-time 5000 \
-X DELETE \
http://localhost:7777/lume/vms/my-vm-name
# Delete with storage location specified
curl --connect-timeout 6000 \
--max-time 5000 \
-X DELETE \
http://localhost:7777/lume/vms/my-vm-name?storage=ssd
```
</details>
<details open>
<summary><strong>Pull Image</strong> - POST /pull</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
-H "Content-Type: application/json" \
-d '{
"image": "macos-sequoia-vanilla:latest",
"name": "my-vm-name",
"registry": "ghcr.io",
"organization": "trycua",
"storage": "ssd"
}' \
http://localhost:7777/lume/pull
```
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
-H "Content-Type: application/json" \
-d '{
"image": "macos-sequoia-vanilla:15.2",
"name": "macos-sequoia-vanilla"
}' \
http://localhost:7777/lume/pull
```
</details>
<details open>
<summary><strong>Push Image (Async)</strong> - POST /vms/push</summary>
```bash
# Push VM 'my-local-vm' to 'my-org/my-image:latest' and 'my-org/my-image:v1'
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
-H "Content-Type: application/json" \
-d '{
"name": "my-local-vm",
"imageName": "my-image",
"tags": ["latest", "v1"],
"organization": "my-org",
"registry": "ghcr.io",
"chunkSizeMb": 512,
"storage": null
}' \
http://localhost:7777/lume/vms/push
```
**Response (202 Accepted):**
```json
{
"message": "Push initiated in background",
"name": "my-local-vm",
"imageName": "my-image",
"tags": [
"latest",
"v1"
]
}
```
</details>
<details open>
<summary><strong>Clone VM</strong> - POST /vms/:name/clone</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
-H "Content-Type: application/json" \
-d '{
"name": "source-vm",
"newName": "cloned-vm",
"sourceLocation": "default",
"destLocation": "ssd"
}' \
http://localhost:7777/lume/vms/clone
```
</details>
<details open>
<summary><strong>Get Latest IPSW URL</strong> - GET /ipsw</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
http://localhost:7777/lume/ipsw
```
</details>
<details open>
<summary><strong>List Images</strong> - GET /images</summary>
```bash
# List images with default organization (trycua)
curl --connect-timeout 6000 \
--max-time 5000 \
http://localhost:7777/lume/images
```
```json
{
"local": [
"macos-sequoia-xcode:latest",
"macos-sequoia-vanilla:latest"
]
}
```
</details>
<details open>
<summary><strong>Prune Images</strong> - POST /lume/prune</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
http://localhost:7777/lume/prune
```
</details>
<details open>
<summary><strong>Get Configuration</strong> - GET /lume/config</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
http://localhost:7777/lume/config
```
```json
{
"homeDirectory": "~/.lume",
"cacheDirectory": "~/.lume/cache",
"cachingEnabled": true
}
```
</details>
<details open>
<summary><strong>Update Configuration</strong> - POST /lume/config</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
-H "Content-Type: application/json" \
-d '{
"homeDirectory": "~/custom/lume",
"cacheDirectory": "~/custom/lume/cache",
"cachingEnabled": true
}' \
http://localhost:7777/lume/config
```
</details>
<details open>
<summary><strong>Get VM Storage Locations</strong> - GET /lume/config/locations</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
http://localhost:7777/lume/config/locations
```
```json
[
{
"name": "default",
"path": "~/.lume/vms",
"isDefault": true
},
{
"name": "ssd",
"path": "/Volumes/SSD/lume/vms",
"isDefault": false
}
]
```
</details>
<details open>
<summary><strong>Add VM Storage Location</strong> - POST /lume/config/locations</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
-H "Content-Type: application/json" \
-d '{
"name": "ssd",
"path": "/Volumes/SSD/lume/vms"
}' \
http://localhost:7777/lume/config/locations
```
</details>
<details open>
<summary><strong>Remove VM Storage Location</strong> - DELETE /lume/config/locations/:name</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
-X DELETE \
http://localhost:7777/lume/config/locations/ssd
```
</details>
<details open>
<summary><strong>Set Default VM Storage Location</strong> - POST /lume/config/locations/default/:name</summary>
```bash
curl --connect-timeout 6000 \
--max-time 5000 \
-X POST \
http://localhost:7777/lume/config/locations/default/ssd
```
</details>

View File

@@ -0,0 +1,48 @@
---
title: Development Guide
---
# Development Guide
This guide will help you set up your development environment and understand the process for contributing code to lume.
## Environment Setup
Lume development requires:
- Swift 6 or higher
- Xcode 15 or higher
- macOS Sequoia 15.2 or higher
- (Optional) VS Code with Swift extension
## Setting Up the Repository Locally
1. **Fork the Repository**: Create your own fork of lume
2. **Clone the Repository**:
```bash
git clone https://github.com/trycua/lume.git
cd lume
```
3. **Install Dependencies**:
```bash
swift package resolve
```
4. **Build the Project**:
```bash
swift build
```
## Development Workflow
1. Create a new branch for your changes
2. Make your changes
3. Run the tests: `swift test`
4. Build and test your changes locally
5. Commit your changes with clear commit messages
## Submitting Pull Requests
1. Push your changes to your fork
2. Open a Pull Request with:
- A clear title and description
- Reference to any related issues
- Screenshots or logs if relevant
3. Respond to any feedback from maintainers

View File

@@ -0,0 +1,117 @@
---
title: FAQs
---
# FAQs
### Where are the VMs stored?
VMs are stored in `~/.lume` by default. You can configure additional storage locations using the `lume config` command.
### How are images cached?
Images are cached in `~/.lume/cache`. When doing `lume pull <image>`, it will check if the image is already cached. If not, it will download the image and cache it, removing any older versions.
### Where is the configuration file stored?
Lume follows the XDG Base Directory specification for the configuration file:
- Configuration is stored in `$XDG_CONFIG_HOME/lume/config.yaml` (defaults to `~/.config/lume/config.yaml`)
By default, other data is stored in:
- VM data: `~/.lume`
- Cache files: `~/.lume/cache`
The config file contains settings for:
- VM storage locations and the default location
- Cache directory location
- Whether caching is enabled
You can view and modify these settings using the `lume config` commands:
```bash
# View current configuration
lume config get
# Manage VM storage locations
lume config storage list # List all VM storage locations
lume config storage add <name> <path> # Add a new VM storage location
lume config storage remove <name> # Remove a VM storage location
lume config storage default <name> # Set the default VM storage location
# Manage cache settings
lume config cache get # Get current cache directory
lume config cache set <path> # Set cache directory
# Manage image caching settings
lume config caching get # Show current caching status
lume config caching set <boolean> # Enable or disable image caching
```
### How do I use multiple VM storage locations?
Lume supports storing VMs in different locations (e.g., internal drive, external SSD). After configuring storage locations, you can specify which location to use with the `--storage` parameter in various commands:
```bash
# Create a VM in a specific storage location
lume create my-vm --os macos --ipsw latest --storage ssd
# Run a VM from a specific storage location
lume run my-vm --storage ssd
# Delete a VM from a specific storage location
lume delete my-vm --storage ssd
# Pull an image to a specific storage location
lume pull macos-sequoia-vanilla:latest --name my-vm --storage ssd
# Clone a VM between storage locations
lume clone source-vm cloned-vm --source-storage default --dest-storage ssd
```
If you don't specify a storage location, Lume will use the default one or search across all configured locations.
### Are VM disks taking up all the disk space?
No, macOS uses sparse files, which only allocate space as needed. For example, VM disks totaling 50 GB may only use 20 GB on disk.
### How do I get the latest macOS restore image URL?
```bash
lume ipsw
```
### How do I delete a VM?
```bash
lume delete <name>
```
### How to Install macOS from an IPSW Image
#### Create a new macOS VM using the latest supported IPSW image:
Run the following command to create a new macOS virtual machine using the latest available IPSW image:
```bash
lume create <name> --os macos --ipsw latest
```
#### Create a new macOS VM using a specific IPSW image:
To create a macOS virtual machine from an older or specific IPSW file, first download the desired IPSW (UniversalMac) from a trusted source.
Then, use the downloaded IPSW path:
```bash
lume create <name> --os macos --ipsw <downloaded_ipsw_path>
```
### How do I install a custom Linux image?
The process for creating a custom Linux image differs than macOS, with IPSW restore files not being used. You need to create a linux VM first, then mount a setup image file to the VM for the first boot.
```bash
lume create <name> --os linux
lume run <name> --mount <path-to-setup-image>
lume run <name>
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 374 KiB

View File

@@ -0,0 +1,201 @@
---
title: Lume
---
<div align="center" style={{display: 'flex', gap: '10px', margin: '0 auto', width: '100%', justifyContent: 'center'}}>
<a href="#"><img src="https://img.shields.io/badge/Swift_6-F54A2A?logo=swift&logoColor=white&labelColor=F54A2A" alt="Swift 6" /></a>
<a href="#"><img src="https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0" alt="macOS" /></a>
<a href="https://discord.com/invite/mVnXXpdE85"><img src="https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white" alt="Discord" /></a>
</div>
**lume** is a lightweight Command Line Interface and local API server to create, run and manage macOS and Linux virtual machines (VMs) with near-native performance on Apple Silicon, using Apple's `Virtualization.Framework`.
### Run prebuilt macOS images in just 1 step
<div align="center">
<img src="./cli.png" alt="lume cli"/>
</div>
```bash
lume run macos-sequoia-vanilla:latest
```
## Development Environment
If you're working on Lume in the context of the CUA monorepo, we recommend using the dedicated VS Code workspace configuration:
```bash
# Open VS Code workspace from the root of the monorepo
code .vscode/lume.code-workspace
```
This workspace is preconfigured with Swift language support, build tasks, and debug configurations.
## Usage
```bash
lume <command>
Commands:
lume create <name> Create a new macOS or Linux VM
lume run <name> Run a VM
lume ls List all VMs
lume get <name> Get detailed information about a VM
lume set <name> Modify VM configuration
lume stop <name> Stop a running VM
lume delete <name> Delete a VM
lume pull <image> Pull a macOS image from container registry
lume push <name> <image:tag> Push a VM image to a container registry
lume clone <name> <new-name> Clone an existing VM
lume config Get or set lume configuration
lume images List available macOS images in local cache
lume ipsw Get the latest macOS restore image URL
lume prune Remove cached images
lume serve Start the API server
Options:
--help Show help [boolean]
--version Show version number [boolean]
Command Options:
create:
--os <os> Operating system to install (macOS or linux, default: macOS)
--cpu <cores> Number of CPU cores (default: 4)
--memory <size> Memory size, e.g., 8GB (default: 4GB)
--disk-size <size> Disk size, e.g., 50GB (default: 40GB)
--display <res> Display resolution (default: 1024x768)
--ipsw <path> Path to IPSW file or 'latest' for macOS VMs
--storage <name> VM storage location to use
run:
--no-display Do not start the VNC client app
--shared-dir <dir> Share directory with VM (format: path[:ro|rw])
--mount <path> For Linux VMs only, attach a read-only disk image
--registry <url> Container registry URL (default: ghcr.io)
--organization <org> Organization to pull from (default: trycua)
--vnc-port <port> Port to use for the VNC server (default: 0 for auto-assign)
--recovery-mode <boolean> For MacOS VMs only, start VM in recovery mode (default: false)
--storage <name> VM storage location to use
set:
--cpu <cores> New number of CPU cores (e.g., 4)
--memory <size> New memory size (e.g., 8192MB or 8GB)
--disk-size <size> New disk size (e.g., 40960MB or 40GB)
--display <res> New display resolution in format WIDTHxHEIGHT (e.g., 1024x768)
--storage <name> VM storage location to use
delete:
--force Force deletion without confirmation
--storage <name> VM storage location to use
pull:
--registry <url> Container registry URL (default: ghcr.io)
--organization <org> Organization to pull from (default: trycua)
--storage <name> VM storage location to use
push:
--additional-tags <tags...> Additional tags to push the same image to
--registry <url> Container registry URL (default: ghcr.io)
--organization <org> Organization/user to push to (default: trycua)
--storage <name> VM storage location to use
--chunk-size-mb <size> Chunk size for disk image upload in MB (default: 512)
--verbose Enable verbose logging
--dry-run Prepare files and show plan without uploading
--reassemble Verify integrity by reassembling chunks (requires --dry-run)
get:
-f, --format <format> Output format (json|text)
--storage <name> VM storage location to use
stop:
--storage <name> VM storage location to use
clone:
--source-storage <name> Source VM storage location
--dest-storage <name> Destination VM storage location
config:
get Get current configuration
storage Manage VM storage locations
add <name> <path> Add a new VM storage location
remove <name> Remove a VM storage location
list List all VM storage locations
default <name> Set the default VM storage location
cache Manage cache settings
get Get current cache directory
set <path> Set cache directory
caching Manage image caching settings
get Show current caching status
set <boolean> Enable or disable image caching
serve:
--port <port> Port to listen on (default: 7777)
```
## Install
Install with a single command:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
```
By default, Lume is installed as a background service that starts automatically on login. If you prefer to start the Lume API service manually when needed, you can use the `--no-background-service` option:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh) --no-background-service"
```
**Note:** With this option, you'll need to manually start the Lume API service by running `lume serve` in your terminal whenever you need to use tools or libraries that rely on the Lume API (such as the Computer-Use Agent).
You can also download the `lume.pkg.tar.gz` archive from the [latest release](https://github.com/trycua/lume/releases), extract it, and install the package manually.
## Prebuilt Images
Pre-built images are available in the registry [ghcr.io/trycua](https://github.com/orgs/trycua/packages).
**Important Note (v0.2.0+):** Images are being re-uploaded with sparse file system optimizations enabled, resulting in significantly lower actual disk usage. Older images (without the `-sparse` suffix) are now **deprecated**. The last version of `lume` fully supporting the non-sparse images was `v0.1.x`. Starting from `v0.2.0`, lume will automatically pull images optimized with sparse file system support.
These images come with an SSH server pre-configured and auto-login enabled.
For the security of your VM, change the default password `lume` immediately after your first login.
| Image | Tag | Description | Logical Size |
|-------|------------|-------------|------|
| `macos-sequoia-vanilla` | `latest`, `15.2` | macOS Sequoia 15.2 image | 20GB |
| `macos-sequoia-xcode` | `latest`, `15.2` | macOS Sequoia 15.2 image with Xcode command line tools | 22GB |
| `macos-sequoia-cua` | `latest`, `15.3` | macOS Sequoia 15.3 image compatible with the Computer interface | 24GB |
| `ubuntu-noble-vanilla` | `latest`, `24.04.1` | [Ubuntu Server for ARM 24.04.1 LTS](https://ubuntu.com/download/server/arm) with Ubuntu Desktop | 20GB |
For additional disk space, resize the VM disk after pulling the image using the `lume set <name> --disk-size <size>` command. Note that the actual disk space used by sparse images will be much lower than the logical size listed.
## Local API Server
`lume` exposes a local HTTP API server that listens on `http://localhost:7777/lume`, enabling automated management of VMs.
```bash
lume serve
```
For detailed API documentation, please refer to [API Reference](./API-Reference).
## Docs
- [API Reference](./API-Reference)
- [Development](./Development)
- [FAQ](./FAQ)
## Contributing
We welcome and greatly appreciate contributions to lume! Whether you're improving documentation, adding new features, fixing bugs, or adding new VM images, your efforts help make lume better for everyone. For detailed instructions on how to contribute, please refer to our [Contributing Guidelines](CONTRIBUTING.md).
Join our [Discord community](https://discord.com/invite/mVnXXpdE85) to discuss ideas or get assistance.
## License
lume is open-sourced under the MIT License - see the [LICENSE](https://github.com/trycua/cua/blob/main/LICENSE.md) file for details.
## Trademarks
Apple, macOS, and Apple Silicon are trademarks of Apple Inc. Ubuntu and Canonical are registered trademarks of Canonical Ltd. This project is not affiliated with, endorsed by, or sponsored by Apple Inc. or Canonical Ltd.

View File

@@ -0,0 +1,259 @@
---
title: Lumier
---
<div align="center" style={{display: 'flex', gap: '10px', margin: '0 auto', width: '100%', justifyContent: 'center'}}>
<a href="#"><img src="https://img.shields.io/badge/Swift_6-F54A2A?logo=swift&logoColor=white&labelColor=F54A2A" alt="Swift 6" /></a>
<a href="#"><img src="https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0" alt="macOS" /></a>
<a href="https://discord.com/invite/mVnXXpdE85"><img src="https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white" alt="Discord" /></a>
</div>
macOS and Linux virtual machines in a Docker container.
<div align="center">
<video src="https://github.com/user-attachments/assets/2ecca01c-cb6f-4c35-a5a7-69bc58bd94e2" width="800" controls></video>
</div>
## What is Lumier?
**Lumier** is an interface for running macOS virtual machines with minimal setup. It uses Docker as a packaging system to deliver a pre-configured environment that connects to the `lume` virtualization service running on your host machine. With Lumier, you get:
- A ready-to-use macOS or Linux virtual machine in minutes
- Browser-based VNC access to your VM
- Easy file sharing between your host and VM
- Simple configuration through environment variables
## Requirements
Before using Lumier, make sure you have:
1. **Docker for Apple Silicon** - download it [here](https://desktop.docker.com/mac/main/arm64/Docker.dmg) and follow the installation instructions.
2. **Lume** - This is the virtualization CLI that powers Lumier. Install it with this command:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
```
After installation, Lume runs as a background service and listens on port 7777. This service allows Lumier to create and manage virtual machines. If port 7777 is already in use on your system, you can specify a different port with the `--port` option when running the `install.sh` script.
## How It Works
> **Note:** We're using Docker primarily as a convenient delivery mechanism, not as an isolation layer. Unlike traditional Docker containers, Lumier leverages the Apple Virtualization Framework (Apple Vz) through the `lume` CLI to create true virtual machines.
Here's what's happening behind the scenes:
1. The Docker container provides a consistent environment to run the Lumier interface
2. Lumier connects to the Lume service running on your host Mac
3. Lume uses Apple's Virtualization Framework to create a true macOS virtual machine
4. The VM runs with hardware acceleration using your Mac's native virtualization capabilities
## Getting Started
```bash
# Run the container with temporary storage (using pre-built image from Docker Hub)
docker run -it --rm \
--name macos-vm \
-p 8006:8006 \
-e VM_NAME=macos-vm \
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
-e CPU_CORES=4 \
-e RAM_SIZE=8192 \
trycua/lumier:latest
```
After running the command above, you can access your macOS VM through a web browser (e.g., http://localhost:8006).
> **Note:** With the basic setup above, your VM will be reset when you stop the container (ephemeral mode). This means any changes you make inside the macOS VM will be lost. See the section below for how to save your VM state.
## Saving Your VM State
To save your VM state between sessions (so your changes persist when you stop and restart the container), you'll need to set up a storage location:
```bash
# First, create a storage directory if it doesn't exist
mkdir -p storage
# Then run the container with persistent storage
docker run -it --rm \
--name lumier-vm \
-p 8006:8006 \
-v $(pwd)/storage:/storage \
-e VM_NAME=lumier-vm \
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
-e CPU_CORES=4 \
-e RAM_SIZE=8192 \
-e HOST_STORAGE_PATH=$(pwd)/storage \
trycua/lumier:latest
```
This command creates a connection between a folder on your Mac (`$(pwd)/storage`) and a folder inside the Docker container (`/storage`). The `-v` flag (volume mount) and the `HOST_STORAGE_PATH` variable work together to ensure your VM data is saved on your host Mac.
## Sharing Files with Your VM
To share files between your Mac and the virtual machine, you can set up a shared folder:
```bash
# Create both storage and shared folders
mkdir -p storage shared
# Run with both persistent storage and a shared folder
docker run -it --rm \
--name lumier-vm \
-p 8006:8006 \
-v $(pwd)/storage:/storage \
-v $(pwd)/shared:/shared \
-e VM_NAME=lumier-vm \
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
-e CPU_CORES=4 \
-e RAM_SIZE=8192 \
-e HOST_STORAGE_PATH=$(pwd)/storage \
-e HOST_SHARED_PATH=$(pwd)/shared \
trycua/lumier:latest
```
With this setup, any files you place in the `shared` folder on your Mac will be accessible from within the macOS VM, and vice versa.
## Automating VM Startup with on-logon.sh
You can automatically run scripts when the VM starts up by placing an `on-logon.sh` script in the shared folder's lifecycle directory. This is useful for setting up your VM environment each time it starts.
```bash
# Create the lifecycle directory in your shared folder
mkdir -p shared/lifecycle
# Create a sample on-logon.sh script
cat > shared/lifecycle/on-logon.sh << 'EOF'
#!/usr/bin/env bash
# Create a file on the desktop
echo "Hello from Lumier!" > /Users/lume/Desktop/hello_lume.txt
# You can add more commands to execute at VM startup
# For example:
# - Configure environment variables
# - Start applications
# - Mount network drives
# - Set up development environments
EOF
# Make the script executable
chmod +x shared/lifecycle/on-logon.sh
```
The script will be automatically executed when the VM starts up. It runs in the VM context and has access to:
- The `/Users/lume` user directory (home directory in the VM)
- The shared folder at `/Volumes/My Shared Files` inside the VM
- Any resources available to the VM
This feature enables automation of VM setup without modifying the base VM image.
## Using Docker Compose
You can also use Docker Compose to run Lumier with a simple configuration file. Create a `docker-compose.yml` file with the following content:
```yaml
version: '3'
services:
lumier:
image: trycua/lumier:latest
container_name: lumier-vm
restart: unless-stopped
ports:
- "8006:8006" # Port for VNC access
volumes:
- ./storage:/storage # VM persistent storage
- ./shared:/shared # Shared folder accessible in the VM
environment:
- VM_NAME=lumier-vm
- VERSION=ghcr.io/trycua/macos-sequoia-cua:latest
- CPU_CORES=4
- RAM_SIZE=8192
- HOST_STORAGE_PATH=${PWD}/storage
- HOST_SHARED_PATH=${PWD}/shared
stop_signal: SIGINT
stop_grace_period: 2m
```
Then run Lumier using:
```bash
# First create the required directories
mkdir -p storage shared
# Start the container
docker-compose up -d
# View the logs
docker-compose logs -f
# Stop the container when done
docker-compose down
```
## Building and Customizing Lumier
If you want to customize the Lumier container or build it from source, you can follow these steps:
```bash
# 1. Navigate to the Lumier directory
cd libs/lumier
# 2. Build the Docker image locally
docker build -t lumier-custom:latest .
# 3. Run your custom build
docker run -it --rm \
--name lumier-vm \
-p 8006:8006 \
-e VM_NAME=lumier-vm \
-e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
-e CPU_CORES=4 \
-e RAM_SIZE=8192 \
lumier-custom:latest
```
### Customization Options
The Dockerfile provides several customization points:
1. **Base image**: The container uses Debian Bullseye Slim as the base. You can modify this if needed.
2. **Installed packages**: You can add or remove packages in the apt-get install list.
3. **Hooks**: Check the `/run/hooks/` directory for scripts that run at specific points during VM lifecycle.
4. **Configuration**: Review `/run/config/constants.sh` for default settings.
After making your modifications, you can build and push your custom image to your own Docker Hub repository:
```bash
# Build with a custom tag
docker build -t yourusername/lumier:custom .
# Push to Docker Hub (after docker login)
docker push yourusername/lumier:custom
```
## Configuration Options
When running Lumier, you'll need to configure a few things:
- **Port forwarding** (`-p 8006:8006`): Makes the VM's VNC interface accessible in your browser. If port 8006 is already in use, you can use a different port like `-p 8007:8006`.
- **Environment variables** (`-e`): Configure your VM settings:
- `VM_NAME`: A name for your virtual machine
- `VERSION`: The macOS image to use
- `CPU_CORES`: Number of CPU cores to allocate
- `RAM_SIZE`: Memory in MB to allocate
- `HOST_STORAGE_PATH`: Path to save VM state (when using persistent storage)
- `HOST_SHARED_PATH`: Path to the shared folder (optional)
- **Background service**: The `lume serve` service should be running on your host (starts automatically when you install Lume using the `install.sh` script above).
## Credits
This project was inspired by [dockur/windows](https://github.com/dockur/windows) and [dockur/macos](https://github.com/dockur/macos), which pioneered the approach of running Windows and macOS VMs in Docker containers.
Main differences with dockur/macos:
- Lumier is specifically designed for macOS virtualization
- Lumier supports Apple Silicon (M1/M2/M3/M4) while dockur/macos only supports Intel
- Lumier uses the Apple Virtualization Framework (Vz) through the `lume` CLI to create true virtual machines, while dockur relies on KVM.
- Image specification is different, with Lumier and Lume relying on Apple Vz spec (disk.img and nvram.bin)

View File

@@ -0,0 +1,161 @@
---
title: MCP Server
---
<div align="center" style={{display: 'flex', gap: '10px', margin: '0 auto', width: '100%', justifyContent: 'center'}}>
<a href="#"><img src="https://img.shields.io/badge/Swift_6-F54A2A?logo=swift&logoColor=white&labelColor=F54A2A" alt="Swift 6" /></a>
<a href="#"><img src="https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0" alt="macOS" /></a>
<a href="https://discord.com/invite/mVnXXpdE85"><img src="https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white" alt="Discord" /></a>
<a href="https://pypi.org/project/cua-computer/"><img src="https://img.shields.io/pypi/v/cua-computer?color=333333" alt="Python" /></a>
</div>
**cua-mcp-server** is a MCP server for the Computer-Use Agent (CUA), allowing you to run CUA through Claude Desktop or other MCP clients.
### Get started with Agent
## Prerequisites
Before installing the MCP server, you'll need to set up the full Computer-Use Agent capabilities as described in [Option 2 of the main README](../../README.md#option-2-full-computer-use-agent-capabilities). This includes:
1. Installing the Lume CLI
2. Pulling the latest macOS CUA image
3. Starting the Lume daemon service
4. Installing the required Python libraries (Optional: only needed if you want to verify the agent is working before installing MCP server)
Make sure these steps are completed and working before proceeding with the MCP server installation.
## Installation
Install the package from PyPI:
```bash
pip install cua-mcp-server
```
This will install:
- The MCP server
- CUA agent and computer dependencies
- An executable `cua-mcp-server` script in your PATH
## Easy Setup Script
If you want to simplify installation, you can use this one-liner to download and run the installation script:
```bash
curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/mcp-server/scripts/install_mcp_server.sh | bash
```
This script will:
- Create the ~/.cua directory if it doesn't exist
- Generate a startup script at ~/.cua/start_mcp_server.sh
- Make the script executable
- The startup script automatically manages Python virtual environments and installs/updates the cua-mcp-server package
You can then use the script in your MCP configuration like this:
```json
{
"mcpServers": {
"cua-agent": {
"command": "/bin/bash",
"args": ["~/.cua/start_mcp_server.sh"],
"env": {
"CUA_AGENT_LOOP": "OMNI",
"CUA_MODEL_PROVIDER": "ANTHROPIC",
"CUA_MODEL_NAME": "claude-3-7-sonnet-20250219",
"CUA_PROVIDER_API_KEY": "your-api-key"
}
}
}
}
```
## Development Guide
If you want to develop with the cua-mcp-server directly without installation, you can use this configuration:
```json
{
"mcpServers": {
"cua-agent": {
"command": "/bin/bash",
"args": ["~/cua/libs/mcp-server/scripts/start_mcp_server.sh"],
"env": {
"CUA_AGENT_LOOP": "UITARS",
"CUA_MODEL_PROVIDER": "OAICOMPAT",
"CUA_MODEL_NAME": "ByteDance-Seed/UI-TARS-1.5-7B",
"CUA_PROVIDER_BASE_URL": "https://****************.us-east-1.aws.endpoints.huggingface.cloud/v1",
"CUA_PROVIDER_API_KEY": "your-api-key"
}
}
}
}
```
This configuration:
- Uses the start_mcp_server.sh script which automatically sets up the Python path and runs the server module
- Works with Claude Desktop, Cursor, or any other MCP client
- Automatically uses your development code without requiring installation
Just add this to your MCP client's configuration and it will use your local development version of the server.
### Troubleshooting
If you get a `/bin/bash: ~/cua/libs/mcp-server/scripts/start_mcp_server.sh: No such file or directory` error, try changing the path to the script to be absolute instead of relative.
To see the logs:
```
tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
```
## Claude Desktop Integration
To use with Claude Desktop, add an entry to your Claude Desktop configuration (`claude_desktop_config.json`, typically found in `~/.config/claude-desktop/`):
For more information on MCP with Claude Desktop, see the [official MCP User Guide](https://modelcontextprotocol.io/quickstart/user).
## Cursor Integration
To use with Cursor, add an MCP configuration file in one of these locations:
- **Project-specific**: Create `.cursor/mcp.json` in your project directory
- **Global**: Create `~/.cursor/mcp.json` in your home directory
After configuration, you can simply tell Cursor's Agent to perform computer tasks by explicitly mentioning the CUA agent, such as "Use the computer control tools to open Safari."
For more information on MCP with Cursor, see the [official Cursor MCP documentation](https://docs.cursor.com/context/model-context-protocol).
### First-time Usage Notes
**API Keys**: Ensure you have valid API keys:
- Add your Anthropic API key, or other model provider API key in the Claude Desktop config (as shown above)
- Or set it as an environment variable in your shell profile
## Configuration
The server is configured using environment variables (can be set in the Claude Desktop config):
| Variable | Description | Default |
|----------|-------------|---------|
| `CUA_AGENT_LOOP` | Agent loop to use (OPENAI, ANTHROPIC, UITARS, OMNI) | OMNI |
| `CUA_MODEL_PROVIDER` | Model provider (ANTHROPIC, OPENAI, OLLAMA, OAICOMPAT) | ANTHROPIC |
| `CUA_MODEL_NAME` | Model name to use | None (provider default) |
| `CUA_PROVIDER_BASE_URL` | Base URL for provider API | None |
| `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 |
## Available Tools
The MCP server exposes the following tools to Claude:
1. `run_cua_task` - Run a single Computer-Use Agent task with the given instruction
2. `run_multi_cua_tasks` - Run multiple tasks in sequence
## Usage
Once configured, you can simply ask Claude to perform computer tasks:
- "Open Chrome and go to github.com"
- "Create a folder called 'Projects' on my desktop"
- "Find all PDFs in my Downloads folder"
- "Take a screenshot and highlight the error message"
Claude will automatically use your CUA agent to perform these tasks.

View File

@@ -0,0 +1,45 @@
---
title: PyLume
---
<div align="center" style={{display: 'flex', gap: '10px', margin: '0 auto', width: '100%', justifyContent: 'center'}}>
<a href="#"><img src="https://img.shields.io/badge/Python-333333?logo=python&logoColor=white&labelColor=333333" alt="Python" /></a>
<a href="#"><img src="https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0" alt="macOS" /></a>
<a href="https://discord.com/invite/mVnXXpdE85"><img src="https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white" alt="Discord" /></a>
<a href="https://pypi.org/project/pylume/"><img src="https://img.shields.io/pypi/v/pylume?color=333333" alt="PyPI" /></a>
</div>
**pylume** is a lightweight Python library based on [lume](https://github.com/trycua/lume) to create, run and manage macOS and Linux virtual machines (VMs) natively on Apple Silicon.
<div align="center">
<img src="img/py.png" alt="lume-py"/>
</div>
```bash
pip install pylume
```
## Usage
Please refer to this [Notebook](https://github.com/trycua/cua/blob/main/notebooks/pylume_nb.ipynb) for a quickstart. More details about the underlying API used by pylume are available [here](https://github.com/trycua/lume/docs/API-Reference.md).
## Prebuilt Images
Pre-built images are available on [ghcr.io/trycua](https://github.com/orgs/trycua/packages).
These images come pre-configured with an SSH server and auto-login enabled.
## Contributing
We welcome and greatly appreciate contributions to lume! Whether you're improving documentation, adding new features, fixing bugs, or adding new VM images, your efforts help make pylume better for everyone.
Join our [Discord community](https://discord.com/invite/mVnXXpdE85) to discuss ideas or get assistance.
## License
lume is open-sourced under the MIT License - see the [LICENSE](LICENSE) file for details.
## Stargazers over time
[![Stargazers over time](https://starchart.cc/trycua/pylume.svg?variant=adaptive)](https://starchart.cc/trycua/pylume)

View File

@@ -0,0 +1,178 @@
---
title: Set-of-Mark
---
<div align="center" style={{display: 'flex', gap: '10px', margin: '0 auto', width: '100%', justifyContent: 'center'}}>
<a href="#"><img src="https://img.shields.io/badge/Python-333333?logo=python&logoColor=white&labelColor=333333" alt="Python" /></a>
<a href="#"><img src="https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0" alt="macOS" /></a>
<a href="https://discord.com/invite/mVnXXpdE85"><img src="https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white" alt="Discord" /></a>
<a href="https://pypi.org/project/cua-computer/"><img src="https://img.shields.io/pypi/v/cua-computer?color=333333" alt="PyPI" /></a>
</div>
**Som** (Set-of-Mark) is a visual grounding component for the Computer-Use Agent (CUA) framework powering Cua, for detecting and analyzing UI elements in screenshots. Optimized for macOS Silicon with Metal Performance Shaders (MPS), it combines YOLO-based icon detection with EasyOCR text recognition to provide comprehensive UI element analysis.
## Features
- Optimized for Apple Silicon with MPS acceleration
- Icon detection using YOLO with multi-scale processing
- Text recognition using EasyOCR (GPU-accelerated)
- Automatic hardware detection (MPS → CUDA → CPU)
- Smart detection parameters tuned for UI elements
- Detailed visualization with numbered annotations
- Performance benchmarking tools
## System Requirements
- **Recommended**: macOS with Apple Silicon
- Uses Metal Performance Shaders (MPS)
- Multi-scale detection enabled
- ~0.4s average detection time
- **Supported**: Any Python 3.11+ environment
- Falls back to CPU if no GPU available
- Single-scale detection on CPU
- ~1.3s average detection time
## Installation
```bash
# Using PDM (recommended)
pdm install
# Using pip
pip install -e .
```
## Quick Start
```python
from som import OmniParser
from PIL import Image
# Initialize parser
parser = OmniParser()
# Process an image
image = Image.open("screenshot.png")
result = parser.parse(
image,
box_threshold=0.3, # Confidence threshold
iou_threshold=0.1, # Overlap threshold
use_ocr=True # Enable text detection
)
# Access results
for elem in result.elements:
if elem.type == "icon":
print(f"Icon: confidence={elem.confidence:.3f}, bbox={elem.bbox.coordinates}")
else: # text
print(f"Text: '{elem.content}', confidence={elem.confidence:.3f}")
```
## Configuration
### Detection Parameters
#### Box Threshold (0.3)
Controls the confidence threshold for accepting detections:
```
High Threshold (0.3): Low Threshold (0.01):
+----------------+ +----------------+
| | | +--------+ |
| Confident | | |Unsure?| |
| Detection | | +--------+ |
| (✓ Accept) | | (? Reject) |
| | | |
+----------------+ +----------------+
conf = 0.85 conf = 0.02
```
- Higher values (0.3) yield more precise but fewer detections
- Lower values (0.01) catch more potential icons but increase false positives
- Default is 0.3 for optimal precision/recall balance
#### IOU Threshold (0.1)
Controls how overlapping detections are merged:
```
IOU = Intersection Area / Union Area
Low Overlap (Keep Both): High Overlap (Merge):
+----------+ +----------+
| Box1 | | Box1 |
| | vs. |+-----+ |
+----------+ ||Box2 | |
+----------+ |+-----+ |
| Box2 | +----------+
| |
+----------+
IOU ≈ 0.05 (Keep Both) IOU ≈ 0.7 (Merge)
```
- Lower values (0.1) more aggressively remove overlapping boxes
- Higher values (0.5) allow more overlapping detections
- Default is 0.1 to handle densely packed UI elements
### OCR Configuration
- **Engine**: EasyOCR
- Primary choice for all platforms
- Fast initialization and processing
- Built-in English language support
- GPU acceleration when available
- **Settings**:
- Timeout: 5 seconds
- Confidence threshold: 0.5
- Paragraph mode: Disabled
- Language: English only
## Performance
### Hardware Acceleration
#### MPS (Metal Performance Shaders)
- Multi-scale detection (640px, 1280px, 1920px)
- Test-time augmentation enabled
- Half-precision (FP16)
- Average detection time: ~0.4s
- Best for production use when available
#### CPU
- Single-scale detection (1280px)
- Full-precision (FP32)
- Average detection time: ~1.3s
- Reliable fallback option
### Example Output Structure
```
examples/output/
├── {timestamp}_no_ocr/
│ ├── annotated_images/
│ │ └── screenshot_analyzed.png
│ ├── screen_details.txt
│ └── summary.json
└── {timestamp}_ocr/
├── annotated_images/
│ └── screenshot_analyzed.png
├── screen_details.txt
└── summary.json
```
## Development
### Test Data
- Place test screenshots in `examples/test_data/`
- Not tracked in git to keep repository size manageable
- Default test image: `test_screen.png` (1920x1080)
### Running Tests
```bash
# Run benchmark with no OCR
python examples/omniparser_examples.py examples/test_data/test_screen.png --runs 5 --ocr none
# Run benchmark with OCR
python examples/omniparser_examples.py examples/test_data/test_screen.png --runs 5 --ocr easyocr
```
## License
MIT License - See LICENSE file for details.

View File

@@ -1,10 +1,20 @@
import { createMDX } from 'fumadocs-mdx/next';
import { createMDX } from "fumadocs-mdx/next";
const withMDX = createMDX();
/** @type {import('next').NextConfig} */
const config = {
reactStrictMode: true,
images: {
dangerouslyAllowSVG: true,
remotePatterns: [
{
protocol: "https",
hostname: "img.shields.io",
},
],
},
};
export default withMDX(config);