mirror of
https://github.com/trycua/computer.git
synced 2026-02-17 11:58:59 -06:00
Merge branch 'main' into feature/computer/extensions
This commit is contained in:
@@ -169,6 +169,15 @@
|
||||
"contributions": [
|
||||
"code"
|
||||
]
|
||||
},
|
||||
{
|
||||
"login": "evnsnclr",
|
||||
"name": "Evan smith",
|
||||
"avatar_url": "https://avatars.githubusercontent.com/u/139897548?v=4",
|
||||
"profile": "https://github.com/evnsnclr",
|
||||
"contributions": [
|
||||
"code"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
2
.github/workflows/publish-agent.yml
vendored
2
.github/workflows/publish-agent.yml
vendored
@@ -56,7 +56,7 @@ jobs:
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.10'
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Update dependencies to latest versions
|
||||
id: update-deps
|
||||
|
||||
2
.github/workflows/publish-computer.yml
vendored
2
.github/workflows/publish-computer.yml
vendored
@@ -54,7 +54,7 @@ jobs:
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.10'
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Update dependencies to latest versions
|
||||
id: update-deps
|
||||
|
||||
2
.github/workflows/publish-mcp-server.yml
vendored
2
.github/workflows/publish-mcp-server.yml
vendored
@@ -59,7 +59,7 @@ jobs:
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.10'
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Update dependencies to latest versions
|
||||
id: update-deps
|
||||
|
||||
4
.github/workflows/reusable-publish.yml
vendored
4
.github/workflows/reusable-publish.yml
vendored
@@ -52,7 +52,7 @@ jobs:
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.10'
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Create root pdm.lock file
|
||||
run: |
|
||||
@@ -62,7 +62,7 @@ jobs:
|
||||
- name: Install PDM
|
||||
uses: pdm-project/setup-pdm@v3
|
||||
with:
|
||||
python-version: '3.10'
|
||||
python-version: '3.11'
|
||||
cache: true
|
||||
|
||||
- name: Set version
|
||||
|
||||
158
README.md
158
README.md
@@ -9,44 +9,108 @@
|
||||
[](#)
|
||||
[](#)
|
||||
[](https://discord.com/invite/mVnXXpdE85)
|
||||
<br>
|
||||
<a href="https://trendshift.io/repositories/13685" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13685" alt="trycua%2Fcua | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
||||
</div>
|
||||
|
||||
**c/ua** (pronounced "koo-ah") enables AI agents to control full operating systems in high-performance virtual containers with near-native speed on Apple Silicon.
|
||||
**c/ua** ("koo-ah") is Docker for [Computer-Use Agents](https://www.oneusefulthing.org/p/when-you-give-a-claude-a-mouse) - it enables AI agents to control full operating systems in virtual containers and deploy them locally or to the cloud.
|
||||
|
||||
<div align="center">
|
||||
<video src="https://github.com/user-attachments/assets/c619b4ea-bb8e-4382-860e-f3757e36af20" width="800" controls></video>
|
||||
</div>
|
||||
<details>
|
||||
<summary><b>Check out more demos of the Computer-Use Agent in action
|
||||
</b></summary>
|
||||
|
||||
# 🚀 Quick Start
|
||||
<details open>
|
||||
<summary><b>MCP Server: Work with Claude Desktop and Tableau</b></summary>
|
||||
<br>
|
||||
<div align="center">
|
||||
<video src="https://github.com/user-attachments/assets/9f573547-5149-493e-9a72-396f3cff29df" width="800" controls></video>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
Get started with a Computer-Use Agent UI and a VM with a single command:
|
||||
<details>
|
||||
<summary><b>AI-Gradio: Multi-app workflow with browser, VS Code and terminal</b></summary>
|
||||
<br>
|
||||
<div align="center">
|
||||
<video src="https://github.com/user-attachments/assets/723a115d-1a07-4c8e-b517-88fbdf53ed0f" width="800" controls></video>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>Notebook: Fix GitHub issue in Cursor</b></summary>
|
||||
<br>
|
||||
<div align="center">
|
||||
<video src="https://github.com/user-attachments/assets/f67f0107-a1e1-46dc-aa9f-0146eb077077" width="800" controls></video>
|
||||
</div>
|
||||
</details>
|
||||
</details><br/>
|
||||
|
||||
# 🚀 Quick Start with a Computer-Use Agent UI
|
||||
|
||||
**Need to automate desktop tasks? Launch the Computer-Use Agent UI with a single command.**
|
||||
|
||||
|
||||
|
||||
### Option 1: Fully-managed install (recommended)
|
||||
*I want to be totally guided in the process*
|
||||
|
||||
**macOS/Linux/Windows (via WSL):**
|
||||
```bash
|
||||
# Requires Python 3.11+
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground.sh)"
|
||||
```
|
||||
|
||||
|
||||
This script will:
|
||||
- Install Lume CLI for VM management (if needed)
|
||||
- Pull the latest macOS CUA image (if needed)
|
||||
- Set up Python environment and install/update required packages
|
||||
- Ask if you want to use local VMs or C/ua Cloud Containers
|
||||
- Install necessary dependencies (Lume CLI for local VMs)
|
||||
- Download VM images if needed
|
||||
- Install Python packages
|
||||
- Launch the Computer-Use Agent UI
|
||||
|
||||
#### Supported [Agent Loops](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops)
|
||||
### Option 2: Key manual steps
|
||||
<details>
|
||||
<summary>If you are skeptical running one-install scripts</summary>
|
||||
|
||||
**For C/ua Agent UI (any system, cloud VMs only):**
|
||||
```bash
|
||||
# Requires Python 3.11+ and C/ua API key
|
||||
pip install -U "cua-computer[all]" "cua-agent[all]"
|
||||
python -m agent.ui.gradio.app
|
||||
```
|
||||
|
||||
**For Local macOS/Linux VMs (Apple Silicon only):**
|
||||
```bash
|
||||
# 1. Install Lume CLI
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
|
||||
|
||||
# 2. Pull macOS image
|
||||
lume pull macos-sequoia-cua:latest
|
||||
|
||||
# 3. Start VM
|
||||
lume run macos-sequoia-cua:latest
|
||||
|
||||
# 4. Install packages and launch UI
|
||||
pip install -U "cua-computer[all]" "cua-agent[all]"
|
||||
python -m agent.ui.gradio.app
|
||||
```
|
||||
</details>
|
||||
|
||||
---
|
||||
|
||||
*How it works: Computer module provides secure desktops (Lume CLI locally, [C/ua Cloud Containers](https://trycua.com) remotely), Agent module provides local/API agents with OpenAI AgentResponse format and [trajectory tracing](https://trycua.com/trajectory-viewer).*
|
||||
### Supported [Agent Loops](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops)
|
||||
- [UITARS-1.5](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Run locally on Apple Silicon with MLX, or use cloud providers
|
||||
- [OpenAI CUA](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Use OpenAI's Computer-Use Preview model
|
||||
- [Anthropic CUA](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Use Anthropic's Computer-Use capabilities
|
||||
- [OmniParser-v2.0](https://github.com/trycua/cua/blob/main/libs/agent/README.md#agent-loops) - Control UI with [Set-of-Marks prompting](https://som-gpt4v.github.io/) using any vision model
|
||||
|
||||
### System Requirements
|
||||
|
||||
- Mac with Apple Silicon (M1/M2/M3/M4 series)
|
||||
- macOS 15 (Sequoia) or newer
|
||||
- Disk space for VM images (30GB+ recommended)
|
||||
|
||||
|
||||
# 💻 For Developers
|
||||
# 💻 Developer Guide
|
||||
|
||||
Follow these steps to use C/ua in your own code. See [Developer Guide](./docs/Developer-Guide.md) for building from source.
|
||||
|
||||
### Step 1: Install Lume CLI
|
||||
|
||||
@@ -70,8 +134,6 @@ The macOS CUA image contains the default Mac apps and the Computer Server for ea
|
||||
pip install "cua-computer[all]" "cua-agent[all]"
|
||||
```
|
||||
|
||||
Alternatively, see the [Developer Guide](./docs/Developer-Guide.md) for building from source.
|
||||
|
||||
### Step 4: Use in Your Code
|
||||
|
||||
```python
|
||||
@@ -79,21 +141,29 @@ from computer import Computer
|
||||
from agent import ComputerAgent, LLM
|
||||
|
||||
async def main():
|
||||
# Start a local macOS VM with a 1024x768 display
|
||||
async with Computer(os_type="macos", display="1024x768") as computer:
|
||||
# Start a local macOS VM
|
||||
computer = Computer(os_type="macos")
|
||||
await computer.run()
|
||||
|
||||
# Example: Direct control of a macOS VM with Computer
|
||||
await computer.interface.left_click(100, 200)
|
||||
await computer.interface.type_text("Hello, world!")
|
||||
screenshot_bytes = await computer.interface.screenshot()
|
||||
|
||||
# Example: Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop="UITARS",
|
||||
model=LLM(provider="MLXVLM", name="mlx-community/UI-TARS-1.5-7B-6bit")
|
||||
)
|
||||
await agent.run("Find the trycua/cua repository on GitHub and follow the quick start guide")
|
||||
# Or with C/ua Cloud Container
|
||||
computer = Computer(
|
||||
os_type="linux",
|
||||
api_key="your_cua_api_key_here",
|
||||
name="your_container_name_here"
|
||||
)
|
||||
|
||||
# Example: Direct control of a macOS VM with Computer
|
||||
await computer.interface.left_click(100, 200)
|
||||
await computer.interface.type_text("Hello, world!")
|
||||
screenshot_bytes = await computer.interface.screenshot()
|
||||
|
||||
# Example: Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop="UITARS",
|
||||
model=LLM(provider="MLXVLM", name="mlx-community/UI-TARS-1.5-7B-6bit")
|
||||
)
|
||||
await agent.run("Find the trycua/cua repository on GitHub and follow the quick start guide")
|
||||
|
||||
main()
|
||||
```
|
||||
@@ -234,33 +304,6 @@ ComputerAgent(
|
||||
)
|
||||
```
|
||||
|
||||
## Demos
|
||||
|
||||
Check out these demos of the Computer-Use Agent in action:
|
||||
|
||||
<details open>
|
||||
<summary><b>MCP Server: Work with Claude Desktop and Tableau</b></summary>
|
||||
<br>
|
||||
<div align="center">
|
||||
<video src="https://github.com/user-attachments/assets/9f573547-5149-493e-9a72-396f3cff29df" width="800" controls></video>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>AI-Gradio: Multi-app workflow with browser, VS Code and terminal</b></summary>
|
||||
<br>
|
||||
<div align="center">
|
||||
<video src="https://github.com/user-attachments/assets/723a115d-1a07-4c8e-b517-88fbdf53ed0f" width="800" controls></video>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>Notebook: Fix GitHub issue in Cursor</b></summary>
|
||||
<br>
|
||||
<div align="center">
|
||||
<video src="https://github.com/user-attachments/assets/f67f0107-a1e1-46dc-aa9f-0146eb077077" width="800" controls></video>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
## Community
|
||||
|
||||
@@ -316,6 +359,7 @@ Thank you to all our supporters!
|
||||
<td align="center" valign="top" width="14.28%"><a href="https://mjspeck.github.io/"><img src="https://avatars.githubusercontent.com/u/20689127?v=4?s=100" width="100px;" alt="Matt Speck"/><br /><sub><b>Matt Speck</b></sub></a><br /><a href="#code-mjspeck" title="Code">💻</a></td>
|
||||
<td align="center" valign="top" width="14.28%"><a href="https://github.com/FinnBorge"><img src="https://avatars.githubusercontent.com/u/9272726?v=4?s=100" width="100px;" alt="FinnBorge"/><br /><sub><b>FinnBorge</b></sub></a><br /><a href="#code-FinnBorge" title="Code">💻</a></td>
|
||||
<td align="center" valign="top" width="14.28%"><a href="https://github.com/jklapacz"><img src="https://avatars.githubusercontent.com/u/5343758?v=4?s=100" width="100px;" alt="Jakub Klapacz"/><br /><sub><b>Jakub Klapacz</b></sub></a><br /><a href="#code-jklapacz" title="Code">💻</a></td>
|
||||
<td align="center" valign="top" width="14.28%"><a href="https://github.com/evnsnclr"><img src="https://avatars.githubusercontent.com/u/139897548?v=4?s=100" width="100px;" alt="Evan smith"/><br /><sub><b>Evan smith</b></sub></a><br /><a href="#code-evnsnclr" title="Code">💻</a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
@@ -62,7 +62,7 @@ Refer to the [Lume README](../libs/lume/docs/Development.md) for instructions on
|
||||
|
||||
## Python Development
|
||||
|
||||
There are two ways to instal Lume:
|
||||
There are two ways to install Lume:
|
||||
|
||||
### Run the build script
|
||||
|
||||
@@ -91,7 +91,7 @@ To install with PDM, simply run:
|
||||
pdm install -G:all
|
||||
```
|
||||
|
||||
This installs all the dependencies for development, testing, and building the docs. If you'd oly like development dependencies, you can run:
|
||||
This installs all the dependencies for development, testing, and building the docs. If you'd only like development dependencies, you can run:
|
||||
|
||||
```console
|
||||
pdm install -d
|
||||
@@ -200,11 +200,11 @@ The formatting configuration is defined in the root `pyproject.toml` file:
|
||||
```toml
|
||||
[tool.black]
|
||||
line-length = 100
|
||||
target-version = ["py310"]
|
||||
target-version = ["py311"]
|
||||
|
||||
[tool.ruff]
|
||||
line-length = 100
|
||||
target-version = "py310"
|
||||
target-version = "py311"
|
||||
select = ["E", "F", "B", "I"]
|
||||
fix = true
|
||||
|
||||
@@ -213,7 +213,7 @@ docstring-code-format = true
|
||||
|
||||
[tool.mypy]
|
||||
strict = true
|
||||
python_version = "3.10"
|
||||
python_version = "3.11"
|
||||
ignore_missing_imports = true
|
||||
disallow_untyped_defs = true
|
||||
check_untyped_defs = true
|
||||
@@ -225,7 +225,7 @@ warn_unused_ignores = false
|
||||
#### Key Formatting Rules
|
||||
|
||||
- **Line Length**: Maximum of 100 characters
|
||||
- **Python Version**: Code should be compatible with Python 3.10+
|
||||
- **Python Version**: Code should be compatible with Python 3.11+
|
||||
- **Imports**: Automatically sorted (using Ruff's "I" rule)
|
||||
- **Type Hints**: Required for all function definitions (strict mypy mode)
|
||||
|
||||
|
||||
@@ -10,7 +10,7 @@ CUA libraries collect minimal anonymous usage data to help improve our software.
|
||||
|
||||
- Basic system information:
|
||||
- Operating system (e.g., 'darwin', 'win32', 'linux')
|
||||
- Python version (e.g., '3.10.0')
|
||||
- Python version (e.g., '3.11.0')
|
||||
- Module initialization events:
|
||||
- When a module (like 'computer' or 'agent') is imported
|
||||
- Version of the module being used
|
||||
|
||||
@@ -5,7 +5,7 @@ import logging
|
||||
import traceback
|
||||
import signal
|
||||
|
||||
from computer import Computer
|
||||
from computer import Computer, VMProviderType
|
||||
|
||||
# Import the unified agent class and types
|
||||
from agent import ComputerAgent, LLMProvider, LLM, AgentLoop
|
||||
@@ -23,76 +23,88 @@ async def run_agent_example():
|
||||
print("\n=== Example: ComputerAgent with OpenAI and Omni provider ===")
|
||||
|
||||
try:
|
||||
# Create a local macOS computer
|
||||
computer = Computer(
|
||||
os_type="macos",
|
||||
verbosity=logging.DEBUG,
|
||||
)
|
||||
|
||||
# Create a remote Linux computer with C/ua
|
||||
# computer = Computer(
|
||||
# os_type="linux",
|
||||
# api_key=os.getenv("CUA_API_KEY"),
|
||||
# name=os.getenv("CUA_CONTAINER_NAME"),
|
||||
# provider_type=VMProviderType.CLOUD,
|
||||
# )
|
||||
|
||||
# Create Computer instance with async context manager
|
||||
async with Computer(verbosity=logging.DEBUG) as macos_computer:
|
||||
# Create agent with loop and provider
|
||||
agent = ComputerAgent(
|
||||
computer=macos_computer,
|
||||
# loop=AgentLoop.OPENAI,
|
||||
# loop=AgentLoop.ANTHROPIC,
|
||||
# loop=AgentLoop.UITARS,
|
||||
loop=AgentLoop.OMNI,
|
||||
# model=LLM(provider=LLMProvider.OPENAI), # No model name for Operator CUA
|
||||
# model=LLM(provider=LLMProvider.OPENAI, name="gpt-4o"),
|
||||
# model=LLM(provider=LLMProvider.ANTHROPIC, name="claude-3-7-sonnet-20250219"),
|
||||
# model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:4b-it-q4_K_M"),
|
||||
# model=LLM(provider=LLMProvider.MLXVLM, name="mlx-community/UI-TARS-1.5-7B-4bit"),
|
||||
model=LLM(
|
||||
provider=LLMProvider.OAICOMPAT,
|
||||
name="gemma-3-12b-it",
|
||||
provider_base_url="http://localhost:1234/v1", # LM Studio local endpoint
|
||||
),
|
||||
save_trajectory=True,
|
||||
only_n_most_recent_images=3,
|
||||
verbosity=logging.DEBUG,
|
||||
)
|
||||
agent = ComputerAgent(
|
||||
computer=computer,
|
||||
loop=AgentLoop.OPENAI,
|
||||
# loop=AgentLoop.ANTHROPIC,
|
||||
# loop=AgentLoop.UITARS,
|
||||
# loop=AgentLoop.OMNI,
|
||||
model=LLM(provider=LLMProvider.OPENAI), # No model name for Operator CUA
|
||||
# model=LLM(provider=LLMProvider.OPENAI, name="gpt-4o"),
|
||||
# model=LLM(provider=LLMProvider.ANTHROPIC, name="claude-3-7-sonnet-20250219"),
|
||||
# model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:4b-it-q4_K_M"),
|
||||
# model=LLM(provider=LLMProvider.MLXVLM, name="mlx-community/UI-TARS-1.5-7B-4bit"),
|
||||
# model=LLM(
|
||||
# provider=LLMProvider.OAICOMPAT,
|
||||
# name="gemma-3-12b-it",
|
||||
# provider_base_url="http://localhost:1234/v1", # LM Studio local endpoint
|
||||
# ),
|
||||
save_trajectory=True,
|
||||
only_n_most_recent_images=3,
|
||||
verbosity=logging.DEBUG,
|
||||
)
|
||||
|
||||
tasks = [
|
||||
"Look for a repository named trycua/cua on GitHub.",
|
||||
"Check the open issues, open the most recent one and read it.",
|
||||
"Clone the repository in users/lume/projects if it doesn't exist yet.",
|
||||
"Open the repository with an app named Cursor (on the dock, black background and white cube icon).",
|
||||
"From Cursor, open Composer if not already open.",
|
||||
"Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.",
|
||||
]
|
||||
tasks = [
|
||||
"Look for a repository named trycua/cua on GitHub.",
|
||||
"Check the open issues, open the most recent one and read it.",
|
||||
"Clone the repository in users/lume/projects if it doesn't exist yet.",
|
||||
"Open the repository with an app named Cursor (on the dock, black background and white cube icon).",
|
||||
"From Cursor, open Composer if not already open.",
|
||||
"Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.",
|
||||
]
|
||||
|
||||
for i, task in enumerate(tasks):
|
||||
print(f"\nExecuting task {i}/{len(tasks)}: {task}")
|
||||
async for result in agent.run(task):
|
||||
print("Response ID: ", result.get("id"))
|
||||
for i, task in enumerate(tasks):
|
||||
print(f"\nExecuting task {i}/{len(tasks)}: {task}")
|
||||
async for result in agent.run(task):
|
||||
print("Response ID: ", result.get("id"))
|
||||
|
||||
# Print detailed usage information
|
||||
usage = result.get("usage")
|
||||
if usage:
|
||||
print("\nUsage Details:")
|
||||
print(f" Input Tokens: {usage.get('input_tokens')}")
|
||||
if "input_tokens_details" in usage:
|
||||
print(f" Input Tokens Details: {usage.get('input_tokens_details')}")
|
||||
print(f" Output Tokens: {usage.get('output_tokens')}")
|
||||
if "output_tokens_details" in usage:
|
||||
print(f" Output Tokens Details: {usage.get('output_tokens_details')}")
|
||||
print(f" Total Tokens: {usage.get('total_tokens')}")
|
||||
# Print detailed usage information
|
||||
usage = result.get("usage")
|
||||
if usage:
|
||||
print("\nUsage Details:")
|
||||
print(f" Input Tokens: {usage.get('input_tokens')}")
|
||||
if "input_tokens_details" in usage:
|
||||
print(f" Input Tokens Details: {usage.get('input_tokens_details')}")
|
||||
print(f" Output Tokens: {usage.get('output_tokens')}")
|
||||
if "output_tokens_details" in usage:
|
||||
print(f" Output Tokens Details: {usage.get('output_tokens_details')}")
|
||||
print(f" Total Tokens: {usage.get('total_tokens')}")
|
||||
|
||||
print("Response Text: ", result.get("text"))
|
||||
print("Response Text: ", result.get("text"))
|
||||
|
||||
# Print tools information
|
||||
tools = result.get("tools")
|
||||
if tools:
|
||||
print("\nTools:")
|
||||
print(tools)
|
||||
# Print tools information
|
||||
tools = result.get("tools")
|
||||
if tools:
|
||||
print("\nTools:")
|
||||
print(tools)
|
||||
|
||||
# Print reasoning and tool call outputs
|
||||
outputs = result.get("output", [])
|
||||
for output in outputs:
|
||||
output_type = output.get("type")
|
||||
if output_type == "reasoning":
|
||||
print("\nReasoning Output:")
|
||||
print(output)
|
||||
elif output_type == "computer_call":
|
||||
print("\nTool Call Output:")
|
||||
print(output)
|
||||
# Print reasoning and tool call outputs
|
||||
outputs = result.get("output", [])
|
||||
for output in outputs:
|
||||
output_type = output.get("type")
|
||||
if output_type == "reasoning":
|
||||
print("\nReasoning Output:")
|
||||
print(output)
|
||||
elif output_type == "computer_call":
|
||||
print("\nTool Call Output:")
|
||||
print(output)
|
||||
|
||||
print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")
|
||||
print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error in run_agent_example: {e}")
|
||||
|
||||
@@ -16,17 +16,18 @@ load_dotenv(env_file)
|
||||
pythonpath = os.environ.get("PYTHONPATH", "")
|
||||
for path in pythonpath.split(":"):
|
||||
if path and path not in sys.path:
|
||||
sys.path.append(path)
|
||||
sys.path.insert(0, path) # Insert at beginning to prioritize
|
||||
print(f"Added to sys.path: {path}")
|
||||
|
||||
from computer import Computer, VMProviderType
|
||||
from computer.computer import Computer
|
||||
from computer.providers.base import VMProviderType
|
||||
from computer.logger import LogLevel
|
||||
|
||||
async def main():
|
||||
try:
|
||||
print("\n=== Using direct initialization ===")
|
||||
|
||||
# Create computer with configured host
|
||||
# Create a local macOS computer
|
||||
computer = Computer(
|
||||
display="1024x768",
|
||||
memory="8GB",
|
||||
@@ -41,12 +42,31 @@ async def main():
|
||||
],
|
||||
ephemeral=False,
|
||||
)
|
||||
|
||||
# Create a remote Linux computer with C/ua
|
||||
# computer = Computer(
|
||||
# os_type="linux",
|
||||
# api_key=os.getenv("CUA_API_KEY"),
|
||||
# name=os.getenv("CONTAINER_NAME"),
|
||||
# provider_type=VMProviderType.CLOUD,
|
||||
# )
|
||||
|
||||
try:
|
||||
# Run the computer with default parameters
|
||||
await computer.run()
|
||||
|
||||
await computer.interface.hotkey("command", "space")
|
||||
screenshot = await computer.interface.screenshot()
|
||||
|
||||
# Create output directory if it doesn't exist
|
||||
output_dir = Path("./output")
|
||||
output_dir.mkdir(exist_ok=True)
|
||||
|
||||
screenshot_path = output_dir / "screenshot.png"
|
||||
with open(screenshot_path, "wb") as f:
|
||||
f.write(screenshot)
|
||||
print(f"Screenshot saved to: {screenshot_path.absolute()}")
|
||||
|
||||
# await computer.interface.hotkey("command", "space")
|
||||
|
||||
# res = await computer.interface.run_command("touch ./Downloads/empty_file")
|
||||
# print(f"Run command result: {res}")
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
"""Tool-related type definitions."""
|
||||
|
||||
from enum import Enum
|
||||
from enum import StrEnum
|
||||
from typing import Dict, Any, Optional
|
||||
from pydantic import BaseModel, ConfigDict
|
||||
|
||||
class ToolInvocationState(str, Enum):
|
||||
class ToolInvocationState(StrEnum):
|
||||
"""States for tool invocation."""
|
||||
CALL = 'call'
|
||||
PARTIAL_CALL = 'partial-call'
|
||||
|
||||
@@ -1,18 +1,18 @@
|
||||
"""Core type definitions."""
|
||||
|
||||
from typing import Any, Dict, List, Optional, TypedDict, Union
|
||||
from enum import Enum, StrEnum, auto
|
||||
from enum import StrEnum
|
||||
from dataclasses import dataclass
|
||||
|
||||
|
||||
class AgentLoop(Enum):
|
||||
class AgentLoop(StrEnum):
|
||||
"""Enumeration of available loop types."""
|
||||
|
||||
ANTHROPIC = auto() # Anthropic implementation
|
||||
OMNI = auto() # OmniLoop implementation
|
||||
OPENAI = auto() # OpenAI implementation
|
||||
OLLAMA = auto() # OLLAMA implementation
|
||||
UITARS = auto() # UI-TARS implementation
|
||||
ANTHROPIC = "anthropic" # Anthropic implementation
|
||||
OMNI = "omni" # OmniLoop implementation
|
||||
OPENAI = "openai" # OpenAI implementation
|
||||
OLLAMA = "ollama" # OLLAMA implementation
|
||||
UITARS = "uitars" # UI-TARS implementation
|
||||
# Add more loop types as needed
|
||||
|
||||
|
||||
|
||||
@@ -3,6 +3,9 @@
|
||||
from datetime import datetime
|
||||
import platform
|
||||
|
||||
today = datetime.today()
|
||||
today = f"{today.strftime('%A, %B')} {today.day}, {today.year}"
|
||||
|
||||
SYSTEM_PROMPT = f"""<SYSTEM_CAPABILITY>
|
||||
* You are utilising a macOS virtual machine using ARM architecture with internet access and Safari as default browser.
|
||||
* You can feel free to install macOS applications with your bash tool. Use curl instead of wget.
|
||||
@@ -10,7 +13,7 @@ SYSTEM_PROMPT = f"""<SYSTEM_CAPABILITY>
|
||||
* When using your bash tool with commands that are expected to output very large quantities of text, redirect into a tmp file and use str_replace_editor or `grep -n -B <lines before> -A <lines after> <query> <filename>` to confirm output.
|
||||
* When viewing a page it can be helpful to zoom out so that you can see everything on the page. Either that, or make sure you scroll down to see everything before deciding something isn't available.
|
||||
* When using your computer function calls, they take a while to run and send back to you. Where possible/feasible, try to chain multiple of these calls all into one function calls request.
|
||||
* The current date is {datetime.today().strftime('%A, %B %-d, %Y')}.
|
||||
* The current date is {today}.
|
||||
</SYSTEM_CAPABILITY>
|
||||
|
||||
<IMPORTANT>
|
||||
|
||||
@@ -22,7 +22,7 @@ Supported Agent Loops and Models:
|
||||
Requirements:
|
||||
- Mac with Apple Silicon (M1/M2/M3/M4)
|
||||
- macOS 14 (Sonoma) or newer
|
||||
- Python 3.10+
|
||||
- Python 3.11+
|
||||
- Lume CLI installed (https://github.com/trycua/cua)
|
||||
- OpenAI or Anthropic API key
|
||||
"""
|
||||
@@ -31,6 +31,7 @@ import os
|
||||
import asyncio
|
||||
import logging
|
||||
import json
|
||||
import platform
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional, AsyncGenerator, Any, Tuple, Union
|
||||
import gradio as gr
|
||||
@@ -129,6 +130,9 @@ class GradioChatScreenshotHandler(DefaultCallbackHandler):
|
||||
)
|
||||
|
||||
|
||||
# Detect if current device is MacOS
|
||||
is_mac = platform.system().lower() == "darwin"
|
||||
|
||||
# Map model names to specific provider model names
|
||||
MODEL_MAPPINGS = {
|
||||
"openai": {
|
||||
@@ -165,7 +169,7 @@ MODEL_MAPPINGS = {
|
||||
},
|
||||
"uitars": {
|
||||
# UI-TARS models using MLXVLM provider
|
||||
"default": "mlx-community/UI-TARS-1.5-7B-4bit",
|
||||
"default": "mlx-community/UI-TARS-1.5-7B-4bit" if is_mac else "tgi",
|
||||
"mlx-community/UI-TARS-1.5-7B-4bit": "mlx-community/UI-TARS-1.5-7B-4bit",
|
||||
"mlx-community/UI-TARS-1.5-7B-6bit": "mlx-community/UI-TARS-1.5-7B-6bit"
|
||||
},
|
||||
@@ -290,7 +294,7 @@ def get_provider_and_model(model_name: str, loop_provider: str) -> tuple:
|
||||
model_name_to_use = cleaned_model_name
|
||||
# agent_loop remains AgentLoop.OMNI
|
||||
elif agent_loop == AgentLoop.UITARS:
|
||||
# For UITARS, use MLXVLM provider for the MLX models, OAICOMPAT for custom
|
||||
# For UITARS, use MLXVLM for mlx-community models, OAICOMPAT for custom
|
||||
if model_name == "Custom model (OpenAI compatible API)":
|
||||
provider = LLMProvider.OAICOMPAT
|
||||
model_name_to_use = "tgi"
|
||||
@@ -333,12 +337,25 @@ def get_ollama_models() -> List[str]:
|
||||
logging.error(f"Error getting Ollama models: {e}")
|
||||
return []
|
||||
|
||||
def create_computer_instance(verbosity: int = logging.INFO) -> Computer:
|
||||
|
||||
def create_computer_instance(
|
||||
verbosity: int = logging.INFO,
|
||||
os_type: str = "macos",
|
||||
provider_type: str = "lume",
|
||||
name: Optional[str] = None,
|
||||
api_key: Optional[str] = None
|
||||
) -> Computer:
|
||||
"""Create or get the global Computer instance."""
|
||||
global global_computer
|
||||
|
||||
if global_computer is None:
|
||||
global_computer = Computer(verbosity=verbosity)
|
||||
global_computer = Computer(
|
||||
verbosity=verbosity,
|
||||
os_type=os_type,
|
||||
provider_type=provider_type,
|
||||
name=name if name else "",
|
||||
api_key=api_key
|
||||
)
|
||||
|
||||
return global_computer
|
||||
|
||||
@@ -353,12 +370,22 @@ def create_agent(
|
||||
verbosity: int = logging.INFO,
|
||||
use_oaicompat: bool = False,
|
||||
provider_base_url: Optional[str] = None,
|
||||
computer_os: str = "macos",
|
||||
computer_provider: str = "lume",
|
||||
computer_name: Optional[str] = None,
|
||||
computer_api_key: Optional[str] = None,
|
||||
) -> ComputerAgent:
|
||||
"""Create or update the global agent with the specified parameters."""
|
||||
global global_agent
|
||||
|
||||
# Create the computer if not already done
|
||||
computer = create_computer_instance(verbosity=verbosity)
|
||||
computer = create_computer_instance(
|
||||
verbosity=verbosity,
|
||||
os_type=computer_os,
|
||||
provider_type=computer_provider,
|
||||
name=computer_name,
|
||||
api_key=computer_api_key
|
||||
)
|
||||
|
||||
# Get API key from environment if not provided
|
||||
if api_key is None:
|
||||
@@ -401,6 +428,7 @@ def create_agent(
|
||||
|
||||
return global_agent
|
||||
|
||||
|
||||
def create_gradio_ui(
|
||||
provider_name: str = "openai",
|
||||
model_name: str = "gpt-4o",
|
||||
@@ -421,7 +449,8 @@ def create_gradio_ui(
|
||||
# Check for API keys
|
||||
openai_api_key = os.environ.get("OPENAI_API_KEY", "")
|
||||
anthropic_api_key = os.environ.get("ANTHROPIC_API_KEY", "")
|
||||
|
||||
cua_api_key = os.environ.get("CUA_API_KEY", "")
|
||||
|
||||
# Always show models regardless of API key availability
|
||||
openai_models = ["OpenAI: Computer-Use Preview"]
|
||||
anthropic_models = [
|
||||
@@ -439,22 +468,29 @@ def create_gradio_ui(
|
||||
# Check if API keys are available
|
||||
has_openai_key = bool(openai_api_key)
|
||||
has_anthropic_key = bool(anthropic_api_key)
|
||||
has_cua_key = bool(cua_api_key)
|
||||
|
||||
print("has_openai_key", has_openai_key)
|
||||
print("has_anthropic_key", has_anthropic_key)
|
||||
print("has_cua_key", has_cua_key)
|
||||
|
||||
# Get Ollama models for OMNI
|
||||
ollama_models = get_ollama_models()
|
||||
if ollama_models:
|
||||
omni_models += ollama_models
|
||||
|
||||
# Detect if current device is MacOS
|
||||
is_mac = platform.system().lower() == "darwin"
|
||||
|
||||
# Format model choices
|
||||
provider_to_models = {
|
||||
"OPENAI": openai_models,
|
||||
"ANTHROPIC": anthropic_models,
|
||||
"OMNI": omni_models + ["Custom model (OpenAI compatible API)", "Custom model (ollama)"], # Add custom model options
|
||||
"UITARS": [
|
||||
"UITARS": ([
|
||||
"mlx-community/UI-TARS-1.5-7B-4bit",
|
||||
"mlx-community/UI-TARS-1.5-7B-6bit",
|
||||
"Custom model (OpenAI compatible API)"
|
||||
], # UI-TARS options with MLX models
|
||||
] if is_mac else []) + ["Custom model (OpenAI compatible API)"], # UI-TARS options with MLX models
|
||||
}
|
||||
|
||||
# --- Apply Saved Settings (override defaults if available) ---
|
||||
@@ -473,7 +509,7 @@ def create_gradio_ui(
|
||||
elif initial_loop == "ANTHROPIC":
|
||||
initial_model = anthropic_models[0] if anthropic_models else "No models available"
|
||||
else: # OMNI
|
||||
initial_model = omni_models[0] if omni_models else "No models available"
|
||||
initial_model = omni_models[0] if omni_models else "Custom model (OpenAI compatible API)"
|
||||
if "Custom model (OpenAI compatible API)" in available_models_for_loop:
|
||||
initial_model = (
|
||||
"Custom model (OpenAI compatible API)" # Default to custom if available and no other default fits
|
||||
@@ -494,7 +530,7 @@ def create_gradio_ui(
|
||||
]
|
||||
|
||||
# Function to generate Python code based on configuration and tasks
|
||||
def generate_python_code(agent_loop_choice, provider, model_name, tasks, provider_url, recent_images=3, save_trajectory=True):
|
||||
def generate_python_code(agent_loop_choice, provider, model_name, tasks, provider_url, recent_images=3, save_trajectory=True, computer_os="macos", computer_provider="lume", container_name="", cua_cloud_api_key=""):
|
||||
"""Generate Python code for the current configuration and tasks.
|
||||
|
||||
Args:
|
||||
@@ -505,6 +541,10 @@ def create_gradio_ui(
|
||||
provider_url: The provider base URL for OAICOMPAT providers
|
||||
recent_images: Number of recent images to keep in context
|
||||
save_trajectory: Whether to save the agent trajectory
|
||||
computer_os: Operating system type for the computer
|
||||
computer_provider: Provider type for the computer
|
||||
container_name: Optional VM name
|
||||
cua_cloud_api_key: Optional CUA Cloud API key
|
||||
|
||||
Returns:
|
||||
Formatted Python code as a string
|
||||
@@ -515,13 +555,29 @@ def create_gradio_ui(
|
||||
if task and task.strip():
|
||||
tasks_str += f' "{task}",\n'
|
||||
|
||||
# Create the Python code template
|
||||
# Create the Python code template with computer configuration
|
||||
computer_args = []
|
||||
if computer_os != "macos":
|
||||
computer_args.append(f'os_type="{computer_os}"')
|
||||
if computer_provider != "lume":
|
||||
computer_args.append(f'provider_type="{computer_provider}"')
|
||||
if container_name:
|
||||
computer_args.append(f'name="{container_name}"')
|
||||
if cua_cloud_api_key:
|
||||
computer_args.append(f'api_key="{cua_cloud_api_key}"')
|
||||
|
||||
computer_args_str = ", ".join(computer_args)
|
||||
if computer_args_str:
|
||||
computer_args_str = f"({computer_args_str})"
|
||||
else:
|
||||
computer_args_str = "()"
|
||||
|
||||
code = f'''import asyncio
|
||||
from computer import Computer
|
||||
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
|
||||
|
||||
async def main():
|
||||
async with Computer() as macos_computer:
|
||||
async with Computer{computer_args_str} as macos_computer:
|
||||
agent = ComputerAgent(
|
||||
computer=macos_computer,
|
||||
loop=AgentLoop.{agent_loop_choice},
|
||||
@@ -660,12 +716,54 @@ if __name__ == "__main__":
|
||||
LLMProvider.OPENAI,
|
||||
"gpt-4o",
|
||||
[],
|
||||
"https://openrouter.ai/api/v1"
|
||||
"https://openrouter.ai/api/v1",
|
||||
3, # recent_images default
|
||||
True, # save_trajectory default
|
||||
"macos",
|
||||
"lume",
|
||||
"",
|
||||
""
|
||||
),
|
||||
interactive=False,
|
||||
)
|
||||
|
||||
with gr.Accordion("Configuration", open=True):
|
||||
with gr.Accordion("Computer Configuration", open=True):
|
||||
# Computer configuration options
|
||||
computer_os = gr.Radio(
|
||||
choices=["macos", "linux"],
|
||||
label="Operating System",
|
||||
value="macos",
|
||||
info="Select the operating system for the computer",
|
||||
)
|
||||
|
||||
# Detect if current device is MacOS
|
||||
is_mac = platform.system().lower() == "darwin"
|
||||
|
||||
computer_provider = gr.Radio(
|
||||
choices=["cloud", "lume"],
|
||||
label="Provider",
|
||||
value="lume" if is_mac else "cloud",
|
||||
visible=is_mac,
|
||||
info="Select the computer provider",
|
||||
)
|
||||
|
||||
container_name = gr.Textbox(
|
||||
label="Container Name",
|
||||
placeholder="Enter container name (optional)",
|
||||
value="",
|
||||
info="Optional name for the container",
|
||||
)
|
||||
|
||||
cua_cloud_api_key = gr.Textbox(
|
||||
label="CUA Cloud API Key",
|
||||
placeholder="Enter your CUA Cloud API key",
|
||||
value="",
|
||||
type="password",
|
||||
info="Required for cloud provider",
|
||||
visible=(not has_cua_key)
|
||||
)
|
||||
|
||||
with gr.Accordion("Agent Configuration", open=True):
|
||||
# Configuration options
|
||||
agent_loop = gr.Dropdown(
|
||||
choices=["OPENAI", "ANTHROPIC", "OMNI", "UITARS"],
|
||||
@@ -986,6 +1084,10 @@ if __name__ == "__main__":
|
||||
custom_api_key=None,
|
||||
openai_key_input=None,
|
||||
anthropic_key_input=None,
|
||||
computer_os="macos",
|
||||
computer_provider="lume",
|
||||
container_name="",
|
||||
cua_cloud_api_key="",
|
||||
):
|
||||
if not history:
|
||||
yield history
|
||||
@@ -1083,6 +1185,8 @@ if __name__ == "__main__":
|
||||
else:
|
||||
# For Ollama or default OAICOMPAT (without custom key), no key needed/expected
|
||||
api_key = ""
|
||||
|
||||
cua_cloud_api_key = cua_cloud_api_key or os.environ.get("CUA_API_KEY", "")
|
||||
|
||||
# --- Save Settings Before Running Agent ---
|
||||
current_settings = {
|
||||
@@ -1092,6 +1196,10 @@ if __name__ == "__main__":
|
||||
"provider_base_url": custom_url_value,
|
||||
"save_trajectory": save_traj,
|
||||
"recent_images": recent_imgs,
|
||||
"computer_os": computer_os,
|
||||
"computer_provider": computer_provider,
|
||||
"container_name": container_name,
|
||||
"cua_cloud_api_key": cua_cloud_api_key,
|
||||
}
|
||||
save_settings(current_settings)
|
||||
# --- End Save Settings ---
|
||||
@@ -1109,6 +1217,10 @@ if __name__ == "__main__":
|
||||
use_oaicompat=is_oaicompat, # Set flag if custom model was selected
|
||||
# Pass custom URL only if custom model was selected
|
||||
provider_base_url=custom_url_value if is_oaicompat else None,
|
||||
computer_os=computer_os,
|
||||
computer_provider=computer_provider,
|
||||
computer_name=container_name,
|
||||
computer_api_key=cua_cloud_api_key,
|
||||
verbosity=logging.DEBUG, # Added verbosity here
|
||||
)
|
||||
|
||||
@@ -1235,6 +1347,10 @@ if __name__ == "__main__":
|
||||
provider_api_key,
|
||||
openai_api_key_input,
|
||||
anthropic_api_key_input,
|
||||
computer_os,
|
||||
computer_provider,
|
||||
container_name,
|
||||
cua_cloud_api_key,
|
||||
],
|
||||
outputs=[chatbot_history],
|
||||
queue=True,
|
||||
@@ -1253,82 +1369,20 @@ if __name__ == "__main__":
|
||||
|
||||
|
||||
# Function to update the code display based on configuration and chat history
|
||||
def update_code_display(agent_loop, model_choice_val, custom_model_val, chat_history, provider_base_url, recent_images_val, save_trajectory_val):
|
||||
def update_code_display(agent_loop, model_choice_val, custom_model_val, chat_history, provider_base_url, recent_images_val, save_trajectory_val, computer_os, computer_provider, container_name, cua_cloud_api_key):
|
||||
# Extract messages from chat history
|
||||
messages = []
|
||||
if chat_history:
|
||||
for msg in chat_history:
|
||||
if msg.get("role") == "user":
|
||||
if isinstance(msg, dict) and msg.get("role") == "user":
|
||||
messages.append(msg.get("content", ""))
|
||||
|
||||
# Determine if this is a custom model selection and which type
|
||||
is_custom_openai_api = model_choice_val == "Custom model (OpenAI compatible API)"
|
||||
is_custom_ollama = model_choice_val == "Custom model (ollama)"
|
||||
is_custom_model_selected = is_custom_openai_api or is_custom_ollama
|
||||
# Determine provider and model based on current selection
|
||||
provider, model_name, _ = get_provider_and_model(
|
||||
model_choice_val or custom_model_val or "gpt-4o",
|
||||
agent_loop
|
||||
)
|
||||
|
||||
# Determine provider and model name based on agent loop
|
||||
if agent_loop == "OPENAI":
|
||||
# For OPENAI loop, always use OPENAI provider with computer-use-preview
|
||||
provider = LLMProvider.OPENAI
|
||||
model_name = "computer-use-preview"
|
||||
elif agent_loop == "ANTHROPIC":
|
||||
# For ANTHROPIC loop, always use ANTHROPIC provider
|
||||
provider = LLMProvider.ANTHROPIC
|
||||
# Extract model name from the UI string
|
||||
if model_choice_val.startswith("Anthropic: Claude "):
|
||||
# Extract the model name based on the UI string
|
||||
model_parts = model_choice_val.replace("Anthropic: Claude ", "").split(" (")
|
||||
version = model_parts[0] # e.g., "3.7 Sonnet"
|
||||
date = model_parts[1].replace(")", "") if len(model_parts) > 1 else "" # e.g., "20250219"
|
||||
|
||||
# Format as claude-3-7-sonnet-20250219 or claude-3-5-sonnet-20240620
|
||||
version = version.replace(".", "-").replace(" ", "-").lower()
|
||||
model_name = f"claude-{version}-{date}"
|
||||
else:
|
||||
# Use the model_choice_val directly if it doesn't match the expected format
|
||||
model_name = model_choice_val
|
||||
elif agent_loop == "UITARS":
|
||||
# For UITARS, use MLXVLM for mlx-community models, OAICOMPAT for custom
|
||||
if model_choice_val == "Custom model (OpenAI compatible API)":
|
||||
provider = LLMProvider.OAICOMPAT
|
||||
model_name = custom_model_val
|
||||
else:
|
||||
provider = LLMProvider.MLXVLM
|
||||
model_name = model_choice_val
|
||||
elif agent_loop == "OMNI":
|
||||
# For OMNI, provider can be OPENAI, ANTHROPIC, OLLAMA, or OAICOMPAT
|
||||
if is_custom_openai_api:
|
||||
provider = LLMProvider.OAICOMPAT
|
||||
model_name = custom_model_val
|
||||
elif is_custom_ollama:
|
||||
provider = LLMProvider.OLLAMA
|
||||
model_name = custom_model_val
|
||||
elif model_choice_val.startswith("OMNI: OpenAI "):
|
||||
provider = LLMProvider.OPENAI
|
||||
# Extract model name from UI string (e.g., "OMNI: OpenAI GPT-4o" -> "gpt-4o")
|
||||
model_name = model_choice_val.replace("OMNI: OpenAI ", "").lower().replace(" ", "-")
|
||||
elif model_choice_val.startswith("OMNI: Claude "):
|
||||
provider = LLMProvider.ANTHROPIC
|
||||
# Extract model name from UI string (similar to ANTHROPIC loop case)
|
||||
model_parts = model_choice_val.replace("OMNI: Claude ", "").split(" (")
|
||||
version = model_parts[0] # e.g., "3.7 Sonnet"
|
||||
date = model_parts[1].replace(")", "") if len(model_parts) > 1 else "" # e.g., "20250219"
|
||||
|
||||
# Format as claude-3-7-sonnet-20250219 or claude-3-5-sonnet-20240620
|
||||
version = version.replace(".", "-").replace(" ", "-").lower()
|
||||
model_name = f"claude-{version}-{date}"
|
||||
elif model_choice_val.startswith("OMNI: Ollama "):
|
||||
provider = LLMProvider.OLLAMA
|
||||
# Extract model name from UI string (e.g., "OMNI: Ollama llama3" -> "llama3")
|
||||
model_name = model_choice_val.replace("OMNI: Ollama ", "")
|
||||
else:
|
||||
# Fallback to get_provider_and_model for any other cases
|
||||
provider, model_name, _ = get_provider_and_model(model_choice_val, agent_loop)
|
||||
else:
|
||||
# Fallback for any other agent loop
|
||||
provider, model_name, _ = get_provider_and_model(model_choice_val, agent_loop)
|
||||
|
||||
# Generate and return the code
|
||||
return generate_python_code(
|
||||
agent_loop,
|
||||
provider,
|
||||
@@ -1336,38 +1390,62 @@ if __name__ == "__main__":
|
||||
messages,
|
||||
provider_base_url,
|
||||
recent_images_val,
|
||||
save_trajectory_val
|
||||
save_trajectory_val,
|
||||
computer_os,
|
||||
computer_provider,
|
||||
container_name,
|
||||
cua_cloud_api_key
|
||||
)
|
||||
|
||||
# Update code display when configuration changes
|
||||
agent_loop.change(
|
||||
update_code_display,
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
|
||||
outputs=[code_display]
|
||||
)
|
||||
model_choice.change(
|
||||
update_code_display,
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
|
||||
outputs=[code_display]
|
||||
)
|
||||
custom_model.change(
|
||||
update_code_display,
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
|
||||
outputs=[code_display]
|
||||
)
|
||||
chatbot_history.change(
|
||||
update_code_display,
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
|
||||
outputs=[code_display]
|
||||
)
|
||||
recent_images.change(
|
||||
update_code_display,
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
|
||||
outputs=[code_display]
|
||||
)
|
||||
save_trajectory.change(
|
||||
update_code_display,
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory],
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
|
||||
outputs=[code_display]
|
||||
)
|
||||
computer_os.change(
|
||||
update_code_display,
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
|
||||
outputs=[code_display]
|
||||
)
|
||||
computer_provider.change(
|
||||
update_code_display,
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
|
||||
outputs=[code_display]
|
||||
)
|
||||
container_name.change(
|
||||
update_code_display,
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
|
||||
outputs=[code_display]
|
||||
)
|
||||
cua_cloud_api_key.change(
|
||||
update_code_display,
|
||||
inputs=[agent_loop, model_choice, custom_model, chatbot_history, provider_base_url, recent_images, save_trajectory, computer_os, computer_provider, container_name, cua_cloud_api_key],
|
||||
outputs=[code_display]
|
||||
)
|
||||
|
||||
@@ -1377,7 +1455,7 @@ if __name__ == "__main__":
|
||||
def test_cua():
|
||||
"""Standalone function to launch the Gradio app."""
|
||||
demo = create_gradio_ui()
|
||||
demo.launch(share=False) # Don't create a public link
|
||||
demo.launch(share=False, inbrowser=True) # Don't create a public link
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@@ -19,11 +19,11 @@ dependencies = [
|
||||
"pydantic>=2.6.4,<3.0.0",
|
||||
"rich>=13.7.1,<14.0.0",
|
||||
"python-dotenv>=1.0.1,<2.0.0",
|
||||
"cua-computer>=0.1.0,<0.2.0",
|
||||
"cua-computer>=0.2.0,<0.3.0",
|
||||
"cua-core>=0.1.0,<0.2.0",
|
||||
"certifi>=2024.2.2"
|
||||
]
|
||||
requires-python = ">=3.10"
|
||||
requires-python = ">=3.11"
|
||||
|
||||
[project.optional-dependencies]
|
||||
anthropic = [
|
||||
@@ -102,11 +102,11 @@ source-includes = ["tests/", "README.md", "LICENSE"]
|
||||
|
||||
[tool.black]
|
||||
line-length = 100
|
||||
target-version = ["py310"]
|
||||
target-version = ["py311"]
|
||||
|
||||
[tool.ruff]
|
||||
line-length = 100
|
||||
target-version = "py310"
|
||||
target-version = "py311"
|
||||
select = ["E", "F", "B", "I"]
|
||||
fix = true
|
||||
|
||||
@@ -115,7 +115,7 @@ docstring-code-format = true
|
||||
|
||||
[tool.mypy]
|
||||
strict = true
|
||||
python_version = "3.10"
|
||||
python_version = "3.11"
|
||||
ignore_missing_imports = true
|
||||
disallow_untyped_defs = true
|
||||
check_untyped_defs = true
|
||||
|
||||
@@ -27,6 +27,16 @@ def parse_args(args: Optional[List[str]] = None) -> argparse.Namespace:
|
||||
default="info",
|
||||
help="Logging level (default: info)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--ssl-keyfile",
|
||||
type=str,
|
||||
help="Path to SSL private key file (enables HTTPS)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--ssl-certfile",
|
||||
type=str,
|
||||
help="Path to SSL certificate file (enables HTTPS)",
|
||||
)
|
||||
|
||||
return parser.parse_args(args)
|
||||
|
||||
@@ -43,7 +53,21 @@ def main() -> None:
|
||||
|
||||
# Create and start the server
|
||||
logger.info(f"Starting CUA Computer API server on {args.host}:{args.port}...")
|
||||
server = Server(host=args.host, port=args.port, log_level=args.log_level)
|
||||
|
||||
# Handle SSL configuration
|
||||
ssl_args = {}
|
||||
if args.ssl_keyfile and args.ssl_certfile:
|
||||
ssl_args = {
|
||||
"ssl_keyfile": args.ssl_keyfile,
|
||||
"ssl_certfile": args.ssl_certfile,
|
||||
}
|
||||
logger.info("HTTPS mode enabled with SSL certificates")
|
||||
elif args.ssl_keyfile or args.ssl_certfile:
|
||||
logger.warning("Both --ssl-keyfile and --ssl-certfile are required for HTTPS. Running in HTTP mode.")
|
||||
else:
|
||||
logger.info("HTTP mode (no SSL certificates provided)")
|
||||
|
||||
server = Server(host=args.host, port=args.port, log_level=args.log_level, **ssl_args)
|
||||
|
||||
try:
|
||||
server.start()
|
||||
|
||||
@@ -8,11 +8,11 @@ import traceback
|
||||
from contextlib import redirect_stdout, redirect_stderr
|
||||
from io import StringIO
|
||||
from .handlers.factory import HandlerFactory
|
||||
import os
|
||||
import aiohttp
|
||||
|
||||
# Set up logging with more detail
|
||||
logging.basicConfig(
|
||||
level=logging.DEBUG, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
|
||||
)
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Configure WebSocket with larger message size
|
||||
@@ -48,6 +48,112 @@ manager = ConnectionManager()
|
||||
async def websocket_endpoint(websocket: WebSocket):
|
||||
# WebSocket message size is configured at the app or endpoint level, not on the instance
|
||||
await manager.connect(websocket)
|
||||
|
||||
# Check if CONTAINER_NAME is set (indicating cloud provider)
|
||||
container_name = os.environ.get("CONTAINER_NAME")
|
||||
|
||||
# If cloud provider, perform authentication handshake
|
||||
if container_name:
|
||||
try:
|
||||
logger.info(f"Cloud provider detected. CONTAINER_NAME: {container_name}. Waiting for authentication...")
|
||||
|
||||
# Wait for authentication message
|
||||
auth_data = await websocket.receive_json()
|
||||
|
||||
# Validate auth message format
|
||||
if auth_data.get("command") != "authenticate":
|
||||
await websocket.send_json({
|
||||
"success": False,
|
||||
"error": "First message must be authentication"
|
||||
})
|
||||
await websocket.close()
|
||||
manager.disconnect(websocket)
|
||||
return
|
||||
|
||||
# Extract credentials
|
||||
client_api_key = auth_data.get("params", {}).get("api_key")
|
||||
client_container_name = auth_data.get("params", {}).get("container_name")
|
||||
|
||||
# Layer 1: VM Identity Verification
|
||||
if client_container_name != container_name:
|
||||
logger.warning(f"VM name mismatch. Expected: {container_name}, Got: {client_container_name}")
|
||||
await websocket.send_json({
|
||||
"success": False,
|
||||
"error": "VM name mismatch"
|
||||
})
|
||||
await websocket.close()
|
||||
manager.disconnect(websocket)
|
||||
return
|
||||
|
||||
# Layer 2: API Key Validation with TryCUA API
|
||||
if not client_api_key:
|
||||
await websocket.send_json({
|
||||
"success": False,
|
||||
"error": "API key required"
|
||||
})
|
||||
await websocket.close()
|
||||
manager.disconnect(websocket)
|
||||
return
|
||||
|
||||
# Validate with TryCUA API
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
headers = {
|
||||
"Authorization": f"Bearer {client_api_key}"
|
||||
}
|
||||
|
||||
async with session.get(
|
||||
f"https://www.trycua.com/api/vm/auth?container_name={container_name}",
|
||||
headers=headers,
|
||||
) as resp:
|
||||
if resp.status != 200:
|
||||
error_msg = await resp.text()
|
||||
logger.warning(f"API validation failed: {error_msg}")
|
||||
await websocket.send_json({
|
||||
"success": False,
|
||||
"error": "Authentication failed"
|
||||
})
|
||||
await websocket.close()
|
||||
manager.disconnect(websocket)
|
||||
return
|
||||
|
||||
# If we get a 200 response with VNC URL, the VM exists and user has access
|
||||
vnc_url = (await resp.text()).strip()
|
||||
if not vnc_url:
|
||||
logger.warning(f"No VNC URL returned for VM: {container_name}")
|
||||
await websocket.send_json({
|
||||
"success": False,
|
||||
"error": "VM not found"
|
||||
})
|
||||
await websocket.close()
|
||||
manager.disconnect(websocket)
|
||||
return
|
||||
|
||||
logger.info(f"Authentication successful for VM: {container_name}")
|
||||
await websocket.send_json({
|
||||
"success": True,
|
||||
"message": "Authenticated"
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error validating with TryCUA API: {e}")
|
||||
await websocket.send_json({
|
||||
"success": False,
|
||||
"error": "Authentication service unavailable"
|
||||
})
|
||||
await websocket.close()
|
||||
manager.disconnect(websocket)
|
||||
return
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Authentication error: {e}")
|
||||
await websocket.send_json({
|
||||
"success": False,
|
||||
"error": "Authentication failed"
|
||||
})
|
||||
await websocket.close()
|
||||
manager.disconnect(websocket)
|
||||
return
|
||||
|
||||
# Map commands to appropriate handler methods
|
||||
handlers = {
|
||||
|
||||
@@ -32,7 +32,8 @@ class Server:
|
||||
await server.stop() # Stop the server
|
||||
"""
|
||||
|
||||
def __init__(self, host: str = "0.0.0.0", port: int = 8000, log_level: str = "info"):
|
||||
def __init__(self, host: str = "0.0.0.0", port: int = 8000, log_level: str = "info",
|
||||
ssl_keyfile: Optional[str] = None, ssl_certfile: Optional[str] = None):
|
||||
"""
|
||||
Initialize the server.
|
||||
|
||||
@@ -40,10 +41,14 @@ class Server:
|
||||
host: Host to bind the server to
|
||||
port: Port to bind the server to
|
||||
log_level: Logging level (debug, info, warning, error, critical)
|
||||
ssl_keyfile: Path to SSL private key file (for HTTPS)
|
||||
ssl_certfile: Path to SSL certificate file (for HTTPS)
|
||||
"""
|
||||
self.host = host
|
||||
self.port = port
|
||||
self.log_level = log_level
|
||||
self.ssl_keyfile = ssl_keyfile
|
||||
self.ssl_certfile = ssl_certfile
|
||||
self.app = fastapi_app
|
||||
self._server_task: Optional[asyncio.Task] = None
|
||||
self._should_exit = asyncio.Event()
|
||||
@@ -52,7 +57,14 @@ class Server:
|
||||
"""
|
||||
Start the server synchronously. This will block until the server is stopped.
|
||||
"""
|
||||
uvicorn.run(self.app, host=self.host, port=self.port, log_level=self.log_level)
|
||||
uvicorn.run(
|
||||
self.app,
|
||||
host=self.host,
|
||||
port=self.port,
|
||||
log_level=self.log_level,
|
||||
ssl_keyfile=self.ssl_keyfile,
|
||||
ssl_certfile=self.ssl_certfile
|
||||
)
|
||||
|
||||
async def start_async(self) -> None:
|
||||
"""
|
||||
@@ -60,7 +72,12 @@ class Server:
|
||||
will run in the background.
|
||||
"""
|
||||
server_config = uvicorn.Config(
|
||||
self.app, host=self.host, port=self.port, log_level=self.log_level
|
||||
self.app,
|
||||
host=self.host,
|
||||
port=self.port,
|
||||
log_level=self.log_level,
|
||||
ssl_keyfile=self.ssl_keyfile,
|
||||
ssl_certfile=self.ssl_certfile
|
||||
)
|
||||
|
||||
self._should_exit.clear()
|
||||
@@ -72,7 +89,8 @@ class Server:
|
||||
# Wait a short time to ensure the server starts
|
||||
await asyncio.sleep(0.5)
|
||||
|
||||
logger.info(f"Server started at http://{self.host}:{self.port}")
|
||||
protocol = "https" if self.ssl_certfile else "http"
|
||||
logger.info(f"Server started at {protocol}://{self.host}:{self.port}")
|
||||
|
||||
async def stop(self) -> None:
|
||||
"""
|
||||
|
||||
@@ -17,7 +17,8 @@ dependencies = [
|
||||
"uvicorn[standard]>=0.27.0",
|
||||
"pydantic>=2.0.0",
|
||||
"pyautogui>=0.9.54",
|
||||
"pillow>=10.2.0"
|
||||
"pillow>=10.2.0",
|
||||
"aiohttp>=3.9.1"
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
|
||||
@@ -51,7 +51,8 @@ class Computer:
|
||||
noVNC_port: Optional[int] = 8006,
|
||||
host: str = os.environ.get("PYLUME_HOST", "localhost"),
|
||||
storage: Optional[str] = None,
|
||||
ephemeral: bool = False
|
||||
ephemeral: bool = False,
|
||||
api_key: Optional[str] = None
|
||||
):
|
||||
"""Initialize a new Computer instance.
|
||||
|
||||
@@ -90,6 +91,8 @@ class Computer:
|
||||
self.os_type = os_type
|
||||
self.provider_type = provider_type
|
||||
self.ephemeral = ephemeral
|
||||
|
||||
self.api_key = api_key
|
||||
|
||||
# The default is currently to use non-ephemeral storage
|
||||
if storage and ephemeral and storage != "ephemeral":
|
||||
@@ -269,9 +272,7 @@ class Computer:
|
||||
elif self.provider_type == VMProviderType.CLOUD:
|
||||
self.config.vm_provider = VMProviderFactory.create_provider(
|
||||
self.provider_type,
|
||||
port=port,
|
||||
host=host,
|
||||
storage=storage,
|
||||
api_key=self.api_key,
|
||||
verbose=verbose,
|
||||
)
|
||||
else:
|
||||
@@ -405,12 +406,25 @@ class Computer:
|
||||
self.logger.info(f"Initializing interface for {self.os_type} at {ip_address}")
|
||||
from .interface.base import BaseComputerInterface
|
||||
|
||||
self._interface = cast(
|
||||
BaseComputerInterface,
|
||||
InterfaceFactory.create_interface_for_os(
|
||||
os=self.os_type, ip_address=ip_address # type: ignore[arg-type]
|
||||
),
|
||||
)
|
||||
# Pass authentication credentials if using cloud provider
|
||||
if self.provider_type == VMProviderType.CLOUD and self.api_key and self.config.name:
|
||||
self._interface = cast(
|
||||
BaseComputerInterface,
|
||||
InterfaceFactory.create_interface_for_os(
|
||||
os=self.os_type,
|
||||
ip_address=ip_address,
|
||||
api_key=self.api_key,
|
||||
vm_name=self.config.name
|
||||
),
|
||||
)
|
||||
else:
|
||||
self._interface = cast(
|
||||
BaseComputerInterface,
|
||||
InterfaceFactory.create_interface_for_os(
|
||||
os=self.os_type,
|
||||
ip_address=ip_address
|
||||
),
|
||||
)
|
||||
|
||||
# Wait for the WebSocket interface to be ready
|
||||
self.logger.info("Connecting to WebSocket interface...")
|
||||
@@ -505,6 +519,11 @@ class Computer:
|
||||
|
||||
# Call the provider's get_ip method which will wait indefinitely
|
||||
storage_param = "ephemeral" if self.ephemeral else self.storage
|
||||
|
||||
# Log the image being used
|
||||
self.logger.info(f"Running VM using image: {self.image}")
|
||||
|
||||
# Call provider.get_ip with explicit image parameter
|
||||
ip = await self.config.vm_provider.get_ip(
|
||||
name=self.config.name,
|
||||
storage=storage_param,
|
||||
|
||||
@@ -8,17 +8,21 @@ from ..logger import Logger, LogLevel
|
||||
class BaseComputerInterface(ABC):
|
||||
"""Base class for computer control interfaces."""
|
||||
|
||||
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume"):
|
||||
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume", api_key: Optional[str] = None, vm_name: Optional[str] = None):
|
||||
"""Initialize interface.
|
||||
|
||||
Args:
|
||||
ip_address: IP address of the computer to control
|
||||
username: Username for authentication
|
||||
password: Password for authentication
|
||||
api_key: Optional API key for cloud authentication
|
||||
vm_name: Optional VM name for cloud authentication
|
||||
"""
|
||||
self.ip_address = ip_address
|
||||
self.username = username
|
||||
self.password = password
|
||||
self.api_key = api_key
|
||||
self.vm_name = vm_name
|
||||
self.logger = Logger("cua.interface", LogLevel.NORMAL)
|
||||
|
||||
@abstractmethod
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
"""Factory for creating computer interfaces."""
|
||||
|
||||
from typing import Literal
|
||||
from typing import Literal, Optional
|
||||
from .base import BaseComputerInterface
|
||||
|
||||
class InterfaceFactory:
|
||||
@@ -9,13 +9,17 @@ class InterfaceFactory:
|
||||
@staticmethod
|
||||
def create_interface_for_os(
|
||||
os: Literal['macos', 'linux'],
|
||||
ip_address: str
|
||||
ip_address: str,
|
||||
api_key: Optional[str] = None,
|
||||
vm_name: Optional[str] = None
|
||||
) -> BaseComputerInterface:
|
||||
"""Create an interface for the specified OS.
|
||||
|
||||
Args:
|
||||
os: Operating system type ('macos' or 'linux')
|
||||
ip_address: IP address of the computer to control
|
||||
api_key: Optional API key for cloud authentication
|
||||
vm_name: Optional VM name for cloud authentication
|
||||
|
||||
Returns:
|
||||
BaseComputerInterface: The appropriate interface for the OS
|
||||
@@ -28,8 +32,8 @@ class InterfaceFactory:
|
||||
from .linux import LinuxComputerInterface
|
||||
|
||||
if os == 'macos':
|
||||
return MacOSComputerInterface(ip_address)
|
||||
return MacOSComputerInterface(ip_address, api_key=api_key, vm_name=vm_name)
|
||||
elif os == 'linux':
|
||||
return LinuxComputerInterface(ip_address)
|
||||
return LinuxComputerInterface(ip_address, api_key=api_key, vm_name=vm_name)
|
||||
else:
|
||||
raise ValueError(f"Unsupported OS type: {os}")
|
||||
@@ -15,8 +15,8 @@ from .models import Key, KeyType
|
||||
class LinuxComputerInterface(BaseComputerInterface):
|
||||
"""Interface for Linux."""
|
||||
|
||||
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume"):
|
||||
super().__init__(ip_address, username, password)
|
||||
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume", api_key: Optional[str] = None, vm_name: Optional[str] = None):
|
||||
super().__init__(ip_address, username, password, api_key, vm_name)
|
||||
self._ws = None
|
||||
self._reconnect_task = None
|
||||
self._closed = False
|
||||
@@ -26,6 +26,7 @@ class LinuxComputerInterface(BaseComputerInterface):
|
||||
self._reconnect_delay = 1 # Start with 1 second delay
|
||||
self._max_reconnect_delay = 30 # Maximum delay between reconnection attempts
|
||||
self._log_connection_attempts = True # Flag to control connection attempt logging
|
||||
self._authenticated = False # Track authentication status
|
||||
|
||||
# Set logger name for Linux interface
|
||||
self.logger = Logger("cua.interface.linux", LogLevel.NORMAL)
|
||||
@@ -37,7 +38,9 @@ class LinuxComputerInterface(BaseComputerInterface):
|
||||
Returns:
|
||||
WebSocket URI for the Computer API Server
|
||||
"""
|
||||
return f"ws://{self.ip_address}:8000/ws"
|
||||
protocol = "wss" if self.api_key else "ws"
|
||||
port = "8443" if self.api_key else "8000"
|
||||
return f"{protocol}://{self.ip_address}:{port}/ws"
|
||||
|
||||
async def _keep_alive(self):
|
||||
"""Keep the WebSocket connection alive with automatic reconnection."""
|
||||
@@ -86,9 +89,15 @@ class LinuxComputerInterface(BaseComputerInterface):
|
||||
timeout=30,
|
||||
)
|
||||
self.logger.info("WebSocket connection established")
|
||||
|
||||
# Authentication will be handled by the first command that needs it
|
||||
# Don't do authentication here to avoid recv conflicts
|
||||
|
||||
self._reconnect_delay = 1 # Reset reconnect delay on successful connection
|
||||
self._last_ping = time.time()
|
||||
retry_count = 0 # Reset retry count on successful connection
|
||||
self._authenticated = False # Reset auth status on new connection
|
||||
|
||||
except (asyncio.TimeoutError, websockets.exceptions.WebSocketException) as e:
|
||||
next_retry = self._reconnect_delay
|
||||
|
||||
@@ -112,13 +121,6 @@ class LinuxComputerInterface(BaseComputerInterface):
|
||||
pass
|
||||
self._ws = None
|
||||
|
||||
# Use exponential backoff for connection retries
|
||||
await asyncio.sleep(self._reconnect_delay)
|
||||
self._reconnect_delay = min(
|
||||
self._reconnect_delay * 2, self._max_reconnect_delay
|
||||
)
|
||||
continue
|
||||
|
||||
# Regular ping to check connection
|
||||
if self._ws and self._ws.state == websockets.protocol.State.OPEN:
|
||||
try:
|
||||
@@ -197,6 +199,31 @@ class LinuxComputerInterface(BaseComputerInterface):
|
||||
if not self._ws:
|
||||
raise ConnectionError("WebSocket connection is not established")
|
||||
|
||||
# Handle authentication if needed
|
||||
if self.api_key and self.vm_name and not self._authenticated:
|
||||
self.logger.info("Performing authentication handshake...")
|
||||
auth_message = {
|
||||
"command": "authenticate",
|
||||
"params": {
|
||||
"api_key": self.api_key,
|
||||
"container_name": self.vm_name
|
||||
}
|
||||
}
|
||||
await self._ws.send(json.dumps(auth_message))
|
||||
|
||||
# Wait for authentication response
|
||||
auth_response = await asyncio.wait_for(self._ws.recv(), timeout=10)
|
||||
auth_result = json.loads(auth_response)
|
||||
|
||||
if not auth_result.get("success"):
|
||||
error_msg = auth_result.get("error", "Authentication failed")
|
||||
self.logger.error(f"Authentication failed: {error_msg}")
|
||||
self._authenticated = False
|
||||
raise ConnectionError(f"Authentication failed: {error_msg}")
|
||||
|
||||
self.logger.info("Authentication successful")
|
||||
self._authenticated = True
|
||||
|
||||
message = {"command": command, "params": params or {}}
|
||||
await self._ws.send(json.dumps(message))
|
||||
response = await asyncio.wait_for(self._ws.recv(), timeout=30)
|
||||
@@ -217,9 +244,7 @@ class LinuxComputerInterface(BaseComputerInterface):
|
||||
f"Failed to send command '{command}' after {max_retries} retries"
|
||||
)
|
||||
self.logger.debug(f"Command failure details: {e}")
|
||||
raise
|
||||
|
||||
raise last_error if last_error else RuntimeError("Failed to send command")
|
||||
raise last_error if last_error else RuntimeError("Failed to send command")
|
||||
|
||||
async def wait_for_ready(self, timeout: int = 60, interval: float = 1.0):
|
||||
"""Wait for WebSocket connection to become available."""
|
||||
|
||||
@@ -13,10 +13,10 @@ from .models import Key, KeyType
|
||||
|
||||
|
||||
class MacOSComputerInterface(BaseComputerInterface):
|
||||
"""Interface for MacOS."""
|
||||
"""Interface for macOS."""
|
||||
|
||||
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume"):
|
||||
super().__init__(ip_address, username, password)
|
||||
def __init__(self, ip_address: str, username: str = "lume", password: str = "lume", api_key: Optional[str] = None, vm_name: Optional[str] = None):
|
||||
super().__init__(ip_address, username, password, api_key, vm_name)
|
||||
self._ws = None
|
||||
self._reconnect_task = None
|
||||
self._closed = False
|
||||
@@ -27,7 +27,7 @@ class MacOSComputerInterface(BaseComputerInterface):
|
||||
self._max_reconnect_delay = 30 # Maximum delay between reconnection attempts
|
||||
self._log_connection_attempts = True # Flag to control connection attempt logging
|
||||
|
||||
# Set logger name for MacOS interface
|
||||
# Set logger name for macOS interface
|
||||
self.logger = Logger("cua.interface.macos", LogLevel.NORMAL)
|
||||
|
||||
@property
|
||||
@@ -37,7 +37,9 @@ class MacOSComputerInterface(BaseComputerInterface):
|
||||
Returns:
|
||||
WebSocket URI for the Computer API Server
|
||||
"""
|
||||
return f"ws://{self.ip_address}:8000/ws"
|
||||
protocol = "wss" if self.api_key else "ws"
|
||||
port = "8443" if self.api_key else "8000"
|
||||
return f"{protocol}://{self.ip_address}:{port}/ws"
|
||||
|
||||
async def _keep_alive(self):
|
||||
"""Keep the WebSocket connection alive with automatic reconnection."""
|
||||
@@ -86,6 +88,32 @@ class MacOSComputerInterface(BaseComputerInterface):
|
||||
timeout=30,
|
||||
)
|
||||
self.logger.info("WebSocket connection established")
|
||||
|
||||
# If api_key and vm_name are provided, perform authentication handshake
|
||||
if self.api_key and self.vm_name:
|
||||
self.logger.info("Performing authentication handshake...")
|
||||
auth_message = {
|
||||
"command": "authenticate",
|
||||
"params": {
|
||||
"api_key": self.api_key,
|
||||
"container_name": self.vm_name
|
||||
}
|
||||
}
|
||||
await self._ws.send(json.dumps(auth_message))
|
||||
|
||||
# Wait for authentication response
|
||||
auth_response = await asyncio.wait_for(self._ws.recv(), timeout=10)
|
||||
auth_result = json.loads(auth_response)
|
||||
|
||||
if not auth_result.get("success"):
|
||||
error_msg = auth_result.get("error", "Authentication failed")
|
||||
self.logger.error(f"Authentication failed: {error_msg}")
|
||||
await self._ws.close()
|
||||
self._ws = None
|
||||
raise ConnectionError(f"Authentication failed: {error_msg}")
|
||||
|
||||
self.logger.info("Authentication successful")
|
||||
|
||||
self._reconnect_delay = 1 # Reset reconnect delay on successful connection
|
||||
self._last_ping = time.time()
|
||||
retry_count = 0 # Reset retry count on successful connection
|
||||
|
||||
@@ -1,11 +1,11 @@
|
||||
"""Base provider interface for VM backends."""
|
||||
|
||||
import abc
|
||||
from enum import Enum
|
||||
from enum import StrEnum
|
||||
from typing import Dict, List, Optional, Any, AsyncContextManager
|
||||
|
||||
|
||||
class VMProviderType(str, Enum):
|
||||
class VMProviderType(StrEnum):
|
||||
"""Enum of supported VM provider types."""
|
||||
LUME = "lume"
|
||||
LUMIER = "lumier"
|
||||
|
||||
@@ -11,90 +11,65 @@ from ..base import BaseVMProvider, VMProviderType
|
||||
# Setup logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
import asyncio
|
||||
import aiohttp
|
||||
from urllib.parse import urlparse
|
||||
|
||||
class CloudProvider(BaseVMProvider):
|
||||
"""Cloud VM Provider stub implementation.
|
||||
|
||||
This is a placeholder for a future cloud VM provider implementation.
|
||||
"""
|
||||
|
||||
"""Cloud VM Provider implementation."""
|
||||
def __init__(
|
||||
self,
|
||||
host: str = "localhost",
|
||||
port: int = 7777,
|
||||
storage: Optional[str] = None,
|
||||
self,
|
||||
api_key: str,
|
||||
verbose: bool = False,
|
||||
**kwargs,
|
||||
):
|
||||
"""Initialize the Cloud provider.
|
||||
|
||||
"""
|
||||
Args:
|
||||
host: Host to use for API connections (default: localhost)
|
||||
port: Port for the API server (default: 7777)
|
||||
storage: Path to store VM data
|
||||
api_key: API key for authentication
|
||||
name: Name of the VM
|
||||
verbose: Enable verbose logging
|
||||
"""
|
||||
self.host = host
|
||||
self.port = port
|
||||
self.storage = storage
|
||||
assert api_key, "api_key required for CloudProvider"
|
||||
self.api_key = api_key
|
||||
self.verbose = verbose
|
||||
|
||||
logger.warning("CloudProvider is not yet implemented")
|
||||
|
||||
|
||||
@property
|
||||
def provider_type(self) -> VMProviderType:
|
||||
"""Get the provider type."""
|
||||
return VMProviderType.CLOUD
|
||||
|
||||
|
||||
async def __aenter__(self):
|
||||
"""Enter async context manager."""
|
||||
logger.debug("Entering CloudProvider context")
|
||||
return self
|
||||
|
||||
|
||||
async def __aexit__(self, exc_type, exc_val, exc_tb):
|
||||
"""Exit async context manager."""
|
||||
logger.debug("Exiting CloudProvider context")
|
||||
|
||||
pass
|
||||
|
||||
async def get_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""Get VM information by name."""
|
||||
logger.warning("CloudProvider.get_vm is not implemented")
|
||||
return {
|
||||
"name": name,
|
||||
"status": "unavailable",
|
||||
"message": "CloudProvider is not implemented"
|
||||
}
|
||||
|
||||
"""Get VM VNC URL by name using the cloud API."""
|
||||
return {"name": name, "hostname": f"{name}.containers.cloud.trycua.com"}
|
||||
|
||||
async def list_vms(self) -> List[Dict[str, Any]]:
|
||||
"""List all available VMs."""
|
||||
logger.warning("CloudProvider.list_vms is not implemented")
|
||||
return []
|
||||
|
||||
|
||||
async def run_vm(self, image: str, name: str, run_opts: Dict[str, Any], storage: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""Run a VM with the given options."""
|
||||
logger.warning("CloudProvider.run_vm is not implemented")
|
||||
return {
|
||||
"name": name,
|
||||
"status": "unavailable",
|
||||
"message": "CloudProvider is not implemented"
|
||||
}
|
||||
|
||||
return {"name": name, "status": "unavailable", "message": "CloudProvider is not implemented"}
|
||||
|
||||
async def stop_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""Stop a running VM."""
|
||||
logger.warning("CloudProvider.stop_vm is not implemented")
|
||||
return {
|
||||
"name": name,
|
||||
"status": "stopped",
|
||||
"message": "CloudProvider is not implemented"
|
||||
}
|
||||
|
||||
return {"name": name, "status": "stopped", "message": "CloudProvider is not implemented"}
|
||||
|
||||
async def update_vm(self, name: str, update_opts: Dict[str, Any], storage: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""Update VM configuration."""
|
||||
logger.warning("CloudProvider.update_vm is not implemented")
|
||||
return {
|
||||
"name": name,
|
||||
"status": "unchanged",
|
||||
"message": "CloudProvider is not implemented"
|
||||
}
|
||||
|
||||
async def get_ip(self, name: str, storage: Optional[str] = None, retry_delay: int = 2) -> str:
|
||||
"""Get the IP address of a VM."""
|
||||
logger.warning("CloudProvider.get_ip is not implemented")
|
||||
raise NotImplementedError("CloudProvider.get_ip is not implemented")
|
||||
return {"name": name, "status": "unchanged", "message": "CloudProvider is not implemented"}
|
||||
|
||||
async def get_ip(self, name: Optional[str] = None, storage: Optional[str] = None, retry_delay: int = 2) -> str:
|
||||
"""
|
||||
Return the VM's IP address as '{container_name}.containers.cloud.trycua.com'.
|
||||
Uses the provided 'name' argument (the VM name requested by the caller),
|
||||
falling back to self.name only if 'name' is None.
|
||||
Retries up to 3 times with retry_delay seconds if hostname is not available.
|
||||
"""
|
||||
if name is None:
|
||||
raise ValueError("VM name is required for CloudProvider.get_ip")
|
||||
return f"{name}.containers.cloud.trycua.com"
|
||||
|
||||
@@ -22,7 +22,8 @@ class VMProviderFactory:
|
||||
image: Optional[str] = None,
|
||||
verbose: bool = False,
|
||||
ephemeral: bool = False,
|
||||
noVNC_port: Optional[int] = None
|
||||
noVNC_port: Optional[int] = None,
|
||||
**kwargs,
|
||||
) -> BaseVMProvider:
|
||||
"""Create a VM provider of the specified type.
|
||||
|
||||
@@ -101,12 +102,9 @@ class VMProviderFactory:
|
||||
elif provider_type == VMProviderType.CLOUD:
|
||||
try:
|
||||
from .cloud import CloudProvider
|
||||
# Return the stub implementation of CloudProvider
|
||||
return CloudProvider(
|
||||
host=host,
|
||||
port=port,
|
||||
storage=storage,
|
||||
verbose=verbose
|
||||
verbose=verbose,
|
||||
**kwargs,
|
||||
)
|
||||
except ImportError as e:
|
||||
logger.error(f"Failed to import CloudProvider: {e}")
|
||||
|
||||
@@ -344,9 +344,15 @@ class LumierProvider(BaseVMProvider):
|
||||
# Use the VM image passed from the Computer class
|
||||
print(f"Using VM image: {self.image}")
|
||||
|
||||
# If ghcr.io is in the image, use the full image name
|
||||
if "ghcr.io" in self.image:
|
||||
vm_image = self.image
|
||||
else:
|
||||
vm_image = f"ghcr.io/trycua/{self.image}"
|
||||
|
||||
cmd.extend([
|
||||
"-e", f"VM_NAME={self.container_name}",
|
||||
"-e", f"VERSION=ghcr.io/trycua/{self.image}",
|
||||
"-e", f"VERSION={vm_image}",
|
||||
"-e", f"CPU_CORES={run_opts.get('cpu', '4')}",
|
||||
"-e", f"RAM_SIZE={memory_mb}",
|
||||
])
|
||||
|
||||
@@ -18,7 +18,7 @@ dependencies = [
|
||||
"cua-core>=0.1.0,<0.2.0",
|
||||
"pydantic>=2.11.1"
|
||||
]
|
||||
requires-python = ">=3.10"
|
||||
requires-python = ">=3.11"
|
||||
|
||||
[project.optional-dependencies]
|
||||
lume = [
|
||||
@@ -46,11 +46,11 @@ source-includes = ["tests/", "README.md", "LICENSE"]
|
||||
|
||||
[tool.black]
|
||||
line-length = 100
|
||||
target-version = ["py310"]
|
||||
target-version = ["py311"]
|
||||
|
||||
[tool.ruff]
|
||||
line-length = 100
|
||||
target-version = "py310"
|
||||
target-version = "py311"
|
||||
select = ["E", "F", "B", "I"]
|
||||
fix = true
|
||||
|
||||
@@ -59,7 +59,7 @@ docstring-code-format = true
|
||||
|
||||
[tool.mypy]
|
||||
strict = true
|
||||
python_version = "3.10"
|
||||
python_version = "3.11"
|
||||
ignore_missing_imports = true
|
||||
disallow_untyped_defs = true
|
||||
check_untyped_defs = true
|
||||
|
||||
@@ -15,7 +15,7 @@ dependencies = [
|
||||
"httpx>=0.24.0",
|
||||
"posthog>=3.20.0"
|
||||
]
|
||||
requires-python = ">=3.10"
|
||||
requires-python = ">=3.11"
|
||||
|
||||
[tool.pdm]
|
||||
distribution = true
|
||||
@@ -26,11 +26,11 @@ source-includes = ["tests/", "README.md", "LICENSE"]
|
||||
|
||||
[tool.black]
|
||||
line-length = 100
|
||||
target-version = ["py310"]
|
||||
target-version = ["py311"]
|
||||
|
||||
[tool.ruff]
|
||||
line-length = 100
|
||||
target-version = "py310"
|
||||
target-version = "py311"
|
||||
select = ["E", "F", "B", "I"]
|
||||
fix = true
|
||||
|
||||
@@ -39,7 +39,7 @@ docstring-code-format = true
|
||||
|
||||
[tool.mypy]
|
||||
strict = true
|
||||
python_version = "3.10"
|
||||
python_version = "3.11"
|
||||
ignore_missing_imports = true
|
||||
disallow_untyped_defs = true
|
||||
check_untyped_defs = true
|
||||
|
||||
@@ -10,7 +10,6 @@
|
||||
|
||||
[](#)
|
||||
[](#)
|
||||
[](#install)
|
||||
[](https://discord.com/invite/mVnXXpdE85)
|
||||
</h1>
|
||||
</div>
|
||||
|
||||
@@ -6,15 +6,15 @@ build-backend = "pdm.backend"
|
||||
name = "cua-mcp-server"
|
||||
description = "MCP Server for Computer-Use Agent (CUA)"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.10"
|
||||
requires-python = ">=3.11"
|
||||
version = "0.1.0"
|
||||
authors = [
|
||||
{name = "TryCua", email = "gh@trycua.com"}
|
||||
]
|
||||
dependencies = [
|
||||
"mcp>=1.6.0,<2.0.0",
|
||||
"cua-agent[all]>=0.1.0,<0.2.0",
|
||||
"cua-computer>=0.1.0,<0.2.0",
|
||||
"cua-agent[all]>=0.2.0,<0.3.0",
|
||||
"cua-computer>=0.2.0,<0.3.0",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
@@ -31,10 +31,10 @@ dev = [
|
||||
|
||||
[tool.black]
|
||||
line-length = 100
|
||||
target-version = ["py310"]
|
||||
target-version = ["py311"]
|
||||
|
||||
[tool.ruff]
|
||||
line-length = 100
|
||||
target-version = "py310"
|
||||
target-version = "py311"
|
||||
select = ["E", "F", "B", "I"]
|
||||
fix = true
|
||||
|
||||
@@ -43,13 +43,13 @@ dev = [
|
||||
|
||||
[tool.black]
|
||||
line-length = 100
|
||||
target-version = ["py310"]
|
||||
target-version = ["py311"]
|
||||
|
||||
[tool.ruff]
|
||||
fix = true
|
||||
line-length = 100
|
||||
select = ["B", "E", "F", "I"]
|
||||
target-version = "py310"
|
||||
target-version = "py311"
|
||||
|
||||
[tool.ruff.format]
|
||||
docstring-code-format = true
|
||||
@@ -58,7 +58,7 @@ docstring-code-format = true
|
||||
check_untyped_defs = true
|
||||
disallow_untyped_defs = true
|
||||
ignore_missing_imports = true
|
||||
python_version = "3.10"
|
||||
python_version = "3.11"
|
||||
show_error_codes = true
|
||||
strict = true
|
||||
warn_return_any = true
|
||||
|
||||
@@ -24,7 +24,7 @@ dependencies = [
|
||||
"typing-extensions>=4.9.0",
|
||||
"pydantic>=2.6.3"
|
||||
]
|
||||
requires-python = ">=3.10"
|
||||
requires-python = ">=3.11"
|
||||
readme = "README.md"
|
||||
license = {text = "MIT"}
|
||||
keywords = ["computer-vision", "ocr", "ui-analysis", "icon-detection"]
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
"source": [
|
||||
"## Agent\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to use Cua's Agent to run a workflow in a virtual sandbox on Apple Silicon Macs."
|
||||
"This notebook demonstrates how to use Cua's Agent to run workflows in virtual sandboxes, either using C/ua Cloud Containers or local VMs on Apple Silicon Macs."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -68,7 +68,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Agent allows you to run an agentic workflow in a virtual sandbox instances on Apple Silicon. Here's a basic example:"
|
||||
"Agent allows you to run an agentic workflow in virtual sandbox instances. You can choose between cloud containers or local VMs."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -83,15 +83,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"# Get API keys from environment or prompt user\n",
|
||||
"anthropic_key = os.getenv(\"ANTHROPIC_API_KEY\") or input(\"Enter your Anthropic API key: \")\n",
|
||||
"openai_key = os.getenv(\"OPENAI_API_KEY\") or input(\"Enter your OpenAI API key: \")\n",
|
||||
"anthropic_key = os.getenv(\"ANTHROPIC_API_KEY\") or \\\n",
|
||||
" input(\"Enter your Anthropic API key: \")\n",
|
||||
"openai_key = os.getenv(\"OPENAI_API_KEY\") or \\\n",
|
||||
" input(\"Enter your OpenAI API key: \")\n",
|
||||
"\n",
|
||||
"os.environ[\"ANTHROPIC_API_KEY\"] = anthropic_key\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = openai_key"
|
||||
@@ -101,7 +103,165 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Similar to Computer, you can either use the async context manager pattern or initialize the ComputerAgent instance directly."
|
||||
"## Option 1: Agent with C/ua Cloud Containers\n",
|
||||
"\n",
|
||||
"Use cloud containers for running agents from any system without local setup."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prerequisites for Cloud Containers\n",
|
||||
"\n",
|
||||
"To use C/ua Cloud Containers, you need to:\n",
|
||||
"1. Sign up at https://trycua.com\n",
|
||||
"2. Create a Cloud Container\n",
|
||||
"3. Generate an API Key\n",
|
||||
"\n",
|
||||
"Once you have these, you can connect to your cloud container and run agents on it."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Get C/ua API credentials and container details"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"cua_api_key = os.getenv(\"CUA_API_KEY\") or \\\n",
|
||||
" input(\"Enter your C/ua API Key: \")\n",
|
||||
"container_name = os.getenv(\"CONTAINER_NAME\") or \\\n",
|
||||
" input(\"Enter your Cloud Container name: \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Choose the OS type for your container (linux or macos)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os_type = input(\"Enter the OS type of your container (linux/macos) [default: linux]: \").lower() or \"linux\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create an agent with cloud container"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import logging\n",
|
||||
"from pathlib import Path\n",
|
||||
"\n",
|
||||
"# Connect to your existing cloud container\n",
|
||||
"computer = Computer(\n",
|
||||
" os_type=os_type,\n",
|
||||
" api_key=cua_api_key,\n",
|
||||
" name=container_name,\n",
|
||||
" provider_type=VMProviderType.CLOUD,\n",
|
||||
" verbosity=logging.INFO\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Create agent\n",
|
||||
"agent = ComputerAgent(\n",
|
||||
" computer=computer,\n",
|
||||
" loop=AgentLoop.OPENAI,\n",
|
||||
" model=LLM(provider=LLMProvider.OPENAI),\n",
|
||||
" save_trajectory=True,\n",
|
||||
" trajectory_dir=str(Path(\"trajectories\")),\n",
|
||||
" only_n_most_recent_images=3,\n",
|
||||
" verbosity=logging.INFO\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Run tasks on cloud container"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tasks = [\n",
|
||||
" \"Open a web browser and navigate to GitHub\",\n",
|
||||
" \"Search for the trycua/cua repository\",\n",
|
||||
" \"Take a screenshot of the repository page\"\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"for i, task in enumerate(tasks):\n",
|
||||
" print(f\"\\nExecuting task {i+1}/{len(tasks)}: {task}\")\n",
|
||||
" async for result in cloud_agent.run(task):\n",
|
||||
" # print(result)\n",
|
||||
" pass\n",
|
||||
" print(f\"✅ Task {i+1}/{len(tasks)} completed: {task}\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Option 2: Agent with Local VMs (Lume daemon)\n",
|
||||
"\n",
|
||||
"For Apple Silicon Macs, run agents on local VMs with near-native performance."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Before we can create an agent, we need to initialize a local computer with Lume."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import logging\n",
|
||||
"from pathlib import Path\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"computer = Computer(\n",
|
||||
" verbosity=logging.INFO, \n",
|
||||
" provider_type=VMProviderType.LUME,\n",
|
||||
" display=\"1024x768\",\n",
|
||||
" memory=\"8GB\",\n",
|
||||
" cpu=\"4\",\n",
|
||||
" os_type=\"macos\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create an agent with local VM"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -117,22 +277,31 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import logging\n",
|
||||
"from pathlib import Path\n",
|
||||
"\n",
|
||||
"computer = Computer(verbosity=logging.INFO, provider_type=VMProviderType.LUME)\n",
|
||||
"\n",
|
||||
"# Create agent with Anthropic loop and provider\n",
|
||||
"agent = ComputerAgent(\n",
|
||||
" computer=computer,\n",
|
||||
" loop=AgentLoop.OPENAI,\n",
|
||||
" model=LLM(provider=LLMProvider.OPENAI),\n",
|
||||
" save_trajectory=True,\n",
|
||||
" trajectory_dir=str(Path(\"trajectories\")),\n",
|
||||
" only_n_most_recent_images=3,\n",
|
||||
" verbosity=logging.INFO\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" computer=computer,\n",
|
||||
" loop=AgentLoop.OPENAI,\n",
|
||||
" model=LLM(provider=LLMProvider.OPENAI),\n",
|
||||
" save_trajectory=True,\n",
|
||||
" trajectory_dir=str(Path(\"trajectories\")),\n",
|
||||
" only_n_most_recent_images=3,\n",
|
||||
" verbosity=logging.INFO\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Run tasks on a local Lume VM"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tasks = [\n",
|
||||
" \"Look for a repository named trycua/cua on GitHub.\",\n",
|
||||
" \"Check the open issues, open the most recent one and read it.\",\n",
|
||||
@@ -210,22 +379,6 @@
|
||||
"The agent includes a Gradio-based user interface for easy interaction. To use it:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"# Get API keys from environment or prompt user\n",
|
||||
"anthropic_key = os.getenv(\"ANTHROPIC_API_KEY\") or input(\"Enter your Anthropic API key: \")\n",
|
||||
"openai_key = os.getenv(\"OPENAI_API_KEY\") or input(\"Enter your OpenAI API key: \")\n",
|
||||
"\n",
|
||||
"os.environ[\"ANTHROPIC_API_KEY\"] = anthropic_key\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = openai_key"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -237,6 +390,146 @@
|
||||
"app = create_gradio_ui()\n",
|
||||
"app.launch(share=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Advanced Agent Configurations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using different agent loops"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can use different agent loops depending on your needs:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"1. OpenAI Agent Loop"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai_agent = ComputerAgent(\n",
|
||||
" computer=computer, # Can be cloud or local\n",
|
||||
" loop=AgentLoop.OPENAI,\n",
|
||||
" model=LLM(provider=LLMProvider.OPENAI),\n",
|
||||
" save_trajectory=True,\n",
|
||||
" trajectory_dir=str(Path(\"trajectories\")),\n",
|
||||
" verbosity=logging.INFO\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"2. Anthropic Agent Loop"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"anthropic_agent = ComputerAgent(\n",
|
||||
" computer=computer,\n",
|
||||
" loop=AgentLoop.ANTHROPIC,\n",
|
||||
" model=LLM(provider=LLMProvider.ANTHROPIC),\n",
|
||||
" save_trajectory=True,\n",
|
||||
" trajectory_dir=str(Path(\"trajectories\")),\n",
|
||||
" verbosity=logging.INFO\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"3. Omni Agent Loop (supports multiple providers)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"omni_agent = ComputerAgent(\n",
|
||||
" computer=computer,\n",
|
||||
" loop=AgentLoop.OMNI,\n",
|
||||
" model=LLM(provider=LLMProvider.ANTHROPIC, name=\"claude-3-7-sonnet-20250219\"),\n",
|
||||
" # model=LLM(provider=LLMProvider.OPENAI, name=\"gpt-4.5-preview\"),\n",
|
||||
" # model=LLM(provider=LLMProvider.OLLAMA, name=\"gemma3:12b-it-q4_K_M\"),\n",
|
||||
" save_trajectory=True,\n",
|
||||
" trajectory_dir=str(Path(\"trajectories\")),\n",
|
||||
" only_n_most_recent_images=3,\n",
|
||||
" verbosity=logging.INFO\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"4. UITARS Agent Loop (for local inference on Apple Silicon)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"uitars_agent = ComputerAgent(\n",
|
||||
" computer=computer,\n",
|
||||
" loop=AgentLoop.UITARS,\n",
|
||||
" model=LLM(provider=LLMProvider.UITARS),\n",
|
||||
" save_trajectory=True,\n",
|
||||
" trajectory_dir=str(Path(\"trajectories\")),\n",
|
||||
" verbosity=logging.INFO\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Trajectory viewing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"All agent runs save trajectories that can be viewed at https://trycua.com/trajectory-viewer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f\"Trajectories saved to: {Path('trajectories').absolute()}\")\n",
|
||||
"print(\"Upload trajectory files to https://trycua.com/trajectory-viewer to visualize agent actions\")\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
"source": [
|
||||
"## Computer\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to use Computer to operate a Lume sandbox VMs programmatically on Apple Silicon macOS systems."
|
||||
"This notebook demonstrates how to use Computer to operate sandbox VMs programmatically, either using C/ua Cloud Containers or local Lume VMs on Apple Silicon macOS systems."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -22,25 +22,23 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip uninstall -y cua-computer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip uninstall -y cua-computer\n",
|
||||
"!pip install \"cua-computer[all]\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If locally installed, use this instead:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# If locally installed, use this instead:\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.chdir('../libs/computer')\n",
|
||||
@@ -55,7 +53,126 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Lume daemon\n",
|
||||
"## Option 1: C/ua Cloud Containers\n",
|
||||
"\n",
|
||||
"C/ua Cloud Containers provide remote VMs that can be accessed from any system without local setup."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prerequisites for Cloud Containers\n",
|
||||
"\n",
|
||||
"To use C/ua Cloud Containers, you need to:\n",
|
||||
"1. Sign up at https://trycua.com\n",
|
||||
"2. Create a Cloud Container\n",
|
||||
"3. Generate an API Key\n",
|
||||
"\n",
|
||||
"Once you have these, you can connect to your cloud container using its name."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get API key and container name from environment or prompt user\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"cua_api_key = os.getenv(\"CUA_API_KEY\") or \\\n",
|
||||
" input(\"Enter your C/ua API Key: \")\n",
|
||||
"container_name = os.getenv(\"CONTAINER_NAME\") or \\\n",
|
||||
" input(\"Enter your Cloud Container name: \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Choose the OS type for your container (linux or macos)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os_type = input(\"Enter the OS type of your container (linux/macos) [default: linux]: \").lower() or \"linux\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to your Cloud Container"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from computer import Computer, VMProviderType"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Connect to your existing C/ua Cloud Container"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"computer = Computer(\n",
|
||||
" os_type=os_type, # Must match the OS type of your cloud container\n",
|
||||
" api_key=cua_api_key,\n",
|
||||
" name=container_name,\n",
|
||||
" provider_type=VMProviderType.CLOUD,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Take a screenshot"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"screenshot = await computer.interface.screenshot()\n",
|
||||
"\n",
|
||||
"with open(\"screenshot.png\", \"wb\") as f:\n",
|
||||
" f.write(screenshot)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Option 2: Local VMs (Lume daemon)\n",
|
||||
"\n",
|
||||
"For Apple Silicon Macs, you can run VMs locally using the Lume daemon."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Lume daemon setup\n",
|
||||
"\n",
|
||||
"Refer to [../libs/lume/README.md](../libs/lume/README.md) for more details on the lume cli."
|
||||
]
|
||||
@@ -143,7 +260,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialize a Computer instance"
|
||||
"### Initialize a Local Computer instance"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -190,7 +307,7 @@
|
||||
" os_type=\"macos\",\n",
|
||||
" provider_type=VMProviderType.LUME,\n",
|
||||
") as computer:\n",
|
||||
" await computer.run()\n",
|
||||
" pass\n",
|
||||
" # ... do something with the computer interface"
|
||||
]
|
||||
},
|
||||
@@ -217,6 +334,15 @@
|
||||
"await computer.run()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Computer Interface\n",
|
||||
"\n",
|
||||
"Both cloud and local computers provide the same interface for interaction."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -461,7 +587,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "cua312",
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -475,7 +601,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.9"
|
||||
"version": "3.12.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -9,7 +9,7 @@ description = "CUA (Computer Use Agent) mono-repo"
|
||||
license = { text = "MIT" }
|
||||
name = "cua-workspace"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.10"
|
||||
requires-python = ">=3.11"
|
||||
version = "0.1.0"
|
||||
|
||||
[project.urls]
|
||||
@@ -53,13 +53,13 @@ respect-source-order = true
|
||||
|
||||
[tool.black]
|
||||
line-length = 100
|
||||
target-version = ["py310"]
|
||||
target-version = ["py311"]
|
||||
|
||||
[tool.ruff]
|
||||
fix = true
|
||||
line-length = 100
|
||||
select = ["B", "E", "F", "I"]
|
||||
target-version = "py310"
|
||||
target-version = "py311"
|
||||
|
||||
[tool.ruff.format]
|
||||
docstring-code-format = true
|
||||
@@ -68,7 +68,7 @@ docstring-code-format = true
|
||||
check_untyped_defs = true
|
||||
disallow_untyped_defs = true
|
||||
ignore_missing_imports = true
|
||||
python_version = "3.10"
|
||||
python_version = "3.11"
|
||||
show_error_codes = true
|
||||
strict = true
|
||||
warn_return_any = true
|
||||
|
||||
@@ -2,83 +2,173 @@
|
||||
|
||||
set -e
|
||||
|
||||
echo "🚀 Setting up CUA playground environment..."
|
||||
echo "🚀 Launching C/ua Computer-Use Agent UI..."
|
||||
|
||||
# Check for Apple Silicon Mac
|
||||
if [[ $(uname -s) != "Darwin" || $(uname -m) != "arm64" ]]; then
|
||||
echo "❌ This script requires an Apple Silicon Mac (M1/M2/M3/M4)."
|
||||
exit 1
|
||||
fi
|
||||
# Save the original working directory
|
||||
ORIGINAL_DIR="$(pwd)"
|
||||
|
||||
# Check for macOS 15 (Sequoia) or newer
|
||||
OSVERSION=$(sw_vers -productVersion)
|
||||
if [[ $(echo "$OSVERSION 15.0" | tr " " "\n" | sort -V | head -n 1) != "15.0" ]]; then
|
||||
echo "❌ This script requires macOS 15 (Sequoia) or newer. You have $OSVERSION."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Create a temporary directory for our work
|
||||
TMP_DIR=$(mktemp -d)
|
||||
cd "$TMP_DIR"
|
||||
# Directories used by the script
|
||||
DEMO_DIR="$HOME/.cua-demo"
|
||||
VENV_DIR="$DEMO_DIR/venv"
|
||||
|
||||
# Function to clean up on exit
|
||||
cleanup() {
|
||||
cd ~
|
||||
rm -rf "$TMP_DIR"
|
||||
rm -rf "$TMP_DIR" 2>/dev/null || true
|
||||
}
|
||||
|
||||
# Create a temporary directory for our work
|
||||
TMP_DIR=$(mktemp -d)
|
||||
cd "$TMP_DIR"
|
||||
trap cleanup EXIT
|
||||
|
||||
# Install Lume if not already installed
|
||||
if ! command -v lume &> /dev/null; then
|
||||
echo "📦 Installing Lume CLI..."
|
||||
curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash
|
||||
# Ask user to choose between local macOS VMs or C/ua Cloud Containers
|
||||
echo ""
|
||||
echo "Choose your C/ua setup:"
|
||||
echo "1) ☁️ C/ua Cloud Containers (works on any system)"
|
||||
echo "2) 🖥️ Local macOS VMs (requires Apple Silicon Mac + macOS 15+)"
|
||||
echo ""
|
||||
read -p "Enter your choice (1 or 2): " CHOICE
|
||||
|
||||
if [[ "$CHOICE" == "1" ]]; then
|
||||
# C/ua Cloud Container setup
|
||||
echo ""
|
||||
echo "☁️ Setting up C/ua Cloud Containers..."
|
||||
echo ""
|
||||
|
||||
# Add lume to PATH for this session if it's not already there
|
||||
if ! command -v lume &> /dev/null; then
|
||||
export PATH="$PATH:$HOME/.local/bin"
|
||||
# Check if existing .env.local already has CUA_API_KEY (check current dir and demo dir)
|
||||
# Look for .env.local in the original working directory (before cd to temp dir)
|
||||
CURRENT_ENV_FILE="$ORIGINAL_DIR/.env.local"
|
||||
DEMO_ENV_FILE="$DEMO_DIR/.env.local"
|
||||
|
||||
CUA_API_KEY=""
|
||||
|
||||
# First check current directory
|
||||
if [[ -f "$CURRENT_ENV_FILE" ]] && grep -q "CUA_API_KEY=" "$CURRENT_ENV_FILE"; then
|
||||
EXISTING_CUA_KEY=$(grep "CUA_API_KEY=" "$CURRENT_ENV_FILE" | cut -d'=' -f2- | tr -d '"' | tr -d "'" | xargs)
|
||||
if [[ -n "$EXISTING_CUA_KEY" && "$EXISTING_CUA_KEY" != "your_cua_api_key_here" && "$EXISTING_CUA_KEY" != "" ]]; then
|
||||
CUA_API_KEY="$EXISTING_CUA_KEY"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Then check demo directory if not found in current dir
|
||||
if [[ -z "$CUA_API_KEY" ]] && [[ -f "$DEMO_ENV_FILE" ]] && grep -q "CUA_API_KEY=" "$DEMO_ENV_FILE"; then
|
||||
EXISTING_CUA_KEY=$(grep "CUA_API_KEY=" "$DEMO_ENV_FILE" | cut -d'=' -f2- | tr -d '"' | tr -d "'" | xargs)
|
||||
if [[ -n "$EXISTING_CUA_KEY" && "$EXISTING_CUA_KEY" != "your_cua_api_key_here" && "$EXISTING_CUA_KEY" != "" ]]; then
|
||||
CUA_API_KEY="$EXISTING_CUA_KEY"
|
||||
fi
|
||||
fi
|
||||
|
||||
# If no valid API key found, prompt for one
|
||||
if [[ -z "$CUA_API_KEY" ]]; then
|
||||
echo "To use C/ua Cloud Containers, you need to:"
|
||||
echo "1. Sign up at https://trycua.com"
|
||||
echo "2. Create a Cloud Container"
|
||||
echo "3. Generate an Api Key"
|
||||
echo ""
|
||||
read -p "Enter your C/ua Api Key: " CUA_API_KEY
|
||||
|
||||
if [[ -z "$CUA_API_KEY" ]]; then
|
||||
echo "❌ C/ua Api Key is required for Cloud Containers."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
USE_CLOUD=true
|
||||
|
||||
elif [[ "$CHOICE" == "2" ]]; then
|
||||
# Local macOS VM setup
|
||||
echo ""
|
||||
echo "🖥️ Setting up local macOS VMs..."
|
||||
|
||||
# Check for Apple Silicon Mac
|
||||
if [[ $(uname -s) != "Darwin" || $(uname -m) != "arm64" ]]; then
|
||||
echo "❌ Local macOS VMs require an Apple Silicon Mac (M1/M2/M3/M4)."
|
||||
echo "💡 Consider using C/ua Cloud Containers instead (option 1)."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check for macOS 15 (Sequoia) or newer
|
||||
OSVERSION=$(sw_vers -productVersion)
|
||||
if [[ $(echo "$OSVERSION 15.0" | tr " " "\n" | sort -V | head -n 1) != "15.0" ]]; then
|
||||
echo "❌ Local macOS VMs require macOS 15 (Sequoia) or newer. You have $OSVERSION."
|
||||
echo "💡 Consider using C/ua Cloud Containers instead (option 1)."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
USE_CLOUD=false
|
||||
|
||||
else
|
||||
echo "❌ Invalid choice. Please run the script again and choose 1 or 2."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Pull the macOS CUA image if not already present
|
||||
if ! lume ls | grep -q "macos-sequoia-cua"; then
|
||||
# Check available disk space
|
||||
IMAGE_SIZE_GB=30
|
||||
AVAILABLE_SPACE_KB=$(df -k $HOME | tail -1 | awk '{print $4}')
|
||||
AVAILABLE_SPACE_GB=$(($AVAILABLE_SPACE_KB / 1024 / 1024))
|
||||
|
||||
echo "📊 The macOS CUA image will use approximately ${IMAGE_SIZE_GB}GB of disk space."
|
||||
echo " You currently have ${AVAILABLE_SPACE_GB}GB available on your system."
|
||||
|
||||
# Prompt for confirmation
|
||||
read -p " Continue? [y]/n: " CONTINUE
|
||||
CONTINUE=${CONTINUE:-y}
|
||||
|
||||
if [[ $CONTINUE =~ ^[Yy]$ ]]; then
|
||||
echo "📥 Pulling macOS CUA image (this may take a while)..."
|
||||
lume pull macos-sequoia-cua:latest
|
||||
else
|
||||
echo "❌ Installation cancelled."
|
||||
exit 1
|
||||
# Install Lume if not already installed (only for local VMs)
|
||||
if [[ "$USE_CLOUD" == "false" ]]; then
|
||||
if ! command -v lume &> /dev/null; then
|
||||
echo "📦 Installing Lume CLI..."
|
||||
curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash
|
||||
|
||||
# Add lume to PATH for this session if it's not already there
|
||||
if ! command -v lume &> /dev/null; then
|
||||
export PATH="$PATH:$HOME/.local/bin"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Pull the macOS CUA image if not already present
|
||||
if ! lume ls | grep -q "macos-sequoia-cua"; then
|
||||
# Check available disk space
|
||||
IMAGE_SIZE_GB=30
|
||||
AVAILABLE_SPACE_KB=$(df -k $HOME | tail -1 | awk '{print $4}')
|
||||
AVAILABLE_SPACE_GB=$(($AVAILABLE_SPACE_KB / 1024 / 1024))
|
||||
|
||||
echo "📊 The macOS CUA image will use approximately ${IMAGE_SIZE_GB}GB of disk space."
|
||||
echo " You currently have ${AVAILABLE_SPACE_GB}GB available on your system."
|
||||
|
||||
# Prompt for confirmation
|
||||
read -p " Continue? [y]/n: " CONTINUE
|
||||
CONTINUE=${CONTINUE:-y}
|
||||
|
||||
if [[ $CONTINUE =~ ^[Yy]$ ]]; then
|
||||
echo "📥 Pulling macOS CUA image (this may take a while)..."
|
||||
lume pull macos-sequoia-cua:latest
|
||||
else
|
||||
echo "❌ Installation cancelled."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# Create a Python virtual environment
|
||||
echo "🐍 Setting up Python environment..."
|
||||
PYTHON_CMD="python3"
|
||||
|
||||
# Check if Python 3.11+ is available
|
||||
PYTHON_VERSION=$($PYTHON_CMD --version 2>&1 | cut -d" " -f2)
|
||||
PYTHON_MAJOR=$(echo $PYTHON_VERSION | cut -d. -f1)
|
||||
PYTHON_MINOR=$(echo $PYTHON_VERSION | cut -d. -f2)
|
||||
# Try different Python commands in order of preference
|
||||
PYTHON_CMD=""
|
||||
for cmd in python3.11 python3 python; do
|
||||
if command -v $cmd &> /dev/null; then
|
||||
# Check if this Python version is 3.11+
|
||||
PYTHON_VERSION=$($cmd --version 2>&1 | cut -d" " -f2)
|
||||
PYTHON_MAJOR=$(echo $PYTHON_VERSION | cut -d. -f1)
|
||||
PYTHON_MINOR=$(echo $PYTHON_VERSION | cut -d. -f2)
|
||||
|
||||
if [ "$PYTHON_MAJOR" -gt 3 ] || ([ "$PYTHON_MAJOR" -eq 3 ] && [ "$PYTHON_MINOR" -ge 11 ]); then
|
||||
PYTHON_CMD=$cmd
|
||||
echo "✅ Found suitable Python: $cmd (version $PYTHON_VERSION)"
|
||||
break
|
||||
else
|
||||
echo "⚠️ Found $cmd (version $PYTHON_VERSION) but it's too old, trying next..."
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
if [ "$PYTHON_MAJOR" -lt 3 ] || ([ "$PYTHON_MAJOR" -eq 3 ] && [ "$PYTHON_MINOR" -lt 11 ]); then
|
||||
echo "❌ Python 3.11+ is required. You have $PYTHON_VERSION."
|
||||
# If no suitable Python was found, error out
|
||||
if [ -z "$PYTHON_CMD" ]; then
|
||||
echo "❌ Python 3.11+ is required but not found."
|
||||
echo "Please install Python 3.11+ and try again."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Create a virtual environment
|
||||
VENV_DIR="$HOME/.cua-venv"
|
||||
if [ ! -d "$VENV_DIR" ]; then
|
||||
$PYTHON_CMD -m venv "$VENV_DIR"
|
||||
fi
|
||||
@@ -87,66 +177,144 @@ fi
|
||||
source "$VENV_DIR/bin/activate"
|
||||
|
||||
# Install required packages
|
||||
echo "📦 Updating CUA packages..."
|
||||
pip install -U pip
|
||||
echo "📦 Updating C/ua packages..."
|
||||
pip install -U pip setuptools wheel Cmake
|
||||
pip install -U cua-computer "cua-agent[all]"
|
||||
|
||||
# Temporary fix for mlx-vlm, see https://github.com/Blaizzy/mlx-vlm/pull/349
|
||||
pip install git+https://github.com/ddupont808/mlx-vlm.git@stable/fix/qwen2-position-id
|
||||
|
||||
# Create a simple demo script
|
||||
DEMO_DIR="$HOME/.cua-demo"
|
||||
mkdir -p "$DEMO_DIR"
|
||||
|
||||
cat > "$DEMO_DIR/run_demo.py" << 'EOF'
|
||||
import asyncio
|
||||
import os
|
||||
from computer import Computer
|
||||
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
|
||||
from agent.ui.gradio.app import create_gradio_ui
|
||||
|
||||
# Try to load API keys from environment
|
||||
api_key = os.environ.get("OPENAI_API_KEY", "")
|
||||
if not api_key:
|
||||
print("\n⚠️ No OpenAI API key found. You'll need to provide one in the UI.")
|
||||
|
||||
# Launch the Gradio UI and open it in the browser
|
||||
app = create_gradio_ui()
|
||||
app.launch(share=False, inbrowser=True)
|
||||
# Create .env.local file with API keys (only if it doesn't exist)
|
||||
if [[ ! -f "$DEMO_DIR/.env.local" ]]; then
|
||||
cat > "$DEMO_DIR/.env.local" << EOF
|
||||
# Uncomment and add your API keys here
|
||||
# OPENAI_API_KEY=your_openai_api_key_here
|
||||
# ANTHROPIC_API_KEY=your_anthropic_api_key_here
|
||||
CUA_API_KEY=your_cua_api_key_here
|
||||
EOF
|
||||
echo "📝 Created .env.local file with API key placeholders"
|
||||
else
|
||||
echo "📝 Found existing .env.local file - keeping your current settings"
|
||||
fi
|
||||
|
||||
if [[ "$USE_CLOUD" == "true" ]]; then
|
||||
# Add CUA API key to .env.local if not already present
|
||||
if ! grep -q "CUA_API_KEY" "$DEMO_DIR/.env.local"; then
|
||||
echo "CUA_API_KEY=$CUA_API_KEY" >> "$DEMO_DIR/.env.local"
|
||||
echo "🔑 Added CUA_API_KEY to .env.local"
|
||||
elif grep -q "CUA_API_KEY=your_cua_api_key_here" "$DEMO_DIR/.env.local"; then
|
||||
# Update placeholder with actual key
|
||||
sed -i.bak "s/CUA_API_KEY=your_cua_api_key_here/CUA_API_KEY=$CUA_API_KEY/" "$DEMO_DIR/.env.local"
|
||||
echo "🔑 Updated CUA_API_KEY in .env.local"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Create a convenience script to run the demo
|
||||
cat > "$DEMO_DIR/start_demo.sh" << EOF
|
||||
cat > "$DEMO_DIR/start_ui.sh" << EOF
|
||||
#!/bin/bash
|
||||
source "$VENV_DIR/bin/activate"
|
||||
cd "$DEMO_DIR"
|
||||
python run_demo.py
|
||||
EOF
|
||||
chmod +x "$DEMO_DIR/start_demo.sh"
|
||||
chmod +x "$DEMO_DIR/start_ui.sh"
|
||||
|
||||
echo "✅ Setup complete!"
|
||||
echo "🖥️ You can start the CUA playground by running: $DEMO_DIR/start_demo.sh"
|
||||
|
||||
# Check if the VM is running
|
||||
echo "🔍 Checking if the macOS CUA VM is running..."
|
||||
VM_RUNNING=$(lume ls | grep "macos-sequoia-cua" | grep "running" || echo "")
|
||||
if [[ "$USE_CLOUD" == "true" ]]; then
|
||||
# Create run_demo.py for cloud containers
|
||||
cat > "$DEMO_DIR/run_demo.py" << 'EOF'
|
||||
import asyncio
|
||||
import os
|
||||
from pathlib import Path
|
||||
from dotenv import load_dotenv
|
||||
from computer import Computer
|
||||
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
|
||||
from agent.ui.gradio.app import create_gradio_ui
|
||||
|
||||
if [ -z "$VM_RUNNING" ]; then
|
||||
echo "🚀 Starting the macOS CUA VM in the background..."
|
||||
lume run macos-sequoia-cua:latest &
|
||||
# Wait a moment for the VM to initialize
|
||||
sleep 5
|
||||
echo "✅ VM started successfully."
|
||||
# Load environment variables from .env.local
|
||||
load_dotenv(Path(__file__).parent / ".env.local")
|
||||
|
||||
# Check for required API keys
|
||||
cua_api_key = os.environ.get("CUA_API_KEY", "")
|
||||
if not cua_api_key:
|
||||
print("\n❌ CUA_API_KEY not found in .env.local file.")
|
||||
print("Please add your CUA API key to the .env.local file.")
|
||||
exit(1)
|
||||
|
||||
openai_key = os.environ.get("OPENAI_API_KEY", "")
|
||||
anthropic_key = os.environ.get("ANTHROPIC_API_KEY", "")
|
||||
|
||||
if not openai_key and not anthropic_key:
|
||||
print("\n⚠️ No OpenAI or Anthropic API keys found in .env.local.")
|
||||
print("Please add at least one API key to use AI agents.")
|
||||
|
||||
print("🚀 Starting CUA playground with Cloud Containers...")
|
||||
print("📝 Edit .env.local to update your API keys")
|
||||
|
||||
# Launch the Gradio UI and open it in the browser
|
||||
app = create_gradio_ui()
|
||||
app.launch(share=False, inbrowser=True)
|
||||
EOF
|
||||
else
|
||||
echo "✅ macOS CUA VM is already running."
|
||||
# Create run_demo.py for local macOS VMs
|
||||
cat > "$DEMO_DIR/run_demo.py" << 'EOF'
|
||||
import asyncio
|
||||
import os
|
||||
from pathlib import Path
|
||||
from dotenv import load_dotenv
|
||||
from computer import Computer
|
||||
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
|
||||
from agent.ui.gradio.app import create_gradio_ui
|
||||
|
||||
# Load environment variables from .env.local
|
||||
load_dotenv(Path(__file__).parent / ".env.local")
|
||||
|
||||
# Try to load API keys from environment
|
||||
openai_key = os.environ.get("OPENAI_API_KEY", "")
|
||||
anthropic_key = os.environ.get("ANTHROPIC_API_KEY", "")
|
||||
|
||||
if not openai_key and not anthropic_key:
|
||||
print("\n⚠️ No OpenAI or Anthropic API keys found in .env.local.")
|
||||
print("Please add at least one API key to use AI agents.")
|
||||
|
||||
print("🚀 Starting CUA playground with local macOS VMs...")
|
||||
print("📝 Edit .env.local to update your API keys")
|
||||
|
||||
# Launch the Gradio UI and open it in the browser
|
||||
app = create_gradio_ui()
|
||||
app.launch(share=False, inbrowser=True)
|
||||
EOF
|
||||
fi
|
||||
|
||||
echo "☁️ CUA Cloud Container setup complete!"
|
||||
echo "📝 Edit $DEMO_DIR/.env.local to update your API keys"
|
||||
echo "🖥️ Start the playground by running: $DEMO_DIR/start_ui.sh"
|
||||
|
||||
# Check if the VM is running (only for local setup)
|
||||
if [[ "$USE_CLOUD" == "false" ]]; then
|
||||
echo "🔍 Checking if the macOS CUA VM is running..."
|
||||
VM_RUNNING=$(lume ls | grep "macos-sequoia-cua" | grep "running" || echo "")
|
||||
|
||||
if [ -z "$VM_RUNNING" ]; then
|
||||
echo "🚀 Starting the macOS CUA VM in the background..."
|
||||
lume run macos-sequoia-cua:latest &
|
||||
# Wait a moment for the VM to initialize
|
||||
sleep 5
|
||||
echo "✅ VM started successfully."
|
||||
else
|
||||
echo "✅ macOS CUA VM is already running."
|
||||
fi
|
||||
fi
|
||||
|
||||
# Ask if the user wants to start the demo now
|
||||
echo
|
||||
read -p "Would you like to start the CUA playground now? (y/n) " -n 1 -r
|
||||
read -p "Would you like to start the C/ua Computer-Use Agent UI now? (y/n) " -n 1 -r
|
||||
echo
|
||||
if [[ $REPLY =~ ^[Yy]$ ]]; then
|
||||
echo "🚀 Starting the CUA playground..."
|
||||
echo "🚀 Starting the C/ua Computer-Use Agent UI..."
|
||||
echo ""
|
||||
"$DEMO_DIR/start_demo.sh"
|
||||
"$DEMO_DIR/start_ui.sh"
|
||||
fi
|
||||
|
||||
Reference in New Issue
Block a user