@@ -106,7 +114,8 @@ For local development on Windows 10 Pro/Enterprise or Windows 11:
4. Configure your desktop application installation within the sandbox
- **Manual VPN Setup**: Windows Sandbox requires manual VPN configuration each time it starts. For production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections.
+ **Manual VPN Setup**: Windows Sandbox requires manual VPN configuration each time it starts. For
+ production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections.
@@ -421,6 +430,7 @@ python hr_automation.py
```
The agent will:
+
1. Connect to your Windows environment (with VPN if configured)
2. Launch and navigate the desktop application
3. Execute each workflow step sequentially
@@ -506,6 +516,7 @@ agent = ComputerAgent(
### 1. Workflow Mining
Before deploying, analyze your actual workflows:
+
- Record user interactions with the application
- Identify common patterns and edge cases
- Map out decision trees and validation requirements
@@ -524,6 +535,7 @@ tasks = ["onboard_employee", "run_payroll", "generate_compliance_report"]
```
This provides:
+
- Better audit trails
- Approval gates at business logic level
- Higher success rates
@@ -547,12 +559,14 @@ agent = ComputerAgent(
Choose your deployment model:
**Managed (Recommended)**
+
- Cua hosts Windows sandboxes, VPN/RDP stack, and agent runtime
- You get UI/API endpoints for triggering workflows
- Automatic scaling, monitoring, and maintenance
- SLA guarantees and enterprise support
**Self-Hosted**
+
- You manage Windows VMs, VPN infrastructure, and agent deployment
- Full control over data and security
- Custom network configurations
diff --git a/docs/content/docs/index.mdx b/docs/content/docs/index.mdx
index acecca6d..f475db7f 100644
--- a/docs/content/docs/index.mdx
+++ b/docs/content/docs/index.mdx
@@ -5,7 +5,8 @@ title: Introduction
import { Monitor, Code, BookOpen, Zap, Bot, Boxes, Rocket } from 'lucide-react';
-Cua is an open-source framework for building **Computer-Use Agents** - AI systems that see, understand, and interact with desktop applications through vision and action, just like humans do.
+ Cua is an open-source framework for building **Computer-Use Agents** - AI systems that see,
+ understand, and interact with desktop applications through vision and action, just like humans do.
## Why Cua?
diff --git a/docs/content/docs/libraries/computer-server/index.mdx b/docs/content/docs/libraries/computer-server/index.mdx
index e2f683dd..d5affd25 100644
--- a/docs/content/docs/libraries/computer-server/index.mdx
+++ b/docs/content/docs/libraries/computer-server/index.mdx
@@ -7,7 +7,14 @@ github:
---
- A corresponding Jupyter Notebook is available for this documentation.
+ A corresponding{' '}
+
+ Jupyter Notebook
+ {' '}
+ is available for this documentation.
The Computer Server API reference documentation is currently under development.
diff --git a/docs/content/docs/libraries/cua-cli/commands.mdx b/docs/content/docs/libraries/cua-cli/commands.mdx
index e50a7c07..b425b9a4 100644
--- a/docs/content/docs/libraries/cua-cli/commands.mdx
+++ b/docs/content/docs/libraries/cua-cli/commands.mdx
@@ -15,6 +15,7 @@ The CUA CLI provides commands for authentication and sandbox management.
The CLI supports **two command styles** for flexibility:
**Flat style** (quick & concise):
+
```bash
cua list
cua create --os linux --size small --region north-america
@@ -22,6 +23,7 @@ cua start my-sandbox
```
**Grouped style** (explicit & clear):
+
```bash
cua sb list # or: cua sandbox list
cua sb create # or: cua sandbox create
@@ -54,9 +56,11 @@ cua login --api-key sk-your-api-key-here
```
**Options:**
+
- `--api-key
` - Provide API key directly instead of browser flow
**Example:**
+
```bash
$ cua auth login
Opening browser for CLI auth...
@@ -75,12 +79,14 @@ cua env
```
**Example:**
+
```bash
$ cua auth env
Wrote /path/to/your/project/.env
```
The generated `.env` file will contain:
+
```
CUA_API_KEY=sk-your-api-key-here
```
@@ -97,6 +103,7 @@ cua logout
```
**Example:**
+
```bash
$ cua auth logout
Logged out
@@ -121,6 +128,7 @@ cua ps
```
**Example Output (default, passwords hidden):**
+
```
NAME STATUS HOST
my-dev-sandbox running my-dev-sandbox.sandbox.cua.ai
@@ -128,6 +136,7 @@ test-windows stopped test-windows.sandbox.cua.ai
```
**Example Output (with --show-passwords):**
+
```
NAME STATUS PASSWORD HOST
my-dev-sandbox running secure-pass-123 my-dev-sandbox.sandbox.cua.ai
@@ -143,11 +152,13 @@ cua create --os --size --region
```
**Required Options:**
+
- `--os` - Operating system: `linux`, `windows`, `macos`
- `--size` - Sandbox size: `small`, `medium`, `large`
- `--region` - Region: `north-america`, `europe`, `asia-pacific`, `south-america`
**Examples:**
+
```bash
# Create a small Linux sandbox in North America
cua create --os linux --size small --region north-america
@@ -162,6 +173,7 @@ cua create --os macos --size large --region asia-pacific
**Response Types:**
**Immediate (Status 200):**
+
```bash
Sandbox created and ready: my-new-sandbox-abc123
Password: secure-password-here
@@ -169,6 +181,7 @@ Host: my-new-sandbox-abc123.sandbox.cua.ai
```
**Provisioning (Status 202):**
+
```bash
Sandbox provisioning started: my-new-sandbox-abc123
Job ID: job-xyz789
@@ -184,6 +197,7 @@ cua start
```
**Example:**
+
```bash
$ cua start my-dev-sandbox
Start accepted
@@ -198,6 +212,7 @@ cua stop
```
**Example:**
+
```bash
$ cua stop my-dev-sandbox
stopping
@@ -212,6 +227,7 @@ cua restart
```
**Example:**
+
```bash
$ cua restart my-dev-sandbox
restarting
@@ -226,6 +242,7 @@ cua delete
```
**Example:**
+
```bash
$ cua delete old-test-sandbox
Sandbox deletion initiated: deleting
@@ -247,6 +264,7 @@ cua open
```
**Example:**
+
```bash
$ cua vnc my-dev-sandbox
Opening NoVNC: https://my-dev-sandbox.sandbox.cua.ai/vnc.html?autoconnect=true&password=...
@@ -254,7 +272,6 @@ Opening NoVNC: https://my-dev-sandbox.sandbox.cua.ai/vnc.html?autoconnect=true&p
This command automatically opens your default browser to the VNC interface with the correct password pre-filled.
-
## Global Options
### Help
@@ -273,18 +290,21 @@ cua list --help
The CLI provides clear error messages for common issues:
### Authentication Errors
+
```bash
$ cua list
Unauthorized. Try 'cua auth login' again.
```
### Sandbox Not Found
+
```bash
$ cua start nonexistent-sandbox
Sandbox not found
```
### Invalid Configuration
+
```bash
$ cua create --os invalid --configuration small --region north-america
Invalid request or unsupported configuration
@@ -293,6 +313,7 @@ Invalid request or unsupported configuration
## Tips and Best Practices
### 1. Use Descriptive Sandbox Names
+
```bash
# Good
cua create --os linux --size small --region north-america
@@ -304,6 +325,7 @@ cua list # Check the generated name
```
### 2. Environment Management
+
```bash
# Set up your project with API key
cd my-project
@@ -312,6 +334,7 @@ cua auth env
```
### 3. Quick Sandbox Access
+
```bash
# Create aliases for frequently used sandboxes
alias dev-sandbox="cua vnc my-development-sandbox"
@@ -319,6 +342,7 @@ alias prod-sandbox="cua vnc my-production-sandbox"
```
### 4. Monitoring Provisioning
+
```bash
# For sandboxes that need provisioning time
cua create --os windows --size large --region europe
diff --git a/docs/content/docs/libraries/cua-cli/index.mdx b/docs/content/docs/libraries/cua-cli/index.mdx
index e1adaad0..7a7ac914 100644
--- a/docs/content/docs/libraries/cua-cli/index.mdx
+++ b/docs/content/docs/libraries/cua-cli/index.mdx
@@ -34,16 +34,19 @@ cua sb list
## Use Cases
### Development Workflow
+
- Quickly spin up cloud sandboxes for testing
- Manage multiple sandboxes across different regions
- Integrate with CI/CD pipelines
### Team Collaboration
+
- Share sandbox configurations and access
- Standardize development environments
- Quick onboarding for new team members
### Automation
+
- Script sandbox provisioning and management
- Integrate with deployment workflows
- Automate environment setup
diff --git a/docs/content/docs/libraries/cua-cli/installation.mdx b/docs/content/docs/libraries/cua-cli/installation.mdx
index d05b42c0..9e08a7f0 100644
--- a/docs/content/docs/libraries/cua-cli/installation.mdx
+++ b/docs/content/docs/libraries/cua-cli/installation.mdx
@@ -11,24 +11,21 @@ import { Callout } from 'fumadocs-ui/components/callout';
The fastest way to install the CUA CLI is using our installation scripts:
-
- ```bash
- curl -LsSf https://cua.ai/cli/install.sh | sh
- ```
-
+ ```bash curl -LsSf https://cua.ai/cli/install.sh | sh ```
- ```powershell
- powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
+ ```powershell powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
```
These scripts will automatically:
+
1. Install [Bun](https://bun.sh) (a fast JavaScript runtime)
2. Install the CUA CLI via `bun add -g @trycua/cli`
- The installation scripts will automatically detect your system and install the appropriate binary to your PATH.
+ The installation scripts will automatically detect your system and install the appropriate binary
+ to your PATH.
## Alternative: Install with Bun
@@ -44,8 +41,8 @@ bun add -g @trycua/cli
```
- Using Bun provides faster installation and better performance compared to npm.
- If you don't have Bun installed, the first command will install it for you.
+ Using Bun provides faster installation and better performance compared to npm. If you don't have
+ Bun installed, the first command will install it for you.
## Verify Installation
@@ -76,40 +73,21 @@ To update to the latest version:
- Re-run the installation script:
- ```bash
- # macOS/Linux
- curl -LsSf https://cua.ai/cli/install.sh | sh
-
- # Windows
- powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
- ```
-
-
- ```bash
- npm update -g @trycua/cli
+ Re-run the installation script: ```bash # macOS/Linux curl -LsSf https://cua.ai/cli/install.sh |
+ sh # Windows powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
```
+ ```bash npm update -g @trycua/cli ```
## Uninstalling
- Remove the binary from your PATH:
- ```bash
- # macOS/Linux
- rm $(which cua)
-
- # Windows
- # Remove from your PATH or delete the executable
- ```
-
-
- ```bash
- npm uninstall -g @trycua/cli
- ```
+ Remove the binary from your PATH: ```bash # macOS/Linux rm $(which cua) # Windows # Remove from
+ your PATH or delete the executable ```
+ ```bash npm uninstall -g @trycua/cli ```
## Troubleshooting
@@ -128,17 +106,12 @@ If you encounter permission issues during installation:
- Try running with sudo (not recommended for the curl method):
- ```bash
- # If using npm
- sudo npm install -g @trycua/cli
- ```
+ Try running with sudo (not recommended for the curl method): ```bash # If using npm sudo npm
+ install -g @trycua/cli ```
- Run PowerShell as Administrator:
- ```powershell
- # Right-click PowerShell and "Run as Administrator"
- powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
+ Run PowerShell as Administrator: ```powershell # Right-click PowerShell and "Run as
+ Administrator" powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
```
diff --git a/docs/content/docs/libraries/mcp-server/client-integrations.mdx b/docs/content/docs/libraries/mcp-server/client-integrations.mdx
index a95df6a9..43d76ab5 100644
--- a/docs/content/docs/libraries/mcp-server/client-integrations.mdx
+++ b/docs/content/docs/libraries/mcp-server/client-integrations.mdx
@@ -30,13 +30,15 @@ To use with Claude Desktop, add an entry to your Claude Desktop configuration (`
If you're working with the CUA source code:
**Standard VM Mode:**
+
```json
{
"mcpServers": {
"cua-agent": {
"command": "/usr/bin/env",
"args": [
- "bash", "-lc",
+ "bash",
+ "-lc",
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
]
}
@@ -45,13 +47,15 @@ If you're working with the CUA source code:
```
**Host Computer Control Mode:**
+
```json
{
"mcpServers": {
"cua-agent": {
"command": "/usr/bin/env",
"args": [
- "bash", "-lc",
+ "bash",
+ "-lc",
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; export CUA_USE_HOST_COMPUTER_SERVER='true'; export CUA_MAX_IMAGES='1'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
]
}
@@ -62,6 +66,7 @@ If you're working with the CUA source code:
**Note**: Replace `/path/to/cua` with the absolute path to your CUA repository directory.
**⚠️ Host Computer Control Setup**: When using `CUA_USE_HOST_COMPUTER_SERVER='true'`, you must also:
+
1. Install computer server dependencies: `python3 -m pip install uvicorn fastapi`
2. Install the computer server: `python3 -m pip install -e libs/python/computer-server --break-system-packages`
3. Start the computer server: `python -m computer_server --log-level debug`
diff --git a/docs/content/docs/libraries/mcp-server/configuration.mdx b/docs/content/docs/libraries/mcp-server/configuration.mdx
index cce1957c..30c3074f 100644
--- a/docs/content/docs/libraries/mcp-server/configuration.mdx
+++ b/docs/content/docs/libraries/mcp-server/configuration.mdx
@@ -4,19 +4,20 @@ title: Configuration
The server is configured using environment variables (can be set in the Claude Desktop config):
-| Variable | Description | Default |
-|----------|-------------|---------|
-| `CUA_MODEL_NAME` | Model string (e.g., "anthropic/claude-sonnet-4-20250514", "anthropic/claude-3-5-sonnet-20240620", "openai/computer-use-preview", "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", "omniparser+litellm/gpt-4o", "omniparser+ollama_chat/gemma3") | anthropic/claude-sonnet-4-20250514 |
-| `ANTHROPIC_API_KEY` | Your Anthropic API key (required for Anthropic models) | None |
-| `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 |
-| `CUA_USE_HOST_COMPUTER_SERVER` | Target your local desktop instead of a VM. Set to "true" to use your host system. **Warning:** AI models may perform risky actions. | false |
+| Variable | Description | Default |
+| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- |
+| `CUA_MODEL_NAME` | Model string (e.g., "anthropic/claude-sonnet-4-20250514", "openai/computer-use-preview", "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", "omniparser+litellm/gpt-4o", "omniparser+ollama_chat/gemma3") | anthropic/claude-sonnet-4-20250514 |
+| `ANTHROPIC_API_KEY` | Your Anthropic API key (required for Anthropic models) | None |
+| `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 |
+| `CUA_USE_HOST_COMPUTER_SERVER` | Target your local desktop instead of a VM. Set to "true" to use your host system. **Warning:** AI models may perform risky actions. | false |
## Model Configuration
The `CUA_MODEL_NAME` environment variable supports various model providers through LiteLLM integration:
### Supported Providers
-- **Anthropic**: `anthropic/claude-sonnet-4-20250514`, `anthropic/claude-3-5-sonnet-20240620`, `anthropic/claude-3-haiku-20240307`
+
+- **Anthropic**: `anthropic/claude-sonnet-4-20250514`,
- **OpenAI**: `openai/computer-use-preview`, `openai/gpt-4o`
- **Local Models**: `huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B`
- **Omni + LiteLLM**: `omniparser+litellm/gpt-4o`, `omniparser+litellm/claude-3-haiku`
@@ -25,6 +26,7 @@ The `CUA_MODEL_NAME` environment variable supports various model providers throu
### Example Configurations
**Claude Desktop Configuration:**
+
```json
{
"mcpServers": {
@@ -43,6 +45,7 @@ The `CUA_MODEL_NAME` environment variable supports various model providers throu
```
**Local Model Configuration:**
+
```json
{
"mcpServers": {
@@ -61,6 +64,7 @@ The `CUA_MODEL_NAME` environment variable supports various model providers throu
## Session Management Configuration
The MCP server automatically manages sessions with the following defaults:
+
- **Max Concurrent Sessions**: 10
- **Session Timeout**: 10 minutes of inactivity
- **Computer Pool Size**: 5 instances
diff --git a/docs/content/docs/libraries/mcp-server/installation.mdx b/docs/content/docs/libraries/mcp-server/installation.mdx
index e3e11a6b..b9c14f09 100644
--- a/docs/content/docs/libraries/mcp-server/installation.mdx
+++ b/docs/content/docs/libraries/mcp-server/installation.mdx
@@ -58,7 +58,8 @@ If you're working with the CUA source code directly (like in the CUA repository)
"cua-agent": {
"command": "/usr/bin/env",
"args": [
- "bash", "-lc",
+ "bash",
+ "-lc",
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
]
}
@@ -69,16 +70,19 @@ If you're working with the CUA source code directly (like in the CUA repository)
**For host computer control** (development setup):
1. **Install Computer Server Dependencies**:
+
```bash
python3 -m pip install uvicorn fastapi
python3 -m pip install -e libs/python/computer-server --break-system-packages
```
2. **Start the Computer Server**:
+
```bash
cd /path/to/cua
python -m computer_server --log-level debug
```
+
This will start the computer server on `http://localhost:8000` that controls your actual desktop.
3. **Configure Claude Desktop**:
@@ -88,7 +92,8 @@ If you're working with the CUA source code directly (like in the CUA repository)
"cua-agent": {
"command": "/usr/bin/env",
"args": [
- "bash", "-lc",
+ "bash",
+ "-lc",
"export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; export CUA_USE_HOST_COMPUTER_SERVER='true'; export CUA_MAX_IMAGES='1'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh"
]
}
@@ -110,6 +115,7 @@ If you're working with the CUA source code directly (like in the CUA repository)
- Check logs for specific error messages
2. **"Missing Anthropic API Key"** - Add your API key to the configuration:
+
```json
"env": {
"ANTHROPIC_API_KEY": "your-api-key-here"
@@ -118,8 +124,6 @@ If you're working with the CUA source code directly (like in the CUA repository)
3. **"model not found"** - Use a valid model name:
- ✅ `anthropic/claude-sonnet-4-20250514`
- - ✅ `anthropic/claude-3-5-sonnet-20240620`
- - ❌ `anthropic/claude-3-5-sonnet-20241022` (doesn't exist)
4. **Script not found** - If you get a `/bin/bash: ~/cua/libs/python/mcp-server/scripts/start_mcp_server.sh: No such file or directory` error, try changing the path to the script to be absolute instead of relative.
@@ -130,6 +134,7 @@ If you're working with the CUA source code directly (like in the CUA repository)
- **Image size errors**: Use `CUA_MAX_IMAGES='1'` to reduce image context size
**Viewing Logs:**
+
```bash
tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
```
diff --git a/docs/content/docs/libraries/mcp-server/llm-integrations.mdx b/docs/content/docs/libraries/mcp-server/llm-integrations.mdx
index 6dedd52d..656def70 100644
--- a/docs/content/docs/libraries/mcp-server/llm-integrations.mdx
+++ b/docs/content/docs/libraries/mcp-server/llm-integrations.mdx
@@ -12,7 +12,7 @@ This MCP server features comprehensive liteLLM integration, allowing you to use
### Model String Examples:
-- **Anthropic**: `"anthropic/claude-3-5-sonnet-20241022"`
+- **Anthropic**: `"anthropic/claude-sonnet-4-5-20250929"`
- **OpenAI**: `"openai/computer-use-preview"`
- **UI-TARS**: `"huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"`
- **Omni + Any LiteLLM**: `"omniparser+litellm/gpt-4o"`, `"omniparser+litellm/claude-3-haiku"`, `"omniparser+ollama_chat/gemma3"`
diff --git a/docs/content/docs/libraries/mcp-server/tools.mdx b/docs/content/docs/libraries/mcp-server/tools.mdx
index 14901057..0b4616ef 100644
--- a/docs/content/docs/libraries/mcp-server/tools.mdx
+++ b/docs/content/docs/libraries/mcp-server/tools.mdx
@@ -45,17 +45,20 @@ The MCP server supports multi-client sessions with automatic resource management
## Usage Examples
### Basic Task Execution
+
```
"Open Chrome and navigate to github.com"
"Create a folder called 'Projects' on my desktop"
```
### Multi-Task Execution
+
```
"Run these tasks: 1) Open Finder, 2) Navigate to Documents, 3) Create a new folder called 'Work'"
```
### Session Management
+
```
"Take a screenshot of the current screen"
"Show me the session statistics"
diff --git a/docs/content/docs/libraries/mcp-server/usage.mdx b/docs/content/docs/libraries/mcp-server/usage.mdx
index 1748490a..d65fc644 100644
--- a/docs/content/docs/libraries/mcp-server/usage.mdx
+++ b/docs/content/docs/libraries/mcp-server/usage.mdx
@@ -16,27 +16,35 @@ Claude will automatically use your CUA agent to perform these tasks.
## Advanced Features
### Progress Reporting
+
The MCP server provides real-time progress updates during task execution:
+
- Task progress is reported as percentages (0-100%)
- Multi-task operations show progress for each individual task
- Progress updates are streamed to the MCP client for real-time feedback
### Error Handling
+
Robust error handling ensures reliable operation:
+
- Failed tasks return error messages with screenshots when possible
- Session state is preserved even when individual tasks fail
- Automatic cleanup prevents resource leaks
- Detailed error logging for troubleshooting
### Concurrent Task Execution
+
For improved performance, multiple tasks can run concurrently:
+
- Set `concurrent=true` in `run_multi_cua_tasks` for parallel execution
- Each task runs in its own context with isolated state
- Progress tracking works for both sequential and concurrent modes
- Resource pooling ensures efficient computer instance usage
### Session Management
+
Multi-client support with automatic resource management:
+
- Each client gets isolated sessions with separate computer instances
- Sessions automatically clean up after 10 minutes of inactivity
- Resource pooling prevents resource exhaustion
@@ -55,7 +63,8 @@ No additional configuration is needed - this is the default behavior.
### Option: Targeting Your Local Desktop
- **Warning:** When targeting your local system, AI models have direct access to your desktop and may perform risky actions. Use with caution.
+ **Warning:** When targeting your local system, AI models have direct access to your desktop and
+ may perform risky actions. Use with caution.
To have the MCP server control your local desktop instead of a VM:
@@ -82,13 +91,14 @@ Add the `CUA_USE_HOST_COMPUTER_SERVER` environment variable to your MCP client c
"command": "/bin/bash",
"args": ["~/.cua/start_mcp_server.sh"],
"env": {
- "CUA_MODEL_NAME": "anthropic/claude-3-5-sonnet-20241022",
+ "CUA_MODEL_NAME": "anthropic/claude-sonnet-4-5-20250929",
"CUA_USE_HOST_COMPUTER_SERVER": "true"
}
}
}
}
```
+
Set the environment variable in your MCP client configuration:
@@ -98,6 +108,7 @@ Add the `CUA_USE_HOST_COMPUTER_SERVER` environment variable to your MCP client c
```
Then start your MCP client as usual.
+
@@ -108,6 +119,7 @@ Now Claude will control your local desktop directly when you ask it to perform c
## Usage Examples
### Single Task Execution
+
```
"Open Safari and navigate to apple.com"
"Create a new folder on the desktop called 'My Projects'"
@@ -115,16 +127,19 @@ Now Claude will control your local desktop directly when you ask it to perform c
```
### Multi-Task Execution (Sequential)
+
```
"Run these tasks in order: 1) Open Finder, 2) Navigate to Documents folder, 3) Create a new folder called 'Work'"
```
### Multi-Task Execution (Concurrent)
+
```
"Run these tasks simultaneously: 1) Open Chrome, 2) Open Safari, 3) Open Finder"
```
### Session Management
+
```
"Show me the current session statistics"
"Take a screenshot using session abc123"
@@ -132,6 +147,7 @@ Now Claude will control your local desktop directly when you ask it to perform c
```
### Error Recovery
+
```
"Try to open a non-existent application and show me the error"
"Find all files with .tmp extension and delete them safely"
@@ -140,13 +156,14 @@ Now Claude will control your local desktop directly when you ask it to perform c
## First-time Usage Notes
**API Keys**: Ensure you have valid API keys:
- - Add your Anthropic API key in the Claude Desktop config (as shown above)
- - Or set it as an environment variable in your shell profile
- - **Required**: The MCP server needs an API key to authenticate with the model provider
+
+- Add your Anthropic API key in the Claude Desktop config (as shown above)
+- Or set it as an environment variable in your shell profile
+- **Required**: The MCP server needs an API key to authenticate with the model provider
**Model Selection**: Choose the appropriate model for your needs:
- - **Claude Sonnet 4**: Latest model with best performance (`anthropic/claude-sonnet-4-20250514`)
- - **Claude 3.5 Sonnet**: Reliable performance (`anthropic/claude-3-5-sonnet-20240620`)
- - **Computer-Use Preview**: Specialized for computer tasks (`openai/computer-use-preview`)
- - **Local Models**: For privacy-sensitive environments
- - **Ollama**: For offline usage
+
+- **Claude Sonnet 4**: Latest model with best performance (`anthropic/claude-sonnet-4-20250514`)
+- **Computer-Use Preview**: Specialized for computer tasks (`openai/computer-use-preview`)
+- **Local Models**: For privacy-sensitive environments
+- **Ollama**: For offline usage
diff --git a/docs/content/docs/libraries/som/index.mdx b/docs/content/docs/libraries/som/index.mdx
index 7a210290..3eef53f1 100644
--- a/docs/content/docs/libraries/som/index.mdx
+++ b/docs/content/docs/libraries/som/index.mdx
@@ -7,7 +7,11 @@ github:
---
- A corresponding Python example is available for this documentation.
+ A corresponding{' '}
+
+ Python example
+ {' '}
+ is available for this documentation.
## Overview
diff --git a/docs/public/img/grounding-with-gemini3.gif b/docs/public/img/grounding-with-gemini3.gif
new file mode 100644
index 00000000..57404ba0
Binary files /dev/null and b/docs/public/img/grounding-with-gemini3.gif differ
diff --git a/examples/agent_examples.py b/examples/agent_examples.py
index 6a3772ff..bb3ca6e5 100644
--- a/examples/agent_examples.py
+++ b/examples/agent_examples.py
@@ -53,6 +53,10 @@ async def run_agent_example():
# == Omniparser + Any LLM ==
# model="omniparser+anthropic/claude-opus-4-20250514",
# model="omniparser+ollama_chat/gemma3:12b-it-q4_K_M",
+ # == Omniparser + Vertex AI Gemini 3 (with thinking_level) ==
+ # model="omni+vertex_ai/gemini-3-flash",
+ # thinking_level="high", # or "low"
+ # media_resolution="medium", # or "low" or "high"
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.DEBUG,
diff --git a/libs/python/agent/README.md b/libs/python/agent/README.md
index 75b11914..40b901a3 100644
--- a/libs/python/agent/README.md
+++ b/libs/python/agent/README.md
@@ -51,7 +51,7 @@ async def main():
# Create agent
agent = ComputerAgent(
- model="anthropic/claude-3-5-sonnet-20241022",
+ model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
only_n_most_recent_images=3,
trajectory_dir="trajectories",
diff --git a/libs/python/agent/agent/agent.py b/libs/python/agent/agent/agent.py
index 42f04a00..fbcab3e1 100644
--- a/libs/python/agent/agent/agent.py
+++ b/libs/python/agent/agent/agent.py
@@ -189,7 +189,7 @@ class ComputerAgent:
Initialize ComputerAgent.
Args:
- model: Model name (e.g., "claude-3-5-sonnet-20241022", "computer-use-preview", "omni+vertex_ai/gemini-pro")
+ model: Model name (e.g., "claude-sonnet-4-5-20250929", "computer-use-preview", "omni+vertex_ai/gemini-pro")
tools: List of tools (computer objects, decorated functions, etc.)
custom_loop: Custom agent loop function to use instead of auto-selection
only_n_most_recent_images: If set, only keep the N most recent images in message history. Adds ImageRetentionCallback automatically.
diff --git a/libs/python/agent/agent/cli.py b/libs/python/agent/agent/cli.py
index 10cb40f7..970214ce 100644
--- a/libs/python/agent/agent/cli.py
+++ b/libs/python/agent/agent/cli.py
@@ -7,7 +7,7 @@ Usage:
Examples:
python -m agent.cli openai/computer-use-preview
python -m agent.cli anthropic/claude-sonnet-4-5-20250929
- python -m agent.cli omniparser+anthropic/claude-3-5-sonnet-20241022
+ python -m agent.cli omniparser+anthropic/claude-sonnet-4-5-20250929
"""
try:
@@ -233,7 +233,7 @@ async def main():
Examples:
python -m agent.cli openai/computer-use-preview
python -m agent.cli anthropic/claude-sonnet-4-5-20250929
- python -m agent.cli omniparser+anthropic/claude-3-5-sonnet-20241022
+ python -m agent.cli omniparser+anthropic/claude-sonnet-4-5-20250929
python -m agent.cli huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B
""",
)
diff --git a/libs/python/agent/agent/loops/anthropic.py b/libs/python/agent/agent/loops/anthropic.py
index 42e33b5d..0fa08b96 100644
--- a/libs/python/agent/agent/loops/anthropic.py
+++ b/libs/python/agent/agent/loops/anthropic.py
@@ -671,11 +671,12 @@ def _convert_completion_to_responses_items(response: Any) -> List[Dict[str, Any]
# Handle custom function tools (not computer tools)
if tool_name != "computer":
from ..responses import make_function_call_item
- responses_items.append(make_function_call_item(
- function_name=tool_name,
- arguments=tool_input,
- call_id=call_id
- ))
+
+ responses_items.append(
+ make_function_call_item(
+ function_name=tool_name, arguments=tool_input, call_id=call_id
+ )
+ )
continue
# Computer tool - process actions
@@ -883,16 +884,17 @@ def _convert_completion_to_responses_items(response: Any) -> List[Dict[str, Any]
# Handle custom function tools
if tool_name != "computer":
from ..responses import make_function_call_item
+
# tool_call.function.arguments is a JSON string, need to parse it
try:
args_dict = json.loads(tool_call.function.arguments)
except json.JSONDecodeError:
args_dict = {}
- responses_items.append(make_function_call_item(
- function_name=tool_name,
- arguments=args_dict,
- call_id=tool_call.id
- ))
+ responses_items.append(
+ make_function_call_item(
+ function_name=tool_name, arguments=args_dict, call_id=tool_call.id
+ )
+ )
continue
# Handle computer tool
diff --git a/libs/python/agent/agent/loops/generic_vlm.py b/libs/python/agent/agent/loops/generic_vlm.py
index 2b44b18b..4696234b 100644
--- a/libs/python/agent/agent/loops/generic_vlm.py
+++ b/libs/python/agent/agent/loops/generic_vlm.py
@@ -20,6 +20,7 @@ from ..loops.base import AsyncAgentConfig
from ..responses import (
convert_completion_messages_to_responses_items,
convert_responses_items_to_completion_messages,
+ make_reasoning_item,
)
from ..types import AgentCapability
@@ -373,13 +374,23 @@ class GenericVlmConfig(AsyncAgentConfig):
if _on_usage:
await _on_usage(usage)
- # Parse tool call from text; then convert to responses items via fake tool_calls
+ # Extract response data
resp_dict = response.model_dump() # type: ignore
choice = (resp_dict.get("choices") or [{}])[0]
- content_text = ((choice.get("message") or {}).get("content")) or ""
- tool_call = _parse_tool_call_from_text(content_text)
+ message = choice.get("message") or {}
+ content_text = message.get("content") or ""
+ tool_calls_array = message.get("tool_calls") or []
+ reasoning_text = message.get("reasoning") or ""
output_items: List[Dict[str, Any]] = []
+
+ # Add reasoning if present (Ollama Cloud format)
+ if reasoning_text:
+ output_items.append(make_reasoning_item(reasoning_text))
+
+ # Priority 1: Try to parse tool call from content text (OpenRouter format)
+ tool_call = _parse_tool_call_from_text(content_text)
+
if tool_call and isinstance(tool_call, dict):
fn_name = tool_call.get("name") or "computer"
raw_args = tool_call.get("arguments") or {}
@@ -405,8 +416,50 @@ class GenericVlmConfig(AsyncAgentConfig):
],
}
output_items.extend(convert_completion_messages_to_responses_items([fake_cm]))
+ elif tool_calls_array:
+ # Priority 2: Use tool_calls field if present (Ollama Cloud format)
+ # Process and unnormalize coordinates in tool calls
+ processed_tool_calls = []
+ for tc in tool_calls_array:
+ function = tc.get("function", {})
+ fn_name = function.get("name", "computer")
+ args_str = function.get("arguments", "{}")
+
+ try:
+ args = json.loads(args_str)
+
+ # Unnormalize coordinates if present
+ if "coordinate" in args and last_rw is not None and last_rh is not None:
+ args = await _unnormalize_coordinate(args, (last_rw, last_rh))
+
+ # Convert Qwen format to Computer Calls format if this is a computer tool
+ if fn_name == "computer":
+ converted_action = convert_qwen_tool_args_to_computer_action(args)
+ if converted_action:
+ args = converted_action
+
+ processed_tool_calls.append(
+ {
+ "type": tc.get("type", "function"),
+ "id": tc.get("id", "call_0"),
+ "function": {
+ "name": fn_name,
+ "arguments": json.dumps(args),
+ },
+ }
+ )
+ except json.JSONDecodeError:
+ # Keep original if parsing fails
+ processed_tool_calls.append(tc)
+
+ fake_cm = {
+ "role": "assistant",
+ "content": content_text if content_text else "",
+ "tool_calls": processed_tool_calls,
+ }
+ output_items.extend(convert_completion_messages_to_responses_items([fake_cm]))
else:
- # Fallback: just return assistant text
+ # No tool calls found in either format, return text response
fake_cm = {"role": "assistant", "content": content_text}
output_items.extend(convert_completion_messages_to_responses_items([fake_cm]))
diff --git a/libs/python/agent/agent/loops/omniparser.py b/libs/python/agent/agent/loops/omniparser.py
index e15dfc5b..f671dce2 100644
--- a/libs/python/agent/agent/loops/omniparser.py
+++ b/libs/python/agent/agent/loops/omniparser.py
@@ -365,6 +365,22 @@ class OmniparserConfig(AsyncAgentConfig):
**kwargs,
}
+ # Add Vertex AI specific parameters if using vertex_ai models
+ if llm_model.startswith("vertex_ai/"):
+ import os
+
+ # Pass vertex_project and vertex_location to liteLLM
+ if "vertex_project" not in api_kwargs:
+ api_kwargs["vertex_project"] = os.getenv("GOOGLE_CLOUD_PROJECT")
+ if "vertex_location" not in api_kwargs:
+ api_kwargs["vertex_location"] = "global"
+
+ # Pass through Gemini 3-specific parameters if provided
+ if "thinking_level" in kwargs:
+ api_kwargs["thinking_level"] = kwargs["thinking_level"]
+ if "media_resolution" in kwargs:
+ api_kwargs["media_resolution"] = kwargs["media_resolution"]
+
# Call API start hook
if _on_api_start:
await _on_api_start(api_kwargs)
diff --git a/libs/python/agent/agent/loops/uitars2.py b/libs/python/agent/agent/loops/uitars2.py
index 5d46aced..4ecb3b04 100644
--- a/libs/python/agent/agent/loops/uitars2.py
+++ b/libs/python/agent/agent/loops/uitars2.py
@@ -5,13 +5,14 @@ UITARS-2 agent loop implementation using LiteLLM.
- Calls litellm.acompletion
- Parses ... outputs back into Responses items (computer actions)
"""
+
from __future__ import annotations
-import re
-from typing import Any, Dict, List, Optional, Tuple
import base64
import io
import json
+import re
+from typing import Any, Dict, List, Optional, Tuple
import litellm
from litellm.responses.litellm_completion_transformation.transformation import (
@@ -20,37 +21,45 @@ from litellm.responses.litellm_completion_transformation.transformation import (
from ..decorators import register_agent
from .omniparser import get_last_computer_call_output # type: ignore
+
try:
from PIL import Image # type: ignore
except Exception: # pragma: no cover
Image = None # type: ignore
from ..responses import (
+ convert_responses_items_to_completion_messages,
make_click_item,
make_double_click_item,
make_drag_item,
make_function_call_item,
make_keypress_item,
- make_screenshot_item,
make_move_item,
make_output_text_item,
make_reasoning_item,
+ make_screenshot_item,
make_scroll_item,
make_type_item,
make_wait_item,
- convert_responses_items_to_completion_messages,
)
from ..types import AgentCapability
-
TOOL_SCHEMAS: List[Dict[str, Any]] = [
- {"type": "function", "name": "open_computer", "parameters": {}, "description": "Open computer."},
+ {
+ "type": "function",
+ "name": "open_computer",
+ "parameters": {},
+ "description": "Open computer.",
+ },
{
"type": "function",
"name": "click",
"parameters": {
"type": "object",
"properties": {
- "point": {"type": "string", "description": "Click coordinates. The format is: x y"}
+ "point": {
+ "type": "string",
+ "description": "Click coordinates. The format is: x y",
+ }
},
"required": ["point"],
},
@@ -62,7 +71,10 @@ TOOL_SCHEMAS: List[Dict[str, Any]] = [
"parameters": {
"type": "object",
"properties": {
- "point": {"type": "string", "description": "Click coordinates. The format is: x y"}
+ "point": {
+ "type": "string",
+ "description": "Click coordinates. The format is: x y",
+ }
},
"required": ["point"],
},
@@ -74,7 +86,10 @@ TOOL_SCHEMAS: List[Dict[str, Any]] = [
"parameters": {
"type": "object",
"properties": {
- "point": {"type": "string", "description": "Click coordinates. The format is: x y"}
+ "point": {
+ "type": "string",
+ "description": "Click coordinates. The format is: x y",
+ }
},
"required": ["point"],
},
@@ -106,7 +121,10 @@ TOOL_SCHEMAS: List[Dict[str, Any]] = [
"parameters": {
"type": "object",
"properties": {
- "point": {"type": "string", "description": "Target coordinates. The format is: x y"}
+ "point": {
+ "type": "string",
+ "description": "Target coordinates. The format is: x y",
+ }
},
"required": ["point"],
},
@@ -117,7 +135,12 @@ TOOL_SCHEMAS: List[Dict[str, Any]] = [
"name": "hotkey",
"parameters": {
"type": "object",
- "properties": {"key": {"type": "string", "description": "Hotkeys you want to press. Split keys with a space and use lowercase."}},
+ "properties": {
+ "key": {
+ "type": "string",
+ "description": "Hotkeys you want to press. Split keys with a space and use lowercase.",
+ }
+ },
"required": ["key"],
},
"description": "Press hotkey.",
@@ -227,9 +250,7 @@ TOOL_SCHEMAS: List[Dict[str, Any]] = [
"name": "wait",
"parameters": {
"type": "object",
- "properties": {
- "time": {"type": "integer", "description": "Wait time in seconds."}
- },
+ "properties": {"time": {"type": "integer", "description": "Wait time in seconds."}},
"required": [],
},
"description": "Wait for a while.",
@@ -268,7 +289,12 @@ TOOL_SCHEMAS: List[Dict[str, Any]] = [
},
"description": "Type content.",
},
- {"type": "function", "name": "take_screenshot", "parameters": {}, "description": "Take screenshot."},
+ {
+ "type": "function",
+ "name": "take_screenshot",
+ "parameters": {},
+ "description": "Take screenshot.",
+ },
]
@@ -319,7 +345,9 @@ _PROMPT_SUFFIX = (
SYSTEM_PROMPT = _PROMPT_PREFIX + _format_tool_schemas_json_lines(TOOL_SCHEMAS) + _PROMPT_SUFFIX
-def _extract_function_schemas_from_tools(tools: Optional[List[Dict[str, Any]]]) -> List[Dict[str, Any]]:
+def _extract_function_schemas_from_tools(
+ tools: Optional[List[Dict[str, Any]]],
+) -> List[Dict[str, Any]]:
schemas: List[Dict[str, Any]] = []
if not tools:
return schemas
@@ -330,12 +358,14 @@ def _extract_function_schemas_from_tools(tools: Optional[List[Dict[str, Any]]])
params = fn.get("parameters", {})
desc = fn.get("description", "")
if name:
- schemas.append({
- "type": "function",
- "name": name,
- "parameters": params if isinstance(params, dict) else {},
- "description": desc,
- })
+ schemas.append(
+ {
+ "type": "function",
+ "name": name,
+ "parameters": params if isinstance(params, dict) else {},
+ "description": desc,
+ }
+ )
return schemas
@@ -392,7 +422,9 @@ def _denormalize_xy_from_uitars(nx: float, ny: float, width: int, height: int) -
return x, y
-def _map_computer_action_to_function(action: Dict[str, Any], width: int, height: int) -> Optional[Dict[str, Any]]:
+def _map_computer_action_to_function(
+ action: Dict[str, Any], width: int, height: int
+) -> Optional[Dict[str, Any]]:
"""Map a computer action item to a UITARS function + parameters dict of strings.
Returns dict like {"function": name, "parameters": {..}} or None if unknown.
"""
@@ -404,7 +436,10 @@ def _map_computer_action_to_function(action: Dict[str, Any], width: int, height:
return None
nx, ny = _normalize_xy_to_uitars(int(x), int(y), width, height)
if btn == "right":
- return {"function": "right_single", "parameters": {"point": f"{nx} {ny}"}}
+ return {
+ "function": "right_single",
+ "parameters": {"point": f"{nx} {ny}"},
+ }
return {"function": "click", "parameters": {"point": f"{nx} {ny}"}}
if atype == "double_click":
x, y = action.get("x"), action.get("y")
@@ -434,8 +469,19 @@ def _map_computer_action_to_function(action: Dict[str, Any], width: int, height:
nx, ny = _normalize_xy_to_uitars(int(x), int(y), width, height)
sx, sy = action.get("scroll_x", 0), action.get("scroll_y", 0)
# Our parser used positive sy for up
- direction = "up" if sy and sy > 0 else ("down" if sy and sy < 0 else ("right" if sx and sx > 0 else ("left" if sx and sx < 0 else "down")))
- return {"function": "scroll", "parameters": {"direction": direction, "point": f"{nx} {ny}"}}
+ direction = (
+ "up"
+ if sy and sy > 0
+ else (
+ "down"
+ if sy and sy < 0
+ else ("right" if sx and sx > 0 else ("left" if sx and sx < 0 else "down"))
+ )
+ )
+ return {
+ "function": "scroll",
+ "parameters": {"direction": direction, "point": f"{nx} {ny}"},
+ }
if atype == "drag":
path = action.get("path", [])
if isinstance(path, list) and len(path) >= 2:
@@ -461,7 +507,9 @@ def _map_computer_action_to_function(action: Dict[str, Any], width: int, height:
return None
-def _to_uitars_messages(messages: List[Dict[str, Any]], width: int, height: int) -> List[Dict[str, Any]]:
+def _to_uitars_messages(
+ messages: List[Dict[str, Any]], width: int, height: int
+) -> List[Dict[str, Any]]:
"""Convert responses items into completion messages tailored for UI-TARS.
- User content is passed through similar to convert_responses_items_to_completion_messages
@@ -505,7 +553,9 @@ def _to_uitars_messages(messages: List[Dict[str, Any]], width: int, height: int)
completion_content = []
for item in content:
if item.get("type") == "input_image":
- completion_content.append({"type": "image_url", "image_url": {"url": item.get("image_url")}})
+ completion_content.append(
+ {"type": "image_url", "image_url": {"url": item.get("image_url")}}
+ )
elif item.get("type") in ("input_text", "text"):
completion_content.append({"type": "text", "text": item.get("text")})
uitars_messages.append({"role": "user", "content": completion_content})
@@ -517,7 +567,11 @@ def _to_uitars_messages(messages: List[Dict[str, Any]], width: int, height: int)
if mtype == "reasoning":
# Responses reasoning stores summary list
summary = msg.get("summary", [])
- texts = [s.get("text", "") for s in summary if isinstance(s, dict) and s.get("type") == "summary_text"]
+ texts = [
+ s.get("text", "")
+ for s in summary
+ if isinstance(s, dict) and s.get("type") == "summary_text"
+ ]
if texts:
pending_think = "\n".join([t for t in texts if t])
continue
@@ -546,9 +600,15 @@ def _to_uitars_messages(messages: List[Dict[str, Any]], width: int, height: int)
pending_think, pending_functions = None, []
content = msg.get("content", [])
if isinstance(content, list):
- texts = [c.get("text", "") for c in content if isinstance(c, dict) and c.get("type") in ("output_text", "text")]
+ texts = [
+ c.get("text", "")
+ for c in content
+ if isinstance(c, dict) and c.get("type") in ("output_text", "text")
+ ]
if texts:
- uitars_messages.append({"role": "assistant", "content": "\n".join([t for t in texts if t])})
+ uitars_messages.append(
+ {"role": "assistant", "content": "\n".join([t for t in texts if t])}
+ )
elif isinstance(content, str) and content:
uitars_messages.append({"role": "assistant", "content": content})
continue
@@ -581,8 +641,12 @@ def _to_uitars_messages(messages: List[Dict[str, Any]], width: int, height: int)
return uitars_messages
+
def _to_response_items(
- actions: List[Dict[str, Any]], tool_names: Optional[set[str]] = None, width: Optional[int] = None, height: Optional[int] = None
+ actions: List[Dict[str, Any]],
+ tool_names: Optional[set[str]] = None,
+ width: Optional[int] = None,
+ height: Optional[int] = None,
) -> List[Any]:
"""Map parsed actions into Responses items (computer actions + optional reasoning)."""
items: List[Any] = []
@@ -736,8 +800,12 @@ class UITARS2Config:
# Build dynamic system prompt by concatenating built-in schemas and provided function tools
provided_fn_schemas = _extract_function_schemas_from_tools(tools)
- combined_schemas = TOOL_SCHEMAS + provided_fn_schemas if provided_fn_schemas else TOOL_SCHEMAS
- dynamic_system_prompt = _PROMPT_PREFIX + _format_tool_schemas_json_lines(combined_schemas) + _PROMPT_SUFFIX
+ combined_schemas = (
+ TOOL_SCHEMAS + provided_fn_schemas if provided_fn_schemas else TOOL_SCHEMAS
+ )
+ dynamic_system_prompt = (
+ _PROMPT_PREFIX + _format_tool_schemas_json_lines(combined_schemas) + _PROMPT_SUFFIX
+ )
# Prepend system prompt (based on training prompts + provided tools)
litellm_messages: List[Dict[str, Any]] = [
@@ -829,7 +897,10 @@ class UITARS2Config:
"role": "user",
"content": [
{"type": "text", "text": "Please return a single click action."},
- {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}},
+ {
+ "type": "image_url",
+ "image_url": {"url": f"data:image/png;base64,{image_b64}"},
+ },
],
},
]
@@ -841,7 +912,9 @@ class UITARS2Config:
"temperature": kwargs.get("temperature", 0.0),
"do_sample": kwargs.get("temperature", 0.0) > 0.0,
}
- api_kwargs.update({k: v for k, v in (kwargs or {}).items() if k not in ["max_tokens", "temperature"]})
+ api_kwargs.update(
+ {k: v for k, v in (kwargs or {}).items() if k not in ["max_tokens", "temperature"]}
+ )
response = await litellm.acompletion(**api_kwargs)
# Extract response content
@@ -852,7 +925,11 @@ class UITARS2Config:
msg = choices[0].get("message", {})
content_text = msg.get("content", "")
if isinstance(content_text, list):
- text_parts = [p.get("text", "") for p in content_text if isinstance(p, dict) and p.get("type") == "text"]
+ text_parts = [
+ p.get("text", "")
+ for p in content_text
+ if isinstance(p, dict) and p.get("type") == "text"
+ ]
content_text = "\n".join([t for t in text_parts if t])
if not isinstance(content_text, str):
return None
diff --git a/libs/python/agent/agent/proxy/examples.py b/libs/python/agent/agent/proxy/examples.py
index dfe6b87c..67aa8fb0 100644
--- a/libs/python/agent/agent/proxy/examples.py
+++ b/libs/python/agent/agent/proxy/examples.py
@@ -22,14 +22,14 @@ async def test_http_endpoint():
# Example 1: Simple text request
simple_request = {
- "model": "anthropic/claude-3-5-sonnet-20241022",
+ "model": "anthropic/claude-sonnet-4-5-20250929",
"input": "Tell me a three sentence bedtime story about a unicorn.",
"env": {"ANTHROPIC_API_KEY": anthropic_api_key},
}
# Example 2: Multi-modal request with image
multimodal_request = {
- "model": "anthropic/claude-3-5-sonnet-20241022",
+ "model": "anthropic/claude-sonnet-4-5-20250929",
"input": [
{
"role": "user",
@@ -47,7 +47,7 @@ async def test_http_endpoint():
# Example 3: Request with custom agent and computer kwargs
custom_request = {
- "model": "anthropic/claude-3-5-sonnet-20241022",
+ "model": "anthropic/claude-sonnet-4-5-20250929",
"input": "Take a screenshot and tell me what you see",
"env": {"ANTHROPIC_API_KEY": anthropic_api_key},
}
@@ -95,7 +95,7 @@ def curl_examples():
"""curl http://localhost:8000/responses \\
-H "Content-Type: application/json" \\
-d '{
- "model": "anthropic/claude-3-5-sonnet-20241022",
+ "model": "anthropic/claude-sonnet-4-5-20250929",
"input": "Tell me a three sentence bedtime story about a unicorn."
}'"""
)
@@ -105,7 +105,7 @@ def curl_examples():
"""curl http://localhost:8000/responses \\
-H "Content-Type: application/json" \\
-d '{
- "model": "anthropic/claude-3-5-sonnet-20241022",
+ "model": "anthropic/claude-sonnet-4-5-20250929",
"input": [
{
"role": "user",
@@ -126,7 +126,7 @@ def curl_examples():
"""curl http://localhost:8000/responses \\
-H "Content-Type: application/json" \\
-d '{
- "model": "anthropic/claude-3-5-sonnet-20241022",
+ "model": "anthropic/claude-sonnet-4-5-20250929",
"input": "Take a screenshot and tell me what you see",
"agent_kwargs": {
"save_trajectory": true,
@@ -166,7 +166,7 @@ async def test_p2p_client():
# Send a test request
request = {
- "model": "anthropic/claude-3-5-sonnet-20241022",
+ "model": "anthropic/claude-sonnet-4-5-20250929",
"input": "Hello from P2P client!",
}
await connection.send(json.dumps(request))
diff --git a/libs/python/agent/agent/ui/gradio/app.py b/libs/python/agent/agent/ui/gradio/app.py
index 1a2fb023..cb923bf5 100644
--- a/libs/python/agent/agent/ui/gradio/app.py
+++ b/libs/python/agent/agent/ui/gradio/app.py
@@ -6,9 +6,9 @@ with an advanced UI for model selection and configuration.
Supported Agent Models:
- OpenAI: openai/computer-use-preview
-- Anthropic: anthropic/claude-3-5-sonnet-20241022, anthropic/claude-3-7-sonnet-20250219
+- Anthropic: anthropic/claude-sonnet-4-5-20250929, anthropic/claude-3-7-sonnet-20250219
- UI-TARS: huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B
-- Omniparser: omniparser+anthropic/claude-3-5-sonnet-20241022, omniparser+ollama_chat/gemma3
+- Omniparser: omniparser+anthropic/claude-sonnet-4-5-20250929, omniparser+ollama_chat/gemma3
Requirements:
- Mac with Apple Silicon (M1/M2/M3/M4), Linux, or Windows
@@ -116,14 +116,12 @@ MODEL_MAPPINGS = {
"Anthropic: Claude 4 Opus (20250514)": "anthropic/claude-opus-4-20250514",
"Anthropic: Claude 4 Sonnet (20250514)": "anthropic/claude-sonnet-4-20250514",
"Anthropic: Claude 3.7 Sonnet (20250219)": "anthropic/claude-3-7-sonnet-20250219",
- "Anthropic: Claude 3.5 Sonnet (20241022)": "anthropic/claude-3-5-sonnet-20241022",
},
"omni": {
"default": "omniparser+openai/gpt-4o",
"OMNI: OpenAI GPT-4o": "omniparser+openai/gpt-4o",
"OMNI: OpenAI GPT-4o mini": "omniparser+openai/gpt-4o-mini",
"OMNI: Claude 3.7 Sonnet (20250219)": "omniparser+anthropic/claude-3-7-sonnet-20250219",
- "OMNI: Claude 3.5 Sonnet (20241022)": "omniparser+anthropic/claude-3-5-sonnet-20241022",
},
"uitars": {
"default": "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B" if is_mac else "ui-tars",
diff --git a/libs/python/agent/agent/ui/gradio/ui_components.py b/libs/python/agent/agent/ui/gradio/ui_components.py
index d14f49a9..309dfb6c 100644
--- a/libs/python/agent/agent/ui/gradio/ui_components.py
+++ b/libs/python/agent/agent/ui/gradio/ui_components.py
@@ -44,13 +44,11 @@ def create_gradio_ui() -> gr.Blocks:
"Anthropic: Claude 4 Opus (20250514)",
"Anthropic: Claude 4 Sonnet (20250514)",
"Anthropic: Claude 3.7 Sonnet (20250219)",
- "Anthropic: Claude 3.5 Sonnet (20241022)",
]
omni_models = [
"OMNI: OpenAI GPT-4o",
"OMNI: OpenAI GPT-4o mini",
"OMNI: Claude 3.7 Sonnet (20250219)",
- "OMNI: Claude 3.5 Sonnet (20241022)",
]
# Check if API keys are available
diff --git a/libs/python/agent/example.py b/libs/python/agent/example.py
index b02ccbfd..b8f41083 100644
--- a/libs/python/agent/example.py
+++ b/libs/python/agent/example.py
@@ -102,7 +102,7 @@ async def main():
# model="anthropic/claude-opus-4-20250514",
# model="anthropic/claude-sonnet-4-20250514",
# model="anthropic/claude-3-7-sonnet-20250219",
- # model="anthropic/claude-3-5-sonnet-20241022",
+ # model="anthropic/claude-sonnet-4-5-20250929",
# == UI-TARS ==
# model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B",
# TODO: add local mlx provider
diff --git a/libs/python/agent/tests/conftest.py b/libs/python/agent/tests/conftest.py
index 8270c8e0..d60c1c54 100644
--- a/libs/python/agent/tests/conftest.py
+++ b/libs/python/agent/tests/conftest.py
@@ -24,7 +24,7 @@ def mock_litellm():
"id": "chatcmpl-test123",
"object": "chat.completion",
"created": 1234567890,
- "model": kwargs.get("model", "anthropic/claude-3-5-sonnet-20241022"),
+ "model": kwargs.get("model", "anthropic/claude-sonnet-4-5-20250929"),
"choices": [
{
"index": 0,
diff --git a/libs/python/agent/tests/test_computer_agent.py b/libs/python/agent/tests/test_computer_agent.py
index 936c984c..b6de1e86 100644
--- a/libs/python/agent/tests/test_computer_agent.py
+++ b/libs/python/agent/tests/test_computer_agent.py
@@ -18,18 +18,18 @@ class TestComputerAgentInitialization:
"""Test that agent can be initialized with a model string."""
from agent import ComputerAgent
- agent = ComputerAgent(model="anthropic/claude-3-5-sonnet-20241022")
+ agent = ComputerAgent(model="anthropic/claude-sonnet-4-5-20250929")
assert agent is not None
assert hasattr(agent, "model")
- assert agent.model == "anthropic/claude-3-5-sonnet-20241022"
+ assert agent.model == "anthropic/claude-sonnet-4-5-20250929"
@patch("agent.agent.litellm")
def test_agent_initialization_with_tools(self, mock_litellm, disable_telemetry, mock_computer):
"""Test that agent can be initialized with tools."""
from agent import ComputerAgent
- agent = ComputerAgent(model="anthropic/claude-3-5-sonnet-20241022", tools=[mock_computer])
+ agent = ComputerAgent(model="anthropic/claude-sonnet-4-5-20250929", tools=[mock_computer])
assert agent is not None
assert hasattr(agent, "tools")
@@ -41,7 +41,7 @@ class TestComputerAgentInitialization:
budget = 5.0
agent = ComputerAgent(
- model="anthropic/claude-3-5-sonnet-20241022", max_trajectory_budget=budget
+ model="anthropic/claude-sonnet-4-5-20250929", max_trajectory_budget=budget
)
assert agent is not None
@@ -79,7 +79,7 @@ class TestComputerAgentRun:
mock_litellm.acompletion = AsyncMock(return_value=mock_response)
- agent = ComputerAgent(model="anthropic/claude-3-5-sonnet-20241022")
+ agent = ComputerAgent(model="anthropic/claude-sonnet-4-5-20250929")
# Run should return an async generator
result_generator = agent.run(sample_messages)
@@ -92,7 +92,7 @@ class TestComputerAgentRun:
"""Test that agent has run method available."""
from agent import ComputerAgent
- agent = ComputerAgent(model="anthropic/claude-3-5-sonnet-20241022")
+ agent = ComputerAgent(model="anthropic/claude-sonnet-4-5-20250929")
# Verify run method exists
assert hasattr(agent, "run")
@@ -102,7 +102,7 @@ class TestComputerAgentRun:
"""Test that agent has agent_loop initialized."""
from agent import ComputerAgent
- agent = ComputerAgent(model="anthropic/claude-3-5-sonnet-20241022")
+ agent = ComputerAgent(model="anthropic/claude-sonnet-4-5-20250929")
# Verify agent_loop is initialized
assert hasattr(agent, "agent_loop")
@@ -132,7 +132,7 @@ class TestComputerAgentIntegration:
"""Test that agent can be initialized with Computer tool."""
from agent import ComputerAgent
- agent = ComputerAgent(model="anthropic/claude-3-5-sonnet-20241022", tools=[mock_computer])
+ agent = ComputerAgent(model="anthropic/claude-sonnet-4-5-20250929", tools=[mock_computer])
# Verify agent accepted the tool
assert agent is not None
diff --git a/libs/python/mcp-server/CONCURRENT_SESSIONS.md b/libs/python/mcp-server/CONCURRENT_SESSIONS.md
index 62e63dd2..f3e16099 100644
--- a/libs/python/mcp-server/CONCURRENT_SESSIONS.md
+++ b/libs/python/mcp-server/CONCURRENT_SESSIONS.md
@@ -133,7 +133,7 @@ await cleanup_session(ctx, "session-to-cleanup")
### Environment Variables
-- `CUA_MODEL_NAME`: Model to use (default: `anthropic/claude-3-5-sonnet-20241022`)
+- `CUA_MODEL_NAME`: Model to use (default: `anthropic/claude-sonnet-4-5-20250929`)
- `CUA_MAX_IMAGES`: Maximum images to keep (default: `3`)
### Session Manager Configuration
diff --git a/libs/python/mcp-server/README.md b/libs/python/mcp-server/README.md
index 4c24fd3e..9d8c95af 100644
--- a/libs/python/mcp-server/README.md
+++ b/libs/python/mcp-server/README.md
@@ -44,7 +44,7 @@ Add this to your MCP client configuration:
"args": [
"bash",
"-lc",
- "export CUA_MODEL_NAME='anthropic/claude-3-5-sonnet-20241022'; ~/.cua/start_mcp_server.sh"
+ "export CUA_MODEL_NAME='anthropic/claude-sonnet-4-5-20250929'; ~/.cua/start_mcp_server.sh"
]
}
}
diff --git a/libs/python/mcp-server/mcp_server/server.py b/libs/python/mcp-server/mcp_server/server.py
index 7d47cfd1..33e97f4c 100644
--- a/libs/python/mcp-server/mcp_server/server.py
+++ b/libs/python/mcp-server/mcp_server/server.py
@@ -156,7 +156,7 @@ def serve() -> FastMCP:
try:
# Get model name
- model_name = os.getenv("CUA_MODEL_NAME", "anthropic/claude-3-5-sonnet-20241022")
+ model_name = os.getenv("CUA_MODEL_NAME", "anthropic/claude-sonnet-4-5-20250929")
logger.info(f"Using model: {model_name}")
# Create agent with the new v0.4.x API
diff --git a/libs/python/mcp-server/quick_test_local_option.py b/libs/python/mcp-server/quick_test_local_option.py
index e997f6a9..6e2caab2 100755
--- a/libs/python/mcp-server/quick_test_local_option.py
+++ b/libs/python/mcp-server/quick_test_local_option.py
@@ -168,7 +168,7 @@ def print_usage_examples():
"command": "/bin/bash",
"args": ["~/.cua/start_mcp_server.sh"],
"env": {
- "CUA_MODEL_NAME": "anthropic/claude-3-5-sonnet-20241022"
+ "CUA_MODEL_NAME": "anthropic/claude-sonnet-4-5-20250929"
}
}
}
@@ -192,7 +192,7 @@ Step 2: Configure MCP client:
"command": "/bin/bash",
"args": ["~/.cua/start_mcp_server.sh"],
"env": {
- "CUA_MODEL_NAME": "anthropic/claude-3-5-sonnet-20241022",
+ "CUA_MODEL_NAME": "anthropic/claude-sonnet-4-5-20250929",
"CUA_USE_HOST_COMPUTER_SERVER": "true"
}
}
diff --git a/libs/typescript/agent/README.md b/libs/typescript/agent/README.md
index 27c152fb..42cb4184 100644
--- a/libs/typescript/agent/README.md
+++ b/libs/typescript/agent/README.md
@@ -32,7 +32,7 @@ const peerClient = new AgentClient('peer://my-agent-proxy');
// Send a simple text request
const response = await client.responses.create({
- model: 'anthropic/claude-3-5-sonnet-20241022',
+ model: 'anthropic/claude-sonnet-4-5-20250929',
input: 'Write a one-sentence bedtime story about a unicorn.',
// Optional per-request env overrides
env: {
@@ -47,7 +47,7 @@ console.log(response.output);
```typescript
const response = await client.responses.create({
- model: 'anthropic/claude-3-5-sonnet-20241022',
+ model: 'anthropic/claude-sonnet-4-5-20250929',
input: [
{
role: 'user',
@@ -74,7 +74,7 @@ const client = new AgentClient('https://localhost:8000', {
});
const response = await client.responses.create({
- model: 'anthropic/claude-3-5-sonnet-20241022',
+ model: 'anthropic/claude-sonnet-4-5-20250929',
input: 'Hello, world!',
agent_kwargs: {
save_trajectory: true,
diff --git a/libs/typescript/agent/examples/README.md b/libs/typescript/agent/examples/README.md
index d27eac59..68419600 100644
--- a/libs/typescript/agent/examples/README.md
+++ b/libs/typescript/agent/examples/README.md
@@ -42,7 +42,7 @@ A simple HTML page that demonstrates using the CUA Agent Client in a browser env
4. **Configure and test:**
- Enter an agent URL (e.g., `https://localhost:8000` or `peer://some-peer-id`)
- - Enter a model name (e.g., `anthropic/claude-3-5-sonnet-20241022`)
+ - Enter a model name (e.g., `anthropic/claude-sonnet-4-5-20250929`)
- Type a message and click "Send Message" or press Enter
- View the response in the output textarea
@@ -53,7 +53,7 @@ A simple HTML page that demonstrates using the CUA Agent Client in a browser env
**Example Models:**
-- `anthropic/claude-3-5-sonnet-20241022`
+- `anthropic/claude-sonnet-4-5-20250929`
- `openai/gpt-4`
- `huggingface-local/microsoft/UI-TARS-7B`
diff --git a/libs/typescript/cua-cli/package.json b/libs/typescript/cua-cli/package.json
index 35d894b6..5a79ef77 100644
--- a/libs/typescript/cua-cli/package.json
+++ b/libs/typescript/cua-cli/package.json
@@ -1,6 +1,6 @@
{
"name": "@trycua/cli",
- "version": "0.1.4",
+ "version": "0.1.5",
"packageManager": "bun@1.1.38",
"description": "Command-line interface for CUA cloud sandboxes and authentication",
"type": "module",
diff --git a/libs/typescript/cua-cli/src/cli.ts b/libs/typescript/cua-cli/src/cli.ts
index 9c086379..7ee7d080 100644
--- a/libs/typescript/cua-cli/src/cli.ts
+++ b/libs/typescript/cua-cli/src/cli.ts
@@ -17,7 +17,9 @@ export async function runCli() {
' cua sb Create and manage cloud sandboxes\n' +
' list View all your sandboxes\n' +
' create Provision a new sandbox\n' +
- ' start/stop Control sandbox state\n' +
+ ' start Start or resume a sandbox\n' +
+ ' stop Stop a sandbox (preserves disk)\n' +
+ ' suspend Suspend a sandbox (preserves memory)\n' +
' vnc Open remote desktop\n' +
'\n' +
'Documentation: https://docs.cua.ai/libraries/cua-cli/commands'
diff --git a/libs/typescript/cua-cli/src/commands/sandbox.ts b/libs/typescript/cua-cli/src/commands/sandbox.ts
index 36e66bf5..5d2bde93 100644
--- a/libs/typescript/cua-cli/src/commands/sandbox.ts
+++ b/libs/typescript/cua-cli/src/commands/sandbox.ts
@@ -191,6 +191,41 @@ const restartHandler = async (argv: Record) => {
process.exit(1);
};
+const suspendHandler = async (argv: Record) => {
+ const token = await ensureApiKeyInteractive();
+ const name = String((argv as any).name);
+ const res = await http(`/v1/vms/${encodeURIComponent(name)}/suspend`, {
+ token,
+ method: 'POST',
+ });
+ if (res.status === 202) {
+ const body = (await res.json().catch(() => ({}))) as {
+ status?: string;
+ };
+ console.log(body.status ?? 'suspending');
+ return;
+ }
+ if (res.status === 404) {
+ console.error('Sandbox not found');
+ process.exit(1);
+ }
+ if (res.status === 401) {
+ clearApiKey();
+ console.error("Unauthorized. Try 'cua login' again.");
+ process.exit(1);
+ }
+ if (res.status === 400 || res.status === 500) {
+ const body = (await res.json().catch(() => ({}))) as { error?: string };
+ console.error(
+ body.error ??
+ "Suspend not supported for this VM. Use 'cua sb stop' instead."
+ );
+ process.exit(1);
+ }
+ console.error(`Unexpected status: ${res.status}`);
+ process.exit(1);
+};
+
const openHandler = async (argv: Record) => {
const token = await ensureApiKeyInteractive();
const name = String((argv as any).name);
@@ -296,6 +331,13 @@ export function registerSandboxCommands(y: Argv) {
y.positional('name', { type: 'string', describe: 'Sandbox name' }),
restartHandler
)
+ .command(
+ 'suspend ',
+ 'Suspend a sandbox, preserving memory state (use start to resume)',
+ (y) =>
+ y.positional('name', { type: 'string', describe: 'Sandbox name' }),
+ suspendHandler
+ )
.command(
['vnc ', 'open '],
'Open remote desktop (VNC) connection in your browser',
@@ -378,6 +420,13 @@ export function registerSandboxCommands(y: Argv) {
y.positional('name', { type: 'string', describe: 'Sandbox name' }),
handler: restartHandler,
} as any)
+ .command({
+ command: 'suspend ',
+ describe: false as any, // Hide from help
+ builder: (y: Argv) =>
+ y.positional('name', { type: 'string', describe: 'Sandbox name' }),
+ handler: suspendHandler,
+ } as any)
.command({
command: ['vnc ', 'open '],
describe: false as any, // Hide from help
diff --git a/libs/typescript/cua-cli/src/util.ts b/libs/typescript/cua-cli/src/util.ts
index 36c6e0d6..60147049 100644
--- a/libs/typescript/cua-cli/src/util.ts
+++ b/libs/typescript/cua-cli/src/util.ts
@@ -16,6 +16,8 @@ export type SandboxStatus =
| 'pending'
| 'running'
| 'stopped'
+ | 'suspended'
+ | 'suspending'
| 'terminated'
| 'failed';
export type SandboxItem = {
diff --git a/notebooks/ollama_nb.ipynb b/notebooks/ollama_nb.ipynb
index 63cc2ea8..9b5cb188 100644
--- a/notebooks/ollama_nb.ipynb
+++ b/notebooks/ollama_nb.ipynb
@@ -203,7 +203,7 @@
"\n",
"Examples:\n",
"- `openai/computer-use-preview+ollama/gemma3:4b`\n",
- "- `anthropic/claude-3-5-sonnet-20241022+ollama/gemma3:4b`\n"
+ "- `anthropic/claude-sonnet-4-5-20250929+ollama/gemma3:4b`\n"
]
},
{
@@ -217,7 +217,7 @@
"import logging\n",
"\n",
"agent_composed = ComputerAgent(\n",
- " model=\"anthropic/claude-3-5-sonnet-20241022+ollama/gemma3:4b\",\n",
+ " model=\"anthropic/claude-sonnet-4-5-20250929+ollama/gemma3:4b\",\n",
" tools=[computer],\n",
" trajectory_dir=\"trajectories\",\n",
" only_n_most_recent_images=3,\n",
@@ -234,7 +234,20 @@
"cell_type": "markdown",
"id": "section-3-conceptual",
"metadata": {},
- "source": "## 3) Customize your agent 🛠️\n\nFor a few customization options, see: https://cua.ai/docs/agent-sdk/customizing-computeragent\n\nLevels of customization you can explore:\n\n1) Simple — Prompt engineering\n2) Easy — Tools\n3) Intermediate — Callbacks\n4) Expert — Custom agent via `register_agent` (see `libs/python/agent/agent/decorators.py` → `register_agent`)\n\nor, incorporate the ComputerAgent into your own agent framework!"
+ "source": [
+ "## 3) Customize your agent 🛠️\n",
+ "\n",
+ "For a few customization options, see: https://cua.ai/docs/agent-sdk/customizing-computeragent\n",
+ "\n",
+ "Levels of customization you can explore:\n",
+ "\n",
+ "1) Simple — Prompt engineering\n",
+ "2) Easy — Tools\n",
+ "3) Intermediate — Callbacks\n",
+ "4) Expert — Custom agent via `register_agent` (see `libs/python/agent/agent/decorators.py` → `register_agent`)\n",
+ "\n",
+ "or, incorporate the ComputerAgent into your own agent framework!"
+ ]
},
{
"cell_type": "markdown",
@@ -274,4 +287,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
-}
\ No newline at end of file
+}
diff --git a/tests/agent_loop_testing/agent_test.py b/tests/agent_loop_testing/agent_test.py
index b31c8249..127282d1 100644
--- a/tests/agent_loop_testing/agent_test.py
+++ b/tests/agent_loop_testing/agent_test.py
@@ -184,7 +184,7 @@ if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Test CUA Agent with mock computer")
parser.add_argument(
- "--model", default="anthropic/claude-sonnet-4-20250514", help="CUA model to test"
+ "--model", default="anthropic/claude-sonnet-4-5-20250929", help="CUA model to test"
)
args = parser.parse_args()
diff --git a/uv.lock b/uv.lock
index 67698779..0e26ddcc 100644
--- a/uv.lock
+++ b/uv.lock
@@ -861,7 +861,7 @@ wheels = [
[[package]]
name = "cua-agent"
-version = "0.4.39"
+version = "0.4.53"
source = { editable = "libs/python/agent" }
dependencies = [
{ name = "aiohttp" },
@@ -885,7 +885,6 @@ all = [
{ name = "einops" },
{ name = "google-genai" },
{ name = "gradio" },
- { name = "hud-python" },
{ name = "mlx-vlm", marker = "sys_platform == 'darwin'" },
{ name = "pillow" },
{ name = "python-dotenv" },
@@ -975,7 +974,6 @@ requires-dist = [
{ name = "gradio", marker = "extra == 'all'", specifier = ">=5.23.3" },
{ name = "gradio", marker = "extra == 'ui'", specifier = ">=5.23.3" },
{ name = "httpx", specifier = ">=0.27.0" },
- { name = "hud-python", marker = "extra == 'all'", specifier = "==0.4.52" },
{ name = "hud-python", marker = "extra == 'hud'", specifier = "==0.4.52" },
{ name = "litellm", specifier = ">=1.74.12" },
{ name = "mlx-vlm", marker = "sys_platform == 'darwin' and extra == 'all'", specifier = ">=0.1.27" },
@@ -1015,7 +1013,7 @@ provides-extras = ["openai", "anthropic", "qwen", "omni", "uitars", "uitars-mlx"
[[package]]
name = "cua-computer"
-version = "0.4.12"
+version = "0.4.17"
source = { editable = "libs/python/computer" }
dependencies = [
{ name = "aiohttp" },