Changes for VLM launch

This commit is contained in:
f-trycua
2025-11-19 14:42:12 +01:00
parent e169907548
commit 76b52ece72
11 changed files with 538 additions and 59 deletions

View File

@@ -83,7 +83,7 @@ For long conversations, consider using the `only_n_most_recent_images` parameter
```python
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
only_n_most_recent_images=3
)

View File

@@ -16,7 +16,7 @@ def calculate(a: int, b: int) -> int:
# Use with agent
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer, calculate]
)
```
@@ -43,7 +43,7 @@ from computer import Computer
computer = Computer(...)
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer, read_file],
)
```

View File

@@ -67,7 +67,7 @@ Callbacks provide lifecycle hooks to preprocess messages, postprocess outputs, r
from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback, BudgetManagerCallback
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
callbacks=[
ImageRetentionCallback(only_n_most_recent_images=3),

View File

@@ -9,18 +9,6 @@ All agent loops are compatible with any LLM provider supported by LiteLLM.
See [Running Models Locally](/agent-sdk/supported-model-providers/local-models) for how to use Hugging Face and MLX models on your own machine.
## UI-TARS-2
Nextgeneration UITARS via Cua Router:
- `cua/bytedance/ui-tars-2`
```python
agent = ComputerAgent("cua/bytedance/ui-tars-2", tools=[computer])
async for _ in agent.run("Open a browser and search for Python tutorials"):
pass
```
## Gemini CUA
Gemini models with computer-use capabilities:

View File

@@ -0,0 +1,380 @@
---
title: CUA VLM Router
description: Intelligent vision-language model routing with cost optimization and unified access
---
# CUA VLM Router
The **CUA VLM Router** is an intelligent inference API that provides unified access to multiple vision-language model providers through a single API key. It offers cost optimization and detailed observability for production AI applications.
## Overview
Instead of managing multiple API keys and provider-specific code, CUA VLM Router acts as a smart cloud gateway that:
- **Unifies access** to multiple model providers
- **Optimizes costs** through intelligent routing and provider selection
- **Tracks usage** and costs with detailed metadata
- **Provides observability** with routing decisions and attempt logs
- **Managed infrastructure** - no need to manage provider API keys yourself
## Quick Start
### 1. Get Your API Key
Sign up at [cua.ai](https://cua.ai/signin) and get your CUA API key from the dashboard.
### 2. Set Environment Variable
```bash
export CUA_API_KEY="sk_cua-api01_..."
```
### 3. Use with Agent SDK
```python
from agent import ComputerAgent
from computer import Computer
computer = Computer(os_type="linux", provider_type="docker")
agent = ComputerAgent(
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
max_trajectory_budget=5.0
)
messages = [{"role": "user", "content": "Take a screenshot and tell me what's on screen"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
```
## Available Models
The CUA VLM Router currently supports these models:
| Model ID | Provider | Description | Best For |
|----------|----------|-------------|----------|
| `cua/anthropic/claude-sonnet-4.5` | Anthropic | Claude Sonnet 4.5 | General-purpose tasks, recommended |
| `cua/anthropic/claude-haiku-4.5` | Anthropic | Claude Haiku 4.5 | Fast responses, cost-effective |
## How It Works
### Intelligent Routing
When you make a request to CUA VLM Router:
1. **Model Resolution**: Your model ID (e.g., `cua/anthropic/claude-sonnet-4.5`) is resolved to the appropriate provider
2. **Provider Selection**: CUA routes your request to the appropriate model provider
3. **Response**: You receive an OpenAI-compatible response with metadata
## API Reference
### Base URL
```
https://inference.cua.ai/v1
```
### Authentication
All requests require an API key in the Authorization header:
```bash
Authorization: Bearer sk_cua-api01_...
```
### Endpoints
#### List Available Models
```bash
GET /v1/models
```
**Response:**
```json
{
"data": [
{
"id": "anthropic/claude-sonnet-4.5",
"name": "Claude Sonnet 4.5",
"object": "model",
"owned_by": "cua"
}
],
"object": "list"
}
```
#### Chat Completions
```bash
POST /v1/chat/completions
Content-Type: application/json
```
**Request:**
```json
{
"model": "anthropic/claude-sonnet-4.5",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 100,
"temperature": 0.7,
"stream": false
}
```
**Response:**
```json
{
"id": "gen_...",
"object": "chat.completion",
"created": 1763554838,
"model": "anthropic/claude-sonnet-4.5",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 12,
"total_tokens": 22,
"cost": 0.01,
"is_byok": true
}
}
```
#### Streaming
Set `"stream": true` to receive server-sent events:
```bash
curl -X POST https://inference.cua.ai/v1/chat/completions \
-H "Authorization: Bearer sk_cua-api01_..." \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.5",
"messages": [{"role": "user", "content": "Count to 5"}],
"stream": true
}'
```
**Response (SSE format):**
```
data: {"id":"gen_...","choices":[{"delta":{"content":"1"}}],"object":"chat.completion.chunk"}
data: {"id":"gen_...","choices":[{"delta":{"content":"\n2"}}],"object":"chat.completion.chunk"}
data: {"id":"gen_...","choices":[{"delta":{"content":"\n3\n4\n5"}}],"object":"chat.completion.chunk"}
data: {"id":"gen_...","choices":[{"delta":{},"finish_reason":"stop"}],"usage":{...}}
```
#### Check Balance
```bash
GET /v1/balance
```
**Response:**
```json
{
"balance": 211689.85,
"currency": "credits"
}
```
## Cost Tracking
CUA VLM Router provides detailed cost information in every response:
### Credit System
Requests are billed in **credits**:
- Credits are deducted from your CUA account balance
- Prices vary by model and usage
- CUA manages all provider API keys and infrastructure
### Response Cost Fields
```json
{
"usage": {
"cost": 0.01, // CUA gateway cost in credits
"market_cost": 0.000065 // Actual upstream API cost
}
}
```
**Note:** CUA VLM Router is a fully managed cloud service. If you want to use your own provider API keys directly (BYOK), see the [Supported Model Providers](/agent-sdk/supported-model-providers/) page for direct provider access via the agent SDK.
## Response Metadata
CUA VLM Router includes metadata about routing decisions and costs in the response. This information helps with debugging and monitoring your application's model usage.
## Configuration
### Environment Variables
```bash
# Required: Your CUA API key
export CUA_API_KEY="sk_cua-api01_..."
# Optional: Custom endpoint (defaults to https://inference.cua.ai/v1)
export CUA_BASE_URL="https://custom-endpoint.cua.ai/v1"
```
### Python SDK Configuration
```python
from agent import ComputerAgent
# Using environment variables (recommended)
agent = ComputerAgent(model="cua/anthropic/claude-sonnet-4.5")
# Or explicit configuration
agent = ComputerAgent(
model="cua/anthropic/claude-sonnet-4.5",
# CUA adapter automatically loads from CUA_API_KEY
)
```
## Benefits Over Direct Provider Access
| Feature | CUA VLM Router | Direct Provider (BYOK) |
|---------|---------------|------------------------|
| **Single API Key** | ✅ One key for all providers | ❌ Multiple keys to manage |
| **Managed Infrastructure** | ✅ No API key management | ❌ Manage multiple provider keys |
| **Usage Tracking** | ✅ Unified dashboard | ❌ Per-provider tracking |
| **Model Switching** | ✅ Change model string only | ❌ Change code + keys |
| **Setup Complexity** | ✅ One environment variable | ❌ Multiple environment variables |
## Error Handling
### Common Error Responses
#### Invalid API Key
```json
{
"detail": "Insufficient credits. Current balance: 0.00 credits"
}
```
#### Missing Authorization
```json
{
"detail": "Missing Authorization: Bearer token"
}
```
#### Invalid Model
```json
{
"detail": "Invalid or unavailable model"
}
```
### Best Practices
1. **Check balance periodically** using `/v1/balance`
2. **Handle rate limits** with exponential backoff
3. **Log generation IDs** for debugging
4. **Set up usage alerts** in your CUA dashboard
## Examples
### Basic Usage
```python
from agent import ComputerAgent
from computer import Computer
computer = Computer(os_type="linux", provider_type="docker")
agent = ComputerAgent(
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer]
)
messages = [{"role": "user", "content": "Open Firefox"}]
async for result in agent.run(messages):
print(result)
```
### Direct API Call (curl)
```bash
curl -X POST https://inference.cua.ai/v1/chat/completions \
-H "Authorization: Bearer ${CUA_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.5",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
],
"max_tokens": 200
}'
```
### With Custom Parameters
```python
agent = ComputerAgent(
model="cua/anthropic/claude-haiku-4.5",
tools=[computer],
max_trajectory_budget=10.0,
temperature=0.7
)
```
## Migration from Direct Provider Access
Switching from direct provider access (BYOK) to CUA VLM Router is simple:
**Before (Direct Provider Access with BYOK):**
```python
# Required: Provider-specific API key
export ANTHROPIC_API_KEY="sk-ant-..."
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer]
)
```
**After (CUA VLM Router - Cloud Service):**
```python
# Required: CUA API key only (no provider keys needed)
export CUA_API_KEY="sk_cua-api01_..."
agent = ComputerAgent(
model="cua/anthropic/claude-sonnet-4.5", # Add "cua/" prefix
tools=[computer]
)
```
That's it! Same code structure, just different model format. CUA manages all provider infrastructure and credentials for you.
## Support
- **Documentation**: [cua.ai/docs](https://cua.ai/docs)
- **Discord**: [Join our community](https://discord.com/invite/mVnXXpdE85)
- **Issues**: [GitHub Issues](https://github.com/trycua/cua/issues)
## Next Steps
- Explore [Agent Loops](/agent-sdk/agent-loops) to customize agent behavior
- Learn about [Cost Saving Callbacks](/agent-sdk/callbacks/cost-saving)
- Try [Example Use Cases](/example-usecases/form-filling)
- Review [Supported Model Providers](/agent-sdk/supported-model-providers/) for all options

View File

@@ -4,7 +4,27 @@ title: Supported Model Providers
## Supported Models
### Anthropic Claude (Computer Use API)
### CUA VLM Router (Recommended)
Use CUA's cloud inference API for intelligent routing and cost optimization with a single API key. CUA manages all provider infrastructure and credentials for you.
```python
model="cua/anthropic/claude-sonnet-4.5" # Claude Sonnet 4.5 (recommended)
model="cua/anthropic/claude-haiku-4.5" # Claude Haiku 4.5 (faster)
```
**Benefits:**
- Single API key for multiple providers
- Cost tracking and optimization
- Fully managed infrastructure (no provider keys to manage)
[Learn more about CUA VLM Router →](/agent-sdk/supported-model-providers/cua-vlm-router)
---
### Anthropic Claude (Computer Use API - BYOK)
Direct access to Anthropic's Claude models using your own Anthropic API key (BYOK - Bring Your Own Key).
```python
model="anthropic/claude-3-5-sonnet-20241022"
@@ -13,14 +33,22 @@ model="anthropic/claude-opus-4-20250514"
model="anthropic/claude-sonnet-4-20250514"
```
### OpenAI Computer Use Preview
**Setup:** Set `ANTHROPIC_API_KEY` environment variable with your Anthropic API key.
### OpenAI Computer Use Preview (BYOK)
Direct access to OpenAI's computer use models using your own OpenAI API key (BYOK).
```python
model="openai/computer-use-preview"
```
**Setup:** Set `OPENAI_API_KEY` environment variable with your OpenAI API key.
### UI-TARS (Local or Huggingface Inference)
Run UI-TARS models locally for privacy and offline use.
```python
model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"
model="ollama_chat/0000/ui-tars-1.5-7b"
@@ -28,6 +56,8 @@ model="ollama_chat/0000/ui-tars-1.5-7b"
### Omniparser + Any LLM
Combine Omniparser for UI understanding with any LLM provider.
```python
model="omniparser+ollama_chat/mistral-small3.2"
model="omniparser+vertex_ai/gemini-pro"

View File

@@ -34,7 +34,7 @@ You can then use this as a tool for your agent:
from agent import ComputerAgent
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[custom_computer],
)
@@ -122,7 +122,7 @@ class MyCustomComputer(AsyncComputerHandler):
custom_computer = MyCustomComputer()
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[custom_computer],
)

View File

@@ -1,5 +1,5 @@
---
title: Form Filling
title: PDF to Form Automation
description: Enhance and Automate Interactions Between Form Filling and Local File Systems
---
@@ -83,7 +83,7 @@ async def fill_application():
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
@@ -189,7 +189,7 @@ async def fill_application():
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
@@ -289,7 +289,7 @@ async def fill_application():
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
@@ -388,7 +388,7 @@ async def fill_application():
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,

View File

@@ -232,7 +232,7 @@ async def scrape_linkedin_connections():
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,

View File

@@ -173,7 +173,7 @@ async def automate_hr_workflow():
# Configure agent with specialized instructions
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
@@ -274,7 +274,7 @@ async def automate_hr_workflow():
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
@@ -353,7 +353,7 @@ async def automate_hr_workflow():
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
@@ -476,7 +476,7 @@ For long-running workflows, adjust budget limits:
```python
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
max_trajectory_budget=20.0, # Increase for complex workflows
# ... other params
@@ -535,7 +535,7 @@ Add approval gates for critical operations:
```python
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
# Add human approval callback for sensitive operations
callbacks=[ApprovalCallback(require_approval_for=["payroll", "termination"])]

View File

@@ -30,9 +30,23 @@ You can run your Cua computer in the cloud (recommended for easiest setup), loca
<Tabs items={['Cloud Sandbox', 'Linux on Docker', 'macOS Sandbox', 'Windows Sandbox']}>
<Tab value="Cloud Sandbox">
Create and manage cloud sandboxes that run Linux (Ubuntu), Windows, or macOS using either the website or CLI.
Create and manage cloud sandboxes that run Linux (Ubuntu), Windows, or macOS.
**Option 1: Via CLI (Recommended)**
**First, create your API key:**
1. Go to [cua.ai/signin](https://cua.ai/signin)
2. Navigate to **Dashboard > API Keys > New API Key** to create your API key
3. **Important:** Copy and save your API key immediately - you won't be able to see it again (you'll need to regenerate if lost)
**Then, create your sandbox using either option:**
**Option 1: Via Website**
1. Navigate to **Dashboard > Sandboxes > Create Sandbox**
2. Create a **Small** sandbox, choosing **Linux**, **Windows**, or **macOS**
3. Note your sandbox name
**Option 2: Via CLI**
1. Install the CUA CLI:
```bash
@@ -51,14 +65,7 @@ You can run your Cua computer in the cloud (recommended for easiest setup), loca
3. Note your sandbox name and password from the output
**Option 2: Via Website**
1. Go to [cua.ai/signin](https://cua.ai/signin)
2. Navigate to **Dashboard > Containers > Create Instance**
3. Create a **Small** sandbox, choosing **Linux**, **Windows**, or **macOS**
4. Note your sandbox name and API key
Your Cloud Sandbox will be automatically configured and ready to use with either method.
Your Cloud Sandbox will be automatically configured and ready to use.
</Tab>
<Tab value="Linux on Docker">
@@ -134,14 +141,19 @@ Connect to your Cua computer and perform basic interactions, such as taking scre
<Tabs items={['Cloud Sandbox', 'Linux on Docker', 'macOS Sandbox', 'Windows Sandbox', 'Your host desktop']}>
<Tab value="Cloud Sandbox">
Set your CUA API key (same key used for model inference):
```bash
export CUA_API_KEY="sk_cua-api01_..."
```
Then connect to your sandbox:
```python
from computer import Computer
computer = Computer(
os_type="linux", # or "windows" or "macos"
provider_type="cloud",
name="your-sandbox-name", # from CLI or website
api_key="your-api-key"
name="your-sandbox-name" # from CLI or website
)
await computer.run() # Connect to the sandbox
```
@@ -226,13 +238,18 @@ Connect to your Cua computer and perform basic interactions, such as taking scre
<Tabs items={['Cloud Sandbox', 'Linux on Docker', 'macOS Sandbox', 'Windows Sandbox', 'Your host desktop']}>
<Tab value="Cloud Sandbox">
Set your CUA API key (same key used for model inference):
```bash
export CUA_API_KEY="sk_cua-api01_..."
```
Then connect to your sandbox:
```typescript
import { Computer, OSType } from '@trycua/computer';
const computer = new Computer({
osType: OSType.LINUX, // or OSType.WINDOWS or OSType.MACOS
name: "your-sandbox-name", // from CLI or website
apiKey: "your-api-key"
name: "your-sandbox-name" // from CLI or website
});
await computer.run(); // Connect to the sandbox
```
@@ -322,24 +339,88 @@ Install the Cua agent Python SDK:
pip install "cua-agent[all]"
```
Then, use the `ComputerAgent` object:
Choose how you want to access vision-language models for your agent:
```python
from agent import ComputerAgent
<Tabs items={['CUA VLM Router', 'BYOK (Bring Your Own Key)']}>
<Tab value="CUA VLM Router">
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
max_trajectory_budget=5.0
)
Use CUA's inference API to access multiple model providers with a single API key (same key used for sandbox access). CUA VLM Router provides intelligent routing and cost optimization.
messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
**Set your CUA API key:**
```bash
export CUA_API_KEY="sk_cua-api01_..."
```
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
```
**Use the agent with CUA models:**
```python
from agent import ComputerAgent
agent = ComputerAgent(
model="cua/anthropic/claude-sonnet-4.5", # CUA-routed model
tools=[computer],
max_trajectory_budget=5.0
)
messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
```
**Available CUA models:**
- `cua/anthropic/claude-sonnet-4.5` - Claude Sonnet 4.5 (recommended)
- `cua/anthropic/claude-haiku-4.5` - Claude Haiku 4.5 (faster)
**Benefits:**
- Single API key for multiple providers
- Cost tracking and optimization
- No need to manage multiple provider keys
</Tab>
<Tab value="BYOK (Bring Your Own Key)">
Use your own API keys from model providers like Anthropic, OpenAI, or others.
**Set your provider API key:**
```bash
# For Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
# For OpenAI
export OPENAI_API_KEY="sk-..."
```
**Use the agent with your provider:**
```python
from agent import ComputerAgent
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929", # Direct provider model
tools=[computer],
max_trajectory_budget=5.0
)
messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
```
**Supported providers:**
- `anthropic/claude-*` - Anthropic Claude models
- `openai/gpt-*` - OpenAI GPT models
- `openai/o1-*` - OpenAI o1 models
- `huggingface-local/*` - Local HuggingFace models
- And many more via LiteLLM
See [Supported Models](/agent-sdk/supported-model-providers/) for the complete list.
</Tab>
</Tabs>
Learn more about agents in [Agent Loops](/agent-sdk/agent-loops) and available models in [Supported Models](/agent-sdk/supported-model-providers/).