Merge pull request #582 from trycua/study-docs-structure

Fixes pre-launch week
This commit is contained in:
Francesco Bonacci
2025-11-17 17:36:30 +01:00
committed by GitHub
20 changed files with 1083 additions and 1419 deletions

View File

@@ -4,11 +4,7 @@ description: Supported computer-using agent loops and models
---
<Callout>
A corresponding{' '}
<a href="https://github.com/trycua/cua/blob/main/notebooks/agent_nb.ipynb" target="_blank">
Jupyter Notebook
</a>{' '}
is available for this documentation.
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/agent_nb.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
</Callout>
An agent can be thought of as a loop - it generates actions, executes them, and repeats until done:

View File

@@ -1,16 +1,9 @@
---
title: Customizing Your ComputerAgent
title: Customize ComputerAgent
---
<Callout>
A corresponding{' '}
<a
href="https://github.com/trycua/cua/blob/main/notebooks/customizing_computeragent.ipynb"
target="_blank"
>
Jupyter Notebook
</a>{' '}
is available for this documentation.
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/customizing_computeragent.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
</Callout>
The `ComputerAgent` interface provides an easy proxy to any computer-using model configuration, and it is a powerful framework for extending and building your own agentic systems.

View File

@@ -4,11 +4,7 @@ description: Use ComputerAgent with HUD for benchmarking and evaluation
---
<Callout>
A corresponding{' '}
<a href="https://github.com/trycua/cua/blob/main/notebooks/eval_osworld.ipynb" target="_blank">
Jupyter Notebook
</a>{' '}
is available for this documentation.
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/eval_osworld.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
</Callout>
The HUD integration allows an agent to be benchmarked using the [HUD framework](https://www.hud.so/). Through the HUD integration, the agent controls a computer inside HUD, where tests are run to evaluate the success of each task.

View File

@@ -10,12 +10,10 @@
"customizing-computeragent",
"callbacks",
"custom-tools",
"custom-computer-handlers",
"prompt-caching",
"usage-tracking",
"telemetry",
"benchmarks",
"migration-guide",
"integrations"
]
}

View File

@@ -1,7 +1,11 @@
---
title: Computer UI
title: Computer UI (Deprecated)
---
<Callout type="warn" title="Deprecated">
The Computer UI is deprecated and will be replaced with a revamped playground experience soon. We recommend using VNC or Screen Sharing for precise control of the computer instead.
</Callout>
The computer module includes a Gradio UI for creating and sharing demonstration data. We make it easy for people to build community datasets for better computer use models with an upload to Huggingface feature.
```bash

View File

@@ -1,5 +1,5 @@
{
"title": "Computer SDK",
"description": "Build computer-using agents with the Computer SDK",
"pages": ["computers", "commands", "computer-ui", "tracing-api", "sandboxed-python"]
"pages": ["computers", "commands", "tracing-api", "sandboxed-python", "custom-computer-handlers", "computer-ui"]
}

View File

@@ -4,14 +4,7 @@ slug: sandboxed-python
---
<Callout>
A corresponding{' '}
<a
href="https://github.com/trycua/cua/blob/main/examples/sandboxed_functions_examples.py"
target="_blank"
>
Python example
</a>{' '}
is available for this documentation.
A corresponding <a href="https://github.com/trycua/cua/blob/main/examples/sandboxed_functions_examples.py" target="_blank">Python example</a> is available for this documentation.
</Callout>
You can run Python functions securely inside a sandboxed virtual environment on a remote Cua Computer. This is useful for executing untrusted user code, isolating dependencies, or providing a safe environment for automation tasks.

View File

@@ -3,7 +3,7 @@ title: Form Filling
description: Enhance and Automate Interactions Between Form Filling and Local File Systems
---
import { EditableCodeBlock, EditableValue, S } from '@/components/editable-code-block';
import { Step, Steps } from 'fumadocs-ui/components/steps';
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
## Overview
@@ -12,9 +12,17 @@ Cua can be used to automate interactions between form filling and local file sys
This preset usecase uses [Cua Computer](/computer-sdk/computers) to interact with a web page and local file systems along with [Agent Loops](/agent-sdk/agent-loops) to run the agent in a loop with message history.
## Quickstart
---
Create a `requirements.txt` file with the following dependencies:
<Steps>
<Step>
### Set Up Your Environment
First, install the required dependencies:
Create a `requirements.txt` file:
```text
cua-agent
@@ -22,33 +30,32 @@ cua-computer
python-dotenv>=1.0.0
```
And install:
Install the dependencies:
```bash
pip install -r requirements.txt
```
Create a `.env` file with the following environment variables:
Create a `.env` file with your API keys:
```text
ANTHROPIC_API_KEY=your-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
CUA_API_KEY=sk_cua-api01...
```
Select the environment you want to run the code in (_click on the underlined values in the code to edit them directly!_):
</Step>
<Tabs items={['☁️ Cloud', '🐳 Docker', '🍎 Lume', '🪟 Windows Sandbox']}>
<Tab value="☁️ Cloud">
<Step>
<EditableCodeBlock
key="cloud-tab"
lang="python"
defaultValues={{
"sandbox-name": "m-linux-...",
"api_key": "sk_cua-api01..."
}}
>
{`import asyncio
### Create Your Form Filling Script
Create a Python file (e.g., `form_filling.py`) and select your environment:
<Tabs items={['Cloud Sandbox', 'Linux on Docker', 'macOS Sandbox', 'Windows Sandbox']}>
<Tab value="Cloud Sandbox">
```python
import asyncio
import logging
import os
import signal
@@ -59,21 +66,21 @@ from computer import Computer, VMProviderType
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(**name**)
logger = logging.getLogger(__name__)
def handle_sigint(sig, frame):
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
exit(0)
print("\n\nExecution interrupted by user. Exiting gracefully...")
exit(0)
async def fill_application():
try:
async with Computer(
os_type="linux",
provider_type=VMProviderType.CLOUD,
name="`}<EditableValue placeholder="sandbox-name" />{`",
api_key="`}<EditableValue placeholder="api_key" />{`",
verbosity=logging.INFO,
) as computer:
try:
async with Computer(
os_type="linux",
provider_type=VMProviderType.CLOUD,
name="your-sandbox-name", # Replace with your sandbox name
api_key=os.environ["CUA_API_KEY"],
verbosity=logging.INFO,
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
@@ -93,7 +100,7 @@ verbosity=logging.INFO,
history = []
for i, task in enumerate(tasks, 1):
print(f"\\n[Task {i}/{len(tasks)}] {task}")
print(f"\n[Task {i}/{len(tasks)}] {task}")
# Add user message to history
history.append({"role": "user", "content": task})
@@ -116,7 +123,7 @@ verbosity=logging.INFO,
print(f"✅ Task {i}/{len(tasks)} completed")
print("\\n🎉 All tasks completed successfully!")
print("\n🎉 All tasks completed successfully!")
except Exception as e:
logger.error(f"Error in fill_application: {e}")
@@ -124,18 +131,18 @@ verbosity=logging.INFO,
raise
def main():
try:
load_dotenv()
try:
load_dotenv()
if "ANTHROPIC_API_KEY" not in os.environ:
raise RuntimeError(
"Please set the ANTHROPIC_API_KEY environment variable.\\n"
"Please set the ANTHROPIC_API_KEY environment variable.\n"
"You can add it to a .env file in the project root."
)
if "CUA_API_KEY" not in os.environ:
raise RuntimeError(
"Please set the CUA_API_KEY environment variable.\\n"
"Please set the CUA_API_KEY environment variable.\n"
"You can add it to a .env file in the project root."
)
@@ -147,22 +154,15 @@ load_dotenv()
logger.error(f"Error running automation: {e}")
traceback.print_exc()
if **name** == "**main**":
main()`}
</EditableCodeBlock>
if __name__ == "__main__":
main()
```
</Tab>
<Tab value="🍎 Lume">
<Tab value="Linux on Docker">
<EditableCodeBlock
key="lume-tab"
lang="python"
defaultValues={{
"sandbox-name": "macos-sequoia-cua:latest"
}}
>
{`import asyncio
```python
import asyncio
import logging
import os
import signal
@@ -173,20 +173,20 @@ from computer import Computer, VMProviderType
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(**name**)
logger = logging.getLogger(__name__)
def handle_sigint(sig, frame):
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
exit(0)
print("\n\nExecution interrupted by user. Exiting gracefully...")
exit(0)
async def fill_application():
try:
async with Computer(
os_type="macos",
provider_type=VMProviderType.LUME,
name="`}<EditableValue placeholder="sandbox-name" />{`",
verbosity=logging.INFO,
) as computer:
try:
async with Computer(
os_type="linux",
provider_type=VMProviderType.DOCKER,
image="trycua/cua-xfce:latest", # or "trycua/cua-ubuntu:latest"
verbosity=logging.INFO,
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
@@ -206,7 +206,7 @@ verbosity=logging.INFO,
history = []
for i, task in enumerate(tasks, 1):
print(f"\\n[Task {i}/{len(tasks)}] {task}")
print(f"\n[Task {i}/{len(tasks)}] {task}")
# Add user message to history
history.append({"role": "user", "content": task})
@@ -229,7 +229,7 @@ verbosity=logging.INFO,
print(f"✅ Task {i}/{len(tasks)} completed")
print("\\n🎉 All tasks completed successfully!")
print("\n🎉 All tasks completed successfully!")
except Exception as e:
logger.error(f"Error in fill_application: {e}")
@@ -237,12 +237,12 @@ verbosity=logging.INFO,
raise
def main():
try:
load_dotenv()
try:
load_dotenv()
if "ANTHROPIC_API_KEY" not in os.environ:
raise RuntimeError(
"Please set the ANTHROPIC_API_KEY environment variable.\\n"
"Please set the ANTHROPIC_API_KEY environment variable.\n"
"You can add it to a .env file in the project root."
)
@@ -254,20 +254,15 @@ load_dotenv()
logger.error(f"Error running automation: {e}")
traceback.print_exc()
if **name** == "**main**":
main()`}
</EditableCodeBlock>
if __name__ == "__main__":
main()
```
</Tab>
<Tab value="🪟 Windows Sandbox">
<Tab value="macOS Sandbox">
<EditableCodeBlock
key="windows-tab"
lang="python"
defaultValues={{}}
>
{`import asyncio
```python
import asyncio
import logging
import os
import signal
@@ -278,19 +273,20 @@ from computer import Computer, VMProviderType
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(**name**)
logger = logging.getLogger(__name__)
def handle_sigint(sig, frame):
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
exit(0)
print("\n\nExecution interrupted by user. Exiting gracefully...")
exit(0)
async def fill_application():
try:
async with Computer(
os_type="windows",
provider_type=VMProviderType.WINDOWS_SANDBOX,
verbosity=logging.INFO,
) as computer:
try:
async with Computer(
os_type="macos",
provider_type=VMProviderType.LUME,
name="macos-sequoia-cua:latest",
verbosity=logging.INFO,
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
@@ -310,7 +306,7 @@ verbosity=logging.INFO,
history = []
for i, task in enumerate(tasks, 1):
print(f"\\n[Task {i}/{len(tasks)}] {task}")
print(f"\n[Task {i}/{len(tasks)}] {task}")
# Add user message to history
history.append({"role": "user", "content": task})
@@ -333,7 +329,7 @@ verbosity=logging.INFO,
print(f"✅ Task {i}/{len(tasks)} completed")
print("\\n🎉 All tasks completed successfully!")
print("\n🎉 All tasks completed successfully!")
except Exception as e:
logger.error(f"Error in fill_application: {e}")
@@ -341,12 +337,12 @@ verbosity=logging.INFO,
raise
def main():
try:
load_dotenv()
try:
load_dotenv()
if "ANTHROPIC_API_KEY" not in os.environ:
raise RuntimeError(
"Please set the ANTHROPIC_API_KEY environment variable.\\n"
"Please set the ANTHROPIC_API_KEY environment variable.\n"
"You can add it to a .env file in the project root."
)
@@ -358,22 +354,15 @@ load_dotenv()
logger.error(f"Error running automation: {e}")
traceback.print_exc()
if **name** == "**main**":
main()`}
</EditableCodeBlock>
if __name__ == "__main__":
main()
```
</Tab>
<Tab value="🐳 Docker">
<Tab value="Windows Sandbox">
<EditableCodeBlock
key="docker-tab"
lang="python"
defaultValues={{
"sandbox-name": "trycua/cua-ubuntu:latest"
}}
>
{`import asyncio
```python
import asyncio
import logging
import os
import signal
@@ -384,20 +373,19 @@ from computer import Computer, VMProviderType
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(**name**)
logger = logging.getLogger(__name__)
def handle_sigint(sig, frame):
print("\\n\\nExecution interrupted by user. Exiting gracefully...")
exit(0)
print("\n\nExecution interrupted by user. Exiting gracefully...")
exit(0)
async def fill_application():
try:
async with Computer(
os_type="linux",
provider_type=VMProviderType.DOCKER,
name="`}<EditableValue placeholder="sandbox-name" />{`",
verbosity=logging.INFO,
) as computer:
try:
async with Computer(
os_type="windows",
provider_type=VMProviderType.WINDOWS_SANDBOX,
verbosity=logging.INFO,
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
@@ -417,7 +405,7 @@ verbosity=logging.INFO,
history = []
for i, task in enumerate(tasks, 1):
print(f"\\n[Task {i}/{len(tasks)}] {task}")
print(f"\n[Task {i}/{len(tasks)}] {task}")
# Add user message to history
history.append({"role": "user", "content": task})
@@ -440,7 +428,7 @@ verbosity=logging.INFO,
print(f"✅ Task {i}/{len(tasks)} completed")
print("\\n🎉 All tasks completed successfully!")
print("\n🎉 All tasks completed successfully!")
except Exception as e:
logger.error(f"Error in fill_application: {e}")
@@ -448,12 +436,12 @@ verbosity=logging.INFO,
raise
def main():
try:
load_dotenv()
try:
load_dotenv()
if "ANTHROPIC_API_KEY" not in os.environ:
raise RuntimeError(
"Please set the ANTHROPIC_API_KEY environment variable.\\n"
"Please set the ANTHROPIC_API_KEY environment variable.\n"
"You can add it to a .env file in the project root."
)
@@ -465,16 +453,41 @@ load_dotenv()
logger.error(f"Error running automation: {e}")
traceback.print_exc()
if **name** == "**main**":
main()`}
</EditableCodeBlock>
if __name__ == "__main__":
main()
```
</Tab>
</Tabs>
</Step>
<Step>
### Run Your Script
Execute your form filling automation:
```bash
python form_filling.py
```
The agent will:
1. Download the PDF resume from Overleaf
2. Extract information from the PDF
3. Fill out the JotForm with the extracted information
Monitor the output to see the agent's progress through each task.
</Step>
</Steps>
---
## Next Steps
- Learn more about [Cua computers](/computer-sdk/computers) and [computer commands](/computer-sdk/commands)
- Read about [Agent loops](/agent-sdk/agent-loops), [tools](/agent-sdk/custom-tools), and [supported model providers](/agent-sdk/supported-model-providers/)
- Experiment with different [Models and Providers](/agent-sdk/supported-model-providers/)
- Join our [Discord community](https://discord.com/invite/mVnXXpdE85) for help

View File

@@ -1,5 +1,5 @@
{
"title": "Cookbook",
"description": "Real-world examples of building with Cua",
"pages": ["form-filling", "post-event-contact-export"]
"pages": ["windows-app-behind-vpn", "form-filling", "post-event-contact-export"]
}

View File

@@ -0,0 +1,615 @@
---
title: Windows App behind VPN
description: Automate legacy Windows desktop applications behind VPN with Cua
---
import { Step, Steps } from 'fumadocs-ui/components/steps';
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
## Overview
This guide demonstrates how to automate Windows desktop applications (like eGecko HR/payroll systems) that run behind corporate VPN. This is a common enterprise scenario where legacy desktop applications require manual data entry, report generation, or workflow execution.
**Use cases:**
- HR/payroll processing (employee onboarding, payroll runs, benefits administration)
- Desktop ERP systems behind corporate networks
- Legacy financial applications requiring VPN access
- Compliance reporting from on-premise systems
**Architecture:**
- Client-side Cua agent (Python SDK or Playground UI)
- Windows VM/Sandbox with VPN client configured
- RDP/remote desktop connection to target environment
- Desktop application automation via computer vision and UI control
<Callout type="info">
**Production Deployment**: For production use, consider workflow mining and custom finetuning to create vertical-specific actions (e.g., "Run payroll", "Onboard employee") instead of generic UI automation. This provides better audit trails and higher success rates.
</Callout>
---
## Video Demo
<div className="rounded-lg border bg-card text-card-foreground shadow-sm p-4 mb-6">
<video src="https://github.com/user-attachments/assets/8ab07646-6018-4128-87ce-53180cfea696" controls className="w-full rounded">
Your browser does not support the video tag.
</video>
<div className="text-sm text-muted-foreground mt-2">
Demo showing Cua automating an eGecko-like desktop application on Windows behind AWS VPN
</div>
</div>
---
<Steps>
<Step>
### Set Up Your Environment
Install the required dependencies:
Create a `requirements.txt` file:
```text
cua-agent
cua-computer
python-dotenv>=1.0.0
```
Install the dependencies:
```bash
pip install -r requirements.txt
```
Create a `.env` file with your API keys:
```text
ANTHROPIC_API_KEY=your-anthropic-api-key
CUA_API_KEY=sk_cua-api01...
CUA_SANDBOX_NAME=your-windows-sandbox
```
</Step>
<Step>
### Configure Windows Sandbox with VPN
<Tabs items={['Cloud Sandbox (Recommended)', 'Windows Sandbox', 'Self-Hosted VM']}>
<Tab value="Cloud Sandbox (Recommended)">
For enterprise deployments, use Cua Cloud Sandbox with pre-configured VPN:
1. Go to [cua.ai/signin](https://cua.ai/signin)
2. Navigate to **Dashboard > Containers > Create Instance**
3. Create a **Windows** sandbox (Medium or Large for desktop apps)
4. Configure VPN settings:
- Upload your AWS VPN Client configuration (`.ovpn` file)
- Or configure VPN credentials directly in the dashboard
5. Note your sandbox name and API key
Your Windows sandbox will launch with VPN automatically connected.
</Tab>
<Tab value="Windows Sandbox">
For local development on Windows 10 Pro/Enterprise or Windows 11:
1. Enable [Windows Sandbox](https://learn.microsoft.com/en-us/windows/security/application-security/application-isolation/windows-sandbox/windows-sandbox-install)
2. Install the `pywinsandbox` dependency:
```bash
pip install -U git+git://github.com/karkason/pywinsandbox.git
```
3. Create a VPN setup script that runs on sandbox startup
4. Configure your desktop application installation within the sandbox
<Callout type="warn">
**Manual VPN Setup**: Windows Sandbox requires manual VPN configuration each time it starts. For production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections.
</Callout>
</Tab>
<Tab value="Self-Hosted VM">
For self-managed infrastructure:
1. Deploy Windows VM on your preferred cloud (AWS, Azure, GCP)
2. Install and configure VPN client (AWS VPN Client, OpenVPN, etc.)
3. Install target desktop application and any dependencies
4. Install `cua-computer-server`:
```bash
pip install cua-computer-server
python -m computer_server
```
5. Configure firewall rules to allow Cua agent connections
</Tab>
</Tabs>
</Step>
<Step>
### Create Your Automation Script
Create a Python file (e.g., `hr_automation.py`):
<Tabs items={['Cloud Sandbox', 'Windows Sandbox', 'Self-Hosted']}>
<Tab value="Cloud Sandbox">
```python
import asyncio
import logging
import os
from agent import ComputerAgent
from computer import Computer, VMProviderType
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
load_dotenv()
async def automate_hr_workflow():
"""
Automate HR/payroll desktop application workflow.
This example demonstrates:
- Launching Windows desktop application
- Navigating complex desktop UI
- Data entry and form filling
- Report generation and export
"""
try:
# Connect to Windows Cloud Sandbox with VPN
async with Computer(
os_type="windows",
provider_type=VMProviderType.CLOUD,
name=os.environ["CUA_SANDBOX_NAME"],
api_key=os.environ["CUA_API_KEY"],
verbosity=logging.INFO,
) as computer:
# Configure agent with specialized instructions
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
trajectory_dir="trajectories",
use_prompt_caching=True,
max_trajectory_budget=10.0,
instructions="""
You are automating a Windows desktop HR/payroll application.
IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Look for loading indicators and wait for them to disappear
- Verify each action by checking on-screen confirmation messages
- If a button or field is not visible, try scrolling or navigating tabs
- Desktop apps often have nested menus - explore systematically
- Save work frequently using File > Save or Ctrl+S
- Before closing, always verify changes were saved
COMMON UI PATTERNS:
- Menu bar navigation (File, Edit, View, etc.)
- Ribbon interfaces with tabs
- Modal dialogs that block interaction
- Data grids/tables for viewing records
- Form fields with validation
- Status bars showing operation progress
""".strip()
)
# Define workflow tasks
tasks = [
"Launch the HR application from the desktop or start menu",
"Log in with the credentials shown in credentials.txt on the desktop",
"Navigate to Employee Management section",
"Create a new employee record with information from new_hire.xlsx on desktop",
"Verify the employee was created successfully by searching for their name",
"Generate an onboarding report for the new employee",
"Export the report as PDF to the desktop",
"Log out of the application"
]
history = []
for task in tasks:
logger.info(f"\n{'='*60}")
logger.info(f"Task: {task}")
logger.info(f"{'='*60}\n")
history.append({"role": "user", "content": task})
async for result in agent.run(history):
for item in result.get("output", []):
if item.get("type") == "message":
content = item.get("content", [])
for block in content:
if block.get("type") == "text":
response = block.get("text", "")
logger.info(f"Agent: {response}")
history.append({"role": "assistant", "content": response})
logger.info("\nTask completed. Moving to next task...\n")
logger.info("\n" + "="*60)
logger.info("All tasks completed successfully!")
logger.info("="*60)
except Exception as e:
logger.error(f"Error during automation: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
asyncio.run(automate_hr_workflow())
```
</Tab>
<Tab value="Windows Sandbox">
```python
import asyncio
import logging
import os
from agent import ComputerAgent
from computer import Computer, VMProviderType
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
load_dotenv()
async def automate_hr_workflow():
try:
# Connect to Windows Sandbox
async with Computer(
os_type="windows",
provider_type=VMProviderType.WINDOWS_SANDBOX,
verbosity=logging.INFO,
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
trajectory_dir="trajectories",
use_prompt_caching=True,
max_trajectory_budget=10.0,
instructions="""
You are automating a Windows desktop HR/payroll application.
IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Verify each action by checking on-screen confirmation messages
- Desktop apps often have nested menus - explore systematically
- Save work frequently using File > Save or Ctrl+S
""".strip()
)
tasks = [
"Launch the HR application from the desktop",
"Log in with credentials from credentials.txt on desktop",
"Navigate to Employee Management and create new employee from new_hire.xlsx",
"Generate and export onboarding report as PDF",
"Log out of the application"
]
history = []
for task in tasks:
logger.info(f"\nTask: {task}")
history.append({"role": "user", "content": task})
async for result in agent.run(history):
for item in result.get("output", []):
if item.get("type") == "message":
content = item.get("content", [])
for block in content:
if block.get("type") == "text":
response = block.get("text", "")
logger.info(f"Agent: {response}")
history.append({"role": "assistant", "content": response})
logger.info("\nAll tasks completed!")
except Exception as e:
logger.error(f"Error: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
asyncio.run(automate_hr_workflow())
```
</Tab>
<Tab value="Self-Hosted">
```python
import asyncio
import logging
import os
from agent import ComputerAgent
from computer import Computer
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
load_dotenv()
async def automate_hr_workflow():
try:
# Connect to self-hosted Windows VM running computer-server
async with Computer(
use_host_computer_server=True,
base_url="http://your-windows-vm-ip:5757", # Update with your VM IP
verbosity=logging.INFO,
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
trajectory_dir="trajectories",
use_prompt_caching=True,
max_trajectory_budget=10.0,
instructions="""
You are automating a Windows desktop HR/payroll application.
IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Verify each action by checking on-screen confirmation messages
- Save work frequently using File > Save or Ctrl+S
""".strip()
)
tasks = [
"Launch the HR application",
"Log in with provided credentials",
"Complete the required HR workflow",
"Generate and export report",
"Log out"
]
history = []
for task in tasks:
logger.info(f"\nTask: {task}")
history.append({"role": "user", "content": task})
async for result in agent.run(history):
for item in result.get("output", []):
if item.get("type") == "message":
content = item.get("content", [])
for block in content:
if block.get("type") == "text":
response = block.get("text", "")
logger.info(f"Agent: {response}")
history.append({"role": "assistant", "content": response})
logger.info("\nAll tasks completed!")
except Exception as e:
logger.error(f"Error: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
asyncio.run(automate_hr_workflow())
```
</Tab>
</Tabs>
</Step>
<Step>
### Run Your Automation
Execute the script:
```bash
python hr_automation.py
```
The agent will:
1. Connect to your Windows environment (with VPN if configured)
2. Launch and navigate the desktop application
3. Execute each workflow step sequentially
4. Verify actions and handle errors
5. Save trajectory logs for audit and debugging
Monitor the console output to see the agent's progress through each task.
</Step>
</Steps>
---
## Key Configuration Options
### Agent Instructions
The `instructions` parameter is critical for reliable desktop automation:
```python
instructions="""
You are automating a Windows desktop HR/payroll application.
IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Look for loading indicators and wait for them to disappear
- Verify each action by checking on-screen confirmation messages
- If a button or field is not visible, try scrolling or navigating tabs
- Desktop apps often have nested menus - explore systematically
- Save work frequently using File > Save or Ctrl+S
- Before closing, always verify changes were saved
COMMON UI PATTERNS:
- Menu bar navigation (File, Edit, View, etc.)
- Ribbon interfaces with tabs
- Modal dialogs that block interaction
- Data grids/tables for viewing records
- Form fields with validation
- Status bars showing operation progress
APPLICATION-SPECIFIC:
- Login is at top-left corner
- Employee records are under "HR Management" > "Employees"
- Reports are generated via "Tools" > "Reports" > "Generate"
- Always click "Save" before navigating away from a form
""".strip()
```
### Budget Management
For long-running workflows, adjust budget limits:
```python
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
max_trajectory_budget=20.0, # Increase for complex workflows
# ... other params
)
```
### Image Retention
Balance context and cost by retaining only recent screenshots:
```python
agent = ComputerAgent(
# ...
only_n_most_recent_images=3, # Keep last 3 screenshots
# ...
)
```
---
## Production Considerations
<Callout type="warn" title="Production Deployment">
For enterprise production deployments, consider these additional steps:
</Callout>
### 1. Workflow Mining
Before deploying, analyze your actual workflows:
- Record user interactions with the application
- Identify common patterns and edge cases
- Map out decision trees and validation requirements
- Document application-specific quirks and timing issues
### 2. Custom Finetuning
Create vertical-specific actions instead of generic UI automation:
```python
# Instead of generic steps:
tasks = ["Click login", "Type username", "Type password", "Click submit"]
# Create semantic actions:
tasks = ["onboard_employee", "run_payroll", "generate_compliance_report"]
```
This provides:
- Better audit trails
- Approval gates at business logic level
- Higher success rates
- Easier maintenance and updates
### 3. Human-in-the-Loop
Add approval gates for critical operations:
```python
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
# Add human approval callback for sensitive operations
callbacks=[ApprovalCallback(require_approval_for=["payroll", "termination"])]
)
```
### 4. Deployment Options
Choose your deployment model:
**Managed (Recommended)**
- Cua hosts Windows sandboxes, VPN/RDP stack, and agent runtime
- You get UI/API endpoints for triggering workflows
- Automatic scaling, monitoring, and maintenance
- SLA guarantees and enterprise support
**Self-Hosted**
- You manage Windows VMs, VPN infrastructure, and agent deployment
- Full control over data and security
- Custom network configurations
- On-premise or your preferred cloud
---
## Troubleshooting
### VPN Connection Issues
If the agent cannot reach the application:
1. Verify VPN is connected: Check VPN client status in the Windows sandbox
2. Test network connectivity: Try pinging internal resources
3. Check firewall rules: Ensure RDP and application ports are open
4. Review VPN logs: Look for authentication or routing errors
### Application Not Launching
If the desktop application fails to start:
1. Verify installation: Check the application is installed in the sandbox
2. Check dependencies: Ensure all required DLLs and frameworks are present
3. Review permissions: Application may require admin rights
4. Check logs: Look for error messages in Windows Event Viewer
### UI Element Not Found
If the agent cannot find buttons or fields:
1. Increase wait times: Some applications load slowly
2. Check screen resolution: UI elements may be off-screen
3. Verify DPI scaling: High DPI settings can affect element positions
4. Update instructions: Provide more specific navigation guidance
### Cost Management
If costs are higher than expected:
1. Reduce `max_trajectory_budget`
2. Decrease `only_n_most_recent_images`
3. Use prompt caching: Set `use_prompt_caching=True`
4. Optimize task descriptions: Be more specific to reduce retry attempts
---
## Next Steps
- **Explore custom tools**: Learn how to create [custom tools](/agent-sdk/custom-tools) for application-specific actions
- **Implement callbacks**: Add [monitoring and logging](/agent-sdk/callbacks) for production workflows
- **Join community**: Get help in our [Discord](https://discord.com/invite/mVnXXpdE85)
---
## Related Examples
- [Form Filling](/example-usecases/form-filling) - Web form automation
- [Post-Event Contact Export](/example-usecases/post-event-contact-export) - Data extraction workflows
- [Custom Tools](/agent-sdk/custom-tools) - Building application-specific functions

View File

@@ -3,5 +3,5 @@
"description": "Get started with Cua",
"defaultOpen": true,
"icon": "Rocket",
"pages": ["quickstart"]
"pages": ["../index", "quickstart"]
}

View File

@@ -8,7 +8,7 @@ import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
import { Code, Terminal } from 'lucide-react';
Choose your quickstart path:
{/* Choose your quickstart path:
<div className="grid grid-cols-1 md:grid-cols-2 gap-6 mt-8 mb-8">
<Card icon={<Code />} href="#developer-quickstart" title="Developer Quickstart">
@@ -17,7 +17,7 @@ Choose your quickstart path:
<Card icon={<Terminal />} href="#cli-quickstart" title="CLI Quickstart">
Get started quickly with the command-line interface
</Card>
</div>
</div> */}
---
@@ -30,11 +30,11 @@ You can run your Cua computer in the cloud (recommended for easiest setup), loca
<Tabs items={['Cloud Sandbox', 'Linux on Docker', 'macOS Sandbox', 'Windows Sandbox']}>
<Tab value="Cloud Sandbox">
Cua Cloud Sandbox provides sandboxes that run Linux (Ubuntu) or Windows.
Cua Cloud Sandbox provides sandboxes that run Linux (Ubuntu), Windows, or macOS.
1. Go to [cua.ai/signin](https://cua.ai/signin)
2. Navigate to **Dashboard > Containers > Create Instance**
3. Create a **Small** sandbox, choosing either **Linux** or **Windows**
3. Create a **Small** sandbox, choosing **Linux**, **Windows**, or **macOS**
4. Note your sandbox name and API key
Your Cloud Sandbox will be automatically configured and ready to use.
@@ -117,7 +117,7 @@ Connect to your Cua computer and perform basic interactions, such as taking scre
from computer import Computer
computer = Computer(
os_type="linux",
os_type="linux", # or "windows" or "macos"
provider_type="cloud",
name="your-sandbox-name",
api_key="your-api-key"
@@ -192,6 +192,10 @@ Connect to your Cua computer and perform basic interactions, such as taking scre
</Tab>
<Tab value="TypeScript">
<Callout type="warn" title="TypeScript SDK Deprecated">
The TypeScript interface is currently deprecated. We're working on version 0.2.0 with improved TypeScript support. In the meantime, please use the Python SDK.
</Callout>
Install the Cua computer TypeScript SDK:
```bash
npm install @trycua/computer
@@ -205,7 +209,7 @@ Connect to your Cua computer and perform basic interactions, such as taking scre
import { Computer, OSType } from '@trycua/computer';
const computer = new Computer({
osType: OSType.LINUX,
osType: OSType.LINUX, // or OSType.WINDOWS or OSType.MACOS
name: "your-sandbox-name",
apiKey: "your-api-key"
});
@@ -328,7 +332,7 @@ Learn more about agents in [Agent Loops](/agent-sdk/agent-loops) and available m
- Join our [Discord community](https://discord.com/invite/mVnXXpdE85) for help
- Try out [Form Filling](/example-usecases/form-filling) preset usecase
---
{/* ---
## CLI Quickstart
@@ -354,7 +358,7 @@ Get started quickly with the CUA CLI - the easiest way to manage cloud sandboxes
```bash
# Install Bun if you don't have it
curl -fsSL https://bun.sh/install | bash
# Install CUA CLI
bun add -g @trycua/cli
```
@@ -467,4 +471,4 @@ cua delete my-vm-abc123
---
For running models locally, see [Running Models Locally](/agent-sdk/supported-model-providers/local-models).
For running models locally, see [Running Models Locally](/agent-sdk/supported-model-providers/local-models). */}

View File

@@ -4,15 +4,9 @@ title: Introduction
import { Monitor, Code, BookOpen, Zap, Bot, Boxes, Rocket } from 'lucide-react';
<Hero>
<div className="rounded-lg border bg-card text-card-foreground shadow-sm px-4 py-2 mb-6">
Cua is an open-source framework for building **Computer-Use Agents** - AI systems that see, understand, and interact with desktop applications through vision and action, just like humans do.
<br />
Go from prototype to production with everything you need: multi-provider LLM support, cross-platform sandboxes, and trajectory tracing. Whether you're running locally or deploying to the cloud, Cua gives you the tools to build reliable computer-use agents.
</Hero>
</div>
## Why Cua?
@@ -46,14 +40,14 @@ Follow the [Quickstart guide](/docs/get-started/quickstart) for step-by-step set
If you're new to computer-use agents, check out our [tutorials](https://cua.ai/blog), [examples](https://github.com/trycua/cua/tree/main/examples), and [notebooks](https://github.com/trycua/cua/tree/main/notebooks) to start building with Cua today.
<div className="grid grid-cols-1 md:grid-cols-2 gap-6 mt-8">
<Card icon={<Rocket />} href="/docs/get-started/quickstart" title="Quickstart">
<Card icon={<Rocket />} href="/get-started/quickstart" title="Quickstart">
Get up and running in 3 steps with Python or TypeScript.
</Card>
<Card icon={<BookOpen />} href="/agent-sdk/agent-loops" title="Learn Core Concepts">
Understand agent loops, callbacks, and model composition.
<Card icon={<Zap />} href="/agent-sdk/agent-loops" title="Agent Loops">
Learn how agents work and how to build your own.
</Card>
<Card icon={<Code />} href="/libraries/agent" title="API Reference">
Explore the full Agent SDK and Computer SDK APIs.
<Card icon={<BookOpen />} href="/computer-sdk/computers" title="Computer SDK">
Control desktop applications with the Computer SDK.
</Card>
<Card icon={<Monitor />} href="/example-usecases/form-filling" title="Example Use Cases">
See Cua in action with real-world examples.

View File

@@ -7,14 +7,7 @@ github:
---
<Callout>
A corresponding{' '}
<a
href="https://github.com/trycua/cua/blob/main/notebooks/computer_server_nb.ipynb"
target="_blank"
>
Jupyter Notebook
</a>{' '}
is available for this documentation.
A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/computer_server_nb.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.
</Callout>
The Computer Server API reference documentation is currently under development.

View File

@@ -7,11 +7,7 @@ github:
---
<Callout>
A corresponding{' '}
<a href="https://github.com/trycua/cua/blob/main/examples/som_examples.py" target="_blank">
Python example
</a>{' '}
is available for this documentation.
A corresponding <a href="https://github.com/trycua/cua/blob/main/examples/som_examples.py" target="_blank">Python example</a> is available for this documentation.
</Callout>
## Overview

View File

@@ -4,7 +4,6 @@
"root": true,
"defaultOpen": true,
"pages": [
"index",
"---[Rocket]Get Started---",
"...get-started",
"---[ChefHat]Cookbook---",

View File

@@ -37,6 +37,7 @@ export const baseOptions: BaseLayoutProps = {
Cua
</>
),
url: 'https://cua.ai',
},
githubUrl: 'https://github.com/trycua/cua',
links: [