Add blogpost and doc

This commit is contained in:
f-trycua
2025-11-17 17:26:41 +01:00
parent 2b595f5de8
commit 5983a9b849
3 changed files with 735 additions and 1 deletions
+119
View File
@@ -0,0 +1,119 @@
# Cloud Windows Sandboxes GA + macOS Preview
If you've been building with our `cua` libraries, you might've hit a limitation with local computer-use sandboxes: to run agents on Windows or macOS, you need to be on that OS—Windows Sandbox for Windows, Apple Virtualization for macOS. The only cross-platform option is Linux on Docker, which limits you to virtualizing Linux environments ([see all local options here](https://cua.ai/docs/computer-sdk/computers)).
Today the story changes - we're announcing general availability of **Cloud Windows Sandboxes** and opening early preview access for **Cloud macOS Sandboxes**.
## Cloud Windows Sandboxes: Now GA
![Cloud Windows Sandboxes](./assets/cloud-windows-ga.png)
Cloud Windows Sandboxes are now generally available. You get a full Windows 11 desktop in your browser with Edge and Python pre-installed, working seamlessly with all our [Computer-Use libraries](https://github.com/trycua/cua) for RPA, UI automation, code execution, and agent development.
**What's new with this release:**
- Hot-start under 1 second
- Direct noVNC over HTTPS under our sandbox.cua.ai domain
- 3 sandbox sizes available:
| Size | CPU | RAM | Storage |
|------|-----|-----|---------|
| Small | 2 cores | 8 GB | 128 GB SSD |
| Medium | 4 cores | 16 GB | 128 GB SSD |
| Large | 8 cores | 32 GB | 256 GB SSD |
<div align="center">
<video src="./assets/demo_wsb.mp4" width="600" controls></video>
</div>
**Pricing:** Windows Sandboxes start at 8 credits/hour (Small), 15 credits/hour (Medium), or 31 credits/hour (Large).
## Cloud macOS Sandboxes: Now in Preview
Running macOS locally comes with challenges: 30GB golden images, a maximum of 2 sandboxes per host, and unpredictable compatibility issues. With Cloud macOS Sandboxes, we provision bare-metal macOS hosts (M1, M2, M4) on-demand—giving you full desktop access without the overhead of managing local sandboxes.
![macOS Preview Waitlist](./assets/macOS-waitlist.png)
**Preview access:** Invite-only. [Join the waitlist](https://cua.ai/macos-waitlist) if you're building agents for macOS workflows.
## Getting Started Today
Sign up at [cua.ai/signin](https://cua.ai/signin) and grab your API key from the dashboard. Then connect to a sandbox:
```python
from computer import Computer
computer = Computer(
os_type="windows", # or "macos"
provider_type="cloud",
name="my-sandbox",
api_key="your-api-key"
)
await computer.run()
```
Manage existing sandboxes:
```python
from computer.providers.cloud.provider import CloudProvider
provider = CloudProvider(api_key="your-api-key")
async with provider:
sandboxes = await provider.list_vms()
await provider.run_vm("my-sandbox")
await provider.stop_vm("my-sandbox")
```
Run an agent on Windows to automate a workflow:
```python
from agent import ComputerAgent
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
max_trajectory_budget=5.0
)
response = await agent.run(
"Open Excel, create a sales report with this month's data, and save it to the desktop"
)
```
## FAQs
<details>
<summary><strong>Why not just use local Windows Sandbox?</strong></summary>
Local Windows Sandbox resets on every restart. No persistence, no hot-start, and you need Windows Pro. Our sandboxes persist state, hot-start in under a second, and work from any OS.
</details>
<details>
<summary><strong>What happens to my work when I stop a sandbox?</strong></summary>
Everything persists. Files, installed software, browser profiles—it's all there when you restart. Only pay for runtime, not storage.
</details>
<details>
<summary><strong>How's the latency for UI automation?</strong></summary>
We run in 4 regions so you can pick what's closest. The noVNC connection is optimized for automation, not video streaming. Your agent sees crisp screenshots, not compressed video.
</details>
<details>
<summary><strong>Are there software restrictions?</strong></summary>
No. Full admin access on both platforms. Install whatever you need—Visual Studio, Photoshop, custom enterprise software. It's your sandbox.
</details>
## Need help?
If you hit issues getting either platform working, reach out in [Discord](https://discord.gg/cua-ai). We respond fast and fix based on what people actually use.
---
Get started at [cua.ai](https://cua.ai) or [join the macOS waitlist](https://cua.ai/macos-waitlist).
+1 -1
View File
@@ -1,5 +1,5 @@
{
"title": "Cookbook",
"description": "Real-world examples of building with Cua",
"pages": ["form-filling", "post-event-contact-export"]
"pages": ["windows-app-behind-vpn", "form-filling", "post-event-contact-export"]
}
@@ -0,0 +1,615 @@
---
title: Windows App behind VPN
description: Automate legacy Windows desktop applications behind VPN with Cua
---
import { Step, Steps } from 'fumadocs-ui/components/steps';
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
## Overview
This guide demonstrates how to automate Windows desktop applications (like eGecko HR/payroll systems) that run behind corporate VPN. This is a common enterprise scenario where legacy desktop applications require manual data entry, report generation, or workflow execution.
**Use cases:**
- HR/payroll processing (employee onboarding, payroll runs, benefits administration)
- Desktop ERP systems behind corporate networks
- Legacy financial applications requiring VPN access
- Compliance reporting from on-premise systems
**Architecture:**
- Client-side Cua agent (Python SDK or Playground UI)
- Windows VM/Sandbox with VPN client configured
- RDP/remote desktop connection to target environment
- Desktop application automation via computer vision and UI control
<Callout type="info">
**Production Deployment**: For production use, consider workflow mining and custom finetuning to create vertical-specific actions (e.g., "Run payroll", "Onboard employee") instead of generic UI automation. This provides better audit trails and higher success rates.
</Callout>
---
## Video Demo
<div className="rounded-lg border bg-card text-card-foreground shadow-sm p-4 mb-6">
<video src="https://github.com/user-attachments/assets/7abbdaf4-054f-4965-8260-81dab497c6ba" controls className="w-full rounded">
Your browser does not support the video tag.
</video>
<div className="text-sm text-muted-foreground mt-2">
Demo showing Cua automating an eGecko-like desktop application on Windows behind AWS VPN
</div>
</div>
---
<Steps>
<Step>
### Set Up Your Environment
Install the required dependencies:
Create a `requirements.txt` file:
```text
cua-agent
cua-computer
python-dotenv>=1.0.0
```
Install the dependencies:
```bash
pip install -r requirements.txt
```
Create a `.env` file with your API keys:
```text
ANTHROPIC_API_KEY=your-anthropic-api-key
CUA_API_KEY=sk_cua-api01...
CUA_SANDBOX_NAME=your-windows-sandbox
```
</Step>
<Step>
### Configure Windows Sandbox with VPN
<Tabs items={['Cloud Sandbox (Recommended)', 'Windows Sandbox', 'Self-Hosted VM']}>
<Tab value="Cloud Sandbox (Recommended)">
For enterprise deployments, use Cua Cloud Sandbox with pre-configured VPN:
1. Go to [cua.ai/signin](https://cua.ai/signin)
2. Navigate to **Dashboard > Containers > Create Instance**
3. Create a **Windows** sandbox (Medium or Large for desktop apps)
4. Configure VPN settings:
- Upload your AWS VPN Client configuration (`.ovpn` file)
- Or configure VPN credentials directly in the dashboard
5. Note your sandbox name and API key
Your Windows sandbox will launch with VPN automatically connected.
</Tab>
<Tab value="Windows Sandbox">
For local development on Windows 10 Pro/Enterprise or Windows 11:
1. Enable [Windows Sandbox](https://learn.microsoft.com/en-us/windows/security/application-security/application-isolation/windows-sandbox/windows-sandbox-install)
2. Install the `pywinsandbox` dependency:
```bash
pip install -U git+git://github.com/karkason/pywinsandbox.git
```
3. Create a VPN setup script that runs on sandbox startup
4. Configure your desktop application installation within the sandbox
<Callout type="warn">
**Manual VPN Setup**: Windows Sandbox requires manual VPN configuration each time it starts. For production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections.
</Callout>
</Tab>
<Tab value="Self-Hosted VM">
For self-managed infrastructure:
1. Deploy Windows VM on your preferred cloud (AWS, Azure, GCP)
2. Install and configure VPN client (AWS VPN Client, OpenVPN, etc.)
3. Install target desktop application and any dependencies
4. Install `cua-computer-server`:
```bash
pip install cua-computer-server
python -m computer_server
```
5. Configure firewall rules to allow Cua agent connections
</Tab>
</Tabs>
</Step>
<Step>
### Create Your Automation Script
Create a Python file (e.g., `hr_automation.py`):
<Tabs items={['Cloud Sandbox', 'Windows Sandbox', 'Self-Hosted']}>
<Tab value="Cloud Sandbox">
```python
import asyncio
import logging
import os
from agent import ComputerAgent
from computer import Computer, VMProviderType
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
load_dotenv()
async def automate_hr_workflow():
"""
Automate HR/payroll desktop application workflow.
This example demonstrates:
- Launching Windows desktop application
- Navigating complex desktop UI
- Data entry and form filling
- Report generation and export
"""
try:
# Connect to Windows Cloud Sandbox with VPN
async with Computer(
os_type="windows",
provider_type=VMProviderType.CLOUD,
name=os.environ["CUA_SANDBOX_NAME"],
api_key=os.environ["CUA_API_KEY"],
verbosity=logging.INFO,
) as computer:
# Configure agent with specialized instructions
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
trajectory_dir="trajectories",
use_prompt_caching=True,
max_trajectory_budget=10.0,
instructions="""
You are automating a Windows desktop HR/payroll application.
IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Look for loading indicators and wait for them to disappear
- Verify each action by checking on-screen confirmation messages
- If a button or field is not visible, try scrolling or navigating tabs
- Desktop apps often have nested menus - explore systematically
- Save work frequently using File > Save or Ctrl+S
- Before closing, always verify changes were saved
COMMON UI PATTERNS:
- Menu bar navigation (File, Edit, View, etc.)
- Ribbon interfaces with tabs
- Modal dialogs that block interaction
- Data grids/tables for viewing records
- Form fields with validation
- Status bars showing operation progress
""".strip()
)
# Define workflow tasks
tasks = [
"Launch the HR application from the desktop or start menu",
"Log in with the credentials shown in credentials.txt on the desktop",
"Navigate to Employee Management section",
"Create a new employee record with information from new_hire.xlsx on desktop",
"Verify the employee was created successfully by searching for their name",
"Generate an onboarding report for the new employee",
"Export the report as PDF to the desktop",
"Log out of the application"
]
history = []
for task in tasks:
logger.info(f"\n{'='*60}")
logger.info(f"Task: {task}")
logger.info(f"{'='*60}\n")
history.append({"role": "user", "content": task})
async for result in agent.run(history):
for item in result.get("output", []):
if item.get("type") == "message":
content = item.get("content", [])
for block in content:
if block.get("type") == "text":
response = block.get("text", "")
logger.info(f"Agent: {response}")
history.append({"role": "assistant", "content": response})
logger.info("\nTask completed. Moving to next task...\n")
logger.info("\n" + "="*60)
logger.info("All tasks completed successfully!")
logger.info("="*60)
except Exception as e:
logger.error(f"Error during automation: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
asyncio.run(automate_hr_workflow())
```
</Tab>
<Tab value="Windows Sandbox">
```python
import asyncio
import logging
import os
from agent import ComputerAgent
from computer import Computer, VMProviderType
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
load_dotenv()
async def automate_hr_workflow():
try:
# Connect to Windows Sandbox
async with Computer(
os_type="windows",
provider_type=VMProviderType.WINDOWS_SANDBOX,
verbosity=logging.INFO,
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
trajectory_dir="trajectories",
use_prompt_caching=True,
max_trajectory_budget=10.0,
instructions="""
You are automating a Windows desktop HR/payroll application.
IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Verify each action by checking on-screen confirmation messages
- Desktop apps often have nested menus - explore systematically
- Save work frequently using File > Save or Ctrl+S
""".strip()
)
tasks = [
"Launch the HR application from the desktop",
"Log in with credentials from credentials.txt on desktop",
"Navigate to Employee Management and create new employee from new_hire.xlsx",
"Generate and export onboarding report as PDF",
"Log out of the application"
]
history = []
for task in tasks:
logger.info(f"\nTask: {task}")
history.append({"role": "user", "content": task})
async for result in agent.run(history):
for item in result.get("output", []):
if item.get("type") == "message":
content = item.get("content", [])
for block in content:
if block.get("type") == "text":
response = block.get("text", "")
logger.info(f"Agent: {response}")
history.append({"role": "assistant", "content": response})
logger.info("\nAll tasks completed!")
except Exception as e:
logger.error(f"Error: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
asyncio.run(automate_hr_workflow())
```
</Tab>
<Tab value="Self-Hosted">
```python
import asyncio
import logging
import os
from agent import ComputerAgent
from computer import Computer
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
load_dotenv()
async def automate_hr_workflow():
try:
# Connect to self-hosted Windows VM running computer-server
async with Computer(
use_host_computer_server=True,
base_url="http://your-windows-vm-ip:5757", # Update with your VM IP
verbosity=logging.INFO,
) as computer:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
only_n_most_recent_images=3,
verbosity=logging.INFO,
trajectory_dir="trajectories",
use_prompt_caching=True,
max_trajectory_budget=10.0,
instructions="""
You are automating a Windows desktop HR/payroll application.
IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Verify each action by checking on-screen confirmation messages
- Save work frequently using File > Save or Ctrl+S
""".strip()
)
tasks = [
"Launch the HR application",
"Log in with provided credentials",
"Complete the required HR workflow",
"Generate and export report",
"Log out"
]
history = []
for task in tasks:
logger.info(f"\nTask: {task}")
history.append({"role": "user", "content": task})
async for result in agent.run(history):
for item in result.get("output", []):
if item.get("type") == "message":
content = item.get("content", [])
for block in content:
if block.get("type") == "text":
response = block.get("text", "")
logger.info(f"Agent: {response}")
history.append({"role": "assistant", "content": response})
logger.info("\nAll tasks completed!")
except Exception as e:
logger.error(f"Error: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
asyncio.run(automate_hr_workflow())
```
</Tab>
</Tabs>
</Step>
<Step>
### Run Your Automation
Execute the script:
```bash
python hr_automation.py
```
The agent will:
1. Connect to your Windows environment (with VPN if configured)
2. Launch and navigate the desktop application
3. Execute each workflow step sequentially
4. Verify actions and handle errors
5. Save trajectory logs for audit and debugging
Monitor the console output to see the agent's progress through each task.
</Step>
</Steps>
---
## Key Configuration Options
### Agent Instructions
The `instructions` parameter is critical for reliable desktop automation:
```python
instructions="""
You are automating a Windows desktop HR/payroll application.
IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Look for loading indicators and wait for them to disappear
- Verify each action by checking on-screen confirmation messages
- If a button or field is not visible, try scrolling or navigating tabs
- Desktop apps often have nested menus - explore systematically
- Save work frequently using File > Save or Ctrl+S
- Before closing, always verify changes were saved
COMMON UI PATTERNS:
- Menu bar navigation (File, Edit, View, etc.)
- Ribbon interfaces with tabs
- Modal dialogs that block interaction
- Data grids/tables for viewing records
- Form fields with validation
- Status bars showing operation progress
APPLICATION-SPECIFIC:
- Login is at top-left corner
- Employee records are under "HR Management" > "Employees"
- Reports are generated via "Tools" > "Reports" > "Generate"
- Always click "Save" before navigating away from a form
""".strip()
```
### Budget Management
For long-running workflows, adjust budget limits:
```python
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
max_trajectory_budget=20.0, # Increase for complex workflows
# ... other params
)
```
### Image Retention
Balance context and cost by retaining only recent screenshots:
```python
agent = ComputerAgent(
# ...
only_n_most_recent_images=3, # Keep last 3 screenshots
# ...
)
```
---
## Production Considerations
<Callout type="warn" title="Production Deployment">
For enterprise production deployments, consider these additional steps:
</Callout>
### 1. Workflow Mining
Before deploying, analyze your actual workflows:
- Record user interactions with the application
- Identify common patterns and edge cases
- Map out decision trees and validation requirements
- Document application-specific quirks and timing issues
### 2. Custom Finetuning
Create vertical-specific actions instead of generic UI automation:
```python
# Instead of generic steps:
tasks = ["Click login", "Type username", "Type password", "Click submit"]
# Create semantic actions:
tasks = ["onboard_employee", "run_payroll", "generate_compliance_report"]
```
This provides:
- Better audit trails
- Approval gates at business logic level
- Higher success rates
- Easier maintenance and updates
### 3. Human-in-the-Loop
Add approval gates for critical operations:
```python
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
# Add human approval callback for sensitive operations
callbacks=[ApprovalCallback(require_approval_for=["payroll", "termination"])]
)
```
### 4. Deployment Options
Choose your deployment model:
**Managed (Recommended)**
- Cua hosts Windows sandboxes, VPN/RDP stack, and agent runtime
- You get UI/API endpoints for triggering workflows
- Automatic scaling, monitoring, and maintenance
- SLA guarantees and enterprise support
**Self-Hosted**
- You manage Windows VMs, VPN infrastructure, and agent deployment
- Full control over data and security
- Custom network configurations
- On-premise or your preferred cloud
---
## Troubleshooting
### VPN Connection Issues
If the agent cannot reach the application:
1. Verify VPN is connected: Check VPN client status in the Windows sandbox
2. Test network connectivity: Try pinging internal resources
3. Check firewall rules: Ensure RDP and application ports are open
4. Review VPN logs: Look for authentication or routing errors
### Application Not Launching
If the desktop application fails to start:
1. Verify installation: Check the application is installed in the sandbox
2. Check dependencies: Ensure all required DLLs and frameworks are present
3. Review permissions: Application may require admin rights
4. Check logs: Look for error messages in Windows Event Viewer
### UI Element Not Found
If the agent cannot find buttons or fields:
1. Increase wait times: Some applications load slowly
2. Check screen resolution: UI elements may be off-screen
3. Verify DPI scaling: High DPI settings can affect element positions
4. Update instructions: Provide more specific navigation guidance
### Cost Management
If costs are higher than expected:
1. Reduce `max_trajectory_budget`
2. Decrease `only_n_most_recent_images`
3. Use prompt caching: Set `use_prompt_caching=True`
4. Optimize task descriptions: Be more specific to reduce retry attempts
---
## Next Steps
- **Explore custom tools**: Learn how to create [custom tools](/agent-sdk/custom-tools) for application-specific actions
- **Implement callbacks**: Add [monitoring and logging](/agent-sdk/callbacks) for production workflows
- **Join community**: Get help in our [Discord](https://discord.com/invite/mVnXXpdE85)
---
## Related Examples
- [Form Filling](/example-usecases/form-filling) - Web form automation
- [Post-Event Contact Export](/example-usecases/post-event-contact-export) - Data extraction workflows
- [Custom Tools](/agent-sdk/custom-tools) - Building application-specific functions