From 5983a9b849f2bc567ca8989c53f2d7b27ee0f3a0 Mon Sep 17 00:00:00 2001 From: f-trycua Date: Mon, 17 Nov 2025 17:26:41 +0100 Subject: [PATCH] Add blogpost and doc --- blog/cloud-windows-ga-macos-preview.md | 119 ++++ docs/content/docs/example-usecases/meta.json | 2 +- .../windows-app-behind-vpn.mdx | 615 ++++++++++++++++++ 3 files changed, 735 insertions(+), 1 deletion(-) create mode 100644 blog/cloud-windows-ga-macos-preview.md create mode 100644 docs/content/docs/example-usecases/windows-app-behind-vpn.mdx diff --git a/blog/cloud-windows-ga-macos-preview.md b/blog/cloud-windows-ga-macos-preview.md new file mode 100644 index 000000000..d1024af14 --- /dev/null +++ b/blog/cloud-windows-ga-macos-preview.md @@ -0,0 +1,119 @@ +# Cloud Windows Sandboxes GA + macOS Preview + +If you've been building with our `cua` libraries, you might've hit a limitation with local computer-use sandboxes: to run agents on Windows or macOS, you need to be on that OS—Windows Sandbox for Windows, Apple Virtualization for macOS. The only cross-platform option is Linux on Docker, which limits you to virtualizing Linux environments ([see all local options here](https://cua.ai/docs/computer-sdk/computers)). + +Today the story changes - we're announcing general availability of **Cloud Windows Sandboxes** and opening early preview access for **Cloud macOS Sandboxes**. + +## Cloud Windows Sandboxes: Now GA + +![Cloud Windows Sandboxes](./assets/cloud-windows-ga.png) + +Cloud Windows Sandboxes are now generally available. You get a full Windows 11 desktop in your browser with Edge and Python pre-installed, working seamlessly with all our [Computer-Use libraries](https://github.com/trycua/cua) for RPA, UI automation, code execution, and agent development. + +**What's new with this release:** +- Hot-start under 1 second +- Direct noVNC over HTTPS under our sandbox.cua.ai domain +- 3 sandbox sizes available: + +| Size | CPU | RAM | Storage | +|------|-----|-----|---------| +| Small | 2 cores | 8 GB | 128 GB SSD | +| Medium | 4 cores | 16 GB | 128 GB SSD | +| Large | 8 cores | 32 GB | 256 GB SSD | + +
+ +
+ +**Pricing:** Windows Sandboxes start at 8 credits/hour (Small), 15 credits/hour (Medium), or 31 credits/hour (Large). + +## Cloud macOS Sandboxes: Now in Preview + +Running macOS locally comes with challenges: 30GB golden images, a maximum of 2 sandboxes per host, and unpredictable compatibility issues. With Cloud macOS Sandboxes, we provision bare-metal macOS hosts (M1, M2, M4) on-demand—giving you full desktop access without the overhead of managing local sandboxes. + +![macOS Preview Waitlist](./assets/macOS-waitlist.png) + +**Preview access:** Invite-only. [Join the waitlist](https://cua.ai/macos-waitlist) if you're building agents for macOS workflows. + +## Getting Started Today + +Sign up at [cua.ai/signin](https://cua.ai/signin) and grab your API key from the dashboard. Then connect to a sandbox: + +```python +from computer import Computer + +computer = Computer( + os_type="windows", # or "macos" + provider_type="cloud", + name="my-sandbox", + api_key="your-api-key" +) + +await computer.run() +``` + +Manage existing sandboxes: + +```python +from computer.providers.cloud.provider import CloudProvider + +provider = CloudProvider(api_key="your-api-key") +async with provider: + sandboxes = await provider.list_vms() + await provider.run_vm("my-sandbox") + await provider.stop_vm("my-sandbox") +``` + +Run an agent on Windows to automate a workflow: + +```python +from agent import ComputerAgent + +agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + max_trajectory_budget=5.0 +) + +response = await agent.run( + "Open Excel, create a sales report with this month's data, and save it to the desktop" +) +``` + +## FAQs + +
+Why not just use local Windows Sandbox? + +Local Windows Sandbox resets on every restart. No persistence, no hot-start, and you need Windows Pro. Our sandboxes persist state, hot-start in under a second, and work from any OS. + +
+ +
+What happens to my work when I stop a sandbox? + +Everything persists. Files, installed software, browser profiles—it's all there when you restart. Only pay for runtime, not storage. + +
+ +
+How's the latency for UI automation? + +We run in 4 regions so you can pick what's closest. The noVNC connection is optimized for automation, not video streaming. Your agent sees crisp screenshots, not compressed video. + +
+ +
+Are there software restrictions? + +No. Full admin access on both platforms. Install whatever you need—Visual Studio, Photoshop, custom enterprise software. It's your sandbox. + +
+ +## Need help? + +If you hit issues getting either platform working, reach out in [Discord](https://discord.gg/cua-ai). We respond fast and fix based on what people actually use. + +--- + +Get started at [cua.ai](https://cua.ai) or [join the macOS waitlist](https://cua.ai/macos-waitlist). diff --git a/docs/content/docs/example-usecases/meta.json b/docs/content/docs/example-usecases/meta.json index c7ec3895f..bfc88f1ca 100644 --- a/docs/content/docs/example-usecases/meta.json +++ b/docs/content/docs/example-usecases/meta.json @@ -1,5 +1,5 @@ { "title": "Cookbook", "description": "Real-world examples of building with Cua", - "pages": ["form-filling", "post-event-contact-export"] + "pages": ["windows-app-behind-vpn", "form-filling", "post-event-contact-export"] } diff --git a/docs/content/docs/example-usecases/windows-app-behind-vpn.mdx b/docs/content/docs/example-usecases/windows-app-behind-vpn.mdx new file mode 100644 index 000000000..e8f316177 --- /dev/null +++ b/docs/content/docs/example-usecases/windows-app-behind-vpn.mdx @@ -0,0 +1,615 @@ +--- +title: Windows App behind VPN +description: Automate legacy Windows desktop applications behind VPN with Cua +--- + +import { Step, Steps } from 'fumadocs-ui/components/steps'; +import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; + +## Overview + +This guide demonstrates how to automate Windows desktop applications (like eGecko HR/payroll systems) that run behind corporate VPN. This is a common enterprise scenario where legacy desktop applications require manual data entry, report generation, or workflow execution. + +**Use cases:** +- HR/payroll processing (employee onboarding, payroll runs, benefits administration) +- Desktop ERP systems behind corporate networks +- Legacy financial applications requiring VPN access +- Compliance reporting from on-premise systems + +**Architecture:** +- Client-side Cua agent (Python SDK or Playground UI) +- Windows VM/Sandbox with VPN client configured +- RDP/remote desktop connection to target environment +- Desktop application automation via computer vision and UI control + + + **Production Deployment**: For production use, consider workflow mining and custom finetuning to create vertical-specific actions (e.g., "Run payroll", "Onboard employee") instead of generic UI automation. This provides better audit trails and higher success rates. + + +--- + +## Video Demo + +
+ +
+ Demo showing Cua automating an eGecko-like desktop application on Windows behind AWS VPN +
+
+ +--- + + + + + +### Set Up Your Environment + +Install the required dependencies: + +Create a `requirements.txt` file: + +```text +cua-agent +cua-computer +python-dotenv>=1.0.0 +``` + +Install the dependencies: + +```bash +pip install -r requirements.txt +``` + +Create a `.env` file with your API keys: + +```text +ANTHROPIC_API_KEY=your-anthropic-api-key +CUA_API_KEY=sk_cua-api01... +CUA_SANDBOX_NAME=your-windows-sandbox +``` + + + + + +### Configure Windows Sandbox with VPN + + + + +For enterprise deployments, use Cua Cloud Sandbox with pre-configured VPN: + +1. Go to [cua.ai/signin](https://cua.ai/signin) +2. Navigate to **Dashboard > Containers > Create Instance** +3. Create a **Windows** sandbox (Medium or Large for desktop apps) +4. Configure VPN settings: + - Upload your AWS VPN Client configuration (`.ovpn` file) + - Or configure VPN credentials directly in the dashboard +5. Note your sandbox name and API key + +Your Windows sandbox will launch with VPN automatically connected. + + + + +For local development on Windows 10 Pro/Enterprise or Windows 11: + +1. Enable [Windows Sandbox](https://learn.microsoft.com/en-us/windows/security/application-security/application-isolation/windows-sandbox/windows-sandbox-install) +2. Install the `pywinsandbox` dependency: + ```bash + pip install -U git+git://github.com/karkason/pywinsandbox.git + ``` +3. Create a VPN setup script that runs on sandbox startup +4. Configure your desktop application installation within the sandbox + + + **Manual VPN Setup**: Windows Sandbox requires manual VPN configuration each time it starts. For production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections. + + + + + +For self-managed infrastructure: + +1. Deploy Windows VM on your preferred cloud (AWS, Azure, GCP) +2. Install and configure VPN client (AWS VPN Client, OpenVPN, etc.) +3. Install target desktop application and any dependencies +4. Install `cua-computer-server`: + ```bash + pip install cua-computer-server + python -m computer_server + ``` +5. Configure firewall rules to allow Cua agent connections + + + + + + + + +### Create Your Automation Script + +Create a Python file (e.g., `hr_automation.py`): + + + + +```python +import asyncio +import logging +import os +from agent import ComputerAgent +from computer import Computer, VMProviderType +from dotenv import load_dotenv + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +load_dotenv() + +async def automate_hr_workflow(): + """ + Automate HR/payroll desktop application workflow. + + This example demonstrates: + - Launching Windows desktop application + - Navigating complex desktop UI + - Data entry and form filling + - Report generation and export + """ + try: + # Connect to Windows Cloud Sandbox with VPN + async with Computer( + os_type="windows", + provider_type=VMProviderType.CLOUD, + name=os.environ["CUA_SANDBOX_NAME"], + api_key=os.environ["CUA_API_KEY"], + verbosity=logging.INFO, + ) as computer: + + # Configure agent with specialized instructions + agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + only_n_most_recent_images=3, + verbosity=logging.INFO, + trajectory_dir="trajectories", + use_prompt_caching=True, + max_trajectory_budget=10.0, + instructions=""" +You are automating a Windows desktop HR/payroll application. + +IMPORTANT GUIDELINES: +- Always wait for windows and dialogs to fully load before interacting +- Look for loading indicators and wait for them to disappear +- Verify each action by checking on-screen confirmation messages +- If a button or field is not visible, try scrolling or navigating tabs +- Desktop apps often have nested menus - explore systematically +- Save work frequently using File > Save or Ctrl+S +- Before closing, always verify changes were saved + +COMMON UI PATTERNS: +- Menu bar navigation (File, Edit, View, etc.) +- Ribbon interfaces with tabs +- Modal dialogs that block interaction +- Data grids/tables for viewing records +- Form fields with validation +- Status bars showing operation progress + """.strip() + ) + + # Define workflow tasks + tasks = [ + "Launch the HR application from the desktop or start menu", + "Log in with the credentials shown in credentials.txt on the desktop", + "Navigate to Employee Management section", + "Create a new employee record with information from new_hire.xlsx on desktop", + "Verify the employee was created successfully by searching for their name", + "Generate an onboarding report for the new employee", + "Export the report as PDF to the desktop", + "Log out of the application" + ] + + history = [] + + for task in tasks: + logger.info(f"\n{'='*60}") + logger.info(f"Task: {task}") + logger.info(f"{'='*60}\n") + + history.append({"role": "user", "content": task}) + + async for result in agent.run(history): + for item in result.get("output", []): + if item.get("type") == "message": + content = item.get("content", []) + for block in content: + if block.get("type") == "text": + response = block.get("text", "") + logger.info(f"Agent: {response}") + history.append({"role": "assistant", "content": response}) + + logger.info("\nTask completed. Moving to next task...\n") + + logger.info("\n" + "="*60) + logger.info("All tasks completed successfully!") + logger.info("="*60) + + except Exception as e: + logger.error(f"Error during automation: {e}") + import traceback + traceback.print_exc() + +if __name__ == "__main__": + asyncio.run(automate_hr_workflow()) +``` + + + + +```python +import asyncio +import logging +import os +from agent import ComputerAgent +from computer import Computer, VMProviderType +from dotenv import load_dotenv + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +load_dotenv() + +async def automate_hr_workflow(): + try: + # Connect to Windows Sandbox + async with Computer( + os_type="windows", + provider_type=VMProviderType.WINDOWS_SANDBOX, + verbosity=logging.INFO, + ) as computer: + + agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + only_n_most_recent_images=3, + verbosity=logging.INFO, + trajectory_dir="trajectories", + use_prompt_caching=True, + max_trajectory_budget=10.0, + instructions=""" +You are automating a Windows desktop HR/payroll application. + +IMPORTANT GUIDELINES: +- Always wait for windows and dialogs to fully load before interacting +- Verify each action by checking on-screen confirmation messages +- Desktop apps often have nested menus - explore systematically +- Save work frequently using File > Save or Ctrl+S + """.strip() + ) + + tasks = [ + "Launch the HR application from the desktop", + "Log in with credentials from credentials.txt on desktop", + "Navigate to Employee Management and create new employee from new_hire.xlsx", + "Generate and export onboarding report as PDF", + "Log out of the application" + ] + + history = [] + + for task in tasks: + logger.info(f"\nTask: {task}") + history.append({"role": "user", "content": task}) + + async for result in agent.run(history): + for item in result.get("output", []): + if item.get("type") == "message": + content = item.get("content", []) + for block in content: + if block.get("type") == "text": + response = block.get("text", "") + logger.info(f"Agent: {response}") + history.append({"role": "assistant", "content": response}) + + logger.info("\nAll tasks completed!") + + except Exception as e: + logger.error(f"Error: {e}") + import traceback + traceback.print_exc() + +if __name__ == "__main__": + asyncio.run(automate_hr_workflow()) +``` + + + + +```python +import asyncio +import logging +import os +from agent import ComputerAgent +from computer import Computer +from dotenv import load_dotenv + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +load_dotenv() + +async def automate_hr_workflow(): + try: + # Connect to self-hosted Windows VM running computer-server + async with Computer( + use_host_computer_server=True, + base_url="http://your-windows-vm-ip:5757", # Update with your VM IP + verbosity=logging.INFO, + ) as computer: + + agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + only_n_most_recent_images=3, + verbosity=logging.INFO, + trajectory_dir="trajectories", + use_prompt_caching=True, + max_trajectory_budget=10.0, + instructions=""" +You are automating a Windows desktop HR/payroll application. + +IMPORTANT GUIDELINES: +- Always wait for windows and dialogs to fully load before interacting +- Verify each action by checking on-screen confirmation messages +- Save work frequently using File > Save or Ctrl+S + """.strip() + ) + + tasks = [ + "Launch the HR application", + "Log in with provided credentials", + "Complete the required HR workflow", + "Generate and export report", + "Log out" + ] + + history = [] + + for task in tasks: + logger.info(f"\nTask: {task}") + history.append({"role": "user", "content": task}) + + async for result in agent.run(history): + for item in result.get("output", []): + if item.get("type") == "message": + content = item.get("content", []) + for block in content: + if block.get("type") == "text": + response = block.get("text", "") + logger.info(f"Agent: {response}") + history.append({"role": "assistant", "content": response}) + + logger.info("\nAll tasks completed!") + + except Exception as e: + logger.error(f"Error: {e}") + import traceback + traceback.print_exc() + +if __name__ == "__main__": + asyncio.run(automate_hr_workflow()) +``` + + + + + + + + +### Run Your Automation + +Execute the script: + +```bash +python hr_automation.py +``` + +The agent will: +1. Connect to your Windows environment (with VPN if configured) +2. Launch and navigate the desktop application +3. Execute each workflow step sequentially +4. Verify actions and handle errors +5. Save trajectory logs for audit and debugging + +Monitor the console output to see the agent's progress through each task. + + + + + +--- + +## Key Configuration Options + +### Agent Instructions + +The `instructions` parameter is critical for reliable desktop automation: + +```python +instructions=""" +You are automating a Windows desktop HR/payroll application. + +IMPORTANT GUIDELINES: +- Always wait for windows and dialogs to fully load before interacting +- Look for loading indicators and wait for them to disappear +- Verify each action by checking on-screen confirmation messages +- If a button or field is not visible, try scrolling or navigating tabs +- Desktop apps often have nested menus - explore systematically +- Save work frequently using File > Save or Ctrl+S +- Before closing, always verify changes were saved + +COMMON UI PATTERNS: +- Menu bar navigation (File, Edit, View, etc.) +- Ribbon interfaces with tabs +- Modal dialogs that block interaction +- Data grids/tables for viewing records +- Form fields with validation +- Status bars showing operation progress + +APPLICATION-SPECIFIC: +- Login is at top-left corner +- Employee records are under "HR Management" > "Employees" +- Reports are generated via "Tools" > "Reports" > "Generate" +- Always click "Save" before navigating away from a form +""".strip() +``` + +### Budget Management + +For long-running workflows, adjust budget limits: + +```python +agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + max_trajectory_budget=20.0, # Increase for complex workflows + # ... other params +) +``` + +### Image Retention + +Balance context and cost by retaining only recent screenshots: + +```python +agent = ComputerAgent( + # ... + only_n_most_recent_images=3, # Keep last 3 screenshots + # ... +) +``` + +--- + +## Production Considerations + + + For enterprise production deployments, consider these additional steps: + + +### 1. Workflow Mining + +Before deploying, analyze your actual workflows: +- Record user interactions with the application +- Identify common patterns and edge cases +- Map out decision trees and validation requirements +- Document application-specific quirks and timing issues + +### 2. Custom Finetuning + +Create vertical-specific actions instead of generic UI automation: + +```python +# Instead of generic steps: +tasks = ["Click login", "Type username", "Type password", "Click submit"] + +# Create semantic actions: +tasks = ["onboard_employee", "run_payroll", "generate_compliance_report"] +``` + +This provides: +- Better audit trails +- Approval gates at business logic level +- Higher success rates +- Easier maintenance and updates + +### 3. Human-in-the-Loop + +Add approval gates for critical operations: + +```python +agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + # Add human approval callback for sensitive operations + callbacks=[ApprovalCallback(require_approval_for=["payroll", "termination"])] +) +``` + +### 4. Deployment Options + +Choose your deployment model: + +**Managed (Recommended)** +- Cua hosts Windows sandboxes, VPN/RDP stack, and agent runtime +- You get UI/API endpoints for triggering workflows +- Automatic scaling, monitoring, and maintenance +- SLA guarantees and enterprise support + +**Self-Hosted** +- You manage Windows VMs, VPN infrastructure, and agent deployment +- Full control over data and security +- Custom network configurations +- On-premise or your preferred cloud + +--- + +## Troubleshooting + +### VPN Connection Issues + +If the agent cannot reach the application: + +1. Verify VPN is connected: Check VPN client status in the Windows sandbox +2. Test network connectivity: Try pinging internal resources +3. Check firewall rules: Ensure RDP and application ports are open +4. Review VPN logs: Look for authentication or routing errors + +### Application Not Launching + +If the desktop application fails to start: + +1. Verify installation: Check the application is installed in the sandbox +2. Check dependencies: Ensure all required DLLs and frameworks are present +3. Review permissions: Application may require admin rights +4. Check logs: Look for error messages in Windows Event Viewer + +### UI Element Not Found + +If the agent cannot find buttons or fields: + +1. Increase wait times: Some applications load slowly +2. Check screen resolution: UI elements may be off-screen +3. Verify DPI scaling: High DPI settings can affect element positions +4. Update instructions: Provide more specific navigation guidance + +### Cost Management + +If costs are higher than expected: + +1. Reduce `max_trajectory_budget` +2. Decrease `only_n_most_recent_images` +3. Use prompt caching: Set `use_prompt_caching=True` +4. Optimize task descriptions: Be more specific to reduce retry attempts + +--- + +## Next Steps + +- **Explore custom tools**: Learn how to create [custom tools](/agent-sdk/custom-tools) for application-specific actions +- **Implement callbacks**: Add [monitoring and logging](/agent-sdk/callbacks) for production workflows +- **Join community**: Get help in our [Discord](https://discord.com/invite/mVnXXpdE85) + +--- + +## Related Examples + +- [Form Filling](/example-usecases/form-filling) - Web form automation +- [Post-Event Contact Export](/example-usecases/post-event-contact-export) - Data extraction workflows +- [Custom Tools](/agent-sdk/custom-tools) - Building application-specific functions