Add blogpost and doc

2026-05-12 11:29:41 -05:00 · 2025-11-17 17:26:41 +01:00
parent 2b595f5de8
commit 5983a9b849
3 changed files with 735 additions and 1 deletions
@@ -0,0 +1,119 @@
+# Cloud Windows Sandboxes GA + macOS Preview
+
+If you've been building with our `cua` libraries, you might've hit a limitation with local computer-use sandboxes: to run agents on Windows or macOS, you need to be on that OS—Windows Sandbox for Windows, Apple Virtualization for macOS. The only cross-platform option is Linux on Docker, which limits you to virtualizing Linux environments ([see all local options here](https://cua.ai/docs/computer-sdk/computers)).
+
+Today the story changes - we're announcing general availability of **Cloud Windows Sandboxes** and opening early preview access for **Cloud macOS Sandboxes**.
+
+## Cloud Windows Sandboxes: Now GA
+
+![Cloud Windows Sandboxes](./assets/cloud-windows-ga.png)
+
+Cloud Windows Sandboxes are now generally available. You get a full Windows 11 desktop in your browser with Edge and Python pre-installed, working seamlessly with all our [Computer-Use libraries](https://github.com/trycua/cua) for RPA, UI automation, code execution, and agent development.
+
+**What's new with this release:**
+- Hot-start under 1 second
+- Direct noVNC over HTTPS under our sandbox.cua.ai domain
+- 3 sandbox sizes available:
+
+| Size | CPU | RAM | Storage |
+|------|-----|-----|---------|
+| Small | 2 cores | 8 GB | 128 GB SSD |
+| Medium | 4 cores | 16 GB | 128 GB SSD |
+| Large | 8 cores | 32 GB | 256 GB SSD |
+
+<div align="center">
+  <video src="./assets/demo_wsb.mp4" width="600" controls></video>
+</div>
+
+**Pricing:** Windows Sandboxes start at 8 credits/hour (Small), 15 credits/hour (Medium), or 31 credits/hour (Large).
+
+## Cloud macOS Sandboxes: Now in Preview
+
+Running macOS locally comes with challenges: 30GB golden images, a maximum of 2 sandboxes per host, and unpredictable compatibility issues. With Cloud macOS Sandboxes, we provision bare-metal macOS hosts (M1, M2, M4) on-demand—giving you full desktop access without the overhead of managing local sandboxes.
+
+![macOS Preview Waitlist](./assets/macOS-waitlist.png)
+
+**Preview access:** Invite-only. [Join the waitlist](https://cua.ai/macos-waitlist) if you're building agents for macOS workflows.
+
+## Getting Started Today
+
+Sign up at [cua.ai/signin](https://cua.ai/signin) and grab your API key from the dashboard. Then connect to a sandbox:
+
+```python
+from computer import Computer
+
+computer = Computer(
+    os_type="windows",      # or "macos"
+    provider_type="cloud",
+    name="my-sandbox",
+    api_key="your-api-key"
+)
+
+await computer.run()
+```
+
+Manage existing sandboxes:
+
+```python
+from computer.providers.cloud.provider import CloudProvider
+
+provider = CloudProvider(api_key="your-api-key")
+async with provider:
+    sandboxes = await provider.list_vms()
+    await provider.run_vm("my-sandbox")
+    await provider.stop_vm("my-sandbox")
+```
+
+Run an agent on Windows to automate a workflow:
+
+```python
+from agent import ComputerAgent
+
+agent = ComputerAgent(
+    model="anthropic/claude-sonnet-4-5-20250929",
+    tools=[computer],
+    max_trajectory_budget=5.0
+)
+
+response = await agent.run(
+    "Open Excel, create a sales report with this month's data, and save it to the desktop"
+)
+```
+
+## FAQs
+
+<details>
+<summary><strong>Why not just use local Windows Sandbox?</strong></summary>
+
+Local Windows Sandbox resets on every restart. No persistence, no hot-start, and you need Windows Pro. Our sandboxes persist state, hot-start in under a second, and work from any OS.
+
+</details>
+
+<details>
+<summary><strong>What happens to my work when I stop a sandbox?</strong></summary>
+
+Everything persists. Files, installed software, browser profiles—it's all there when you restart. Only pay for runtime, not storage.
+
+</details>
+
+<details>
+<summary><strong>How's the latency for UI automation?</strong></summary>
+
+We run in 4 regions so you can pick what's closest. The noVNC connection is optimized for automation, not video streaming. Your agent sees crisp screenshots, not compressed video.
+
+</details>
+
+<details>
+<summary><strong>Are there software restrictions?</strong></summary>
+
+No. Full admin access on both platforms. Install whatever you need—Visual Studio, Photoshop, custom enterprise software. It's your sandbox.
+
+</details>
+
+## Need help?
+
+If you hit issues getting either platform working, reach out in [Discord](https://discord.gg/cua-ai). We respond fast and fix based on what people actually use.
+
+---
+
+Get started at [cua.ai](https://cua.ai) or [join the macOS waitlist](https://cua.ai/macos-waitlist).
@@ -1,5 +1,5 @@
 {
  "title": "Cookbook",
  "description": "Real-world examples of building with Cua",
-  "pages": ["form-filling", "post-event-contact-export"]
+  "pages": ["windows-app-behind-vpn", "form-filling", "post-event-contact-export"]
 }
@@ -0,0 +1,615 @@
+---
+title: Windows App behind VPN
+description: Automate legacy Windows desktop applications behind VPN with Cua
+---
+
+import { Step, Steps } from 'fumadocs-ui/components/steps';
+import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
+
+## Overview
+
+This guide demonstrates how to automate Windows desktop applications (like eGecko HR/payroll systems) that run behind corporate VPN. This is a common enterprise scenario where legacy desktop applications require manual data entry, report generation, or workflow execution.
+
+**Use cases:**
+- HR/payroll processing (employee onboarding, payroll runs, benefits administration)
+- Desktop ERP systems behind corporate networks
+- Legacy financial applications requiring VPN access
+- Compliance reporting from on-premise systems
+
+**Architecture:**
+- Client-side Cua agent (Python SDK or Playground UI)
+- Windows VM/Sandbox with VPN client configured
+- RDP/remote desktop connection to target environment
+- Desktop application automation via computer vision and UI control
+
+<Callout type="info">
+  **Production Deployment**: For production use, consider workflow mining and custom finetuning to create vertical-specific actions (e.g., "Run payroll", "Onboard employee") instead of generic UI automation. This provides better audit trails and higher success rates.
+</Callout>
+
+---
+
+## Video Demo
+
+<div className="rounded-lg border bg-card text-card-foreground shadow-sm p-4 mb-6">
+  <video src="https://github.com/user-attachments/assets/7abbdaf4-054f-4965-8260-81dab497c6ba" controls className="w-full rounded">
+    Your browser does not support the video tag.
+  </video>
+  <div className="text-sm text-muted-foreground mt-2">
+    Demo showing Cua automating an eGecko-like desktop application on Windows behind AWS VPN
+  </div>
+</div>
+
+---
+
+<Steps>
+
+<Step>
+
+### Set Up Your Environment
+
+Install the required dependencies:
+
+Create a `requirements.txt` file:
+
+```text
+cua-agent
+cua-computer
+python-dotenv>=1.0.0
+```
+
+Install the dependencies:
+
+```bash
+pip install -r requirements.txt
+```
+
+Create a `.env` file with your API keys:
+
+```text
+ANTHROPIC_API_KEY=your-anthropic-api-key
+CUA_API_KEY=sk_cua-api01...
+CUA_SANDBOX_NAME=your-windows-sandbox
+```
+
+</Step>
+
+<Step>
+
+### Configure Windows Sandbox with VPN
+
+<Tabs items={['Cloud Sandbox (Recommended)', 'Windows Sandbox', 'Self-Hosted VM']}>
+  <Tab value="Cloud Sandbox (Recommended)">
+
+For enterprise deployments, use Cua Cloud Sandbox with pre-configured VPN:
+
+1. Go to [cua.ai/signin](https://cua.ai/signin)
+2. Navigate to **Dashboard > Containers > Create Instance**
+3. Create a **Windows** sandbox (Medium or Large for desktop apps)
+4. Configure VPN settings:
+   - Upload your AWS VPN Client configuration (`.ovpn` file)
+   - Or configure VPN credentials directly in the dashboard
+5. Note your sandbox name and API key
+
+Your Windows sandbox will launch with VPN automatically connected.
+
+  </Tab>
+  <Tab value="Windows Sandbox">
+
+For local development on Windows 10 Pro/Enterprise or Windows 11:
+
+1. Enable [Windows Sandbox](https://learn.microsoft.com/en-us/windows/security/application-security/application-isolation/windows-sandbox/windows-sandbox-install)
+2. Install the `pywinsandbox` dependency:
+   ```bash
+   pip install -U git+git://github.com/karkason/pywinsandbox.git
+   ```
+3. Create a VPN setup script that runs on sandbox startup
+4. Configure your desktop application installation within the sandbox
+
+<Callout type="warn">
+  **Manual VPN Setup**: Windows Sandbox requires manual VPN configuration each time it starts. For production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections.
+</Callout>
+
+  </Tab>
+  <Tab value="Self-Hosted VM">
+
+For self-managed infrastructure:
+
+1. Deploy Windows VM on your preferred cloud (AWS, Azure, GCP)
+2. Install and configure VPN client (AWS VPN Client, OpenVPN, etc.)
+3. Install target desktop application and any dependencies
+4. Install `cua-computer-server`:
+   ```bash
+   pip install cua-computer-server
+   python -m computer_server
+   ```
+5. Configure firewall rules to allow Cua agent connections
+
+  </Tab>
+</Tabs>
+
+</Step>
+
+<Step>
+
+### Create Your Automation Script
+
+Create a Python file (e.g., `hr_automation.py`):
+
+<Tabs items={['Cloud Sandbox', 'Windows Sandbox', 'Self-Hosted']}>
+  <Tab value="Cloud Sandbox">
+
+```python
+import asyncio
+import logging
+import os
+from agent import ComputerAgent
+from computer import Computer, VMProviderType
+from dotenv import load_dotenv
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+load_dotenv()
+
+async def automate_hr_workflow():
+    """
+    Automate HR/payroll desktop application workflow.
+
+    This example demonstrates:
+    - Launching Windows desktop application
+    - Navigating complex desktop UI
+    - Data entry and form filling
+    - Report generation and export
+    """
+    try:
+        # Connect to Windows Cloud Sandbox with VPN
+        async with Computer(
+            os_type="windows",
+            provider_type=VMProviderType.CLOUD,
+            name=os.environ["CUA_SANDBOX_NAME"],
+            api_key=os.environ["CUA_API_KEY"],
+            verbosity=logging.INFO,
+        ) as computer:
+
+            # Configure agent with specialized instructions
+            agent = ComputerAgent(
+                model="anthropic/claude-sonnet-4-5-20250929",
+                tools=[computer],
+                only_n_most_recent_images=3,
+                verbosity=logging.INFO,
+                trajectory_dir="trajectories",
+                use_prompt_caching=True,
+                max_trajectory_budget=10.0,
+                instructions="""
+You are automating a Windows desktop HR/payroll application.
+
+IMPORTANT GUIDELINES:
+- Always wait for windows and dialogs to fully load before interacting
+- Look for loading indicators and wait for them to disappear
+- Verify each action by checking on-screen confirmation messages
+- If a button or field is not visible, try scrolling or navigating tabs
+- Desktop apps often have nested menus - explore systematically
+- Save work frequently using File > Save or Ctrl+S
+- Before closing, always verify changes were saved
+
+COMMON UI PATTERNS:
+- Menu bar navigation (File, Edit, View, etc.)
+- Ribbon interfaces with tabs
+- Modal dialogs that block interaction
+- Data grids/tables for viewing records
+- Form fields with validation
+- Status bars showing operation progress
+                """.strip()
+            )
+
+            # Define workflow tasks
+            tasks = [
+                "Launch the HR application from the desktop or start menu",
+                "Log in with the credentials shown in credentials.txt on the desktop",
+                "Navigate to Employee Management section",
+                "Create a new employee record with information from new_hire.xlsx on desktop",
+                "Verify the employee was created successfully by searching for their name",
+                "Generate an onboarding report for the new employee",
+                "Export the report as PDF to the desktop",
+                "Log out of the application"
+            ]
+
+            history = []
+
+            for task in tasks:
+                logger.info(f"\n{'='*60}")
+                logger.info(f"Task: {task}")
+                logger.info(f"{'='*60}\n")
+
+                history.append({"role": "user", "content": task})
+
+                async for result in agent.run(history):
+                    for item in result.get("output", []):
+                        if item.get("type") == "message":
+                            content = item.get("content", [])
+                            for block in content:
+                                if block.get("type") == "text":
+                                    response = block.get("text", "")
+                                    logger.info(f"Agent: {response}")
+                                    history.append({"role": "assistant", "content": response})
+
+                logger.info("\nTask completed. Moving to next task...\n")
+
+            logger.info("\n" + "="*60)
+            logger.info("All tasks completed successfully!")
+            logger.info("="*60)
+
+    except Exception as e:
+        logger.error(f"Error during automation: {e}")
+        import traceback
+        traceback.print_exc()
+
+if __name__ == "__main__":
+    asyncio.run(automate_hr_workflow())
+```
+
+  </Tab>
+  <Tab value="Windows Sandbox">
+
+```python
+import asyncio
+import logging
+import os
+from agent import ComputerAgent
+from computer import Computer, VMProviderType
+from dotenv import load_dotenv
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+load_dotenv()
+
+async def automate_hr_workflow():
+    try:
+        # Connect to Windows Sandbox
+        async with Computer(
+            os_type="windows",
+            provider_type=VMProviderType.WINDOWS_SANDBOX,
+            verbosity=logging.INFO,
+        ) as computer:
+
+            agent = ComputerAgent(
+                model="anthropic/claude-sonnet-4-5-20250929",
+                tools=[computer],
+                only_n_most_recent_images=3,
+                verbosity=logging.INFO,
+                trajectory_dir="trajectories",
+                use_prompt_caching=True,
+                max_trajectory_budget=10.0,
+                instructions="""
+You are automating a Windows desktop HR/payroll application.
+
+IMPORTANT GUIDELINES:
+- Always wait for windows and dialogs to fully load before interacting
+- Verify each action by checking on-screen confirmation messages
+- Desktop apps often have nested menus - explore systematically
+- Save work frequently using File > Save or Ctrl+S
+                """.strip()
+            )
+
+            tasks = [
+                "Launch the HR application from the desktop",
+                "Log in with credentials from credentials.txt on desktop",
+                "Navigate to Employee Management and create new employee from new_hire.xlsx",
+                "Generate and export onboarding report as PDF",
+                "Log out of the application"
+            ]
+
+            history = []
+
+            for task in tasks:
+                logger.info(f"\nTask: {task}")
+                history.append({"role": "user", "content": task})
+
+                async for result in agent.run(history):
+                    for item in result.get("output", []):
+                        if item.get("type") == "message":
+                            content = item.get("content", [])
+                            for block in content:
+                                if block.get("type") == "text":
+                                    response = block.get("text", "")
+                                    logger.info(f"Agent: {response}")
+                                    history.append({"role": "assistant", "content": response})
+
+            logger.info("\nAll tasks completed!")
+
+    except Exception as e:
+        logger.error(f"Error: {e}")
+        import traceback
+        traceback.print_exc()
+
+if __name__ == "__main__":
+    asyncio.run(automate_hr_workflow())
+```
+
+  </Tab>
+  <Tab value="Self-Hosted">
+
+```python
+import asyncio
+import logging
+import os
+from agent import ComputerAgent
+from computer import Computer
+from dotenv import load_dotenv
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+load_dotenv()
+
+async def automate_hr_workflow():
+    try:
+        # Connect to self-hosted Windows VM running computer-server
+        async with Computer(
+            use_host_computer_server=True,
+            base_url="http://your-windows-vm-ip:5757",  # Update with your VM IP
+            verbosity=logging.INFO,
+        ) as computer:
+
+            agent = ComputerAgent(
+                model="anthropic/claude-sonnet-4-5-20250929",
+                tools=[computer],
+                only_n_most_recent_images=3,
+                verbosity=logging.INFO,
+                trajectory_dir="trajectories",
+                use_prompt_caching=True,
+                max_trajectory_budget=10.0,
+                instructions="""
+You are automating a Windows desktop HR/payroll application.
+
+IMPORTANT GUIDELINES:
+- Always wait for windows and dialogs to fully load before interacting
+- Verify each action by checking on-screen confirmation messages
+- Save work frequently using File > Save or Ctrl+S
+                """.strip()
+            )
+
+            tasks = [
+                "Launch the HR application",
+                "Log in with provided credentials",
+                "Complete the required HR workflow",
+                "Generate and export report",
+                "Log out"
+            ]
+
+            history = []
+
+            for task in tasks:
+                logger.info(f"\nTask: {task}")
+                history.append({"role": "user", "content": task})
+
+                async for result in agent.run(history):
+                    for item in result.get("output", []):
+                        if item.get("type") == "message":
+                            content = item.get("content", [])
+                            for block in content:
+                                if block.get("type") == "text":
+                                    response = block.get("text", "")
+                                    logger.info(f"Agent: {response}")
+                                    history.append({"role": "assistant", "content": response})
+
+            logger.info("\nAll tasks completed!")
+
+    except Exception as e:
+        logger.error(f"Error: {e}")
+        import traceback
+        traceback.print_exc()
+
+if __name__ == "__main__":
+    asyncio.run(automate_hr_workflow())
+```
+
+  </Tab>
+</Tabs>
+
+</Step>
+
+<Step>
+
+### Run Your Automation
+
+Execute the script:
+
+```bash
+python hr_automation.py
+```
+
+The agent will:
+1. Connect to your Windows environment (with VPN if configured)
+2. Launch and navigate the desktop application
+3. Execute each workflow step sequentially
+4. Verify actions and handle errors
+5. Save trajectory logs for audit and debugging
+
+Monitor the console output to see the agent's progress through each task.
+
+</Step>
+
+</Steps>
+
+---
+
+## Key Configuration Options
+
+### Agent Instructions
+
+The `instructions` parameter is critical for reliable desktop automation:
+
+```python
+instructions="""
+You are automating a Windows desktop HR/payroll application.
+
+IMPORTANT GUIDELINES:
+- Always wait for windows and dialogs to fully load before interacting
+- Look for loading indicators and wait for them to disappear
+- Verify each action by checking on-screen confirmation messages
+- If a button or field is not visible, try scrolling or navigating tabs
+- Desktop apps often have nested menus - explore systematically
+- Save work frequently using File > Save or Ctrl+S
+- Before closing, always verify changes were saved
+
+COMMON UI PATTERNS:
+- Menu bar navigation (File, Edit, View, etc.)
+- Ribbon interfaces with tabs
+- Modal dialogs that block interaction
+- Data grids/tables for viewing records
+- Form fields with validation
+- Status bars showing operation progress
+
+APPLICATION-SPECIFIC:
+- Login is at top-left corner
+- Employee records are under "HR Management" > "Employees"
+- Reports are generated via "Tools" > "Reports" > "Generate"
+- Always click "Save" before navigating away from a form
+""".strip()
+```
+
+### Budget Management
+
+For long-running workflows, adjust budget limits:
+
+```python
+agent = ComputerAgent(
+    model="anthropic/claude-sonnet-4-5-20250929",
+    tools=[computer],
+    max_trajectory_budget=20.0,  # Increase for complex workflows
+    # ... other params
+)
+```
+
+### Image Retention
+
+Balance context and cost by retaining only recent screenshots:
+
+```python
+agent = ComputerAgent(
+    # ...
+    only_n_most_recent_images=3,  # Keep last 3 screenshots
+    # ...
+)
+```
+
+---
+
+## Production Considerations
+
+<Callout type="warn" title="Production Deployment">
+  For enterprise production deployments, consider these additional steps:
+</Callout>
+
+### 1. Workflow Mining
+
+Before deploying, analyze your actual workflows:
+- Record user interactions with the application
+- Identify common patterns and edge cases
+- Map out decision trees and validation requirements
+- Document application-specific quirks and timing issues
+
+### 2. Custom Finetuning
+
+Create vertical-specific actions instead of generic UI automation:
+
+```python
+# Instead of generic steps:
+tasks = ["Click login", "Type username", "Type password", "Click submit"]
+
+# Create semantic actions:
+tasks = ["onboard_employee", "run_payroll", "generate_compliance_report"]
+```
+
+This provides:
+- Better audit trails
+- Approval gates at business logic level
+- Higher success rates
+- Easier maintenance and updates
+
+### 3. Human-in-the-Loop
+
+Add approval gates for critical operations:
+
+```python
+agent = ComputerAgent(
+    model="anthropic/claude-sonnet-4-5-20250929",
+    tools=[computer],
+    # Add human approval callback for sensitive operations
+    callbacks=[ApprovalCallback(require_approval_for=["payroll", "termination"])]
+)
+```
+
+### 4. Deployment Options
+
+Choose your deployment model:
+
+**Managed (Recommended)**
+- Cua hosts Windows sandboxes, VPN/RDP stack, and agent runtime
+- You get UI/API endpoints for triggering workflows
+- Automatic scaling, monitoring, and maintenance
+- SLA guarantees and enterprise support
+
+**Self-Hosted**
+- You manage Windows VMs, VPN infrastructure, and agent deployment
+- Full control over data and security
+- Custom network configurations
+- On-premise or your preferred cloud
+
+---
+
+## Troubleshooting
+
+### VPN Connection Issues
+
+If the agent cannot reach the application:
+
+1. Verify VPN is connected: Check VPN client status in the Windows sandbox
+2. Test network connectivity: Try pinging internal resources
+3. Check firewall rules: Ensure RDP and application ports are open
+4. Review VPN logs: Look for authentication or routing errors
+
+### Application Not Launching
+
+If the desktop application fails to start:
+
+1. Verify installation: Check the application is installed in the sandbox
+2. Check dependencies: Ensure all required DLLs and frameworks are present
+3. Review permissions: Application may require admin rights
+4. Check logs: Look for error messages in Windows Event Viewer
+
+### UI Element Not Found
+
+If the agent cannot find buttons or fields:
+
+1. Increase wait times: Some applications load slowly
+2. Check screen resolution: UI elements may be off-screen
+3. Verify DPI scaling: High DPI settings can affect element positions
+4. Update instructions: Provide more specific navigation guidance
+
+### Cost Management
+
+If costs are higher than expected:
+
+1. Reduce `max_trajectory_budget`
+2. Decrease `only_n_most_recent_images`
+3. Use prompt caching: Set `use_prompt_caching=True`
+4. Optimize task descriptions: Be more specific to reduce retry attempts
+
+---
+
+## Next Steps
+
+- **Explore custom tools**: Learn how to create [custom tools](/agent-sdk/custom-tools) for application-specific actions
+- **Implement callbacks**: Add [monitoring and logging](/agent-sdk/callbacks) for production workflows
+- **Join community**: Get help in our [Discord](https://discord.com/invite/mVnXXpdE85)
+
+---
+
+## Related Examples
+
+- [Form Filling](/example-usecases/form-filling) - Web form automation
+- [Post-Event Contact Export](/example-usecases/post-event-contact-export) - Data extraction workflows
+- [Custom Tools](/agent-sdk/custom-tools) - Building application-specific functions