From 2b595f5de8509a2917172f5e73b6fac4ab14b990 Mon Sep 17 00:00:00 2001 From: f-trycua Date: Mon, 17 Nov 2025 16:56:17 +0100 Subject: [PATCH 1/4] Fixes pre-launch week --- docs/content/docs/agent-sdk/agent-loops.mdx | 6 +- .../agent-sdk/customizing-computeragent.mdx | 11 +- .../docs/agent-sdk/integrations/hud.mdx | 6 +- docs/content/docs/agent-sdk/meta.json | 2 - .../content/docs/computer-sdk/computer-ui.mdx | 6 +- .../custom-computer-handlers.mdx | 0 docs/content/docs/computer-sdk/meta.json | 2 +- .../docs/computer-sdk/sandboxed-python.mdx | 9 +- .../docs/example-usecases/form-filling.mdx | 253 +-- .../post-event-contact-export.mdx | 1410 +++-------------- docs/content/docs/get-started/meta.json | 2 +- docs/content/docs/get-started/quickstart.mdx | 22 +- docs/content/docs/index.mdx | 20 +- .../docs/libraries/computer-server/index.mdx | 9 +- docs/content/docs/libraries/som/index.mdx | 6 +- docs/content/docs/meta.json | 1 - docs/src/app/layout.config.tsx | 1 + 17 files changed, 348 insertions(+), 1418 deletions(-) rename docs/content/docs/{agent-sdk => computer-sdk}/custom-computer-handlers.mdx (100%) diff --git a/docs/content/docs/agent-sdk/agent-loops.mdx b/docs/content/docs/agent-sdk/agent-loops.mdx index 2885a5c5..49d7e897 100644 --- a/docs/content/docs/agent-sdk/agent-loops.mdx +++ b/docs/content/docs/agent-sdk/agent-loops.mdx @@ -4,11 +4,7 @@ description: Supported computer-using agent loops and models --- - A corresponding{' '} - - Jupyter Notebook - {' '} - is available for this documentation. + A corresponding Jupyter Notebook is available for this documentation. An agent can be thought of as a loop - it generates actions, executes them, and repeats until done: diff --git a/docs/content/docs/agent-sdk/customizing-computeragent.mdx b/docs/content/docs/agent-sdk/customizing-computeragent.mdx index e7d3c030..82eace76 100644 --- a/docs/content/docs/agent-sdk/customizing-computeragent.mdx +++ b/docs/content/docs/agent-sdk/customizing-computeragent.mdx @@ -1,16 +1,9 @@ --- -title: Customizing Your ComputerAgent +title: Customize ComputerAgent --- - A corresponding{' '} - - Jupyter Notebook - {' '} - is available for this documentation. + A corresponding Jupyter Notebook is available for this documentation. The `ComputerAgent` interface provides an easy proxy to any computer-using model configuration, and it is a powerful framework for extending and building your own agentic systems. diff --git a/docs/content/docs/agent-sdk/integrations/hud.mdx b/docs/content/docs/agent-sdk/integrations/hud.mdx index 7bfcbdea..9575ebf6 100644 --- a/docs/content/docs/agent-sdk/integrations/hud.mdx +++ b/docs/content/docs/agent-sdk/integrations/hud.mdx @@ -4,11 +4,7 @@ description: Use ComputerAgent with HUD for benchmarking and evaluation --- - A corresponding{' '} - - Jupyter Notebook - {' '} - is available for this documentation. + A corresponding Jupyter Notebook is available for this documentation. The HUD integration allows an agent to be benchmarked using the [HUD framework](https://www.hud.so/). Through the HUD integration, the agent controls a computer inside HUD, where tests are run to evaluate the success of each task. diff --git a/docs/content/docs/agent-sdk/meta.json b/docs/content/docs/agent-sdk/meta.json index b86632e7..0a733f28 100644 --- a/docs/content/docs/agent-sdk/meta.json +++ b/docs/content/docs/agent-sdk/meta.json @@ -10,12 +10,10 @@ "customizing-computeragent", "callbacks", "custom-tools", - "custom-computer-handlers", "prompt-caching", "usage-tracking", "telemetry", "benchmarks", - "migration-guide", "integrations" ] } diff --git a/docs/content/docs/computer-sdk/computer-ui.mdx b/docs/content/docs/computer-sdk/computer-ui.mdx index c731e4c4..9739398b 100644 --- a/docs/content/docs/computer-sdk/computer-ui.mdx +++ b/docs/content/docs/computer-sdk/computer-ui.mdx @@ -1,7 +1,11 @@ --- -title: Computer UI +title: Computer UI (Deprecated) --- + + The Computer UI is deprecated and will be replaced with a revamped playground experience soon. We recommend using VNC or Screen Sharing for precise control of the computer instead. + + The computer module includes a Gradio UI for creating and sharing demonstration data. We make it easy for people to build community datasets for better computer use models with an upload to Huggingface feature. ```bash diff --git a/docs/content/docs/agent-sdk/custom-computer-handlers.mdx b/docs/content/docs/computer-sdk/custom-computer-handlers.mdx similarity index 100% rename from docs/content/docs/agent-sdk/custom-computer-handlers.mdx rename to docs/content/docs/computer-sdk/custom-computer-handlers.mdx diff --git a/docs/content/docs/computer-sdk/meta.json b/docs/content/docs/computer-sdk/meta.json index 547dde17..f2c124e7 100644 --- a/docs/content/docs/computer-sdk/meta.json +++ b/docs/content/docs/computer-sdk/meta.json @@ -1,5 +1,5 @@ { "title": "Computer SDK", "description": "Build computer-using agents with the Computer SDK", - "pages": ["computers", "commands", "computer-ui", "tracing-api", "sandboxed-python"] + "pages": ["computers", "commands", "tracing-api", "sandboxed-python", "custom-computer-handlers", "computer-ui"] } diff --git a/docs/content/docs/computer-sdk/sandboxed-python.mdx b/docs/content/docs/computer-sdk/sandboxed-python.mdx index bb1c1e9c..e66ad34c 100644 --- a/docs/content/docs/computer-sdk/sandboxed-python.mdx +++ b/docs/content/docs/computer-sdk/sandboxed-python.mdx @@ -4,14 +4,7 @@ slug: sandboxed-python --- - A corresponding{' '} - - Python example - {' '} - is available for this documentation. + A corresponding Python example is available for this documentation. You can run Python functions securely inside a sandboxed virtual environment on a remote Cua Computer. This is useful for executing untrusted user code, isolating dependencies, or providing a safe environment for automation tasks. diff --git a/docs/content/docs/example-usecases/form-filling.mdx b/docs/content/docs/example-usecases/form-filling.mdx index b6f60b05..7a15cd5f 100644 --- a/docs/content/docs/example-usecases/form-filling.mdx +++ b/docs/content/docs/example-usecases/form-filling.mdx @@ -3,7 +3,7 @@ title: Form Filling description: Enhance and Automate Interactions Between Form Filling and Local File Systems --- -import { EditableCodeBlock, EditableValue, S } from '@/components/editable-code-block'; +import { Step, Steps } from 'fumadocs-ui/components/steps'; import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; ## Overview @@ -12,9 +12,17 @@ Cua can be used to automate interactions between form filling and local file sys This preset usecase uses [Cua Computer](/computer-sdk/computers) to interact with a web page and local file systems along with [Agent Loops](/agent-sdk/agent-loops) to run the agent in a loop with message history. -## Quickstart +--- -Create a `requirements.txt` file with the following dependencies: + + + + +### Set Up Your Environment + +First, install the required dependencies: + +Create a `requirements.txt` file: ```text cua-agent @@ -22,33 +30,32 @@ cua-computer python-dotenv>=1.0.0 ``` -And install: +Install the dependencies: ```bash pip install -r requirements.txt ``` -Create a `.env` file with the following environment variables: +Create a `.env` file with your API keys: ```text -ANTHROPIC_API_KEY=your-api-key +ANTHROPIC_API_KEY=your-anthropic-api-key CUA_API_KEY=sk_cua-api01... ``` -Select the environment you want to run the code in (_click on the underlined values in the code to edit them directly!_): + - - + - -{`import asyncio +### Create Your Form Filling Script + +Create a Python file (e.g., `form_filling.py`) and select your environment: + + + + +```python +import asyncio import logging import os import signal @@ -59,21 +66,21 @@ from computer import Computer, VMProviderType from dotenv import load_dotenv logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(**name**) +logger = logging.getLogger(__name__) def handle_sigint(sig, frame): -print("\\n\\nExecution interrupted by user. Exiting gracefully...") -exit(0) + print("\n\nExecution interrupted by user. Exiting gracefully...") + exit(0) async def fill_application(): -try: -async with Computer( -os_type="linux", -provider_type=VMProviderType.CLOUD, -name="`}{`", -api_key="`}{`", -verbosity=logging.INFO, -) as computer: + try: + async with Computer( + os_type="linux", + provider_type=VMProviderType.CLOUD, + name="your-sandbox-name", # Replace with your sandbox name + api_key=os.environ["CUA_API_KEY"], + verbosity=logging.INFO, + ) as computer: agent = ComputerAgent( model="anthropic/claude-sonnet-4-5-20250929", @@ -93,7 +100,7 @@ verbosity=logging.INFO, history = [] for i, task in enumerate(tasks, 1): - print(f"\\n[Task {i}/{len(tasks)}] {task}") + print(f"\n[Task {i}/{len(tasks)}] {task}") # Add user message to history history.append({"role": "user", "content": task}) @@ -116,7 +123,7 @@ verbosity=logging.INFO, print(f"βœ… Task {i}/{len(tasks)} completed") - print("\\nπŸŽ‰ All tasks completed successfully!") + print("\nπŸŽ‰ All tasks completed successfully!") except Exception as e: logger.error(f"Error in fill_application: {e}") @@ -124,18 +131,18 @@ verbosity=logging.INFO, raise def main(): -try: -load_dotenv() + try: + load_dotenv() if "ANTHROPIC_API_KEY" not in os.environ: raise RuntimeError( - "Please set the ANTHROPIC_API_KEY environment variable.\\n" + "Please set the ANTHROPIC_API_KEY environment variable.\n" "You can add it to a .env file in the project root." ) if "CUA_API_KEY" not in os.environ: raise RuntimeError( - "Please set the CUA_API_KEY environment variable.\\n" + "Please set the CUA_API_KEY environment variable.\n" "You can add it to a .env file in the project root." ) @@ -147,22 +154,15 @@ load_dotenv() logger.error(f"Error running automation: {e}") traceback.print_exc() -if **name** == "**main**": -main()`} - - +if __name__ == "__main__": + main() +``` - + - -{`import asyncio +```python +import asyncio import logging import os import signal @@ -173,20 +173,20 @@ from computer import Computer, VMProviderType from dotenv import load_dotenv logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(**name**) +logger = logging.getLogger(__name__) def handle_sigint(sig, frame): -print("\\n\\nExecution interrupted by user. Exiting gracefully...") -exit(0) + print("\n\nExecution interrupted by user. Exiting gracefully...") + exit(0) async def fill_application(): -try: -async with Computer( -os_type="macos", -provider_type=VMProviderType.LUME, -name="`}{`", -verbosity=logging.INFO, -) as computer: + try: + async with Computer( + os_type="linux", + provider_type=VMProviderType.DOCKER, + image="trycua/cua-xfce:latest", # or "trycua/cua-ubuntu:latest" + verbosity=logging.INFO, + ) as computer: agent = ComputerAgent( model="anthropic/claude-sonnet-4-5-20250929", @@ -206,7 +206,7 @@ verbosity=logging.INFO, history = [] for i, task in enumerate(tasks, 1): - print(f"\\n[Task {i}/{len(tasks)}] {task}") + print(f"\n[Task {i}/{len(tasks)}] {task}") # Add user message to history history.append({"role": "user", "content": task}) @@ -229,7 +229,7 @@ verbosity=logging.INFO, print(f"βœ… Task {i}/{len(tasks)} completed") - print("\\nπŸŽ‰ All tasks completed successfully!") + print("\nπŸŽ‰ All tasks completed successfully!") except Exception as e: logger.error(f"Error in fill_application: {e}") @@ -237,12 +237,12 @@ verbosity=logging.INFO, raise def main(): -try: -load_dotenv() + try: + load_dotenv() if "ANTHROPIC_API_KEY" not in os.environ: raise RuntimeError( - "Please set the ANTHROPIC_API_KEY environment variable.\\n" + "Please set the ANTHROPIC_API_KEY environment variable.\n" "You can add it to a .env file in the project root." ) @@ -254,20 +254,15 @@ load_dotenv() logger.error(f"Error running automation: {e}") traceback.print_exc() -if **name** == "**main**": -main()`} - - +if __name__ == "__main__": + main() +``` - + - -{`import asyncio +```python +import asyncio import logging import os import signal @@ -278,19 +273,20 @@ from computer import Computer, VMProviderType from dotenv import load_dotenv logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(**name**) +logger = logging.getLogger(__name__) def handle_sigint(sig, frame): -print("\\n\\nExecution interrupted by user. Exiting gracefully...") -exit(0) + print("\n\nExecution interrupted by user. Exiting gracefully...") + exit(0) async def fill_application(): -try: -async with Computer( -os_type="windows", -provider_type=VMProviderType.WINDOWS_SANDBOX, -verbosity=logging.INFO, -) as computer: + try: + async with Computer( + os_type="macos", + provider_type=VMProviderType.LUME, + name="macos-sequoia-cua:latest", + verbosity=logging.INFO, + ) as computer: agent = ComputerAgent( model="anthropic/claude-sonnet-4-5-20250929", @@ -310,7 +306,7 @@ verbosity=logging.INFO, history = [] for i, task in enumerate(tasks, 1): - print(f"\\n[Task {i}/{len(tasks)}] {task}") + print(f"\n[Task {i}/{len(tasks)}] {task}") # Add user message to history history.append({"role": "user", "content": task}) @@ -333,7 +329,7 @@ verbosity=logging.INFO, print(f"βœ… Task {i}/{len(tasks)} completed") - print("\\nπŸŽ‰ All tasks completed successfully!") + print("\nπŸŽ‰ All tasks completed successfully!") except Exception as e: logger.error(f"Error in fill_application: {e}") @@ -341,12 +337,12 @@ verbosity=logging.INFO, raise def main(): -try: -load_dotenv() + try: + load_dotenv() if "ANTHROPIC_API_KEY" not in os.environ: raise RuntimeError( - "Please set the ANTHROPIC_API_KEY environment variable.\\n" + "Please set the ANTHROPIC_API_KEY environment variable.\n" "You can add it to a .env file in the project root." ) @@ -358,22 +354,15 @@ load_dotenv() logger.error(f"Error running automation: {e}") traceback.print_exc() -if **name** == "**main**": -main()`} - - +if __name__ == "__main__": + main() +``` - + - -{`import asyncio +```python +import asyncio import logging import os import signal @@ -384,20 +373,19 @@ from computer import Computer, VMProviderType from dotenv import load_dotenv logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(**name**) +logger = logging.getLogger(__name__) def handle_sigint(sig, frame): -print("\\n\\nExecution interrupted by user. Exiting gracefully...") -exit(0) + print("\n\nExecution interrupted by user. Exiting gracefully...") + exit(0) async def fill_application(): -try: -async with Computer( -os_type="linux", -provider_type=VMProviderType.DOCKER, -name="`}{`", -verbosity=logging.INFO, -) as computer: + try: + async with Computer( + os_type="windows", + provider_type=VMProviderType.WINDOWS_SANDBOX, + verbosity=logging.INFO, + ) as computer: agent = ComputerAgent( model="anthropic/claude-sonnet-4-5-20250929", @@ -417,7 +405,7 @@ verbosity=logging.INFO, history = [] for i, task in enumerate(tasks, 1): - print(f"\\n[Task {i}/{len(tasks)}] {task}") + print(f"\n[Task {i}/{len(tasks)}] {task}") # Add user message to history history.append({"role": "user", "content": task}) @@ -440,7 +428,7 @@ verbosity=logging.INFO, print(f"βœ… Task {i}/{len(tasks)} completed") - print("\\nπŸŽ‰ All tasks completed successfully!") + print("\nπŸŽ‰ All tasks completed successfully!") except Exception as e: logger.error(f"Error in fill_application: {e}") @@ -448,12 +436,12 @@ verbosity=logging.INFO, raise def main(): -try: -load_dotenv() + try: + load_dotenv() if "ANTHROPIC_API_KEY" not in os.environ: raise RuntimeError( - "Please set the ANTHROPIC_API_KEY environment variable.\\n" + "Please set the ANTHROPIC_API_KEY environment variable.\n" "You can add it to a .env file in the project root." ) @@ -465,16 +453,41 @@ load_dotenv() logger.error(f"Error running automation: {e}") traceback.print_exc() -if **name** == "**main**": -main()`} - - +if __name__ == "__main__": + main() +``` + + + + +### Run Your Script + +Execute your form filling automation: + +```bash +python form_filling.py +``` + +The agent will: +1. Download the PDF resume from Overleaf +2. Extract information from the PDF +3. Fill out the JotForm with the extracted information + +Monitor the output to see the agent's progress through each task. + + + + + +--- + ## Next Steps - Learn more about [Cua computers](/computer-sdk/computers) and [computer commands](/computer-sdk/commands) - Read about [Agent loops](/agent-sdk/agent-loops), [tools](/agent-sdk/custom-tools), and [supported model providers](/agent-sdk/supported-model-providers/) - Experiment with different [Models and Providers](/agent-sdk/supported-model-providers/) +- Join our [Discord community](https://discord.com/invite/mVnXXpdE85) for help diff --git a/docs/content/docs/example-usecases/post-event-contact-export.mdx b/docs/content/docs/example-usecases/post-event-contact-export.mdx index fcc6e3f7..16131702 100644 --- a/docs/content/docs/example-usecases/post-event-contact-export.mdx +++ b/docs/content/docs/example-usecases/post-event-contact-export.mdx @@ -3,7 +3,7 @@ title: Post-Event Contact Export description: Run overnight contact extraction from LinkedIn, X, or other social platforms after networking events --- -import { EditableCodeBlock, EditableValue, S } from '@/components/editable-code-block'; +import { Step, Steps } from 'fumadocs-ui/components/steps'; import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; ## Overview @@ -26,7 +26,7 @@ This example focuses on LinkedIn but works across platforms. It uses [Cua Comput Traditional web scraping triggers anti-bot measures immediately. Cua's approach works across all platforms. -## What You Get +### What You Get The script generates two files with your extracted connections: @@ -36,10 +36,6 @@ The script generates two files with your extracted connections: first,last,role,company,met_at,linkedin John,Smith,Software Engineer,Acme Corp,Google Devfest Toronto,https://www.linkedin.com/in/johnsmith Sarah,Johnson,Product Manager,Tech Inc,Google Devfest Toronto,https://www.linkedin.com/in/sarahjohnson -Michael,Chen,Data Scientist,StartupXYZ,Google Devfest Toronto,https://www.linkedin.com/in/michaelchen -Emily,Rodriguez,UX Designer,Design Co,Google Devfest Toronto,https://www.linkedin.com/in/emilyrodriguez -David,Kim,Engineering Lead,BigTech,Google Devfest Toronto,https://www.linkedin.com/in/davidkim -... ``` **Messaging Links** (`linkedin_messaging_links_20250116_143022.txt`): @@ -50,15 +46,19 @@ LinkedIn Messaging Compose Links 1. https://www.linkedin.com/messaging/compose/?recipient=johnsmith 2. https://www.linkedin.com/messaging/compose/?recipient=sarahjohnson -3. https://www.linkedin.com/messaging/compose/?recipient=michaelchen -4. https://www.linkedin.com/messaging/compose/?recipient=emilyrodriguez -5. https://www.linkedin.com/messaging/compose/?recipient=davidkim -... ``` -## Quickstart +--- -Create a `requirements.txt` file with the following dependencies: + + + + +### Set Up Your Environment + +First, install the required dependencies: + +Create a `requirements.txt` file: ```text cua-agent @@ -66,20 +66,26 @@ cua-computer python-dotenv>=1.0.0 ``` -And install: +Install the dependencies: ```bash pip install -r requirements.txt ``` -Create a `.env` file with the following environment variables: +Create a `.env` file with your API keys: ```text -ANTHROPIC_API_KEY=your-api-key +ANTHROPIC_API_KEY=your-anthropic-api-key CUA_API_KEY=sk_cua-api01... CUA_CONTAINER_NAME=m-linux-... ``` + + + + +### Log Into LinkedIn Manually + **Important**: Before running the script, manually log into LinkedIn through your VM: 1. Access your VM through the Cua dashboard @@ -88,31 +94,31 @@ CUA_CONTAINER_NAME=m-linux-... 4. Close the browser but leave the VM running 5. Your session is now saved and ready for automation! -**Configuration**: Customize the script by editing these variables: +This one-time manual login bypasses all bot detection. + + + + + +### Configure and Create Your Script + +Create a Python file (e.g., `contact_export.py`). You can customize: ```python # Where you met these connections (automatically added to CSV) MET_AT_REASON = "Google Devfest Toronto" -# Number of contacts to extract (line 134) +# Number of contacts to extract (in the main loop) for contact_num in range(1, 21): # Change 21 to extract more/fewer contacts ``` -Select the environment you want to run the code in (_click on the underlined values in the code to edit them directly!_): +Select your environment: - - + + - -{`import asyncio +```python +import asyncio import csv import logging import os @@ -125,28 +131,22 @@ from computer import Computer, VMProviderType from dotenv import load_dotenv logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(**name**) +logger = logging.getLogger(__name__) # Configuration: Define where you met these connections - -MET_AT_REASON = "`}{`" +MET_AT_REASON = "Google Devfest Toronto" def handle_sigint(sig, frame): -print("\\n\\nExecution interrupted by user. Exiting gracefully...") -exit(0) + print("\n\nExecution interrupted by user. Exiting gracefully...") + exit(0) def extract_public_id_from_linkedin_url(linkedin_url): -""" -Extract public ID from LinkedIn profile URL. -Example: https://www.linkedin.com/in/taylor-r-devries/?lipi=... -> taylor-r-devries -""" -if not linkedin_url: -return None + """Extract public ID from LinkedIn profile URL.""" + if not linkedin_url: + return None - # Remove query parameters and trailing slashes url = linkedin_url.split('?')[0].rstrip('/') - # Extract the part after /in/ if '/in/' in url: public_id = url.split('/in/')[-1] return public_id @@ -154,88 +154,70 @@ return None return None def extract_contact_from_response(result_output): -""" -Extract contact information from agent's response. -Expects the agent to return data in format: -FIRST: value -LAST: value -ROLE: value -COMPANY: value -LINKEDIN: value - - Note: met_at is auto-filled from MET_AT_REASON constant. + """ + Extract contact information from agent's response. + Expects format: + FIRST: value + LAST: value + ROLE: value + COMPANY: value + LINKEDIN: value """ contact = { 'first': '', 'last': '', 'role': '', 'company': '', - 'met_at': MET_AT_REASON, # Auto-fill from constant + 'met_at': MET_AT_REASON, 'linkedin': '' } - # Collect all text from messages for debugging - all_text = [] - for item in result_output: if item.get("type") == "message": content = item.get("content", []) for content_part in content: text = content_part.get("text", "") if text: - all_text.append(text) - # Parse structured output - look for the exact format - for line in text.split('\\n'): + for line in text.split('\n'): line = line.strip() - # Use case-insensitive matching and handle extra whitespace line_upper = line.upper() if line_upper.startswith("FIRST:"): - value = line[6:].strip() # Skip "FIRST:" prefix + value = line[6:].strip() if value and value.upper() != "N/A": contact['first'] = value elif line_upper.startswith("LAST:"): - value = line[5:].strip() # Skip "LAST:" prefix + value = line[5:].strip() if value and value.upper() != "N/A": contact['last'] = value elif line_upper.startswith("ROLE:"): - value = line[5:].strip() # Skip "ROLE:" prefix + value = line[5:].strip() if value and value.upper() != "N/A": contact['role'] = value elif line_upper.startswith("COMPANY:"): - value = line[8:].strip() # Skip "COMPANY:" prefix + value = line[8:].strip() if value and value.upper() != "N/A": contact['company'] = value elif line_upper.startswith("LINKEDIN:"): - value = line[9:].strip() # Skip "LINKEDIN:" prefix + value = line[9:].strip() if value and value.upper() != "N/A": contact['linkedin'] = value - # Debug logging - if not (contact['first'] or contact['last'] or contact['linkedin']): - logger.debug(f"Failed to extract. Full text content ({len(all_text)} messages):") - for i, text in enumerate(all_text[-3:]): # Show last 3 messages - logger.debug(f" Message {i}: {text[:200]}") - return contact async def scrape_linkedin_connections(): -""" -Scrape the first 20 connections from LinkedIn and export to CSV. -The agent extracts data, and Python handles CSV writing programmatically. -""" + """Scrape LinkedIn connections and export to CSV.""" - # Generate output filename with timestamp timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") csv_filename = f"linkedin_connections_{timestamp}.csv" csv_path = os.path.join(os.getcwd(), csv_filename) - # Initialize CSV file with headers + # Initialize CSV file with open(csv_path, 'w', newline='', encoding='utf-8') as csvfile: writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'role', 'company', 'met_at', 'linkedin']) writer.writeheader() - print(f"\\nπŸš€ Starting LinkedIn connections scraper") + print(f"\nπŸš€ Starting LinkedIn connections scraper") print(f"πŸ“ Output file: {csv_path}") print(f"πŸ“ Met at: {MET_AT_REASON}") print("=" * 80) @@ -244,8 +226,8 @@ The agent extracts data, and Python handles CSV writing programmatically. async with Computer( os_type="linux", provider_type=VMProviderType.CLOUD, - name="`}{`", - api_key="`}{`", + name=os.environ["CUA_CONTAINER_NAME"], # Your sandbox name + api_key=os.environ["CUA_API_KEY"], verbosity=logging.INFO, ) as computer: @@ -263,1257 +245,224 @@ The agent extracts data, and Python handles CSV writing programmatically. # Task 1: Navigate to LinkedIn connections page navigation_task = ( - "STEP 1 - NAVIGATE TO LINKEDIN CONNECTIONS PAGE:\\n" - "1. Open a web browser (Chrome or Firefox)\\n" - "2. Navigate to https://www.linkedin.com/mynetwork/invite-connect/connections/\\n" - "3. Wait for the page to fully load (look for the connection list to appear)\\n" - "4. If prompted to log in, handle the authentication\\n" - "5. Confirm you can see the list of connections displayed on the page\\n" - "6. Ready to start extracting contacts one by one" + "STEP 1 - NAVIGATE TO LINKEDIN CONNECTIONS PAGE:\n" + "1. Open a web browser (Chrome or Firefox)\n" + "2. Navigate to https://www.linkedin.com/mynetwork/invite-connect/connections/\n" + "3. Wait for the page to fully load\n" + "4. Confirm you can see the list of connections\n" + "5. Ready to start extracting contacts" ) - print(f"\\n[Task 1/21] Navigating to LinkedIn connections page...") + print(f"\n[Task 1/21] Navigating to LinkedIn...") history.append({"role": "user", "content": navigation_task}) async for result in agent.run(history, stream=False): history += result.get("output", []) - for item in result.get("output", []): - if item.get("type") == "message": - content = item.get("content", []) - for content_part in content: - if content_part.get("text"): - logger.debug(f"Agent: {content_part.get('text')}") - print(f"βœ… Navigation completed\\n") + print(f"βœ… Navigation completed\n") - # Tasks 2-21: Extract each of the 20 contacts + # Extract 20 contacts contacts_extracted = 0 - linkedin_urls = [] # Track LinkedIn URLs for bonus messaging links - previous_contact_name = None # Track the previous contact's name for easy navigation + linkedin_urls = [] + previous_contact_name = None for contact_num in range(1, 21): - # Build extraction task based on whether this is the first contact or not + # Build extraction task if contact_num == 1: - # First contact - start from the top extraction_task = ( - f"STEP {contact_num + 1} - EXTRACT CONTACT {contact_num} OF 20:\\n" - f"1. Look at the very first connection at the top of the list\\n" - f"2. Click on their name/profile link to open their LinkedIn profile page\\n" - f"3. Wait for their profile page to load completely\\n" - f"4. Extract the following information from their profile:\\n" - f" - First name: Extract from their display name at the top (just the first name)\\n" - f" - Last name: Extract from their display name at the top (just the last name)\\n" - f" - Current role/title: Extract from the HEADLINE directly under their name (e.g., 'Software Engineer')\\n" - f" - Company name: Extract from the HEADLINE (typically after 'at' or '@', e.g., 'Software Engineer at Google' β†’ 'Google')\\n" - f" - LinkedIn profile URL: Copy the FULL URL from the browser address bar (must start with https://www.linkedin.com/in/)\\n" - f"5. CRITICAL: You MUST return ALL 5 fields in this EXACT format with each field on its own line:\\n" - f"FIRST: [first name]\\n" - f"LAST: [last name]\\n" - f"ROLE: [role/title from headline]\\n" - f"COMPANY: [company from headline]\\n" - f"LINKEDIN: [full profile URL]\\n" - f"\\n" - f"6. If any field is not available, write 'N/A' instead of leaving it blank\\n" - f"7. Do NOT add any extra text before or after these 5 lines\\n" - f"8. Navigate back to the connections list page" + f"STEP {contact_num + 1} - EXTRACT CONTACT {contact_num} OF 20:\n" + f"1. Click on the first connection's profile\n" + f"2. Extract: FIRST, LAST, ROLE, COMPANY, LINKEDIN URL\n" + f"3. Return in exact format:\n" + f"FIRST: [value]\n" + f"LAST: [value]\n" + f"ROLE: [value]\n" + f"COMPANY: [value]\n" + f"LINKEDIN: [value]\n" + f"4. Navigate back to connections list" ) else: - # Subsequent contacts - reference the previous contact extraction_task = ( - f"STEP {contact_num + 1} - EXTRACT CONTACT {contact_num} OF 20:\\n" - f"1. Find the contact named '{previous_contact_name}' in the list\\n" - f"2. If you don't see '{previous_contact_name}' on the screen, scroll down slowly until you find them\\n" - f"3. Once you find '{previous_contact_name}', look at the contact directly BELOW them\\n" - f"4. Click on that contact's name/profile link (the one below '{previous_contact_name}') to open their profile page\\n" - f"5. Wait for their profile page to load completely\\n" - f"6. Extract the following information from their profile:\\n" - f" - First name: Extract from their display name at the top (just the first name)\\n" - f" - Last name: Extract from their display name at the top (just the last name)\\n" - f" - Current role/title: Extract from the HEADLINE directly under their name (e.g., 'Software Engineer')\\n" - f" - Company name: Extract from the HEADLINE (typically after 'at' or '@', e.g., 'Software Engineer at Google' β†’ 'Google')\\n" - f" - LinkedIn profile URL: Copy the FULL URL from the browser address bar (must start with https://www.linkedin.com/in/)\\n" - f"7. CRITICAL: You MUST return ALL 5 fields in this EXACT format with each field on its own line:\\n" - f"FIRST: [first name]\\n" - f"LAST: [last name]\\n" - f"ROLE: [role/title from headline]\\n" - f"COMPANY: [company from headline]\\n" - f"LINKEDIN: [full profile URL]\\n" - f"\\n" - f"8. If any field is not available, write 'N/A' instead of leaving it blank\\n" - f"9. Do NOT add any extra text before or after these 5 lines\\n" - f"10. Navigate back to the connections list page" + f"STEP {contact_num + 1} - EXTRACT CONTACT {contact_num} OF 20:\n" + f"1. Find '{previous_contact_name}' in the list\n" + f"2. Click on the contact BELOW them\n" + f"3. Extract: FIRST, LAST, ROLE, COMPANY, LINKEDIN URL\n" + f"4. Return in exact format:\n" + f"FIRST: [value]\n" + f"LAST: [value]\n" + f"ROLE: [value]\n" + f"COMPANY: [value]\n" + f"LINKEDIN: [value]\n" + f"5. Navigate back" ) print(f"[Task {contact_num + 1}/21] Extracting contact {contact_num}/20...") history.append({"role": "user", "content": extraction_task}) - # Collect all output from the agent all_output = [] async for result in agent.run(history, stream=False): output = result.get("output", []) history += output all_output.extend(output) - # Log agent output at debug level (only shown if verbosity increased) - for item in output: - if item.get("type") == "message": - content = item.get("content", []) - for content_part in content: - if content_part.get("text"): - logger.debug(f"Agent: {content_part.get('text')}") - - # Now extract contact information from ALL collected output (not just partial results) contact_data = extract_contact_from_response(all_output) - # Validate we got at least the critical fields (name or LinkedIn URL) has_name = bool(contact_data['first'] and contact_data['last']) has_linkedin = bool(contact_data['linkedin'] and 'linkedin.com' in contact_data['linkedin']) - # Write to CSV if we got at least name OR linkedin if has_name or has_linkedin: with open(csv_path, 'a', newline='', encoding='utf-8') as csvfile: writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'role', 'company', 'met_at', 'linkedin']) writer.writerow(contact_data) contacts_extracted += 1 - # Track LinkedIn URL for messaging links if contact_data['linkedin']: linkedin_urls.append(contact_data['linkedin']) - # Remember this contact's name for the next iteration if has_name: previous_contact_name = f"{contact_data['first']} {contact_data['last']}".strip() - # Success message with what we got name_str = f"{contact_data['first']} {contact_data['last']}" if has_name else "[No name]" - linkedin_str = "βœ“ LinkedIn" if has_linkedin else "βœ— No LinkedIn" - role_str = f"({contact_data['role']})" if contact_data['role'] else "(No role)" - print(f"βœ… Contact {contact_num}/20 saved: {name_str} {role_str} | {linkedin_str}") + print(f"βœ… Contact {contact_num}/20 saved: {name_str}") else: print(f"⚠️ Could not extract valid data for contact {contact_num}") - print(f" Got: first='{contact_data['first']}', last='{contact_data['last']}', linkedin='{contact_data['linkedin'][:50] if contact_data['linkedin'] else 'None'}'") - print(f" Check the agent's output above to see what was returned") - print(f" Total output items: {len(all_output)}") - # Progress update every 5 contacts if contact_num % 5 == 0: - print(f"\\nπŸ“ˆ Progress: {contacts_extracted}/{contact_num} contacts extracted so far...\\n") + print(f"\nπŸ“ˆ Progress: {contacts_extracted}/{contact_num} contacts extracted\n") - # BONUS: Create messaging compose links file + # Create messaging links file messaging_filename = f"linkedin_messaging_links_{timestamp}.txt" messaging_path = os.path.join(os.getcwd(), messaging_filename) with open(messaging_path, 'w', encoding='utf-8') as txtfile: - txtfile.write("LinkedIn Messaging Compose Links\\n") - txtfile.write("=" * 80 + "\\n\\n") + txtfile.write("LinkedIn Messaging Compose Links\n") + txtfile.write("=" * 80 + "\n\n") for i, linkedin_url in enumerate(linkedin_urls, 1): public_id = extract_public_id_from_linkedin_url(linkedin_url) if public_id: messaging_url = f"https://www.linkedin.com/messaging/compose/?recipient={public_id}" - txtfile.write(f"{i}. {messaging_url}\\n") - else: - txtfile.write(f"{i}. [Could not extract public ID from: {linkedin_url}]\\n") + txtfile.write(f"{i}. {messaging_url}\n") - print("\\n" + "="*80) + print("\n" + "="*80) print("πŸŽ‰ All tasks completed!") print(f"πŸ“ CSV file saved to: {csv_path}") print(f"πŸ“Š Total contacts extracted: {contacts_extracted}/20") - print(f"πŸ’¬ Bonus: Messaging links saved to: {messaging_path}") - print(f"πŸ“ Total messaging links: {len(linkedin_urls)}") + print(f"πŸ’¬ Messaging links saved to: {messaging_path}") print("="*80) except Exception as e: - print(f"\\n❌ Error during scraping: {e}") + print(f"\n❌ Error: {e}") traceback.print_exc() raise def main(): -try: -load_dotenv() + try: + load_dotenv() if "ANTHROPIC_API_KEY" not in os.environ: - raise RuntimeError( - "Please set the ANTHROPIC_API_KEY environment variable.\\n" - "You can add it to a .env file in the project root." - ) + raise RuntimeError("Please set ANTHROPIC_API_KEY in .env") if "CUA_API_KEY" not in os.environ: - raise RuntimeError( - "Please set the CUA_API_KEY environment variable.\\n" - "You can add it to a .env file in the project root." - ) + raise RuntimeError("Please set CUA_API_KEY in .env") + + if "CUA_CONTAINER_NAME" not in os.environ: + raise RuntimeError("Please set CUA_CONTAINER_NAME in .env") signal.signal(signal.SIGINT, handle_sigint) asyncio.run(scrape_linkedin_connections()) except Exception as e: - print(f"\\n❌ Error running automation: {e}") + print(f"\n❌ Error: {e}") traceback.print_exc() -if **name** == "**main**": -main()`} - - +if __name__ == "__main__": + main() +``` - + - -{`import asyncio -import csv -import logging -import os -import signal -import traceback -from datetime import datetime +```python +# Same code as Cloud Sandbox, but change Computer initialization to: +async with Computer( + os_type="linux", + provider_type=VMProviderType.DOCKER, + image="trycua/cua-xfce:latest", + verbosity=logging.INFO, +) as computer: +``` -from agent import ComputerAgent -from computer import Computer, VMProviderType -from dotenv import load_dotenv - -logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(**name**) - -# Configuration: Define where you met these connections - -MET_AT_REASON = "`}{`" - -def handle_sigint(sig, frame): -print("\\n\\nExecution interrupted by user. Exiting gracefully...") -exit(0) - -def extract_public_id_from_linkedin_url(linkedin_url): -""" -Extract public ID from LinkedIn profile URL. -Example: https://www.linkedin.com/in/taylor-r-devries/?lipi=... -> taylor-r-devries -""" -if not linkedin_url: -return None - - # Remove query parameters and trailing slashes - url = linkedin_url.split('?')[0].rstrip('/') - - # Extract the part after /in/ - if '/in/' in url: - public_id = url.split('/in/')[-1] - return public_id - - return None - -def extract_contact_from_response(result_output): -""" -Extract contact information from agent's response. -Expects the agent to return data in format: -FIRST: value -LAST: value -ROLE: value -COMPANY: value -LINKEDIN: value - - Note: met_at is auto-filled from MET_AT_REASON constant. - """ - contact = { - 'first': '', - 'last': '', - 'role': '', - 'company': '', - 'met_at': MET_AT_REASON, # Auto-fill from constant - 'linkedin': '' - } - - # Collect all text from messages for debugging - all_text = [] - - for item in result_output: - if item.get("type") == "message": - content = item.get("content", []) - for content_part in content: - text = content_part.get("text", "") - if text: - all_text.append(text) - # Parse structured output - look for the exact format - for line in text.split('\\n'): - line = line.strip() - # Use case-insensitive matching and handle extra whitespace - line_upper = line.upper() - - if line_upper.startswith("FIRST:"): - value = line[6:].strip() # Skip "FIRST:" prefix - if value and value.upper() != "N/A": - contact['first'] = value - elif line_upper.startswith("LAST:"): - value = line[5:].strip() # Skip "LAST:" prefix - if value and value.upper() != "N/A": - contact['last'] = value - elif line_upper.startswith("ROLE:"): - value = line[5:].strip() # Skip "ROLE:" prefix - if value and value.upper() != "N/A": - contact['role'] = value - elif line_upper.startswith("COMPANY:"): - value = line[8:].strip() # Skip "COMPANY:" prefix - if value and value.upper() != "N/A": - contact['company'] = value - elif line_upper.startswith("LINKEDIN:"): - value = line[9:].strip() # Skip "LINKEDIN:" prefix - if value and value.upper() != "N/A": - contact['linkedin'] = value - - # Debug logging - if not (contact['first'] or contact['last'] or contact['linkedin']): - logger.debug(f"Failed to extract. Full text content ({len(all_text)} messages):") - for i, text in enumerate(all_text[-3:]): # Show last 3 messages - logger.debug(f" Message {i}: {text[:200]}") - - return contact - -async def scrape_linkedin_connections(): -""" -Scrape the first 20 connections from LinkedIn and export to CSV. -The agent extracts data, and Python handles CSV writing programmatically. -""" - - # Generate output filename with timestamp - timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") - csv_filename = f"linkedin_connections_{timestamp}.csv" - csv_path = os.path.join(os.getcwd(), csv_filename) - - # Initialize CSV file with headers - with open(csv_path, 'w', newline='', encoding='utf-8') as csvfile: - writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'role', 'company', 'met_at', 'linkedin']) - writer.writeheader() - - print(f"\\nπŸš€ Starting LinkedIn connections scraper") - print(f"πŸ“ Output file: {csv_path}") - print(f"πŸ“ Met at: {MET_AT_REASON}") - print("=" * 80) - - try: - async with Computer( - os_type="linux", - provider_type=VMProviderType.DOCKER, - name="`}{`", - verbosity=logging.INFO, - ) as computer: - - agent = ComputerAgent( - model="anthropic/claude-sonnet-4-5-20250929", - tools=[computer], - only_n_most_recent_images=3, - verbosity=logging.INFO, - trajectory_dir="trajectories", - use_prompt_caching=True, - max_trajectory_budget=10.0, - ) - - history = [] - - # Task 1: Navigate to LinkedIn connections page - navigation_task = ( - "STEP 1 - NAVIGATE TO LINKEDIN CONNECTIONS PAGE:\\n" - "1. Open a web browser (Chrome or Firefox)\\n" - "2. Navigate to https://www.linkedin.com/mynetwork/invite-connect/connections/\\n" - "3. Wait for the page to fully load (look for the connection list to appear)\\n" - "4. If prompted to log in, handle the authentication\\n" - "5. Confirm you can see the list of connections displayed on the page\\n" - "6. Ready to start extracting contacts one by one" - ) - - print(f"\\n[Task 1/21] Navigating to LinkedIn connections page...") - history.append({"role": "user", "content": navigation_task}) - - async for result in agent.run(history, stream=False): - history += result.get("output", []) - for item in result.get("output", []): - if item.get("type") == "message": - content = item.get("content", []) - for content_part in content: - if content_part.get("text"): - logger.debug(f"Agent: {content_part.get('text')}") - - print(f"βœ… Navigation completed\\n") - - # Tasks 2-21: Extract each of the 20 contacts - contacts_extracted = 0 - linkedin_urls = [] # Track LinkedIn URLs for bonus messaging links - previous_contact_name = None # Track the previous contact's name for easy navigation - - for contact_num in range(1, 21): - # Build extraction task based on whether this is the first contact or not - if contact_num == 1: - # First contact - start from the top - extraction_task = ( - f"STEP {contact_num + 1} - EXTRACT CONTACT {contact_num} OF 20:\\n" - f"1. Look at the very first connection at the top of the list\\n" - f"2. Click on their name/profile link to open their LinkedIn profile page\\n" - f"3. Wait for their profile page to load completely\\n" - f"4. Extract the following information from their profile:\\n" - f" - First name: Extract from their display name at the top (just the first name)\\n" - f" - Last name: Extract from their display name at the top (just the last name)\\n" - f" - Current role/title: Extract from the HEADLINE directly under their name (e.g., 'Software Engineer')\\n" - f" - Company name: Extract from the HEADLINE (typically after 'at' or '@', e.g., 'Software Engineer at Google' β†’ 'Google')\\n" - f" - LinkedIn profile URL: Copy the FULL URL from the browser address bar (must start with https://www.linkedin.com/in/)\\n" - f"5. CRITICAL: You MUST return ALL 5 fields in this EXACT format with each field on its own line:\\n" - f"FIRST: [first name]\\n" - f"LAST: [last name]\\n" - f"ROLE: [role/title from headline]\\n" - f"COMPANY: [company from headline]\\n" - f"LINKEDIN: [full profile URL]\\n" - f"\\n" - f"6. If any field is not available, write 'N/A' instead of leaving it blank\\n" - f"7. Do NOT add any extra text before or after these 5 lines\\n" - f"8. Navigate back to the connections list page" - ) - else: - # Subsequent contacts - reference the previous contact - extraction_task = ( - f"STEP {contact_num + 1} - EXTRACT CONTACT {contact_num} OF 20:\\n" - f"1. Find the contact named '{previous_contact_name}' in the list\\n" - f"2. If you don't see '{previous_contact_name}' on the screen, scroll down slowly until you find them\\n" - f"3. Once you find '{previous_contact_name}', look at the contact directly BELOW them\\n" - f"4. Click on that contact's name/profile link (the one below '{previous_contact_name}') to open their profile page\\n" - f"5. Wait for their profile page to load completely\\n" - f"6. Extract the following information from their profile:\\n" - f" - First name: Extract from their display name at the top (just the first name)\\n" - f" - Last name: Extract from their display name at the top (just the last name)\\n" - f" - Current role/title: Extract from the HEADLINE directly under their name (e.g., 'Software Engineer')\\n" - f" - Company name: Extract from the HEADLINE (typically after 'at' or '@', e.g., 'Software Engineer at Google' β†’ 'Google')\\n" - f" - LinkedIn profile URL: Copy the FULL URL from the browser address bar (must start with https://www.linkedin.com/in/)\\n" - f"7. CRITICAL: You MUST return ALL 5 fields in this EXACT format with each field on its own line:\\n" - f"FIRST: [first name]\\n" - f"LAST: [last name]\\n" - f"ROLE: [role/title from headline]\\n" - f"COMPANY: [company from headline]\\n" - f"LINKEDIN: [full profile URL]\\n" - f"\\n" - f"8. If any field is not available, write 'N/A' instead of leaving it blank\\n" - f"9. Do NOT add any extra text before or after these 5 lines\\n" - f"10. Navigate back to the connections list page" - ) - - print(f"[Task {contact_num + 1}/21] Extracting contact {contact_num}/20...") - history.append({"role": "user", "content": extraction_task}) - - # Collect all output from the agent - all_output = [] - async for result in agent.run(history, stream=False): - output = result.get("output", []) - history += output - all_output.extend(output) - - # Log agent output at debug level (only shown if verbosity increased) - for item in output: - if item.get("type") == "message": - content = item.get("content", []) - for content_part in content: - if content_part.get("text"): - logger.debug(f"Agent: {content_part.get('text')}") - - # Now extract contact information from ALL collected output (not just partial results) - contact_data = extract_contact_from_response(all_output) - - # Validate we got at least the critical fields (name or LinkedIn URL) - has_name = bool(contact_data['first'] and contact_data['last']) - has_linkedin = bool(contact_data['linkedin'] and 'linkedin.com' in contact_data['linkedin']) - - # Write to CSV if we got at least name OR linkedin - if has_name or has_linkedin: - with open(csv_path, 'a', newline='', encoding='utf-8') as csvfile: - writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'role', 'company', 'met_at', 'linkedin']) - writer.writerow(contact_data) - contacts_extracted += 1 - - # Track LinkedIn URL for messaging links - if contact_data['linkedin']: - linkedin_urls.append(contact_data['linkedin']) - - # Remember this contact's name for the next iteration - if has_name: - previous_contact_name = f"{contact_data['first']} {contact_data['last']}".strip() - - # Success message with what we got - name_str = f"{contact_data['first']} {contact_data['last']}" if has_name else "[No name]" - linkedin_str = "βœ“ LinkedIn" if has_linkedin else "βœ— No LinkedIn" - role_str = f"({contact_data['role']})" if contact_data['role'] else "(No role)" - print(f"βœ… Contact {contact_num}/20 saved: {name_str} {role_str} | {linkedin_str}") - else: - print(f"⚠️ Could not extract valid data for contact {contact_num}") - print(f" Got: first='{contact_data['first']}', last='{contact_data['last']}', linkedin='{contact_data['linkedin'][:50] if contact_data['linkedin'] else 'None'}'") - print(f" Check the agent's output above to see what was returned") - print(f" Total output items: {len(all_output)}") - - # Progress update every 5 contacts - if contact_num % 5 == 0: - print(f"\\nπŸ“ˆ Progress: {contacts_extracted}/{contact_num} contacts extracted so far...\\n") - - # BONUS: Create messaging compose links file - messaging_filename = f"linkedin_messaging_links_{timestamp}.txt" - messaging_path = os.path.join(os.getcwd(), messaging_filename) - - with open(messaging_path, 'w', encoding='utf-8') as txtfile: - txtfile.write("LinkedIn Messaging Compose Links\\n") - txtfile.write("=" * 80 + "\\n\\n") - - for i, linkedin_url in enumerate(linkedin_urls, 1): - public_id = extract_public_id_from_linkedin_url(linkedin_url) - if public_id: - messaging_url = f"https://www.linkedin.com/messaging/compose/?recipient={public_id}" - txtfile.write(f"{i}. {messaging_url}\\n") - else: - txtfile.write(f"{i}. [Could not extract public ID from: {linkedin_url}]\\n") - - print("\\n" + "="*80) - print("πŸŽ‰ All tasks completed!") - print(f"πŸ“ CSV file saved to: {csv_path}") - print(f"πŸ“Š Total contacts extracted: {contacts_extracted}/20") - print(f"πŸ’¬ Bonus: Messaging links saved to: {messaging_path}") - print(f"πŸ“ Total messaging links: {len(linkedin_urls)}") - print("="*80) - - except Exception as e: - print(f"\\n❌ Error during scraping: {e}") - traceback.print_exc() - raise - -def main(): -try: -load_dotenv() - - if "ANTHROPIC_API_KEY" not in os.environ: - raise RuntimeError( - "Please set the ANTHROPIC_API_KEY environment variable.\\n" - "You can add it to a .env file in the project root." - ) - - signal.signal(signal.SIGINT, handle_sigint) - - asyncio.run(scrape_linkedin_connections()) - - except Exception as e: - print(f"\\n❌ Error running automation: {e}") - traceback.print_exc() - -if **name** == "**main**": -main()`} - - +And remove the `CUA_API_KEY` and `CUA_CONTAINER_NAME` requirements from `.env` and the validation checks. - + - -{`import asyncio -import csv -import logging -import os -import signal -import traceback -from datetime import datetime +```python +# Same code as Cloud Sandbox, but change Computer initialization to: +async with Computer( + os_type="macos", + provider_type=VMProviderType.LUME, + name="macos-sequoia-cua:latest", + verbosity=logging.INFO, +) as computer: +``` -from agent import ComputerAgent -from computer import Computer, VMProviderType -from dotenv import load_dotenv - -logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(**name**) - -# Configuration: Define where you met these connections - -MET_AT_REASON = "`}{`" - -def handle_sigint(sig, frame): -print("\\n\\nExecution interrupted by user. Exiting gracefully...") -exit(0) - -def extract_public_id_from_linkedin_url(linkedin_url): -""" -Extract public ID from LinkedIn profile URL. -Example: https://www.linkedin.com/in/taylor-r-devries/?lipi=... -> taylor-r-devries -""" -if not linkedin_url: -return None - - # Remove query parameters and trailing slashes - url = linkedin_url.split('?')[0].rstrip('/') - - # Extract the part after /in/ - if '/in/' in url: - public_id = url.split('/in/')[-1] - return public_id - - return None - -def extract_contact_from_response(result_output): -""" -Extract contact information from agent's response. -Expects the agent to return data in format: -FIRST: value -LAST: value -ROLE: value -COMPANY: value -LINKEDIN: value - - Note: met_at is auto-filled from MET_AT_REASON constant. - """ - contact = { - 'first': '', - 'last': '', - 'role': '', - 'company': '', - 'met_at': MET_AT_REASON, # Auto-fill from constant - 'linkedin': '' - } - - # Collect all text from messages for debugging - all_text = [] - - for item in result_output: - if item.get("type") == "message": - content = item.get("content", []) - for content_part in content: - text = content_part.get("text", "") - if text: - all_text.append(text) - # Parse structured output - look for the exact format - for line in text.split('\\n'): - line = line.strip() - # Use case-insensitive matching and handle extra whitespace - line_upper = line.upper() - - if line_upper.startswith("FIRST:"): - value = line[6:].strip() # Skip "FIRST:" prefix - if value and value.upper() != "N/A": - contact['first'] = value - elif line_upper.startswith("LAST:"): - value = line[5:].strip() # Skip "LAST:" prefix - if value and value.upper() != "N/A": - contact['last'] = value - elif line_upper.startswith("ROLE:"): - value = line[5:].strip() # Skip "ROLE:" prefix - if value and value.upper() != "N/A": - contact['role'] = value - elif line_upper.startswith("COMPANY:"): - value = line[8:].strip() # Skip "COMPANY:" prefix - if value and value.upper() != "N/A": - contact['company'] = value - elif line_upper.startswith("LINKEDIN:"): - value = line[9:].strip() # Skip "LINKEDIN:" prefix - if value and value.upper() != "N/A": - contact['linkedin'] = value - - # Debug logging - if not (contact['first'] or contact['last'] or contact['linkedin']): - logger.debug(f"Failed to extract. Full text content ({len(all_text)} messages):") - for i, text in enumerate(all_text[-3:]): # Show last 3 messages - logger.debug(f" Message {i}: {text[:200]}") - - return contact - -async def scrape_linkedin_connections(): -""" -Scrape the first 20 connections from LinkedIn and export to CSV. -The agent extracts data, and Python handles CSV writing programmatically. -""" - - # Generate output filename with timestamp - timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") - csv_filename = f"linkedin_connections_{timestamp}.csv" - csv_path = os.path.join(os.getcwd(), csv_filename) - - # Initialize CSV file with headers - with open(csv_path, 'w', newline='', encoding='utf-8') as csvfile: - writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'role', 'company', 'met_at', 'linkedin']) - writer.writeheader() - - print(f"\\nπŸš€ Starting LinkedIn connections scraper") - print(f"πŸ“ Output file: {csv_path}") - print(f"πŸ“ Met at: {MET_AT_REASON}") - print("=" * 80) - - try: - async with Computer( - os_type="macos", - provider_type=VMProviderType.LUME, - name="`}{`", - verbosity=logging.INFO, - ) as computer: - - agent = ComputerAgent( - model="anthropic/claude-sonnet-4-5-20250929", - tools=[computer], - only_n_most_recent_images=3, - verbosity=logging.INFO, - trajectory_dir="trajectories", - use_prompt_caching=True, - max_trajectory_budget=10.0, - ) - - history = [] - - # Task 1: Navigate to LinkedIn connections page - navigation_task = ( - "STEP 1 - NAVIGATE TO LINKEDIN CONNECTIONS PAGE:\\n" - "1. Open a web browser (Chrome or Firefox)\\n" - "2. Navigate to https://www.linkedin.com/mynetwork/invite-connect/connections/\\n" - "3. Wait for the page to fully load (look for the connection list to appear)\\n" - "4. If prompted to log in, handle the authentication\\n" - "5. Confirm you can see the list of connections displayed on the page\\n" - "6. Ready to start extracting contacts one by one" - ) - - print(f"\\n[Task 1/21] Navigating to LinkedIn connections page...") - history.append({"role": "user", "content": navigation_task}) - - async for result in agent.run(history, stream=False): - history += result.get("output", []) - for item in result.get("output", []): - if item.get("type") == "message": - content = item.get("content", []) - for content_part in content: - if content_part.get("text"): - logger.debug(f"Agent: {content_part.get('text')}") - - print(f"βœ… Navigation completed\\n") - - # Tasks 2-21: Extract each of the 20 contacts - contacts_extracted = 0 - linkedin_urls = [] # Track LinkedIn URLs for bonus messaging links - previous_contact_name = None # Track the previous contact's name for easy navigation - - for contact_num in range(1, 21): - # Build extraction task based on whether this is the first contact or not - if contact_num == 1: - # First contact - start from the top - extraction_task = ( - f"STEP {contact_num + 1} - EXTRACT CONTACT {contact_num} OF 20:\\n" - f"1. Look at the very first connection at the top of the list\\n" - f"2. Click on their name/profile link to open their LinkedIn profile page\\n" - f"3. Wait for their profile page to load completely\\n" - f"4. Extract the following information from their profile:\\n" - f" - First name: Extract from their display name at the top (just the first name)\\n" - f" - Last name: Extract from their display name at the top (just the last name)\\n" - f" - Current role/title: Extract from the HEADLINE directly under their name (e.g., 'Software Engineer')\\n" - f" - Company name: Extract from the HEADLINE (typically after 'at' or '@', e.g., 'Software Engineer at Google' β†’ 'Google')\\n" - f" - LinkedIn profile URL: Copy the FULL URL from the browser address bar (must start with https://www.linkedin.com/in/)\\n" - f"5. CRITICAL: You MUST return ALL 5 fields in this EXACT format with each field on its own line:\\n" - f"FIRST: [first name]\\n" - f"LAST: [last name]\\n" - f"ROLE: [role/title from headline]\\n" - f"COMPANY: [company from headline]\\n" - f"LINKEDIN: [full profile URL]\\n" - f"\\n" - f"6. If any field is not available, write 'N/A' instead of leaving it blank\\n" - f"7. Do NOT add any extra text before or after these 5 lines\\n" - f"8. Navigate back to the connections list page" - ) - else: - # Subsequent contacts - reference the previous contact - extraction_task = ( - f"STEP {contact_num + 1} - EXTRACT CONTACT {contact_num} OF 20:\\n" - f"1. Find the contact named '{previous_contact_name}' in the list\\n" - f"2. If you don't see '{previous_contact_name}' on the screen, scroll down slowly until you find them\\n" - f"3. Once you find '{previous_contact_name}', look at the contact directly BELOW them\\n" - f"4. Click on that contact's name/profile link (the one below '{previous_contact_name}') to open their profile page\\n" - f"5. Wait for their profile page to load completely\\n" - f"6. Extract the following information from their profile:\\n" - f" - First name: Extract from their display name at the top (just the first name)\\n" - f" - Last name: Extract from their display name at the top (just the last name)\\n" - f" - Current role/title: Extract from the HEADLINE directly under their name (e.g., 'Software Engineer')\\n" - f" - Company name: Extract from the HEADLINE (typically after 'at' or '@', e.g., 'Software Engineer at Google' β†’ 'Google')\\n" - f" - LinkedIn profile URL: Copy the FULL URL from the browser address bar (must start with https://www.linkedin.com/in/)\\n" - f"7. CRITICAL: You MUST return ALL 5 fields in this EXACT format with each field on its own line:\\n" - f"FIRST: [first name]\\n" - f"LAST: [last name]\\n" - f"ROLE: [role/title from headline]\\n" - f"COMPANY: [company from headline]\\n" - f"LINKEDIN: [full profile URL]\\n" - f"\\n" - f"8. If any field is not available, write 'N/A' instead of leaving it blank\\n" - f"9. Do NOT add any extra text before or after these 5 lines\\n" - f"10. Navigate back to the connections list page" - ) - - print(f"[Task {contact_num + 1}/21] Extracting contact {contact_num}/20...") - history.append({"role": "user", "content": extraction_task}) - - # Collect all output from the agent - all_output = [] - async for result in agent.run(history, stream=False): - output = result.get("output", []) - history += output - all_output.extend(output) - - # Log agent output at debug level (only shown if verbosity increased) - for item in output: - if item.get("type") == "message": - content = item.get("content", []) - for content_part in content: - if content_part.get("text"): - logger.debug(f"Agent: {content_part.get('text')}") - - # Now extract contact information from ALL collected output (not just partial results) - contact_data = extract_contact_from_response(all_output) - - # Validate we got at least the critical fields (name or LinkedIn URL) - has_name = bool(contact_data['first'] and contact_data['last']) - has_linkedin = bool(contact_data['linkedin'] and 'linkedin.com' in contact_data['linkedin']) - - # Write to CSV if we got at least name OR linkedin - if has_name or has_linkedin: - with open(csv_path, 'a', newline='', encoding='utf-8') as csvfile: - writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'role', 'company', 'met_at', 'linkedin']) - writer.writerow(contact_data) - contacts_extracted += 1 - - # Track LinkedIn URL for messaging links - if contact_data['linkedin']: - linkedin_urls.append(contact_data['linkedin']) - - # Remember this contact's name for the next iteration - if has_name: - previous_contact_name = f"{contact_data['first']} {contact_data['last']}".strip() - - # Success message with what we got - name_str = f"{contact_data['first']} {contact_data['last']}" if has_name else "[No name]" - linkedin_str = "βœ“ LinkedIn" if has_linkedin else "βœ— No LinkedIn" - role_str = f"({contact_data['role']})" if contact_data['role'] else "(No role)" - print(f"βœ… Contact {contact_num}/20 saved: {name_str} {role_str} | {linkedin_str}") - else: - print(f"⚠️ Could not extract valid data for contact {contact_num}") - print(f" Got: first='{contact_data['first']}', last='{contact_data['last']}', linkedin='{contact_data['linkedin'][:50] if contact_data['linkedin'] else 'None'}'") - print(f" Check the agent's output above to see what was returned") - print(f" Total output items: {len(all_output)}") - - # Progress update every 5 contacts - if contact_num % 5 == 0: - print(f"\\nπŸ“ˆ Progress: {contacts_extracted}/{contact_num} contacts extracted so far...\\n") - - # BONUS: Create messaging compose links file - messaging_filename = f"linkedin_messaging_links_{timestamp}.txt" - messaging_path = os.path.join(os.getcwd(), messaging_filename) - - with open(messaging_path, 'w', encoding='utf-8') as txtfile: - txtfile.write("LinkedIn Messaging Compose Links\\n") - txtfile.write("=" * 80 + "\\n\\n") - - for i, linkedin_url in enumerate(linkedin_urls, 1): - public_id = extract_public_id_from_linkedin_url(linkedin_url) - if public_id: - messaging_url = f"https://www.linkedin.com/messaging/compose/?recipient={public_id}" - txtfile.write(f"{i}. {messaging_url}\\n") - else: - txtfile.write(f"{i}. [Could not extract public ID from: {linkedin_url}]\\n") - - print("\\n" + "="*80) - print("πŸŽ‰ All tasks completed!") - print(f"πŸ“ CSV file saved to: {csv_path}") - print(f"πŸ“Š Total contacts extracted: {contacts_extracted}/20") - print(f"πŸ’¬ Bonus: Messaging links saved to: {messaging_path}") - print(f"πŸ“ Total messaging links: {len(linkedin_urls)}") - print("="*80) - - except Exception as e: - print(f"\\n❌ Error during scraping: {e}") - traceback.print_exc() - raise - -def main(): -try: -load_dotenv() - - if "ANTHROPIC_API_KEY" not in os.environ: - raise RuntimeError( - "Please set the ANTHROPIC_API_KEY environment variable.\\n" - "You can add it to a .env file in the project root." - ) - - signal.signal(signal.SIGINT, handle_sigint) - - asyncio.run(scrape_linkedin_connections()) - - except Exception as e: - print(f"\\n❌ Error running automation: {e}") - traceback.print_exc() - -if **name** == "**main**": -main()`} - - +And remove the `CUA_API_KEY` and `CUA_CONTAINER_NAME` requirements from `.env` and the validation checks. - + - -{`import asyncio -import csv -import logging -import os -import signal -import traceback -from datetime import datetime +```python +# Same code as Cloud Sandbox, but change Computer initialization to: +async with Computer( + os_type="windows", + provider_type=VMProviderType.WINDOWS_SANDBOX, + verbosity=logging.INFO, +) as computer: +``` -from agent import ComputerAgent -from computer import Computer, VMProviderType -from dotenv import load_dotenv - -logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(**name**) - -# Configuration: Define where you met these connections - -MET_AT_REASON = "`}{`" - -def handle_sigint(sig, frame): -print("\\n\\nExecution interrupted by user. Exiting gracefully...") -exit(0) - -def extract_public_id_from_linkedin_url(linkedin_url): -""" -Extract public ID from LinkedIn profile URL. -Example: https://www.linkedin.com/in/taylor-r-devries/?lipi=... -> taylor-r-devries -""" -if not linkedin_url: -return None - - # Remove query parameters and trailing slashes - url = linkedin_url.split('?')[0].rstrip('/') - - # Extract the part after /in/ - if '/in/' in url: - public_id = url.split('/in/')[-1] - return public_id - - return None - -def extract_contact_from_response(result_output): -""" -Extract contact information from agent's response. -Expects the agent to return data in format: -FIRST: value -LAST: value -ROLE: value -COMPANY: value -LINKEDIN: value - - Note: met_at is auto-filled from MET_AT_REASON constant. - """ - contact = { - 'first': '', - 'last': '', - 'role': '', - 'company': '', - 'met_at': MET_AT_REASON, # Auto-fill from constant - 'linkedin': '' - } - - # Collect all text from messages for debugging - all_text = [] - - for item in result_output: - if item.get("type") == "message": - content = item.get("content", []) - for content_part in content: - text = content_part.get("text", "") - if text: - all_text.append(text) - # Parse structured output - look for the exact format - for line in text.split('\\n'): - line = line.strip() - # Use case-insensitive matching and handle extra whitespace - line_upper = line.upper() - - if line_upper.startswith("FIRST:"): - value = line[6:].strip() # Skip "FIRST:" prefix - if value and value.upper() != "N/A": - contact['first'] = value - elif line_upper.startswith("LAST:"): - value = line[5:].strip() # Skip "LAST:" prefix - if value and value.upper() != "N/A": - contact['last'] = value - elif line_upper.startswith("ROLE:"): - value = line[5:].strip() # Skip "ROLE:" prefix - if value and value.upper() != "N/A": - contact['role'] = value - elif line_upper.startswith("COMPANY:"): - value = line[8:].strip() # Skip "COMPANY:" prefix - if value and value.upper() != "N/A": - contact['company'] = value - elif line_upper.startswith("LINKEDIN:"): - value = line[9:].strip() # Skip "LINKEDIN:" prefix - if value and value.upper() != "N/A": - contact['linkedin'] = value - - # Debug logging - if not (contact['first'] or contact['last'] or contact['linkedin']): - logger.debug(f"Failed to extract. Full text content ({len(all_text)} messages):") - for i, text in enumerate(all_text[-3:]): # Show last 3 messages - logger.debug(f" Message {i}: {text[:200]}") - - return contact - -async def scrape_linkedin_connections(): -""" -Scrape the first 20 connections from LinkedIn and export to CSV. -The agent extracts data, and Python handles CSV writing programmatically. -""" - - # Generate output filename with timestamp - timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") - csv_filename = f"linkedin_connections_{timestamp}.csv" - csv_path = os.path.join(os.getcwd(), csv_filename) - - # Initialize CSV file with headers - with open(csv_path, 'w', newline='', encoding='utf-8') as csvfile: - writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'role', 'company', 'met_at', 'linkedin']) - writer.writeheader() - - print(f"\\nπŸš€ Starting LinkedIn connections scraper") - print(f"πŸ“ Output file: {csv_path}") - print(f"πŸ“ Met at: {MET_AT_REASON}") - print("=" * 80) - - try: - async with Computer( - os_type="windows", - provider_type=VMProviderType.WINDOWS_SANDBOX, - verbosity=logging.INFO, - ) as computer: - - agent = ComputerAgent( - model="anthropic/claude-sonnet-4-5-20250929", - tools=[computer], - only_n_most_recent_images=3, - verbosity=logging.INFO, - trajectory_dir="trajectories", - use_prompt_caching=True, - max_trajectory_budget=10.0, - ) - - history = [] - - # Task 1: Navigate to LinkedIn connections page - navigation_task = ( - "STEP 1 - NAVIGATE TO LINKEDIN CONNECTIONS PAGE:\\n" - "1. Open a web browser (Chrome or Firefox)\\n" - "2. Navigate to https://www.linkedin.com/mynetwork/invite-connect/connections/\\n" - "3. Wait for the page to fully load (look for the connection list to appear)\\n" - "4. If prompted to log in, handle the authentication\\n" - "5. Confirm you can see the list of connections displayed on the page\\n" - "6. Ready to start extracting contacts one by one" - ) - - print(f"\\n[Task 1/21] Navigating to LinkedIn connections page...") - history.append({"role": "user", "content": navigation_task}) - - async for result in agent.run(history, stream=False): - history += result.get("output", []) - for item in result.get("output", []): - if item.get("type") == "message": - content = item.get("content", []) - for content_part in content: - if content_part.get("text"): - logger.debug(f"Agent: {content_part.get('text')}") - - print(f"βœ… Navigation completed\\n") - - # Tasks 2-21: Extract each of the 20 contacts - contacts_extracted = 0 - linkedin_urls = [] # Track LinkedIn URLs for bonus messaging links - previous_contact_name = None # Track the previous contact's name for easy navigation - - for contact_num in range(1, 21): - # Build extraction task based on whether this is the first contact or not - if contact_num == 1: - # First contact - start from the top - extraction_task = ( - f"STEP {contact_num + 1} - EXTRACT CONTACT {contact_num} OF 20:\\n" - f"1. Look at the very first connection at the top of the list\\n" - f"2. Click on their name/profile link to open their LinkedIn profile page\\n" - f"3. Wait for their profile page to load completely\\n" - f"4. Extract the following information from their profile:\\n" - f" - First name: Extract from their display name at the top (just the first name)\\n" - f" - Last name: Extract from their display name at the top (just the last name)\\n" - f" - Current role/title: Extract from the HEADLINE directly under their name (e.g., 'Software Engineer')\\n" - f" - Company name: Extract from the HEADLINE (typically after 'at' or '@', e.g., 'Software Engineer at Google' β†’ 'Google')\\n" - f" - LinkedIn profile URL: Copy the FULL URL from the browser address bar (must start with https://www.linkedin.com/in/)\\n" - f"5. CRITICAL: You MUST return ALL 5 fields in this EXACT format with each field on its own line:\\n" - f"FIRST: [first name]\\n" - f"LAST: [last name]\\n" - f"ROLE: [role/title from headline]\\n" - f"COMPANY: [company from headline]\\n" - f"LINKEDIN: [full profile URL]\\n" - f"\\n" - f"6. If any field is not available, write 'N/A' instead of leaving it blank\\n" - f"7. Do NOT add any extra text before or after these 5 lines\\n" - f"8. Navigate back to the connections list page" - ) - else: - # Subsequent contacts - reference the previous contact - extraction_task = ( - f"STEP {contact_num + 1} - EXTRACT CONTACT {contact_num} OF 20:\\n" - f"1. Find the contact named '{previous_contact_name}' in the list\\n" - f"2. If you don't see '{previous_contact_name}' on the screen, scroll down slowly until you find them\\n" - f"3. Once you find '{previous_contact_name}', look at the contact directly BELOW them\\n" - f"4. Click on that contact's name/profile link (the one below '{previous_contact_name}') to open their profile page\\n" - f"5. Wait for their profile page to load completely\\n" - f"6. Extract the following information from their profile:\\n" - f" - First name: Extract from their display name at the top (just the first name)\\n" - f" - Last name: Extract from their display name at the top (just the last name)\\n" - f" - Current role/title: Extract from the HEADLINE directly under their name (e.g., 'Software Engineer')\\n" - f" - Company name: Extract from the HEADLINE (typically after 'at' or '@', e.g., 'Software Engineer at Google' β†’ 'Google')\\n" - f" - LinkedIn profile URL: Copy the FULL URL from the browser address bar (must start with https://www.linkedin.com/in/)\\n" - f"7. CRITICAL: You MUST return ALL 5 fields in this EXACT format with each field on its own line:\\n" - f"FIRST: [first name]\\n" - f"LAST: [last name]\\n" - f"ROLE: [role/title from headline]\\n" - f"COMPANY: [company from headline]\\n" - f"LINKEDIN: [full profile URL]\\n" - f"\\n" - f"8. If any field is not available, write 'N/A' instead of leaving it blank\\n" - f"9. Do NOT add any extra text before or after these 5 lines\\n" - f"10. Navigate back to the connections list page" - ) - - print(f"[Task {contact_num + 1}/21] Extracting contact {contact_num}/20...") - history.append({"role": "user", "content": extraction_task}) - - # Collect all output from the agent - all_output = [] - async for result in agent.run(history, stream=False): - output = result.get("output", []) - history += output - all_output.extend(output) - - # Log agent output at debug level (only shown if verbosity increased) - for item in output: - if item.get("type") == "message": - content = item.get("content", []) - for content_part in content: - if content_part.get("text"): - logger.debug(f"Agent: {content_part.get('text')}") - - # Now extract contact information from ALL collected output (not just partial results) - contact_data = extract_contact_from_response(all_output) - - # Validate we got at least the critical fields (name or LinkedIn URL) - has_name = bool(contact_data['first'] and contact_data['last']) - has_linkedin = bool(contact_data['linkedin'] and 'linkedin.com' in contact_data['linkedin']) - - # Write to CSV if we got at least name OR linkedin - if has_name or has_linkedin: - with open(csv_path, 'a', newline='', encoding='utf-8') as csvfile: - writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'role', 'company', 'met_at', 'linkedin']) - writer.writerow(contact_data) - contacts_extracted += 1 - - # Track LinkedIn URL for messaging links - if contact_data['linkedin']: - linkedin_urls.append(contact_data['linkedin']) - - # Remember this contact's name for the next iteration - if has_name: - previous_contact_name = f"{contact_data['first']} {contact_data['last']}".strip() - - # Success message with what we got - name_str = f"{contact_data['first']} {contact_data['last']}" if has_name else "[No name]" - linkedin_str = "βœ“ LinkedIn" if has_linkedin else "βœ— No LinkedIn" - role_str = f"({contact_data['role']})" if contact_data['role'] else "(No role)" - print(f"βœ… Contact {contact_num}/20 saved: {name_str} {role_str} | {linkedin_str}") - else: - print(f"⚠️ Could not extract valid data for contact {contact_num}") - print(f" Got: first='{contact_data['first']}', last='{contact_data['last']}', linkedin='{contact_data['linkedin'][:50] if contact_data['linkedin'] else 'None'}'") - print(f" Check the agent's output above to see what was returned") - print(f" Total output items: {len(all_output)}") - - # Progress update every 5 contacts - if contact_num % 5 == 0: - print(f"\\nπŸ“ˆ Progress: {contacts_extracted}/{contact_num} contacts extracted so far...\\n") - - # BONUS: Create messaging compose links file - messaging_filename = f"linkedin_messaging_links_{timestamp}.txt" - messaging_path = os.path.join(os.getcwd(), messaging_filename) - - with open(messaging_path, 'w', encoding='utf-8') as txtfile: - txtfile.write("LinkedIn Messaging Compose Links\\n") - txtfile.write("=" * 80 + "\\n\\n") - - for i, linkedin_url in enumerate(linkedin_urls, 1): - public_id = extract_public_id_from_linkedin_url(linkedin_url) - if public_id: - messaging_url = f"https://www.linkedin.com/messaging/compose/?recipient={public_id}" - txtfile.write(f"{i}. {messaging_url}\\n") - else: - txtfile.write(f"{i}. [Could not extract public ID from: {linkedin_url}]\\n") - - print("\\n" + "="*80) - print("πŸŽ‰ All tasks completed!") - print(f"πŸ“ CSV file saved to: {csv_path}") - print(f"πŸ“Š Total contacts extracted: {contacts_extracted}/20") - print(f"πŸ’¬ Bonus: Messaging links saved to: {messaging_path}") - print(f"πŸ“ Total messaging links: {len(linkedin_urls)}") - print("="*80) - - except Exception as e: - print(f"\\n❌ Error during scraping: {e}") - traceback.print_exc() - raise - -def main(): -try: -load_dotenv() - - if "ANTHROPIC_API_KEY" not in os.environ: - raise RuntimeError( - "Please set the ANTHROPIC_API_KEY environment variable.\\n" - "You can add it to a .env file in the project root." - ) - - signal.signal(signal.SIGINT, handle_sigint) - - asyncio.run(scrape_linkedin_connections()) - - except Exception as e: - print(f"\\n❌ Error running automation: {e}") - traceback.print_exc() - -if **name** == "**main**": -main()`} - - +And remove the `CUA_API_KEY` and `CUA_CONTAINER_NAME` requirements from `.env` and the validation checks. + + + + +### Run Your Script + +Execute your contact extraction automation: + +```bash +python contact_export.py +``` + +The agent will: +1. Navigate to your LinkedIn connections page +2. Extract data from 20 contacts (first name, last name, role, company, LinkedIn URL) +3. Save contacts to a timestamped CSV file +4. Generate messaging compose links for easy follow-up + +Monitor the output to see the agent's progress. The script will show a progress update every 5 contacts. + + + + + +--- + ## How It Works This script demonstrates a practical workflow for extracting LinkedIn connection data: -1. **Session Persistence** - Manually log into LinkedIn through the VM once, and the VM saves your session so the agent appears as your regular browsing. -2. **Navigation** - The script navigates to your LinkedIn connections page using your saved authenticated session. -3. **Data Extraction** - For each contact, the agent clicks their profile, extracts name/role/company/URL, and navigates back to repeat. -4. **Python Processing** - Python parses the agent's responses, validates data, and writes to CSV incrementally to preserve progress. -5. **Output Files** - Generates a CSV with contact data and a text file with direct messaging URLs. +1. **Session Persistence** - Manually log into LinkedIn through the VM once, and the VM saves your session +2. **Navigation** - The script navigates to your connections page using your saved authenticated session +3. **Data Extraction** - For each contact, the agent clicks their profile, extracts data, and navigates back +4. **Python Processing** - Python parses responses, validates data, and writes to CSV incrementally +5. **Output Files** - Generates a CSV with contact data and a text file with messaging URLs ## Next Steps @@ -1521,3 +470,4 @@ This script demonstrates a practical workflow for extracting LinkedIn connection - Read about [Agent loops](/agent-sdk/agent-loops), [tools](/agent-sdk/custom-tools), and [supported model providers](/agent-sdk/supported-model-providers/) - Experiment with different [Models and Providers](/agent-sdk/supported-model-providers/) - Adapt this script for other platforms (Twitter/X, email extraction, etc.) +- Join our [Discord community](https://discord.com/invite/mVnXXpdE85) for help diff --git a/docs/content/docs/get-started/meta.json b/docs/content/docs/get-started/meta.json index f7f9fac2..a14e8acb 100644 --- a/docs/content/docs/get-started/meta.json +++ b/docs/content/docs/get-started/meta.json @@ -3,5 +3,5 @@ "description": "Get started with Cua", "defaultOpen": true, "icon": "Rocket", - "pages": ["quickstart"] + "pages": ["../index", "quickstart"] } diff --git a/docs/content/docs/get-started/quickstart.mdx b/docs/content/docs/get-started/quickstart.mdx index e0b09980..23f47085 100644 --- a/docs/content/docs/get-started/quickstart.mdx +++ b/docs/content/docs/get-started/quickstart.mdx @@ -8,7 +8,7 @@ import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; import { Code, Terminal } from 'lucide-react'; -Choose your quickstart path: +{/* Choose your quickstart path:
} href="#developer-quickstart" title="Developer Quickstart"> @@ -17,7 +17,7 @@ Choose your quickstart path: } href="#cli-quickstart" title="CLI Quickstart"> Get started quickly with the command-line interface -
+ */} --- @@ -30,11 +30,11 @@ You can run your Cua computer in the cloud (recommended for easiest setup), loca - Cua Cloud Sandbox provides sandboxes that run Linux (Ubuntu) or Windows. + Cua Cloud Sandbox provides sandboxes that run Linux (Ubuntu), Windows, or macOS. 1. Go to [cua.ai/signin](https://cua.ai/signin) 2. Navigate to **Dashboard > Containers > Create Instance** - 3. Create a **Small** sandbox, choosing either **Linux** or **Windows** + 3. Create a **Small** sandbox, choosing **Linux**, **Windows**, or **macOS** 4. Note your sandbox name and API key Your Cloud Sandbox will be automatically configured and ready to use. @@ -117,7 +117,7 @@ Connect to your Cua computer and perform basic interactions, such as taking scre from computer import Computer computer = Computer( - os_type="linux", + os_type="linux", # or "windows" or "macos" provider_type="cloud", name="your-sandbox-name", api_key="your-api-key" @@ -192,6 +192,10 @@ Connect to your Cua computer and perform basic interactions, such as taking scre + + The TypeScript interface is currently deprecated. We're working on version 0.2.0 with improved TypeScript support. In the meantime, please use the Python SDK. + + Install the Cua computer TypeScript SDK: ```bash npm install @trycua/computer @@ -205,7 +209,7 @@ Connect to your Cua computer and perform basic interactions, such as taking scre import { Computer, OSType } from '@trycua/computer'; const computer = new Computer({ - osType: OSType.LINUX, + osType: OSType.LINUX, // or OSType.WINDOWS or OSType.MACOS name: "your-sandbox-name", apiKey: "your-api-key" }); @@ -328,7 +332,7 @@ Learn more about agents in [Agent Loops](/agent-sdk/agent-loops) and available m - Join our [Discord community](https://discord.com/invite/mVnXXpdE85) for help - Try out [Form Filling](/example-usecases/form-filling) preset usecase ---- +{/* --- ## CLI Quickstart @@ -354,7 +358,7 @@ Get started quickly with the CUA CLI - the easiest way to manage cloud sandboxes ```bash # Install Bun if you don't have it curl -fsSL https://bun.sh/install | bash - + # Install CUA CLI bun add -g @trycua/cli ``` @@ -467,4 +471,4 @@ cua delete my-vm-abc123 --- -For running models locally, see [Running Models Locally](/agent-sdk/supported-model-providers/local-models). +For running models locally, see [Running Models Locally](/agent-sdk/supported-model-providers/local-models). */} diff --git a/docs/content/docs/index.mdx b/docs/content/docs/index.mdx index 9c47c293..acecca6d 100644 --- a/docs/content/docs/index.mdx +++ b/docs/content/docs/index.mdx @@ -4,15 +4,9 @@ title: Introduction import { Monitor, Code, BookOpen, Zap, Bot, Boxes, Rocket } from 'lucide-react'; - - +
Cua is an open-source framework for building **Computer-Use Agents** - AI systems that see, understand, and interact with desktop applications through vision and action, just like humans do. - -
- -Go from prototype to production with everything you need: multi-provider LLM support, cross-platform sandboxes, and trajectory tracing. Whether you're running locally or deploying to the cloud, Cua gives you the tools to build reliable computer-use agents. - - +
## Why Cua? @@ -46,14 +40,14 @@ Follow the [Quickstart guide](/docs/get-started/quickstart) for step-by-step set If you're new to computer-use agents, check out our [tutorials](https://cua.ai/blog), [examples](https://github.com/trycua/cua/tree/main/examples), and [notebooks](https://github.com/trycua/cua/tree/main/notebooks) to start building with Cua today.
- } href="/docs/get-started/quickstart" title="Quickstart"> + } href="/get-started/quickstart" title="Quickstart"> Get up and running in 3 steps with Python or TypeScript. - } href="/agent-sdk/agent-loops" title="Learn Core Concepts"> - Understand agent loops, callbacks, and model composition. + } href="/agent-sdk/agent-loops" title="Agent Loops"> + Learn how agents work and how to build your own. - } href="/libraries/agent" title="API Reference"> - Explore the full Agent SDK and Computer SDK APIs. + } href="/computer-sdk/computers" title="Computer SDK"> + Control desktop applications with the Computer SDK. } href="/example-usecases/form-filling" title="Example Use Cases"> See Cua in action with real-world examples. diff --git a/docs/content/docs/libraries/computer-server/index.mdx b/docs/content/docs/libraries/computer-server/index.mdx index d5affd25..e2f683dd 100644 --- a/docs/content/docs/libraries/computer-server/index.mdx +++ b/docs/content/docs/libraries/computer-server/index.mdx @@ -7,14 +7,7 @@ github: --- - A corresponding{' '} - - Jupyter Notebook - {' '} - is available for this documentation. + A corresponding Jupyter Notebook is available for this documentation. The Computer Server API reference documentation is currently under development. diff --git a/docs/content/docs/libraries/som/index.mdx b/docs/content/docs/libraries/som/index.mdx index 3eef53f1..7a210290 100644 --- a/docs/content/docs/libraries/som/index.mdx +++ b/docs/content/docs/libraries/som/index.mdx @@ -7,11 +7,7 @@ github: --- - A corresponding{' '} - - Python example - {' '} - is available for this documentation. + A corresponding Python example is available for this documentation. ## Overview diff --git a/docs/content/docs/meta.json b/docs/content/docs/meta.json index 30e90eb3..199556f1 100644 --- a/docs/content/docs/meta.json +++ b/docs/content/docs/meta.json @@ -4,7 +4,6 @@ "root": true, "defaultOpen": true, "pages": [ - "index", "---[Rocket]Get Started---", "...get-started", "---[ChefHat]Cookbook---", diff --git a/docs/src/app/layout.config.tsx b/docs/src/app/layout.config.tsx index 6d8e9e38..f47250c5 100644 --- a/docs/src/app/layout.config.tsx +++ b/docs/src/app/layout.config.tsx @@ -37,6 +37,7 @@ export const baseOptions: BaseLayoutProps = { Cua ), + url: 'https://cua.ai', }, githubUrl: 'https://github.com/trycua/cua', links: [ From 5983a9b849f2bc567ca8989c53f2d7b27ee0f3a0 Mon Sep 17 00:00:00 2001 From: f-trycua Date: Mon, 17 Nov 2025 17:26:41 +0100 Subject: [PATCH 2/4] Add blogpost and doc --- blog/cloud-windows-ga-macos-preview.md | 119 ++++ docs/content/docs/example-usecases/meta.json | 2 +- .../windows-app-behind-vpn.mdx | 615 ++++++++++++++++++ 3 files changed, 735 insertions(+), 1 deletion(-) create mode 100644 blog/cloud-windows-ga-macos-preview.md create mode 100644 docs/content/docs/example-usecases/windows-app-behind-vpn.mdx diff --git a/blog/cloud-windows-ga-macos-preview.md b/blog/cloud-windows-ga-macos-preview.md new file mode 100644 index 00000000..d1024af1 --- /dev/null +++ b/blog/cloud-windows-ga-macos-preview.md @@ -0,0 +1,119 @@ +# Cloud Windows Sandboxes GA + macOS Preview + +If you've been building with our `cua` libraries, you might've hit a limitation with local computer-use sandboxes: to run agents on Windows or macOS, you need to be on that OSβ€”Windows Sandbox for Windows, Apple Virtualization for macOS. The only cross-platform option is Linux on Docker, which limits you to virtualizing Linux environments ([see all local options here](https://cua.ai/docs/computer-sdk/computers)). + +Today the story changes - we're announcing general availability of **Cloud Windows Sandboxes** and opening early preview access for **Cloud macOS Sandboxes**. + +## Cloud Windows Sandboxes: Now GA + +![Cloud Windows Sandboxes](./assets/cloud-windows-ga.png) + +Cloud Windows Sandboxes are now generally available. You get a full Windows 11 desktop in your browser with Edge and Python pre-installed, working seamlessly with all our [Computer-Use libraries](https://github.com/trycua/cua) for RPA, UI automation, code execution, and agent development. + +**What's new with this release:** +- Hot-start under 1 second +- Direct noVNC over HTTPS under our sandbox.cua.ai domain +- 3 sandbox sizes available: + +| Size | CPU | RAM | Storage | +|------|-----|-----|---------| +| Small | 2 cores | 8 GB | 128 GB SSD | +| Medium | 4 cores | 16 GB | 128 GB SSD | +| Large | 8 cores | 32 GB | 256 GB SSD | + +
+ +
+ +**Pricing:** Windows Sandboxes start at 8 credits/hour (Small), 15 credits/hour (Medium), or 31 credits/hour (Large). + +## Cloud macOS Sandboxes: Now in Preview + +Running macOS locally comes with challenges: 30GB golden images, a maximum of 2 sandboxes per host, and unpredictable compatibility issues. With Cloud macOS Sandboxes, we provision bare-metal macOS hosts (M1, M2, M4) on-demandβ€”giving you full desktop access without the overhead of managing local sandboxes. + +![macOS Preview Waitlist](./assets/macOS-waitlist.png) + +**Preview access:** Invite-only. [Join the waitlist](https://cua.ai/macos-waitlist) if you're building agents for macOS workflows. + +## Getting Started Today + +Sign up at [cua.ai/signin](https://cua.ai/signin) and grab your API key from the dashboard. Then connect to a sandbox: + +```python +from computer import Computer + +computer = Computer( + os_type="windows", # or "macos" + provider_type="cloud", + name="my-sandbox", + api_key="your-api-key" +) + +await computer.run() +``` + +Manage existing sandboxes: + +```python +from computer.providers.cloud.provider import CloudProvider + +provider = CloudProvider(api_key="your-api-key") +async with provider: + sandboxes = await provider.list_vms() + await provider.run_vm("my-sandbox") + await provider.stop_vm("my-sandbox") +``` + +Run an agent on Windows to automate a workflow: + +```python +from agent import ComputerAgent + +agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + max_trajectory_budget=5.0 +) + +response = await agent.run( + "Open Excel, create a sales report with this month's data, and save it to the desktop" +) +``` + +## FAQs + +
+Why not just use local Windows Sandbox? + +Local Windows Sandbox resets on every restart. No persistence, no hot-start, and you need Windows Pro. Our sandboxes persist state, hot-start in under a second, and work from any OS. + +
+ +
+What happens to my work when I stop a sandbox? + +Everything persists. Files, installed software, browser profilesβ€”it's all there when you restart. Only pay for runtime, not storage. + +
+ +
+How's the latency for UI automation? + +We run in 4 regions so you can pick what's closest. The noVNC connection is optimized for automation, not video streaming. Your agent sees crisp screenshots, not compressed video. + +
+ +
+Are there software restrictions? + +No. Full admin access on both platforms. Install whatever you needβ€”Visual Studio, Photoshop, custom enterprise software. It's your sandbox. + +
+ +## Need help? + +If you hit issues getting either platform working, reach out in [Discord](https://discord.gg/cua-ai). We respond fast and fix based on what people actually use. + +--- + +Get started at [cua.ai](https://cua.ai) or [join the macOS waitlist](https://cua.ai/macos-waitlist). diff --git a/docs/content/docs/example-usecases/meta.json b/docs/content/docs/example-usecases/meta.json index c7ec3895..bfc88f1c 100644 --- a/docs/content/docs/example-usecases/meta.json +++ b/docs/content/docs/example-usecases/meta.json @@ -1,5 +1,5 @@ { "title": "Cookbook", "description": "Real-world examples of building with Cua", - "pages": ["form-filling", "post-event-contact-export"] + "pages": ["windows-app-behind-vpn", "form-filling", "post-event-contact-export"] } diff --git a/docs/content/docs/example-usecases/windows-app-behind-vpn.mdx b/docs/content/docs/example-usecases/windows-app-behind-vpn.mdx new file mode 100644 index 00000000..e8f31617 --- /dev/null +++ b/docs/content/docs/example-usecases/windows-app-behind-vpn.mdx @@ -0,0 +1,615 @@ +--- +title: Windows App behind VPN +description: Automate legacy Windows desktop applications behind VPN with Cua +--- + +import { Step, Steps } from 'fumadocs-ui/components/steps'; +import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; + +## Overview + +This guide demonstrates how to automate Windows desktop applications (like eGecko HR/payroll systems) that run behind corporate VPN. This is a common enterprise scenario where legacy desktop applications require manual data entry, report generation, or workflow execution. + +**Use cases:** +- HR/payroll processing (employee onboarding, payroll runs, benefits administration) +- Desktop ERP systems behind corporate networks +- Legacy financial applications requiring VPN access +- Compliance reporting from on-premise systems + +**Architecture:** +- Client-side Cua agent (Python SDK or Playground UI) +- Windows VM/Sandbox with VPN client configured +- RDP/remote desktop connection to target environment +- Desktop application automation via computer vision and UI control + + + **Production Deployment**: For production use, consider workflow mining and custom finetuning to create vertical-specific actions (e.g., "Run payroll", "Onboard employee") instead of generic UI automation. This provides better audit trails and higher success rates. + + +--- + +## Video Demo + +
+ +
+ Demo showing Cua automating an eGecko-like desktop application on Windows behind AWS VPN +
+
+ +--- + + + + + +### Set Up Your Environment + +Install the required dependencies: + +Create a `requirements.txt` file: + +```text +cua-agent +cua-computer +python-dotenv>=1.0.0 +``` + +Install the dependencies: + +```bash +pip install -r requirements.txt +``` + +Create a `.env` file with your API keys: + +```text +ANTHROPIC_API_KEY=your-anthropic-api-key +CUA_API_KEY=sk_cua-api01... +CUA_SANDBOX_NAME=your-windows-sandbox +``` + + + + + +### Configure Windows Sandbox with VPN + + + + +For enterprise deployments, use Cua Cloud Sandbox with pre-configured VPN: + +1. Go to [cua.ai/signin](https://cua.ai/signin) +2. Navigate to **Dashboard > Containers > Create Instance** +3. Create a **Windows** sandbox (Medium or Large for desktop apps) +4. Configure VPN settings: + - Upload your AWS VPN Client configuration (`.ovpn` file) + - Or configure VPN credentials directly in the dashboard +5. Note your sandbox name and API key + +Your Windows sandbox will launch with VPN automatically connected. + + + + +For local development on Windows 10 Pro/Enterprise or Windows 11: + +1. Enable [Windows Sandbox](https://learn.microsoft.com/en-us/windows/security/application-security/application-isolation/windows-sandbox/windows-sandbox-install) +2. Install the `pywinsandbox` dependency: + ```bash + pip install -U git+git://github.com/karkason/pywinsandbox.git + ``` +3. Create a VPN setup script that runs on sandbox startup +4. Configure your desktop application installation within the sandbox + + + **Manual VPN Setup**: Windows Sandbox requires manual VPN configuration each time it starts. For production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections. + + + + + +For self-managed infrastructure: + +1. Deploy Windows VM on your preferred cloud (AWS, Azure, GCP) +2. Install and configure VPN client (AWS VPN Client, OpenVPN, etc.) +3. Install target desktop application and any dependencies +4. Install `cua-computer-server`: + ```bash + pip install cua-computer-server + python -m computer_server + ``` +5. Configure firewall rules to allow Cua agent connections + + + + + + + + +### Create Your Automation Script + +Create a Python file (e.g., `hr_automation.py`): + + + + +```python +import asyncio +import logging +import os +from agent import ComputerAgent +from computer import Computer, VMProviderType +from dotenv import load_dotenv + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +load_dotenv() + +async def automate_hr_workflow(): + """ + Automate HR/payroll desktop application workflow. + + This example demonstrates: + - Launching Windows desktop application + - Navigating complex desktop UI + - Data entry and form filling + - Report generation and export + """ + try: + # Connect to Windows Cloud Sandbox with VPN + async with Computer( + os_type="windows", + provider_type=VMProviderType.CLOUD, + name=os.environ["CUA_SANDBOX_NAME"], + api_key=os.environ["CUA_API_KEY"], + verbosity=logging.INFO, + ) as computer: + + # Configure agent with specialized instructions + agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + only_n_most_recent_images=3, + verbosity=logging.INFO, + trajectory_dir="trajectories", + use_prompt_caching=True, + max_trajectory_budget=10.0, + instructions=""" +You are automating a Windows desktop HR/payroll application. + +IMPORTANT GUIDELINES: +- Always wait for windows and dialogs to fully load before interacting +- Look for loading indicators and wait for them to disappear +- Verify each action by checking on-screen confirmation messages +- If a button or field is not visible, try scrolling or navigating tabs +- Desktop apps often have nested menus - explore systematically +- Save work frequently using File > Save or Ctrl+S +- Before closing, always verify changes were saved + +COMMON UI PATTERNS: +- Menu bar navigation (File, Edit, View, etc.) +- Ribbon interfaces with tabs +- Modal dialogs that block interaction +- Data grids/tables for viewing records +- Form fields with validation +- Status bars showing operation progress + """.strip() + ) + + # Define workflow tasks + tasks = [ + "Launch the HR application from the desktop or start menu", + "Log in with the credentials shown in credentials.txt on the desktop", + "Navigate to Employee Management section", + "Create a new employee record with information from new_hire.xlsx on desktop", + "Verify the employee was created successfully by searching for their name", + "Generate an onboarding report for the new employee", + "Export the report as PDF to the desktop", + "Log out of the application" + ] + + history = [] + + for task in tasks: + logger.info(f"\n{'='*60}") + logger.info(f"Task: {task}") + logger.info(f"{'='*60}\n") + + history.append({"role": "user", "content": task}) + + async for result in agent.run(history): + for item in result.get("output", []): + if item.get("type") == "message": + content = item.get("content", []) + for block in content: + if block.get("type") == "text": + response = block.get("text", "") + logger.info(f"Agent: {response}") + history.append({"role": "assistant", "content": response}) + + logger.info("\nTask completed. Moving to next task...\n") + + logger.info("\n" + "="*60) + logger.info("All tasks completed successfully!") + logger.info("="*60) + + except Exception as e: + logger.error(f"Error during automation: {e}") + import traceback + traceback.print_exc() + +if __name__ == "__main__": + asyncio.run(automate_hr_workflow()) +``` + + + + +```python +import asyncio +import logging +import os +from agent import ComputerAgent +from computer import Computer, VMProviderType +from dotenv import load_dotenv + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +load_dotenv() + +async def automate_hr_workflow(): + try: + # Connect to Windows Sandbox + async with Computer( + os_type="windows", + provider_type=VMProviderType.WINDOWS_SANDBOX, + verbosity=logging.INFO, + ) as computer: + + agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + only_n_most_recent_images=3, + verbosity=logging.INFO, + trajectory_dir="trajectories", + use_prompt_caching=True, + max_trajectory_budget=10.0, + instructions=""" +You are automating a Windows desktop HR/payroll application. + +IMPORTANT GUIDELINES: +- Always wait for windows and dialogs to fully load before interacting +- Verify each action by checking on-screen confirmation messages +- Desktop apps often have nested menus - explore systematically +- Save work frequently using File > Save or Ctrl+S + """.strip() + ) + + tasks = [ + "Launch the HR application from the desktop", + "Log in with credentials from credentials.txt on desktop", + "Navigate to Employee Management and create new employee from new_hire.xlsx", + "Generate and export onboarding report as PDF", + "Log out of the application" + ] + + history = [] + + for task in tasks: + logger.info(f"\nTask: {task}") + history.append({"role": "user", "content": task}) + + async for result in agent.run(history): + for item in result.get("output", []): + if item.get("type") == "message": + content = item.get("content", []) + for block in content: + if block.get("type") == "text": + response = block.get("text", "") + logger.info(f"Agent: {response}") + history.append({"role": "assistant", "content": response}) + + logger.info("\nAll tasks completed!") + + except Exception as e: + logger.error(f"Error: {e}") + import traceback + traceback.print_exc() + +if __name__ == "__main__": + asyncio.run(automate_hr_workflow()) +``` + + + + +```python +import asyncio +import logging +import os +from agent import ComputerAgent +from computer import Computer +from dotenv import load_dotenv + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +load_dotenv() + +async def automate_hr_workflow(): + try: + # Connect to self-hosted Windows VM running computer-server + async with Computer( + use_host_computer_server=True, + base_url="http://your-windows-vm-ip:5757", # Update with your VM IP + verbosity=logging.INFO, + ) as computer: + + agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + only_n_most_recent_images=3, + verbosity=logging.INFO, + trajectory_dir="trajectories", + use_prompt_caching=True, + max_trajectory_budget=10.0, + instructions=""" +You are automating a Windows desktop HR/payroll application. + +IMPORTANT GUIDELINES: +- Always wait for windows and dialogs to fully load before interacting +- Verify each action by checking on-screen confirmation messages +- Save work frequently using File > Save or Ctrl+S + """.strip() + ) + + tasks = [ + "Launch the HR application", + "Log in with provided credentials", + "Complete the required HR workflow", + "Generate and export report", + "Log out" + ] + + history = [] + + for task in tasks: + logger.info(f"\nTask: {task}") + history.append({"role": "user", "content": task}) + + async for result in agent.run(history): + for item in result.get("output", []): + if item.get("type") == "message": + content = item.get("content", []) + for block in content: + if block.get("type") == "text": + response = block.get("text", "") + logger.info(f"Agent: {response}") + history.append({"role": "assistant", "content": response}) + + logger.info("\nAll tasks completed!") + + except Exception as e: + logger.error(f"Error: {e}") + import traceback + traceback.print_exc() + +if __name__ == "__main__": + asyncio.run(automate_hr_workflow()) +``` + + + + + + + + +### Run Your Automation + +Execute the script: + +```bash +python hr_automation.py +``` + +The agent will: +1. Connect to your Windows environment (with VPN if configured) +2. Launch and navigate the desktop application +3. Execute each workflow step sequentially +4. Verify actions and handle errors +5. Save trajectory logs for audit and debugging + +Monitor the console output to see the agent's progress through each task. + + + + + +--- + +## Key Configuration Options + +### Agent Instructions + +The `instructions` parameter is critical for reliable desktop automation: + +```python +instructions=""" +You are automating a Windows desktop HR/payroll application. + +IMPORTANT GUIDELINES: +- Always wait for windows and dialogs to fully load before interacting +- Look for loading indicators and wait for them to disappear +- Verify each action by checking on-screen confirmation messages +- If a button or field is not visible, try scrolling or navigating tabs +- Desktop apps often have nested menus - explore systematically +- Save work frequently using File > Save or Ctrl+S +- Before closing, always verify changes were saved + +COMMON UI PATTERNS: +- Menu bar navigation (File, Edit, View, etc.) +- Ribbon interfaces with tabs +- Modal dialogs that block interaction +- Data grids/tables for viewing records +- Form fields with validation +- Status bars showing operation progress + +APPLICATION-SPECIFIC: +- Login is at top-left corner +- Employee records are under "HR Management" > "Employees" +- Reports are generated via "Tools" > "Reports" > "Generate" +- Always click "Save" before navigating away from a form +""".strip() +``` + +### Budget Management + +For long-running workflows, adjust budget limits: + +```python +agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + max_trajectory_budget=20.0, # Increase for complex workflows + # ... other params +) +``` + +### Image Retention + +Balance context and cost by retaining only recent screenshots: + +```python +agent = ComputerAgent( + # ... + only_n_most_recent_images=3, # Keep last 3 screenshots + # ... +) +``` + +--- + +## Production Considerations + + + For enterprise production deployments, consider these additional steps: + + +### 1. Workflow Mining + +Before deploying, analyze your actual workflows: +- Record user interactions with the application +- Identify common patterns and edge cases +- Map out decision trees and validation requirements +- Document application-specific quirks and timing issues + +### 2. Custom Finetuning + +Create vertical-specific actions instead of generic UI automation: + +```python +# Instead of generic steps: +tasks = ["Click login", "Type username", "Type password", "Click submit"] + +# Create semantic actions: +tasks = ["onboard_employee", "run_payroll", "generate_compliance_report"] +``` + +This provides: +- Better audit trails +- Approval gates at business logic level +- Higher success rates +- Easier maintenance and updates + +### 3. Human-in-the-Loop + +Add approval gates for critical operations: + +```python +agent = ComputerAgent( + model="anthropic/claude-sonnet-4-5-20250929", + tools=[computer], + # Add human approval callback for sensitive operations + callbacks=[ApprovalCallback(require_approval_for=["payroll", "termination"])] +) +``` + +### 4. Deployment Options + +Choose your deployment model: + +**Managed (Recommended)** +- Cua hosts Windows sandboxes, VPN/RDP stack, and agent runtime +- You get UI/API endpoints for triggering workflows +- Automatic scaling, monitoring, and maintenance +- SLA guarantees and enterprise support + +**Self-Hosted** +- You manage Windows VMs, VPN infrastructure, and agent deployment +- Full control over data and security +- Custom network configurations +- On-premise or your preferred cloud + +--- + +## Troubleshooting + +### VPN Connection Issues + +If the agent cannot reach the application: + +1. Verify VPN is connected: Check VPN client status in the Windows sandbox +2. Test network connectivity: Try pinging internal resources +3. Check firewall rules: Ensure RDP and application ports are open +4. Review VPN logs: Look for authentication or routing errors + +### Application Not Launching + +If the desktop application fails to start: + +1. Verify installation: Check the application is installed in the sandbox +2. Check dependencies: Ensure all required DLLs and frameworks are present +3. Review permissions: Application may require admin rights +4. Check logs: Look for error messages in Windows Event Viewer + +### UI Element Not Found + +If the agent cannot find buttons or fields: + +1. Increase wait times: Some applications load slowly +2. Check screen resolution: UI elements may be off-screen +3. Verify DPI scaling: High DPI settings can affect element positions +4. Update instructions: Provide more specific navigation guidance + +### Cost Management + +If costs are higher than expected: + +1. Reduce `max_trajectory_budget` +2. Decrease `only_n_most_recent_images` +3. Use prompt caching: Set `use_prompt_caching=True` +4. Optimize task descriptions: Be more specific to reduce retry attempts + +--- + +## Next Steps + +- **Explore custom tools**: Learn how to create [custom tools](/agent-sdk/custom-tools) for application-specific actions +- **Implement callbacks**: Add [monitoring and logging](/agent-sdk/callbacks) for production workflows +- **Join community**: Get help in our [Discord](https://discord.com/invite/mVnXXpdE85) + +--- + +## Related Examples + +- [Form Filling](/example-usecases/form-filling) - Web form automation +- [Post-Event Contact Export](/example-usecases/post-event-contact-export) - Data extraction workflows +- [Custom Tools](/agent-sdk/custom-tools) - Building application-specific functions From e824d565be0b02aa59d7d60f12acadd8805a3b29 Mon Sep 17 00:00:00 2001 From: Francesco Bonacci Date: Mon, 17 Nov 2025 17:33:17 +0100 Subject: [PATCH 3/4] Update images and links in macOS preview blog post --- blog/cloud-windows-ga-macos-preview.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/blog/cloud-windows-ga-macos-preview.md b/blog/cloud-windows-ga-macos-preview.md index d1024af1..ff14d9a9 100644 --- a/blog/cloud-windows-ga-macos-preview.md +++ b/blog/cloud-windows-ga-macos-preview.md @@ -1,12 +1,12 @@ # Cloud Windows Sandboxes GA + macOS Preview -If you've been building with our `cua` libraries, you might've hit a limitation with local computer-use sandboxes: to run agents on Windows or macOS, you need to be on that OSβ€”Windows Sandbox for Windows, Apple Virtualization for macOS. The only cross-platform option is Linux on Docker, which limits you to virtualizing Linux environments ([see all local options here](https://cua.ai/docs/computer-sdk/computers)). +If you've been building with our `cua` libraries, you might've hit a limitation with local computer-use sandboxes: to run agents on Windows or macOS, you need to be on that OS - Windows Sandbox for Windows, Apple Virtualization for macOS. The only cross-platform option is Linux on Docker, which limits you to virtualizing Linux environments ([see all local options here](https://cua.ai/docs/computer-sdk/computers)). Today the story changes - we're announcing general availability of **Cloud Windows Sandboxes** and opening early preview access for **Cloud macOS Sandboxes**. ## Cloud Windows Sandboxes: Now GA -![Cloud Windows Sandboxes](./assets/cloud-windows-ga.png) +![Cloud Windows Sandboxes](https://github.com/user-attachments/assets/db15f4c4-70a4-425a-a264-82e629074de7) Cloud Windows Sandboxes are now generally available. You get a full Windows 11 desktop in your browser with Edge and Python pre-installed, working seamlessly with all our [Computer-Use libraries](https://github.com/trycua/cua) for RPA, UI automation, code execution, and agent development. @@ -22,7 +22,7 @@ Cloud Windows Sandboxes are now generally available. You get a full Windows 11 d | Large | 8 cores | 32 GB | 256 GB SSD |
- +
**Pricing:** Windows Sandboxes start at 8 credits/hour (Small), 15 credits/hour (Medium), or 31 credits/hour (Large). @@ -31,7 +31,7 @@ Cloud Windows Sandboxes are now generally available. You get a full Windows 11 d Running macOS locally comes with challenges: 30GB golden images, a maximum of 2 sandboxes per host, and unpredictable compatibility issues. With Cloud macOS Sandboxes, we provision bare-metal macOS hosts (M1, M2, M4) on-demandβ€”giving you full desktop access without the overhead of managing local sandboxes. -![macOS Preview Waitlist](./assets/macOS-waitlist.png) +![macOS Preview Waitlist](https://github.com/user-attachments/assets/343c9a3f-59d8-4b1a-bba8-6af91e8a9cf0) **Preview access:** Invite-only. [Join the waitlist](https://cua.ai/macos-waitlist) if you're building agents for macOS workflows. From 5b89f0e4366d8ded8411f2397c578b19ef5ff68d Mon Sep 17 00:00:00 2001 From: Francesco Bonacci Date: Mon, 17 Nov 2025 17:35:56 +0100 Subject: [PATCH 4/4] Change video source in demo section Updated video source for Windows app demo behind VPN. --- docs/content/docs/example-usecases/windows-app-behind-vpn.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/docs/example-usecases/windows-app-behind-vpn.mdx b/docs/content/docs/example-usecases/windows-app-behind-vpn.mdx index e8f31617..7d8d3c81 100644 --- a/docs/content/docs/example-usecases/windows-app-behind-vpn.mdx +++ b/docs/content/docs/example-usecases/windows-app-behind-vpn.mdx @@ -31,7 +31,7 @@ This guide demonstrates how to automate Windows desktop applications (like eGeck ## Video Demo
-