Merge branch 'main' into models/opencua

This commit is contained in:
Dillon DuPont
2025-09-15 15:11:15 -04:00
35 changed files with 9754 additions and 137 deletions
+201
View File
@@ -0,0 +1,201 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Customizing Your ComputerAgent\n",
"\n",
"This notebook demonstrates four practical ways to increase the capabilities and success rate of your `ComputerAgent` in the Agent SDK:\n",
"\n",
"1. Simple: Prompt engineering (via optional `instructions`)\n",
"2. Easy: Tools (function tools and custom computer tools)\n",
"3. Intermediate: Callbacks\n",
"4. Expert: Custom `@register_agent` loops\n",
"\n",
"> Tip: The same patterns work in scripts and services — the notebook just makes it easy to iterate."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"We'll import `ComputerAgent`, a simple Docker-based computer, and some utilities."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"from agent.agent import ComputerAgent\n",
"from agent.callbacks import LoggingCallback\n",
"from computer import Computer\n",
"\n",
"computer = Computer(\n",
" os_type=\"linux\",\n",
" provider_type=\"docker\",\n",
" image=\"trycua/cua-ubuntu:latest\",\n",
" name=\"my-cua-container\"\n",
")\n",
"\n",
"await computer.run() # Launch & connect to Docker container"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1) Simple: Prompt engineering\n",
"\n",
"You can guide your agent with system-like `instructions`.\n",
"\n",
"Under the hood, `ComputerAgent(instructions=...)` adds a `PromptInstructionsCallback` that prepends a user message before each LLM call.\n",
"\n",
"This mirrors the recommended snippet in code:\n",
"\n",
"```python\n",
"effective_input = full_input\n",
"if instructions:\n",
" effective_input = [{\"role\": \"user\", \"content\": instructions}] + full_input\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"instructions = (\n",
" \"You are a meticulous software operator. Prefer safe, deterministic actions. \"\n",
" \"Always confirm via on-screen text before proceeding.\"\n",
")\n",
"agent = ComputerAgent(\n",
" model=\"openai/computer-use-preview\",\n",
" tools=[computer],\n",
" instructions=instructions,\n",
" callbacks=[LoggingCallback(level=logging.INFO)],\n",
")\n",
"messages = [\n",
" {\"role\": \"user\", \"content\": \"Open the settings and turn on dark mode.\"}\n",
"]\n",
"\n",
"# In notebooks, you may want to consume the async generator\n",
"import asyncio\n",
"async def run_once():\n",
" async for chunk in agent.run(messages):\n",
" # Print any assistant text outputs\n",
" for item in chunk.get(\"output\", []):\n",
" if item.get(\"type\") == \"message\":\n",
" for c in item.get(\"content\", []):\n",
" if c.get(\"text\"):\n",
" print(c.get(\"text\"))\n",
"\n",
"await run_once()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2) Easy: Tools\n",
"\n",
"Add function tools to expose deterministic capabilities. Tools are auto-extracted to schemas and callable by the agent."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def calculate_percentage(numerator: float, denominator: float) -> str:\n",
" \"\"\"Calculate a percentage string.\n",
"\n",
" Args:\n",
" numerator: Numerator value\n",
" denominator: Denominator value\n",
" Returns:\n",
" A formatted percentage string (e.g., '75.00%').\n",
" \"\"\"\n",
" if denominator == 0:\n",
" return \"0.00%\"\n",
" return f\"{(numerator/denominator)*100:.2f}%\"\n",
"\n",
"agent_with_tool = ComputerAgent(\n",
" model=\"openai/computer-use-preview\",\n",
" tools=[computer, calculate_percentage],\n",
" instructions=\"When doing math, prefer the `calculate_percentage` tool when relevant.\",\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3) Intermediate: Callbacks\n",
"\n",
"Callbacks offer lifecycle hooks. For example, limit recent images or record trajectories."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback\n",
"\n",
"agent_with_callbacks = ComputerAgent(\n",
" model=\"anthropic/claude-3-5-sonnet-20241022\",\n",
" tools=[computer],\n",
" callbacks=[\n",
" ImageRetentionCallback(only_n_most_recent_images=3),\n",
" TrajectorySaverCallback(\"./trajectories\"),\n",
" ],\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4) Expert: Custom `@register_agent`\n",
"\n",
"Register custom agent configs that implement `predict_step` (and optionally `predict_click`). This gives you full control over prompting, message shaping, and tool wiring.\n",
"\n",
"See: `libs/python/agent/agent/loops/` for concrete examples."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next steps\n",
"\n",
"- Start with `instructions` for fast wins.\n",
"- Add function tools for determinism and reliability.\n",
"- Use callbacks to manage cost, logs, and safety.\n",
"- Build custom loops for specialized domains."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
+280
View File
@@ -0,0 +1,280 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a5d6b2ed",
"metadata": {},
"source": [
"# Computer-Use Agents SOTA Challenge\n",
"\n",
"Congrats on joining the Cua + HUD hackathon at Hack The North 2025!\n",
"\n",
"This notebook will show you how to create a computer use agent with Cua and evaluate it using HUD."
]
},
{
"cell_type": "markdown",
"id": "cebe8572",
"metadata": {},
"source": [
"## 💻 Prequisites\n",
"\n",
"Clone the Cua repository and install project dependencies."
]
},
{
"cell_type": "markdown",
"id": "3d7c38f9",
"metadata": {},
"source": [
"The easiest way to get started is by getting set up with the Cua development repository.\n",
"\n",
"Install [Docker](https://www.docker.com/products/docker-desktop/) and [pdm](https://pdm-project.org/en/latest/#recommended-installation-method).\n",
"\n",
"Clone the Cua repository:\n",
"\n",
"`git clone https://github.com/trycua/cua`\n",
"\n",
"Install the project dependencies:\n",
"\n",
"`cd cua && pdm install`\n",
"\n",
"Now, you should be able to run the `notebooks/hud_hackathon.ipynb` notebook in VS Code with the `.venv` virtual environment selected."
]
},
{
"cell_type": "markdown",
"id": "19f92431",
"metadata": {},
"source": [
"## ☁️ Connect to cloud services\n",
"\n",
"Create a free HUD accounts and load your API keys. "
]
},
{
"cell_type": "markdown",
"id": "47171dc3",
"metadata": {},
"source": [
"1. Create a HUD account at https://www.hud.so/\n",
"4. Create a .env file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1757f145",
"metadata": {},
"outputs": [],
"source": [
"# Create a .env file if it doesn't exist\n",
"\n",
"ENV_TEMPLATE = \"\"\"# Required environment variables:\n",
"HUD_API_KEY=\n",
"\n",
"# Any LLM provider will work:\n",
"ANTHROPIC_API_KEY=\n",
"OPENAI_API_KEY=\n",
"\"\"\"\n",
"\n",
"import os\n",
"if not os.path.exists(\".env\"):\n",
" open(\".env\", \"w\").write(ENV_TEMPLATE)\n",
" print(\"A .env file was created! Fill in the empty values.\")"
]
},
{
"cell_type": "markdown",
"id": "0949908d",
"metadata": {},
"source": [
"5. Fill in all missing values in the .env file"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f23828d",
"metadata": {},
"outputs": [],
"source": [
"# Read the .env file\n",
"# HUD requires the .env file to be in the same directory\n",
"\n",
"from dotenv import load_dotenv\n",
"load_dotenv(dotenv_path='.env', override=True)\n",
"\n",
"assert os.getenv(\"HUD_API_KEY\")"
]
},
{
"cell_type": "markdown",
"id": "5c8bef64",
"metadata": {},
"source": [
"## 🤖 Create a computer use agent\n",
"\n",
"Create and a computer use agent using the Cua SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cd4393b0",
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"from pathlib import Path\n",
"from agent import ComputerAgent\n",
"\n",
"# Here you can set the model and tools for your agent.\n",
"# Computer use models: https://www.trycua.com/docs/agent-sdk/supported-agents/computer-use-agents\n",
"# Composed agent models: https://www.trycua.com/docs/agent-sdk/supported-agents/composed-agents\n",
"# Custom tools: https://www.trycua.com/docs/agent-sdk/custom-tools\n",
"agent_config = {\n",
" \"model\": \"openai/computer-use-preview\",\n",
" \"trajectory_dir\": str(Path(\"trajectories\")),\n",
" \"only_n_most_recent_images\": 3,\n",
" \"verbosity\": logging.INFO\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "a07b09ee",
"metadata": {},
"source": [
"## 🖱️ Test your agent\n",
"\n",
"Run your agent on a test scenario in a Docker container."
]
},
{
"cell_type": "markdown",
"id": "12b9c22c",
"metadata": {},
"source": [
"Make sure Docker is running to launch the computer.\n",
"\n",
"You can view the live VNC stream from the Docker container at `http://localhost:8006/`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a210e959",
"metadata": {},
"outputs": [],
"source": [
"from computer import Computer, VMProviderType\n",
"import webbrowser\n",
"\n",
"# Connect to your existing cloud container\n",
"computer = Computer(\n",
" os_type=\"linux\",\n",
" provider_type=VMProviderType.DOCKER,\n",
" verbosity=logging.INFO\n",
")\n",
"await computer.run()\n",
"\n",
"agent_config[\"tools\"] = [ computer ]\n",
"\n",
"webbrowser.open(\"http://localhost:8006/\", new=0, autoraise=True)"
]
},
{
"cell_type": "markdown",
"id": "87a307e3",
"metadata": {},
"source": [
"Try running the computer use agent on a simple task.\n",
"\n",
"Trajectories are saved in the format: `trajectories/YYYY-MM-DD_computer-use-pre_XXX`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3a32ea8",
"metadata": {},
"outputs": [],
"source": [
"# Create agent\n",
"agent = ComputerAgent(**agent_config)\n",
"\n",
"tasks = [\n",
" \"Open the web browser and search for a repository named trycua/cua on GitHub.\"\n",
"]\n",
"\n",
"for i, task in enumerate(tasks):\n",
" print(f\"\\nExecuting task {i}/{len(tasks)}: {task}\")\n",
" async for result in agent.run(task):\n",
" print(result)\n",
" pass\n",
"\n",
" print(f\"\\n✅ Task {i+1}/{len(tasks)} completed: {task}\")"
]
},
{
"cell_type": "markdown",
"id": "eb4edbb5",
"metadata": {},
"source": [
"## 🧐 Benchmark your agent\n",
"\n",
"Test your agent's performance on a selection of tasks from the OSWorld benchmark."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6bf0887e",
"metadata": {},
"outputs": [],
"source": [
"import uuid\n",
"from pprint import pprint\n",
"from agent.integrations.hud import run_full_dataset\n",
"\n",
"job_name = f\"osworld-test-{str(uuid.uuid4())[:4]}\"\n",
"\n",
"# Full dataset evaluation (runs via HUD's run_dataset under the hood)\n",
"# See the documentation here: https://docs.trycua.com/docs/agent-sdk/integrations/hud#running-a-full-dataset\n",
"results = await run_full_dataset(\n",
" dataset=\"ddupont/OSWorld-Tiny-Public\",\n",
" job_name=job_name,\n",
" **agent_config,\n",
" max_concurrent=20,\n",
" max_steps=50,\n",
" #split=\"train[:5]\"\n",
")\n",
"\n",
"# results is a list from hud.datasets.run_dataset; inspect/aggregate as needed\n",
"print(f\"Job: {job_name}\")\n",
"print(f\"Total results: {len(results)}\")\n",
"pprint(results[:3])"
]
},
{
"cell_type": "markdown",
"id": "5b89a103",
"metadata": {},
"source": [
"## 🦾 Improve your agent\n",
"\n",
"To improve your agent for OSWorld-Verified, experiment with different models and add custom tools that fit your use case. You can also dive into the ComputerAgent source code to design an improved version or subclass tailored to your needs.\n",
"\n",
"Learn more about [Customizing Your ComputerAgent](https://docs.trycua.com/docs/agent-sdk/customizing-computeragent) in the docs."
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
+286
View File
@@ -0,0 +1,286 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a5d6b2ed",
"metadata": {},
"source": [
"# Computer-Use Agents SOTA Challenge\n",
"\n",
"Congrats on joining the Cua + HUD hackathon at Hack The North 2025!\n",
"\n",
"This notebook will show you how to create a computer use agent with Cua and evaluate it using HUD."
]
},
{
"cell_type": "markdown",
"id": "cebe8572",
"metadata": {},
"source": [
"## 💻 Prequisites\n",
"\n",
"Clone the Cua repository and install project dependencies."
]
},
{
"cell_type": "markdown",
"id": "3d7c38f9",
"metadata": {},
"source": [
"The easiest way to get started is by getting set up with the Cua development repository.\n",
"\n",
"First, clone the Cua repository:\n",
"\n",
"`git clone https://github.com/trycua/cua`\n",
"\n",
"Install [pdm](https://pdm-project.org/en/latest/#recommended-installation-method).\n",
"\n",
"Install the project dependencies:\n",
"\n",
"`cd cua && pdm install`\n",
"\n",
"Now, you should be able to run the `notebooks/hud_hackathon.ipynb` notebook in VS Code with the `.venv` virtual environment selected."
]
},
{
"cell_type": "markdown",
"id": "19f92431",
"metadata": {},
"source": [
"## ☁️ Connect to cloud services\n",
"\n",
"Create Cua and HUD accounts and load your API keys. "
]
},
{
"cell_type": "markdown",
"id": "47171dc3",
"metadata": {},
"source": [
"1. Create a Cua account at https://www.trycua.com/\n",
"2. Start a small Cua container at https://www.trycua.com/dashboard/containers (If you need credits, ask us!)\n",
"3. Create a HUD account at https://www.hud.so/\n",
"4. Create a .env file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1757f145",
"metadata": {},
"outputs": [],
"source": [
"# Create a .env file if it doesn't exist\n",
"\n",
"ENV_TEMPLATE = \"\"\"# Required environment variables:\n",
"CUA_API_KEY=\n",
"CUA_CONTAINER_NAME=\n",
"HUD_API_KEY=\n",
"\n",
"# Any LLM provider will work:\n",
"ANTHROPIC_API_KEY=\n",
"OPENAI_API_KEY=\n",
"\"\"\"\n",
"\n",
"import os\n",
"if not os.path.exists(\".env\"):\n",
" open(\".env\", \"w\").write(ENV_TEMPLATE)\n",
" print(\"A .env file was created! Fill in the empty values.\")"
]
},
{
"cell_type": "markdown",
"id": "0949908d",
"metadata": {},
"source": [
"5. Fill in all missing values in the .env file"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f23828d",
"metadata": {},
"outputs": [],
"source": [
"# Read the .env file\n",
"# HUD requires the .env file to be in the same directory\n",
"\n",
"from dotenv import load_dotenv\n",
"load_dotenv(dotenv_path='.env', override=True)\n",
"\n",
"assert os.getenv(\"CUA_API_KEY\")\n",
"assert os.getenv(\"CUA_CONTAINER_NAME\")\n",
"assert os.getenv(\"HUD_API_KEY\")"
]
},
{
"cell_type": "markdown",
"id": "5c8bef64",
"metadata": {},
"source": [
"## 🤖 Create a computer use agent\n",
"\n",
"Create and a computer use agent using the Cua SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cd4393b0",
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"from pathlib import Path\n",
"from agent import ComputerAgent\n",
"\n",
"# Here you can set the model and tools for your agent.\n",
"# Computer use models: https://www.trycua.com/docs/agent-sdk/supported-agents/computer-use-agents\n",
"# Composed agent models: https://www.trycua.com/docs/agent-sdk/supported-agents/composed-agents\n",
"# Custom tools: https://www.trycua.com/docs/agent-sdk/custom-tools\n",
"agent_config = {\n",
" \"model\": \"openai/computer-use-preview\",\n",
" \"trajectory_dir\": str(Path(\"trajectories\")),\n",
" \"only_n_most_recent_images\": 3,\n",
" \"verbosity\": logging.INFO\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "a07b09ee",
"metadata": {},
"source": [
"## 🖱️ Test your agent\n",
"\n",
"Run your agent on a test scenario in a Cua cloud container."
]
},
{
"cell_type": "markdown",
"id": "12b9c22c",
"metadata": {},
"source": [
"Connect to an existing cloud container through the Cua SDK.\n",
"\n",
"You can access the computer through VNC on the [Cua Dashboard](https://www.trycua.com/dashboard)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a210e959",
"metadata": {},
"outputs": [],
"source": [
"from computer import Computer, VMProviderType\n",
"\n",
"# Connect to your existing cloud container\n",
"computer = Computer(\n",
" os_type=\"linux\",\n",
" provider_type=VMProviderType.CLOUD,\n",
" name=os.getenv(\"CUA_CONTAINER_NAME\") or \"\",\n",
" api_key=os.getenv(\"CUA_API_KEY\"),\n",
" verbosity=logging.INFO\n",
")\n",
"\n",
"agent_config[\"tools\"] = [ computer ]"
]
},
{
"cell_type": "markdown",
"id": "87a307e3",
"metadata": {},
"source": [
"Try running the computer use agent on a simple task.\n",
"\n",
"To view a replay of the agent's actions, upload the trajectory to the [trajectory viewer](https://www.trycua.com/trajectory-viewer).\n",
"\n",
"Trajectories are saved in the format: `trajectories/YYYY-MM-DD_computer-use-pre_XXX`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3a32ea8",
"metadata": {},
"outputs": [],
"source": [
"# Create agent\n",
"agent = ComputerAgent(**agent_config)\n",
"\n",
"tasks = [\n",
" \"Open the web browser and search for a repository named trycua/cua on GitHub.\"\n",
"]\n",
"\n",
"for i, task in enumerate(tasks):\n",
" print(f\"\\nExecuting task {i}/{len(tasks)}: {task}\")\n",
" async for result in agent.run(task):\n",
" print(result)\n",
" pass\n",
"\n",
" print(f\"\\n✅ Task {i+1}/{len(tasks)} completed: {task}\")"
]
},
{
"cell_type": "markdown",
"id": "eb4edbb5",
"metadata": {},
"source": [
"## 🧐 Benchmark your agent\n",
"\n",
"Test your agent's performance on a selection of tasks from the OSWorld benchmark."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6bf0887e",
"metadata": {},
"outputs": [],
"source": [
"import uuid\n",
"from pprint import pprint\n",
"from agent.integrations.hud import run_full_dataset\n",
"\n",
"job_name = f\"osworld-test-{str(uuid.uuid4())[:4]}\"\n",
"\n",
"# Full dataset evaluation (runs via HUD's run_dataset under the hood)\n",
"# See the documentation here: https://docs.trycua.com/docs/agent-sdk/integrations/hud#running-a-full-dataset\n",
"results = await run_full_dataset(\n",
" dataset=\"ddupont/OSWorld-Tiny-Public\",\n",
" job_name=job_name,\n",
" **agent_config,\n",
" max_concurrent=20,\n",
" max_steps=50,\n",
" #split=\"train[:5]\"\n",
")\n",
"\n",
"# results is a list from hud.datasets.run_dataset; inspect/aggregate as needed\n",
"print(f\"Job: {job_name}\")\n",
"print(f\"Total results: {len(results)}\")\n",
"pprint(results[:3])"
]
},
{
"cell_type": "markdown",
"id": "5b89a103",
"metadata": {},
"source": [
"## 🦾 Improve your agent\n",
"\n",
"To improve your agent for OSWorld-Verified, experiment with different models and add custom tools that fit your use case. You can also dive into the ComputerAgent source code to design an improved version or subclass tailored to your needs.\n",
"\n",
"Learn more about [Customizing Your ComputerAgent](https://docs.trycua.com/docs/agent-sdk/customizing-computeragent) in the docs."
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}