computer/notebooks/hud_hackathon.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a5d6b2ed",
   "metadata": {},
   "source": [
    "# Computer-Use Agents SOTA Challenge\n",
    "\n",
    "Congrats on joining the Cua + HUD hackathon at Hack The North 2025!\n",
    "\n",
    "This notebook will show you how to create a computer use agent with Cua and evaluate it using HUD."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19f92431",
   "metadata": {},
   "source": [
    "## ☁️ Connect to cloud services\n",
    "\n",
    "Create Cua and HUD accounts and load your API keys. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47171dc3",
   "metadata": {},
   "source": [
    "1. Create a Cua account at https://www.trycua.com/\n",
    "2. Start a Cua container at https://www.trycua.com/dashboard/containers\n",
    "3. Create a HUD account at https://www.hud.so/\n",
    "4. Create a .env file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1757f145",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a .env file if it doesn't exist\n",
    "\n",
    "ENV_TEMPLATE = \"\"\"# Required environment variables:\n",
    "CUA_API_KEY=\n",
    "CUA_CONTAINER_NAME=\n",
    "HUD_API_KEY=\n",
    "\n",
    "# Any LLM provider will work:\n",
    "ANTHROPIC_API_KEY=\n",
    "OPENAI_API_KEY=\n",
    "\"\"\"\n",
    "\n",
    "import os\n",
    "if not os.path.exists(\".env\"):\n",
    "    open(\".env\", \"w\").write(ENV_TEMPLATE)\n",
    "    print(\"A .env file was created! Fill in the empty values.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0949908d",
   "metadata": {},
   "source": [
    "5. Fill in all missing values in the .env file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f23828d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the .env file\n",
    "# HUD requires the .env file to be in the same directory\n",
    "\n",
    "from dotenv import load_dotenv\n",
    "load_dotenv(dotenv_path='.env', override=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c8bef64",
   "metadata": {},
   "source": [
    "## 🤖 Create a Computer Use Agent\n",
    "\n",
    "Create and run a computer use agent using the Cua SDK."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54338496",
   "metadata": {},
   "source": [
    "Connect to your running Cua container using the Cua SDK and initialize an agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cd4393b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import os\n",
    "from pathlib import Path\n",
    "from agent import ComputerAgent\n",
    "from computer import Computer, VMProviderType\n",
    "\n",
    "api_key = os.getenv(\"CUA_API_KEY\")\n",
    "container_name = os.getenv(\"CUA_CONTAINER_NAME\")\n",
    "assert api_key and container_name\n",
    "\n",
    "# Connect to your existing cloud container\n",
    "computer = Computer(\n",
    "    os_type=\"linux\",\n",
    "    provider_type=VMProviderType.CLOUD,\n",
    "    api_key=api_key,\n",
    "    name=container_name,\n",
    "    verbosity=logging.INFO\n",
    ")\n",
    "\n",
    "agent_config = {\n",
    "    \"model\": \"openai/computer-use-preview\",\n",
    "    \"tools\": [computer],\n",
    "    \"trajectory_dir\": str(Path(\"trajectories\")),\n",
    "    \"only_n_most_recent_images\": 3,\n",
    "    \"verbosity\": logging.INFO\n",
    "}\n",
    "\n",
    "# Create agent\n",
    "agent = ComputerAgent(**agent_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12b9c22c",
   "metadata": {},
   "source": [
    "Try running the computer use agent on a simple task.\n",
    "\n",
    "Trajectories are saved in the format: `trajectories/YYYY-MM-DD_computer-use-pre_XXX`.\n",
    "\n",
    "To view a replay of the agent's actions, upload the trajectory to the [trajectory viewer](https://www.trycua.com/trajectory-viewer).\n",
    "\n",
    "You can also connect to an agent through VNC on the [Cua Dashboard](https://www.trycua.com/dashboard)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3a32ea8",
   "metadata": {},
   "outputs": [],
   "source": [
    "tasks = [\n",
    "    \"Look for a repository named trycua/cua on GitHub.\"\n",
    "]\n",
    "\n",
    "for i, task in enumerate(tasks):\n",
    "    print(f\"\\nExecuting task {i}/{len(tasks)}: {task}\")\n",
    "    async for result in agent.run(task):\n",
    "        print(result)\n",
    "        pass\n",
    "\n",
    "    print(f\"\\n✅ Task {i+1}/{len(tasks)} completed: {task}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb4edbb5",
   "metadata": {},
   "source": [
    "## 🧐 Evaluate the Agent with HUD\n",
    "\n",
    "Test your agent's performance on a selection of tasks from the OSWorld benchmark."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6bf0887e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import uuid\n",
    "from pprint import pprint\n",
    "from agent.integrations.hud import run_full_dataset\n",
    "\n",
    "# Full dataset evaluation (runs via HUD's run_dataset under the hood)\n",
    "job_name = f\"osworld-test-{str(uuid.uuid4())[:4]}\"\n",
    "\n",
    "results = await run_full_dataset(\n",
    "    dataset=\"ddupont/OSWorld-Tiny-Public\",          # You can also pass a Dataset or a list[dict]\n",
    "    job_name=job_name,                   # Optional; defaults to a timestamp for custom datasets\n",
    "    **agent_config,\n",
    "    max_concurrent=20,                   # Tune to your infra\n",
    "    max_steps=50,                        # Safety cap per task\n",
    "    #split=\"train[:5]\"                   # Limit to just 5 tasks\n",
    ")\n",
    "\n",
    "# results is a list from hud.datasets.run_dataset; inspect/aggregate as needed\n",
    "print(f\"Job: {job_name}\")\n",
    "print(f\"Total results: {len(results)}\")\n",
    "pprint(results[:3])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b89a103",
   "metadata": {},
   "source": [
    "# 🦾 Improve your Agent\n",
    "\n",
    "To improve your agent for OSWorld-Verified, experiment with different models and add custom tools that fit your use case. You can also dive into the ComputerAgent source code to design an improved version or subclass tailored to your needs.\n",
    "\n",
    "Learn more about [Customizing Your ComputerAgent](https://docs.trycua.com/docs/agent-sdk/customizing-computeragent) in the docs."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}