computer/notebooks/customizing_computeragent.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Customizing Your ComputerAgent\n",
    "\n",
    "This notebook demonstrates four practical ways to increase the capabilities and success rate of your `ComputerAgent` in the Agent SDK:\n",
    "\n",
    "1. Simple: Prompt engineering (via optional `instructions`)\n",
    "2. Easy: Tools (function tools and custom computer tools)\n",
    "3. Intermediate: Callbacks\n",
    "4. Expert: Custom `@register_agent` loops\n",
    "\n",
    "> Tip: The same patterns work in scripts and services — the notebook just makes it easy to iterate."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "We'll import `ComputerAgent`, a simple Docker-based computer, and some utilities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "from agent.agent import ComputerAgent\n",
    "from agent.callbacks import LoggingCallback\n",
    "from computer import Computer\n",
    "\n",
    "computer = Computer(\n",
    "    os_type=\"linux\",\n",
    "    provider_type=\"docker\",\n",
    "    image=\"trycua/cua-ubuntu:latest\",\n",
    "    name=\"my-cua-container\",\n",
    ")\n",
    "\n",
    "await computer.run()  # Launch & connect to Docker container"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1) Simple: Prompt engineering\n",
    "\n",
    "You can guide your agent with system-like `instructions`.\n",
    "\n",
    "Under the hood, `ComputerAgent(instructions=...)` adds a `PromptInstructionsCallback` that prepends a user message before each LLM call.\n",
    "\n",
    "This mirrors the recommended snippet in code:\n",
    "\n",
    "```python\n",
    "effective_input = full_input\n",
    "if instructions:\n",
    "    effective_input = [{\"role\": \"user\", \"content\": instructions}] + full_input\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "instructions = (\n",
    "    \"You are a meticulous software operator. Prefer safe, deterministic actions. \"\n",
    "    \"Always confirm via on-screen text before proceeding.\"\n",
    ")\n",
    "agent = ComputerAgent(\n",
    "    model=\"openai/computer-use-preview\",\n",
    "    tools=[computer],\n",
    "    instructions=instructions,\n",
    "    callbacks=[LoggingCallback(level=logging.INFO)],\n",
    ")\n",
    "messages = [{\"role\": \"user\", \"content\": \"Open the settings and turn on dark mode.\"}]\n",
    "\n",
    "# In notebooks, you may want to consume the async generator\n",
    "import asyncio\n",
    "\n",
    "\n",
    "async def run_once():\n",
    "    async for chunk in agent.run(messages):\n",
    "        # Print any assistant text outputs\n",
    "        for item in chunk.get(\"output\", []):\n",
    "            if item.get(\"type\") == \"message\":\n",
    "                for c in item.get(\"content\", []):\n",
    "                    if c.get(\"text\"):\n",
    "                        print(c.get(\"text\"))\n",
    "\n",
    "\n",
    "await run_once()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2) Easy: Tools\n",
    "\n",
    "Add function tools to expose deterministic capabilities. Tools are auto-extracted to schemas and callable by the agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def calculate_percentage(numerator: float, denominator: float) -> str:\n",
    "    \"\"\"Calculate a percentage string.\n",
    "\n",
    "    Args:\n",
    "        numerator: Numerator value\n",
    "        denominator: Denominator value\n",
    "    Returns:\n",
    "        A formatted percentage string (e.g., '75.00%').\n",
    "    \"\"\"\n",
    "    if denominator == 0:\n",
    "        return \"0.00%\"\n",
    "    return f\"{(numerator/denominator)*100:.2f}%\"\n",
    "\n",
    "\n",
    "agent_with_tool = ComputerAgent(\n",
    "    model=\"openai/computer-use-preview\",\n",
    "    tools=[computer, calculate_percentage],\n",
    "    instructions=\"When doing math, prefer the `calculate_percentage` tool when relevant.\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3) Intermediate: Callbacks\n",
    "\n",
    "Callbacks offer lifecycle hooks. For example, limit recent images or record trajectories."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback\n",
    "\n",
    "agent_with_callbacks = ComputerAgent(\n",
    "    model=\"anthropic/claude-sonnet-4-5-20250929\",\n",
    "    tools=[computer],\n",
    "    callbacks=[\n",
    "        ImageRetentionCallback(only_n_most_recent_images=3),\n",
    "        TrajectorySaverCallback(\"./trajectories\"),\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4) Expert: Custom `@register_agent`\n",
    "\n",
    "Register custom agent configs that implement `predict_step` (and optionally `predict_click`). This gives you full control over prompting, message shaping, and tool wiring.\n",
    "\n",
    "See: `libs/python/agent/agent/loops/` for concrete examples."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "\n",
    "- Start with `instructions` for fast wins.\n",
    "- Add function tools for determinism and reliability.\n",
    "- Use callbacks to manage cost, logs, and safety.\n",
    "- Build custom loops for specialized domains."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}