{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Customizing Your ComputerAgent\n", "\n", "This notebook demonstrates four practical ways to increase the capabilities and success rate of your `ComputerAgent` in the Agent SDK:\n", "\n", "1. Simple: Prompt engineering (via optional `instructions`)\n", "2. Easy: Tools (function tools and custom computer tools)\n", "3. Intermediate: Callbacks\n", "4. Expert: Custom `@register_agent` loops\n", "\n", "> Tip: The same patterns work in scripts and services — the notebook just makes it easy to iterate." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "We'll import `ComputerAgent`, a simple Docker-based computer, and some utilities." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import logging\n", "from agent.agent import ComputerAgent\n", "from agent.callbacks import LoggingCallback\n", "from computer import Computer\n", "\n", "computer = Computer(\n", " os_type=\"linux\",\n", " provider_type=\"docker\",\n", " image=\"trycua/cua-ubuntu:latest\",\n", " name=\"my-cua-container\"\n", ")\n", "\n", "await computer.run() # Launch & connect to Docker container" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1) Simple: Prompt engineering\n", "\n", "You can guide your agent with system-like `instructions`.\n", "\n", "Under the hood, `ComputerAgent(instructions=...)` adds a `PromptInstructionsCallback` that prepends a user message before each LLM call.\n", "\n", "This mirrors the recommended snippet in code:\n", "\n", "```python\n", "effective_input = full_input\n", "if instructions:\n", " effective_input = [{\"role\": \"user\", \"content\": instructions}] + full_input\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "instructions = (\n", " \"You are a meticulous software operator. Prefer safe, deterministic actions. \"\n", " \"Always confirm via on-screen text before proceeding.\"\n", ")\n", "agent = ComputerAgent(\n", " model=\"openai/computer-use-preview\",\n", " tools=[computer],\n", " instructions=instructions,\n", " callbacks=[LoggingCallback(level=logging.INFO)],\n", ")\n", "messages = [\n", " {\"role\": \"user\", \"content\": \"Open the settings and turn on dark mode.\"}\n", "]\n", "\n", "# In notebooks, you may want to consume the async generator\n", "import asyncio\n", "async def run_once():\n", " async for chunk in agent.run(messages):\n", " # Print any assistant text outputs\n", " for item in chunk.get(\"output\", []):\n", " if item.get(\"type\") == \"message\":\n", " for c in item.get(\"content\", []):\n", " if c.get(\"text\"):\n", " print(c.get(\"text\"))\n", "\n", "await run_once()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2) Easy: Tools\n", "\n", "Add function tools to expose deterministic capabilities. Tools are auto-extracted to schemas and callable by the agent." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def calculate_percentage(numerator: float, denominator: float) -> str:\n", " \"\"\"Calculate a percentage string.\n", "\n", " Args:\n", " numerator: Numerator value\n", " denominator: Denominator value\n", " Returns:\n", " A formatted percentage string (e.g., '75.00%').\n", " \"\"\"\n", " if denominator == 0:\n", " return \"0.00%\"\n", " return f\"{(numerator/denominator)*100:.2f}%\"\n", "\n", "agent_with_tool = ComputerAgent(\n", " model=\"openai/computer-use-preview\",\n", " tools=[computer, calculate_percentage],\n", " instructions=\"When doing math, prefer the `calculate_percentage` tool when relevant.\",\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3) Intermediate: Callbacks\n", "\n", "Callbacks offer lifecycle hooks. For example, limit recent images or record trajectories." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback\n", "\n", "agent_with_callbacks = ComputerAgent(\n", " model=\"anthropic/claude-3-5-sonnet-20241022\",\n", " tools=[computer],\n", " callbacks=[\n", " ImageRetentionCallback(only_n_most_recent_images=3),\n", " TrajectorySaverCallback(\"./trajectories\"),\n", " ],\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4) Expert: Custom `@register_agent`\n", "\n", "Register custom agent configs that implement `predict_step` (and optionally `predict_click`). This gives you full control over prompting, message shaping, and tool wiring.\n", "\n", "See: `libs/python/agent/agent/loops/` for concrete examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Next steps\n", "\n", "- Start with `instructions` for fast wins.\n", "- Add function tools for determinism and reliability.\n", "- Use callbacks to manage cost, logs, and safety.\n", "- Build custom loops for specialized domains." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.10" } }, "nbformat": 4, "nbformat_minor": 5 }