Files
computer/notebooks/customizing_computeragent.ipynb
2025-11-13 12:21:51 -05:00

203 lines
5.9 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Customizing Your ComputerAgent\n",
"\n",
"This notebook demonstrates four practical ways to increase the capabilities and success rate of your `ComputerAgent` in the Agent SDK:\n",
"\n",
"1. Simple: Prompt engineering (via optional `instructions`)\n",
"2. Easy: Tools (function tools and custom computer tools)\n",
"3. Intermediate: Callbacks\n",
"4. Expert: Custom `@register_agent` loops\n",
"\n",
"> Tip: The same patterns work in scripts and services — the notebook just makes it easy to iterate."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"We'll import `ComputerAgent`, a simple Docker-based computer, and some utilities."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"from agent.agent import ComputerAgent\n",
"from agent.callbacks import LoggingCallback\n",
"from computer import Computer\n",
"\n",
"computer = Computer(\n",
" os_type=\"linux\",\n",
" provider_type=\"docker\",\n",
" image=\"trycua/cua-ubuntu:latest\",\n",
" name=\"my-cua-container\",\n",
")\n",
"\n",
"await computer.run() # Launch & connect to Docker container"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1) Simple: Prompt engineering\n",
"\n",
"You can guide your agent with system-like `instructions`.\n",
"\n",
"Under the hood, `ComputerAgent(instructions=...)` adds a `PromptInstructionsCallback` that prepends a user message before each LLM call.\n",
"\n",
"This mirrors the recommended snippet in code:\n",
"\n",
"```python\n",
"effective_input = full_input\n",
"if instructions:\n",
" effective_input = [{\"role\": \"user\", \"content\": instructions}] + full_input\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"instructions = (\n",
" \"You are a meticulous software operator. Prefer safe, deterministic actions. \"\n",
" \"Always confirm via on-screen text before proceeding.\"\n",
")\n",
"agent = ComputerAgent(\n",
" model=\"openai/computer-use-preview\",\n",
" tools=[computer],\n",
" instructions=instructions,\n",
" callbacks=[LoggingCallback(level=logging.INFO)],\n",
")\n",
"messages = [{\"role\": \"user\", \"content\": \"Open the settings and turn on dark mode.\"}]\n",
"\n",
"# In notebooks, you may want to consume the async generator\n",
"import asyncio\n",
"\n",
"\n",
"async def run_once():\n",
" async for chunk in agent.run(messages):\n",
" # Print any assistant text outputs\n",
" for item in chunk.get(\"output\", []):\n",
" if item.get(\"type\") == \"message\":\n",
" for c in item.get(\"content\", []):\n",
" if c.get(\"text\"):\n",
" print(c.get(\"text\"))\n",
"\n",
"\n",
"await run_once()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2) Easy: Tools\n",
"\n",
"Add function tools to expose deterministic capabilities. Tools are auto-extracted to schemas and callable by the agent."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def calculate_percentage(numerator: float, denominator: float) -> str:\n",
" \"\"\"Calculate a percentage string.\n",
"\n",
" Args:\n",
" numerator: Numerator value\n",
" denominator: Denominator value\n",
" Returns:\n",
" A formatted percentage string (e.g., '75.00%').\n",
" \"\"\"\n",
" if denominator == 0:\n",
" return \"0.00%\"\n",
" return f\"{(numerator/denominator)*100:.2f}%\"\n",
"\n",
"\n",
"agent_with_tool = ComputerAgent(\n",
" model=\"openai/computer-use-preview\",\n",
" tools=[computer, calculate_percentage],\n",
" instructions=\"When doing math, prefer the `calculate_percentage` tool when relevant.\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3) Intermediate: Callbacks\n",
"\n",
"Callbacks offer lifecycle hooks. For example, limit recent images or record trajectories."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback\n",
"\n",
"agent_with_callbacks = ComputerAgent(\n",
" model=\"anthropic/claude-sonnet-4-5-20250929\",\n",
" tools=[computer],\n",
" callbacks=[\n",
" ImageRetentionCallback(only_n_most_recent_images=3),\n",
" TrajectorySaverCallback(\"./trajectories\"),\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4) Expert: Custom `@register_agent`\n",
"\n",
"Register custom agent configs that implement `predict_step` (and optionally `predict_click`). This gives you full control over prompting, message shaping, and tool wiring.\n",
"\n",
"See: `libs/python/agent/agent/loops/` for concrete examples."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next steps\n",
"\n",
"- Start with `instructions` for fast wins.\n",
"- Add function tools for determinism and reliability.\n",
"- Use callbacks to manage cost, logs, and safety.\n",
"- Build custom loops for specialized domains."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}