mirror of
https://github.com/trycua/computer.git
synced 2026-02-18 12:28:51 -06:00
added notebook
This commit is contained in:
194
notebooks/customizing_computeragent.ipynb
Normal file
194
notebooks/customizing_computeragent.ipynb
Normal file
@@ -0,0 +1,194 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Customizing Your ComputerAgent\n\n",
|
||||
"This notebook demonstrates four practical ways to increase the capabilities and success rate of your `ComputerAgent` in the Agent SDK:\n\n",
|
||||
"1. Simple: Prompt engineering (via optional `instructions`)\n",
|
||||
"2. Easy: Tools (function tools and custom computer tools)\n",
|
||||
"3. Intermediate: Callbacks\n",
|
||||
"4. Expert: Custom `@register_agent` loops\n\n",
|
||||
"> Tip: The same patterns work in scripts and services — the notebook just makes it easy to iterate."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n\n",
|
||||
"We'll import `ComputerAgent`, a simple computer shim, and some utilities."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import logging\n",
|
||||
"from agent.agent import ComputerAgent\n",
|
||||
"from agent.callbacks import PromptInstructionsCallback, LoggingCallback\n",
|
||||
"\n",
|
||||
"# A very small computer shim for demo purposes (for full computer handlers, see docs)\n",
|
||||
"class DummyComputer:\n",
|
||||
" async def screenshot(self):\n",
|
||||
" # Return a 1x1 transparent PNG as base64 string (placeholder)\n",
|
||||
" import base64\n",
|
||||
" png_bytes = base64.b64decode(\"iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8Xw8AAr8B9k2m0oYAAAAASUVORK5CYII=\")\n",
|
||||
" return base64.b64encode(png_bytes).decode()\n",
|
||||
"\n",
|
||||
" async def click(self, x: int, y: int):\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" async def type(self, text: str):\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
"computer = DummyComputer()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 1) Simple: Prompt engineering\n\n",
|
||||
"You can guide your agent with system-like `instructions`.\n\n",
|
||||
"Under the hood, `ComputerAgent(instructions=...)` adds a `PromptInstructionsCallback` that prepends a user message before each LLM call.\n\n",
|
||||
"This mirrors the recommended snippet in code:\n\n",
|
||||
"```python\n",
|
||||
"effective_input = full_input\n",
|
||||
"if instructions:\n",
|
||||
" effective_input = [{\"role\": \"user\", \"content\": instructions}] + full_input\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"instructions = (\n",
|
||||
" \"You are a meticulous software operator. Prefer safe, deterministic actions. \"\n",
|
||||
" \"Always confirm via on-screen text before proceeding.\"\n",
|
||||
")\n",
|
||||
"agent = ComputerAgent(\n",
|
||||
" model=\"openai/computer-use-preview\",\n",
|
||||
" tools=[computer],\n",
|
||||
" instructions=instructions,\n",
|
||||
" callbacks=[LoggingCallback(level=logging.INFO)],\n",
|
||||
")\n",
|
||||
"messages = [\n",
|
||||
" {\"role\": \"user\", \"content\": \"Open the settings and turn on dark mode.\"}\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"# In notebooks, you may want to consume the async generator\n",
|
||||
"import asyncio\n",
|
||||
"async def run_once():\n",
|
||||
" async for chunk in agent.run(messages):\n",
|
||||
" # Print any assistant text outputs\n",
|
||||
" for item in chunk.get(\"output\", []):\n",
|
||||
" if item.get(\"type\") == \"message\":\n",
|
||||
" for c in item.get(\"content\", []):\n",
|
||||
" if c.get(\"text\"):\n",
|
||||
" print(c.get(\"text\"))\n",
|
||||
"\n",
|
||||
"await run_once()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2) Easy: Tools\n\n",
|
||||
"Add function tools to expose deterministic capabilities. Tools are auto-extracted to schemas and callable by the agent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def calculate_percentage(numerator: float, denominator: float) -> str:\n",
|
||||
" \"\"\"Calculate a percentage string.\n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" numerator: Numerator value\n",
|
||||
" denominator: Denominator value\n",
|
||||
" Returns:\n",
|
||||
" A formatted percentage string (e.g., '75.00%').\n",
|
||||
" \"\"\"\n",
|
||||
" if denominator == 0:\n",
|
||||
" return \"0.00%\"\n",
|
||||
" return f\"{(numerator/denominator)*100:.2f}%\"\n",
|
||||
"\n",
|
||||
"agent_with_tool = ComputerAgent(\n",
|
||||
" model=\"openai/computer-use-preview\",\n",
|
||||
" tools=[computer, calculate_percentage],\n",
|
||||
" instructions=\"When doing math, prefer the `calculate_percentage` tool when relevant.\",\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3) Intermediate: Callbacks\n\n",
|
||||
"Callbacks offer lifecycle hooks. For example, limit recent images or record trajectories."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback\n",
|
||||
"\n",
|
||||
"agent_with_callbacks = ComputerAgent(\n",
|
||||
" model=\"anthropic/claude-3-5-sonnet-20241022\",\n",
|
||||
" tools=[computer],\n",
|
||||
" callbacks=[\n",
|
||||
" ImageRetentionCallback(only_n_most_recent_images=3),\n",
|
||||
" TrajectorySaverCallback(\"./trajectories\"),\n",
|
||||
" ],\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 4) Expert: Custom `@register_agent`\n\n",
|
||||
"Register custom agent configs that implement `predict_step` (and optionally `predict_click`). This gives you full control over prompting, message shaping, and tool wiring.\n\n",
|
||||
"See: `libs/python/agent/agent/loops/` for concrete examples."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next steps\n\n",
|
||||
"- Start with `instructions` for fast wins.\n",
|
||||
"- Add function tools for determinism and reliability.\n",
|
||||
"- Use callbacks to manage cost, logs, and safety.\n",
|
||||
"- Build custom loops for specialized domains."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python",
|
||||
"version": "3.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
Reference in New Issue
Block a user