mirror of
https://github.com/trycua/computer.git
synced 2026-01-01 11:00:31 -06:00
203 lines
5.9 KiB
Plaintext
203 lines
5.9 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Customizing Your ComputerAgent\n",
|
|
"\n",
|
|
"This notebook demonstrates four practical ways to increase the capabilities and success rate of your `ComputerAgent` in the Agent SDK:\n",
|
|
"\n",
|
|
"1. Simple: Prompt engineering (via optional `instructions`)\n",
|
|
"2. Easy: Tools (function tools and custom computer tools)\n",
|
|
"3. Intermediate: Callbacks\n",
|
|
"4. Expert: Custom `@register_agent` loops\n",
|
|
"\n",
|
|
"> Tip: The same patterns work in scripts and services — the notebook just makes it easy to iterate."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup\n",
|
|
"\n",
|
|
"We'll import `ComputerAgent`, a simple Docker-based computer, and some utilities."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import logging\n",
|
|
"from agent.agent import ComputerAgent\n",
|
|
"from agent.callbacks import LoggingCallback\n",
|
|
"from computer import Computer\n",
|
|
"\n",
|
|
"computer = Computer(\n",
|
|
" os_type=\"linux\",\n",
|
|
" provider_type=\"docker\",\n",
|
|
" image=\"trycua/cua-ubuntu:latest\",\n",
|
|
" name=\"my-cua-container\",\n",
|
|
")\n",
|
|
"\n",
|
|
"await computer.run() # Launch & connect to Docker container"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 1) Simple: Prompt engineering\n",
|
|
"\n",
|
|
"You can guide your agent with system-like `instructions`.\n",
|
|
"\n",
|
|
"Under the hood, `ComputerAgent(instructions=...)` adds a `PromptInstructionsCallback` that prepends a user message before each LLM call.\n",
|
|
"\n",
|
|
"This mirrors the recommended snippet in code:\n",
|
|
"\n",
|
|
"```python\n",
|
|
"effective_input = full_input\n",
|
|
"if instructions:\n",
|
|
" effective_input = [{\"role\": \"user\", \"content\": instructions}] + full_input\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"instructions = (\n",
|
|
" \"You are a meticulous software operator. Prefer safe, deterministic actions. \"\n",
|
|
" \"Always confirm via on-screen text before proceeding.\"\n",
|
|
")\n",
|
|
"agent = ComputerAgent(\n",
|
|
" model=\"openai/computer-use-preview\",\n",
|
|
" tools=[computer],\n",
|
|
" instructions=instructions,\n",
|
|
" callbacks=[LoggingCallback(level=logging.INFO)],\n",
|
|
")\n",
|
|
"messages = [{\"role\": \"user\", \"content\": \"Open the settings and turn on dark mode.\"}]\n",
|
|
"\n",
|
|
"# In notebooks, you may want to consume the async generator\n",
|
|
"import asyncio\n",
|
|
"\n",
|
|
"\n",
|
|
"async def run_once():\n",
|
|
" async for chunk in agent.run(messages):\n",
|
|
" # Print any assistant text outputs\n",
|
|
" for item in chunk.get(\"output\", []):\n",
|
|
" if item.get(\"type\") == \"message\":\n",
|
|
" for c in item.get(\"content\", []):\n",
|
|
" if c.get(\"text\"):\n",
|
|
" print(c.get(\"text\"))\n",
|
|
"\n",
|
|
"\n",
|
|
"await run_once()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 2) Easy: Tools\n",
|
|
"\n",
|
|
"Add function tools to expose deterministic capabilities. Tools are auto-extracted to schemas and callable by the agent."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def calculate_percentage(numerator: float, denominator: float) -> str:\n",
|
|
" \"\"\"Calculate a percentage string.\n",
|
|
"\n",
|
|
" Args:\n",
|
|
" numerator: Numerator value\n",
|
|
" denominator: Denominator value\n",
|
|
" Returns:\n",
|
|
" A formatted percentage string (e.g., '75.00%').\n",
|
|
" \"\"\"\n",
|
|
" if denominator == 0:\n",
|
|
" return \"0.00%\"\n",
|
|
" return f\"{(numerator/denominator)*100:.2f}%\"\n",
|
|
"\n",
|
|
"\n",
|
|
"agent_with_tool = ComputerAgent(\n",
|
|
" model=\"openai/computer-use-preview\",\n",
|
|
" tools=[computer, calculate_percentage],\n",
|
|
" instructions=\"When doing math, prefer the `calculate_percentage` tool when relevant.\",\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 3) Intermediate: Callbacks\n",
|
|
"\n",
|
|
"Callbacks offer lifecycle hooks. For example, limit recent images or record trajectories."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback\n",
|
|
"\n",
|
|
"agent_with_callbacks = ComputerAgent(\n",
|
|
" model=\"anthropic/claude-sonnet-4-5-20250929\",\n",
|
|
" tools=[computer],\n",
|
|
" callbacks=[\n",
|
|
" ImageRetentionCallback(only_n_most_recent_images=3),\n",
|
|
" TrajectorySaverCallback(\"./trajectories\"),\n",
|
|
" ],\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 4) Expert: Custom `@register_agent`\n",
|
|
"\n",
|
|
"Register custom agent configs that implement `predict_step` (and optionally `predict_click`). This gives you full control over prompting, message shaping, and tool wiring.\n",
|
|
"\n",
|
|
"See: `libs/python/agent/agent/loops/` for concrete examples."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Next steps\n",
|
|
"\n",
|
|
"- Start with `instructions` for fast wins.\n",
|
|
"- Add function tools for determinism and reliability.\n",
|
|
"- Use callbacks to manage cost, logs, and safety.\n",
|
|
"- Build custom loops for specialized domains."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"name": "python",
|
|
"version": "3.10"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
} |