From 665e65cb856a5515c04471dde336ce27f6ba48a2 Mon Sep 17 00:00:00 2001 From: Dillon DuPont Date: Tue, 9 Sep 2025 11:00:52 -0400 Subject: [PATCH] Replaced computer shim with Docker computer --- notebooks/customizing_computeragent.ipynb | 65 +++++++++++++---------- 1 file changed, 36 insertions(+), 29 deletions(-) diff --git a/notebooks/customizing_computeragent.ipynb b/notebooks/customizing_computeragent.ipynb index b0234d24..56f0beb9 100644 --- a/notebooks/customizing_computeragent.ipynb +++ b/notebooks/customizing_computeragent.ipynb @@ -4,12 +4,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Customizing Your ComputerAgent\n\n", - "This notebook demonstrates four practical ways to increase the capabilities and success rate of your `ComputerAgent` in the Agent SDK:\n\n", + "# Customizing Your ComputerAgent\n", + "\n", + "This notebook demonstrates four practical ways to increase the capabilities and success rate of your `ComputerAgent` in the Agent SDK:\n", + "\n", "1. Simple: Prompt engineering (via optional `instructions`)\n", "2. Easy: Tools (function tools and custom computer tools)\n", "3. Intermediate: Callbacks\n", - "4. Expert: Custom `@register_agent` loops\n\n", + "4. Expert: Custom `@register_agent` loops\n", + "\n", "> Tip: The same patterns work in scripts and services — the notebook just makes it easy to iterate." ] }, @@ -17,8 +20,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Setup\n\n", - "We'll import `ComputerAgent`, a simple computer shim, and some utilities." + "## Setup\n", + "\n", + "We'll import `ComputerAgent`, a simple Docker-based computer, and some utilities." ] }, { @@ -29,33 +33,31 @@ "source": [ "import logging\n", "from agent.agent import ComputerAgent\n", - "from agent.callbacks import PromptInstructionsCallback, LoggingCallback\n", + "from agent.callbacks import LoggingCallback\n", + "from computer import Computer\n", "\n", - "# A very small computer shim for demo purposes (for full computer handlers, see docs)\n", - "class DummyComputer:\n", - " async def screenshot(self):\n", - " # Return a 1x1 transparent PNG as base64 string (placeholder)\n", - " import base64\n", - " png_bytes = base64.b64decode(\"iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8Xw8AAr8B9k2m0oYAAAAASUVORK5CYII=\")\n", - " return base64.b64encode(png_bytes).decode()\n", + "computer = Computer(\n", + " os_type=\"linux\",\n", + " provider_type=\"docker\",\n", + " image=\"trycua/cua-ubuntu:latest\",\n", + " name=\"my-cua-container\"\n", + ")\n", "\n", - " async def click(self, x: int, y: int):\n", - " pass\n", - "\n", - " async def type(self, text: str):\n", - " pass\n", - "\n", - "computer = DummyComputer()\n" + "await computer.run() # Launch & connect to Docker container" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## 1) Simple: Prompt engineering\n\n", - "You can guide your agent with system-like `instructions`.\n\n", - "Under the hood, `ComputerAgent(instructions=...)` adds a `PromptInstructionsCallback` that prepends a user message before each LLM call.\n\n", - "This mirrors the recommended snippet in code:\n\n", + "## 1) Simple: Prompt engineering\n", + "\n", + "You can guide your agent with system-like `instructions`.\n", + "\n", + "Under the hood, `ComputerAgent(instructions=...)` adds a `PromptInstructionsCallback` that prepends a user message before each LLM call.\n", + "\n", + "This mirrors the recommended snippet in code:\n", + "\n", "```python\n", "effective_input = full_input\n", "if instructions:\n", @@ -101,7 +103,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 2) Easy: Tools\n\n", + "## 2) Easy: Tools\n", + "\n", "Add function tools to expose deterministic capabilities. Tools are auto-extracted to schemas and callable by the agent." ] }, @@ -135,7 +138,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 3) Intermediate: Callbacks\n\n", + "## 3) Intermediate: Callbacks\n", + "\n", "Callbacks offer lifecycle hooks. For example, limit recent images or record trajectories." ] }, @@ -161,8 +165,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 4) Expert: Custom `@register_agent`\n\n", - "Register custom agent configs that implement `predict_step` (and optionally `predict_click`). This gives you full control over prompting, message shaping, and tool wiring.\n\n", + "## 4) Expert: Custom `@register_agent`\n", + "\n", + "Register custom agent configs that implement `predict_step` (and optionally `predict_click`). This gives you full control over prompting, message shaping, and tool wiring.\n", + "\n", "See: `libs/python/agent/agent/loops/` for concrete examples." ] }, @@ -170,7 +176,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Next steps\n\n", + "## Next steps\n", + "\n", "- Start with `instructions` for fast wins.\n", "- Add function tools for determinism and reliability.\n", "- Use callbacks to manage cost, logs, and safety.\n",