mirror of
https://github.com/trycua/computer.git
synced 2026-01-04 04:19:57 -06:00
Merge pull request #400 from trycua/docs/tips
Add simple guide for customizing computeragent
This commit is contained in:
@@ -75,13 +75,7 @@ messages = [
|
||||
|
||||
## Message Types
|
||||
|
||||
- **user**: User input messages
|
||||
- **computer_call**: Computer actions (click, type, keypress, etc.)
|
||||
- **computer_call_output**: Results from computer actions (usually screenshots)
|
||||
- **function_call**: Function calls (e.g., `computer.call`)
|
||||
- **function_call_output**: Results from function calls
|
||||
- **reasoning**: Agent's internal reasoning and planning
|
||||
- **message**: Agent text responses
|
||||
See the complete schema in [Message Format](./message-format).
|
||||
|
||||
### Memory Management
|
||||
|
||||
|
||||
121
docs/content/docs/agent-sdk/customizing-computeragent.mdx
Normal file
121
docs/content/docs/agent-sdk/customizing-computeragent.mdx
Normal file
@@ -0,0 +1,121 @@
|
||||
---
|
||||
title: Customizing Your ComputerAgent
|
||||
---
|
||||
|
||||
<Callout>A corresponding <a href="https://github.com/trycua/cua/blob/main/notebooks/customizing_computeragent.ipynb" target="_blank">Jupyter Notebook</a> is available for this documentation.</Callout>
|
||||
|
||||
The `ComputerAgent` interface provides an easy proxy to any computer-using model configuration, and it is a powerful framework for extending and building your own agentic systems.
|
||||
|
||||
This guide shows four proven ways to increase capabilities and success rate:
|
||||
|
||||
- 1 — Simple: Prompt engineering
|
||||
- 2 — Easy: Tools
|
||||
- 3 — Intermediate: Callbacks
|
||||
- 4 — Expert: Custom `@register_agent`
|
||||
|
||||
## 1) Simple: Prompt engineering
|
||||
|
||||
Provide guiding instructions to shape behavior. `ComputerAgent` accepts an optional `instructions: str | None` which acts like a system-style preface. Internally, this uses a callback that pre-pends a user message before each LLM call.
|
||||
|
||||
```python
|
||||
from agent.agent import ComputerAgent
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="openai/computer-use-preview",
|
||||
tools=[computer],
|
||||
instructions=(
|
||||
"You are a meticulous software operator. Prefer safe, deterministic actions. "
|
||||
"Always confirm via on-screen text before proceeding."
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
## 2) Easy: Tools
|
||||
|
||||
Expose deterministic capabilities as tools (Python functions or custom computer handlers). The agent will call them when appropriate.
|
||||
|
||||
```python
|
||||
def calculate_percentage(numerator: float, denominator: float) -> str:
|
||||
"""Calculate percentage as a string.
|
||||
|
||||
Args:
|
||||
numerator: Numerator value
|
||||
denominator: Denominator value
|
||||
Returns:
|
||||
A formatted percentage string (e.g., '75.00%').
|
||||
"""
|
||||
if denominator == 0:
|
||||
return "0.00%"
|
||||
return f"{(numerator/denominator)*100:.2f}%"
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="openai/computer-use-preview",
|
||||
tools=[computer, calculate_percentage],
|
||||
)
|
||||
```
|
||||
|
||||
- See `docs/agent-sdk/custom-tools` for authoring function tools.
|
||||
- See `docs/agent-sdk/custom-computer-handlers` for building full computer interfaces.
|
||||
|
||||
## 3) Intermediate: Callbacks
|
||||
|
||||
Callbacks provide lifecycle hooks to preprocess messages, postprocess outputs, record trajectories, manage costs, and more.
|
||||
|
||||
```python
|
||||
from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback, BudgetManagerCallback
|
||||
|
||||
agent = ComputerAgent(
|
||||
model="anthropic/claude-3-5-sonnet-20241022",
|
||||
tools=[computer],
|
||||
callbacks=[
|
||||
ImageRetentionCallback(only_n_most_recent_images=3),
|
||||
TrajectorySaverCallback("./trajectories"),
|
||||
BudgetManagerCallback(max_budget=10.0, raise_error=True),
|
||||
],
|
||||
)
|
||||
```
|
||||
|
||||
- Browse implementations in `libs/python/agent/agent/loops/`.
|
||||
|
||||
## 4) Expert: Custom `@register_agent`
|
||||
|
||||
Build your own agent configuration class to control prompting, message shaping, and tool handling. This is the most flexible option for specialized domains.
|
||||
|
||||
- Register your own `model=...` loop using `@register_agent`
|
||||
- Browse implementations in `libs/python/agent/agent/loops/`.
|
||||
- Implement `predict_step()` (and optionally `predict_click()`) and return the standardized output schema.
|
||||
|
||||
```python
|
||||
from agent.decorators import register_agent
|
||||
|
||||
@register_agent(models=r".*my-special-model.*", priority=10)
|
||||
class MyCustomAgentConfig:
|
||||
async def predict_step(self, messages, model, tools, **kwargs):
|
||||
# 1) Format messages for your provider
|
||||
# 2) Call provider
|
||||
# 3) Convert responses to the agent output schema
|
||||
return {"output": [], "usage": {}}
|
||||
|
||||
async def predict_click(self, model, image_b64, instruction):
|
||||
# Optional: click-only capability
|
||||
return None
|
||||
|
||||
def get_capabilities(self):
|
||||
return ["step"]
|
||||
```
|
||||
|
||||
## HUD integration (optional)
|
||||
|
||||
When using the HUD evaluation integration (`agent/integrations/hud/`), you can pass `instructions`, `tools`, and `callbacks` directly
|
||||
|
||||
```python
|
||||
from agent.integrations.hud import run_single_task
|
||||
|
||||
await run_single_task(
|
||||
dataset="username/dataset-name",
|
||||
model="openai/computer-use-preview",
|
||||
instructions="Operate carefully. Always verify on-screen text before actions.",
|
||||
# tools=[your_custom_function],
|
||||
# callbacks=[YourCustomCallback()],
|
||||
)
|
||||
```
|
||||
201
docs/content/docs/agent-sdk/message-format.mdx
Normal file
201
docs/content/docs/agent-sdk/message-format.mdx
Normal file
@@ -0,0 +1,201 @@
|
||||
---
|
||||
title: Message Format
|
||||
---
|
||||
|
||||
This page documents the Python message and response schema used by the Agent SDK.
|
||||
It mirrors the structure shown in Chat History and provides precise type definitions you can target in your own code.
|
||||
|
||||
All examples below use Python type hints with `TypedDict` and `Literal` from the standard `typing` module.
|
||||
|
||||
## Response
|
||||
|
||||
The agent yields response chunks as an async generator of objects with `output` and `usage`.
|
||||
|
||||
```python
|
||||
from typing import List, TypedDict
|
||||
|
||||
class Usage(TypedDict, total=False):
|
||||
prompt_tokens: int
|
||||
completion_tokens: int
|
||||
total_tokens: int
|
||||
response_cost: float # USD cost if available
|
||||
|
||||
class AgentResponse(TypedDict):
|
||||
output: List["AgentMessage"]
|
||||
usage: Usage
|
||||
```
|
||||
|
||||
## Messages
|
||||
|
||||
Agent messages represent the state of the conversation and the agent's actions.
|
||||
|
||||
```python
|
||||
from typing import List, Literal, Optional, TypedDict, Union
|
||||
|
||||
# Union of all message variants
|
||||
AgentMessage = Union[
|
||||
"UserMessage",
|
||||
"AssistantMessage",
|
||||
"ReasoningMessage",
|
||||
"ComputerCallMessage",
|
||||
"ComputerCallOutputMessage",
|
||||
"FunctionCallMessage",
|
||||
"FunctionCallOutputMessage",
|
||||
]
|
||||
|
||||
# Input message (role: user/system/developer)
|
||||
class UserMessage(TypedDict, total=False):
|
||||
type: Literal["message"] # optional for user input
|
||||
role: Literal["user", "system", "developer"]
|
||||
content: Union[str, List["InputContent"]]
|
||||
|
||||
# Output message (assistant text)
|
||||
class AssistantMessage(TypedDict):
|
||||
type: Literal["message"]
|
||||
role: Literal["assistant"]
|
||||
content: List["OutputContent"]
|
||||
|
||||
# Output reasoning/thinking message
|
||||
class ReasoningMessage(TypedDict):
|
||||
type: Literal["reasoning"]
|
||||
summary: List["SummaryContent"]
|
||||
|
||||
# Output computer action call (agent intends to act)
|
||||
class ComputerCallMessage(TypedDict):
|
||||
type: Literal["computer_call"]
|
||||
call_id: str
|
||||
status: Literal["completed", "failed", "pending"]
|
||||
action: "ComputerAction"
|
||||
|
||||
# Output computer action result (always a screenshot)
|
||||
class ComputerCallOutputMessage(TypedDict):
|
||||
type: Literal["computer_call_output"]
|
||||
call_id: str
|
||||
output: "ComputerResultContent"
|
||||
|
||||
# Output function call (agent calls a Python tool)
|
||||
class FunctionCallMessage(TypedDict):
|
||||
type: Literal["function_call"]
|
||||
call_id: str
|
||||
status: Literal["completed", "failed", "pending"]
|
||||
name: str
|
||||
arguments: str # JSON-serialized kwargs
|
||||
|
||||
# Output function call result (text)
|
||||
class FunctionCallOutputMessage(TypedDict):
|
||||
type: Literal["function_call_output"]
|
||||
call_id: str
|
||||
output: str
|
||||
```
|
||||
|
||||
## Message Content
|
||||
|
||||
These content items appear inside `content` arrays for the message types above.
|
||||
|
||||
```python
|
||||
# Input content kinds
|
||||
class InputContent(TypedDict):
|
||||
type: Literal["input_image", "input_text"]
|
||||
text: Optional[str]
|
||||
image_url: Optional[str] # e.g., data URL
|
||||
|
||||
# Assistant output content
|
||||
class OutputContent(TypedDict):
|
||||
type: Literal["output_text"]
|
||||
text: str
|
||||
|
||||
# Reasoning/summary output content
|
||||
class SummaryContent(TypedDict):
|
||||
type: Literal["summary_text"]
|
||||
text: str
|
||||
|
||||
# Computer call outputs (screenshots)
|
||||
class ComputerResultContent(TypedDict):
|
||||
type: Literal["computer_screenshot", "input_image"]
|
||||
image_url: str # data URL (e.g., "data:image/png;base64,....")
|
||||
```
|
||||
|
||||
## Actions
|
||||
|
||||
Computer actions represent concrete operations the agent will perform on the computer.
|
||||
|
||||
Two broad families exist depending on the provider: OpenAI-style and Anthropic-style.
|
||||
|
||||
```python
|
||||
# Union of all supported computer actions
|
||||
ComputerAction = Union[
|
||||
"ClickAction",
|
||||
"DoubleClickAction",
|
||||
"DragAction",
|
||||
"KeyPressAction",
|
||||
"MoveAction",
|
||||
"ScreenshotAction",
|
||||
"ScrollAction",
|
||||
"TypeAction",
|
||||
"WaitAction",
|
||||
# Anthropic variants
|
||||
"LeftMouseDownAction",
|
||||
"LeftMouseUpAction",
|
||||
]
|
||||
|
||||
# OpenAI Computer Actions
|
||||
class ClickAction(TypedDict):
|
||||
type: Literal["click"]
|
||||
button: Literal["left", "right", "wheel", "back", "forward"]
|
||||
x: int
|
||||
y: int
|
||||
|
||||
class DoubleClickAction(TypedDict, total=False):
|
||||
type: Literal["double_click"]
|
||||
button: Literal["left", "right", "wheel", "back", "forward"]
|
||||
x: int
|
||||
y: int
|
||||
|
||||
class DragAction(TypedDict, total=False):
|
||||
type: Literal["drag"]
|
||||
button: Literal["left", "right", "wheel", "back", "forward"]
|
||||
path: List[tuple[int, int]] # [(x1, y1), (x2, y2), ...]
|
||||
|
||||
class KeyPressAction(TypedDict):
|
||||
type: Literal["keypress"]
|
||||
keys: List[str] # e.g., ["ctrl", "a"]
|
||||
|
||||
class MoveAction(TypedDict):
|
||||
type: Literal["move"]
|
||||
x: int
|
||||
y: int
|
||||
|
||||
class ScreenshotAction(TypedDict):
|
||||
type: Literal["screenshot"]
|
||||
|
||||
class ScrollAction(TypedDict):
|
||||
type: Literal["scroll"]
|
||||
scroll_x: int
|
||||
scroll_y: int
|
||||
x: int
|
||||
y: int
|
||||
|
||||
class TypeAction(TypedDict):
|
||||
type: Literal["type"]
|
||||
text: str
|
||||
|
||||
class WaitAction(TypedDict):
|
||||
type: Literal["wait"]
|
||||
|
||||
# Anthropic Computer Actions
|
||||
class LeftMouseDownAction(TypedDict):
|
||||
type: Literal["left_mouse_down"]
|
||||
x: int
|
||||
y: int
|
||||
|
||||
class LeftMouseUpAction(TypedDict):
|
||||
type: Literal["left_mouse_up"]
|
||||
x: int
|
||||
y: int
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- The agent runtime may add provider-specific fields when available (e.g., usage cost). Unknown fields should be ignored for forward compatibility.
|
||||
- Computer action outputs are screenshots as data URLs. For security and storage, some serializers may redact or omit large fields in persisted metadata.
|
||||
- The message flow typically alternates between reasoning, actions, screenshots, and concluding assistant text. See [Chat History](./chat-history) for a step-by-step example.
|
||||
@@ -6,6 +6,8 @@
|
||||
"supported-agents",
|
||||
"supported-model-providers",
|
||||
"chat-history",
|
||||
"message-format",
|
||||
"customizing-computeragent",
|
||||
"callbacks",
|
||||
"custom-tools",
|
||||
"custom-computer-handlers",
|
||||
|
||||
@@ -31,7 +31,8 @@ from .callbacks import (
|
||||
TrajectorySaverCallback,
|
||||
BudgetManagerCallback,
|
||||
TelemetryCallback,
|
||||
OperatorNormalizerCallback
|
||||
OperatorNormalizerCallback,
|
||||
PromptInstructionsCallback,
|
||||
)
|
||||
from .computers import (
|
||||
AsyncComputerHandler,
|
||||
@@ -162,6 +163,7 @@ class ComputerAgent:
|
||||
custom_loop: Optional[Callable] = None,
|
||||
only_n_most_recent_images: Optional[int] = None,
|
||||
callbacks: Optional[List[Any]] = None,
|
||||
instructions: Optional[str] = None,
|
||||
verbosity: Optional[int] = None,
|
||||
trajectory_dir: Optional[str | Path | dict] = None,
|
||||
max_retries: Optional[int] = 3,
|
||||
@@ -180,6 +182,7 @@ class ComputerAgent:
|
||||
custom_loop: Custom agent loop function to use instead of auto-selection
|
||||
only_n_most_recent_images: If set, only keep the N most recent images in message history. Adds ImageRetentionCallback automatically.
|
||||
callbacks: List of AsyncCallbackHandler instances for preprocessing/postprocessing
|
||||
instructions: Optional system instructions to be passed to the model
|
||||
verbosity: Logging level (logging.DEBUG, logging.INFO, etc.). If set, adds LoggingCallback automatically
|
||||
trajectory_dir: If set, saves trajectory data (screenshots, responses) to this directory. Adds TrajectorySaverCallback automatically.
|
||||
max_retries: Maximum number of retries for failed API calls
|
||||
@@ -198,6 +201,7 @@ class ComputerAgent:
|
||||
self.custom_loop = custom_loop
|
||||
self.only_n_most_recent_images = only_n_most_recent_images
|
||||
self.callbacks = callbacks or []
|
||||
self.instructions = instructions
|
||||
self.verbosity = verbosity
|
||||
self.trajectory_dir = trajectory_dir
|
||||
self.max_retries = max_retries
|
||||
@@ -211,6 +215,10 @@ class ComputerAgent:
|
||||
# Prepend operator normalizer callback
|
||||
self.callbacks.insert(0, OperatorNormalizerCallback())
|
||||
|
||||
# Add prompt instructions callback if provided
|
||||
if self.instructions:
|
||||
self.callbacks.append(PromptInstructionsCallback(self.instructions))
|
||||
|
||||
# Add telemetry callback if telemetry_enabled is set
|
||||
if self.telemetry_enabled:
|
||||
if isinstance(self.telemetry_enabled, bool):
|
||||
|
||||
@@ -9,6 +9,7 @@ from .trajectory_saver import TrajectorySaverCallback
|
||||
from .budget_manager import BudgetManagerCallback
|
||||
from .telemetry import TelemetryCallback
|
||||
from .operator_validator import OperatorNormalizerCallback
|
||||
from .prompt_instructions import PromptInstructionsCallback
|
||||
|
||||
__all__ = [
|
||||
"AsyncCallbackHandler",
|
||||
@@ -18,4 +19,5 @@ __all__ = [
|
||||
"BudgetManagerCallback",
|
||||
"TelemetryCallback",
|
||||
"OperatorNormalizerCallback",
|
||||
"PromptInstructionsCallback",
|
||||
]
|
||||
|
||||
@@ -20,6 +20,7 @@ from hud import trace
|
||||
|
||||
from agent.agent import ComputerAgent as BaseComputerAgent
|
||||
from .proxy import FakeAsyncOpenAI
|
||||
from agent.callbacks import PromptInstructionsCallback
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -47,6 +48,7 @@ class ProxyOperatorAgent(OperatorAgent):
|
||||
custom_loop: Any | None = None,
|
||||
only_n_most_recent_images: int | None = None,
|
||||
callbacks: list[Any] | None = None,
|
||||
instructions: str | None = None,
|
||||
verbosity: int | None = None,
|
||||
max_retries: int | None = 3,
|
||||
screenshot_delay: float | int = 0.5,
|
||||
@@ -68,12 +70,17 @@ class ProxyOperatorAgent(OperatorAgent):
|
||||
if tools:
|
||||
agent_tools.extend(tools)
|
||||
|
||||
# Build callbacks, injecting prompt instructions if provided
|
||||
agent_callbacks = list(callbacks or [])
|
||||
if instructions:
|
||||
agent_callbacks.append(PromptInstructionsCallback(instructions))
|
||||
|
||||
computer_agent = BaseComputerAgent(
|
||||
model=model,
|
||||
tools=agent_tools,
|
||||
custom_loop=custom_loop,
|
||||
only_n_most_recent_images=only_n_most_recent_images,
|
||||
callbacks=callbacks,
|
||||
callbacks=agent_callbacks,
|
||||
verbosity=verbosity,
|
||||
trajectory_dir=trajectory_dir,
|
||||
max_retries=max_retries,
|
||||
@@ -96,7 +103,6 @@ class ProxyOperatorAgent(OperatorAgent):
|
||||
# Single-task runner
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
async def run_single_task(
|
||||
dataset: str | Dataset | list[dict[str, Any]],
|
||||
*,
|
||||
@@ -108,6 +114,7 @@ async def run_single_task(
|
||||
custom_loop: Any | None = None,
|
||||
only_n_most_recent_images: int | None = None,
|
||||
callbacks: list[Any] | None = None,
|
||||
instructions: str | None = None,
|
||||
verbosity: int | None = None,
|
||||
trajectory_dir: str | dict | None = None,
|
||||
max_retries: int | None = 3,
|
||||
@@ -140,6 +147,7 @@ async def run_single_task(
|
||||
custom_loop=custom_loop,
|
||||
only_n_most_recent_images=only_n_most_recent_images,
|
||||
callbacks=callbacks,
|
||||
instructions=instructions,
|
||||
verbosity=verbosity,
|
||||
trajectory_dir=trajectory_dir,
|
||||
max_retries=max_retries,
|
||||
@@ -157,7 +165,6 @@ async def run_single_task(
|
||||
# Full-dataset runner
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
async def run_full_dataset(
|
||||
dataset: str | Dataset | list[dict[str, Any]],
|
||||
*,
|
||||
@@ -173,6 +180,7 @@ async def run_full_dataset(
|
||||
custom_loop: Any | None = None,
|
||||
only_n_most_recent_images: int | None = 5,
|
||||
callbacks: list[Any] | None = None,
|
||||
instructions: str | None = None,
|
||||
verbosity: int | None = None,
|
||||
max_retries: int | None = 3,
|
||||
screenshot_delay: float | int = 0.5,
|
||||
@@ -207,6 +215,7 @@ async def run_full_dataset(
|
||||
"custom_loop": custom_loop,
|
||||
"only_n_most_recent_images": only_n_most_recent_images,
|
||||
"callbacks": callbacks,
|
||||
"instructions": instructions,
|
||||
"verbosity": verbosity,
|
||||
"max_retries": max_retries,
|
||||
"screenshot_delay": screenshot_delay,
|
||||
|
||||
201
notebooks/customizing_computeragent.ipynb
Normal file
201
notebooks/customizing_computeragent.ipynb
Normal file
@@ -0,0 +1,201 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Customizing Your ComputerAgent\n",
|
||||
"\n",
|
||||
"This notebook demonstrates four practical ways to increase the capabilities and success rate of your `ComputerAgent` in the Agent SDK:\n",
|
||||
"\n",
|
||||
"1. Simple: Prompt engineering (via optional `instructions`)\n",
|
||||
"2. Easy: Tools (function tools and custom computer tools)\n",
|
||||
"3. Intermediate: Callbacks\n",
|
||||
"4. Expert: Custom `@register_agent` loops\n",
|
||||
"\n",
|
||||
"> Tip: The same patterns work in scripts and services — the notebook just makes it easy to iterate."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"We'll import `ComputerAgent`, a simple Docker-based computer, and some utilities."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import logging\n",
|
||||
"from agent.agent import ComputerAgent\n",
|
||||
"from agent.callbacks import LoggingCallback\n",
|
||||
"from computer import Computer\n",
|
||||
"\n",
|
||||
"computer = Computer(\n",
|
||||
" os_type=\"linux\",\n",
|
||||
" provider_type=\"docker\",\n",
|
||||
" image=\"trycua/cua-ubuntu:latest\",\n",
|
||||
" name=\"my-cua-container\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"await computer.run() # Launch & connect to Docker container"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 1) Simple: Prompt engineering\n",
|
||||
"\n",
|
||||
"You can guide your agent with system-like `instructions`.\n",
|
||||
"\n",
|
||||
"Under the hood, `ComputerAgent(instructions=...)` adds a `PromptInstructionsCallback` that prepends a user message before each LLM call.\n",
|
||||
"\n",
|
||||
"This mirrors the recommended snippet in code:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"effective_input = full_input\n",
|
||||
"if instructions:\n",
|
||||
" effective_input = [{\"role\": \"user\", \"content\": instructions}] + full_input\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"instructions = (\n",
|
||||
" \"You are a meticulous software operator. Prefer safe, deterministic actions. \"\n",
|
||||
" \"Always confirm via on-screen text before proceeding.\"\n",
|
||||
")\n",
|
||||
"agent = ComputerAgent(\n",
|
||||
" model=\"openai/computer-use-preview\",\n",
|
||||
" tools=[computer],\n",
|
||||
" instructions=instructions,\n",
|
||||
" callbacks=[LoggingCallback(level=logging.INFO)],\n",
|
||||
")\n",
|
||||
"messages = [\n",
|
||||
" {\"role\": \"user\", \"content\": \"Open the settings and turn on dark mode.\"}\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"# In notebooks, you may want to consume the async generator\n",
|
||||
"import asyncio\n",
|
||||
"async def run_once():\n",
|
||||
" async for chunk in agent.run(messages):\n",
|
||||
" # Print any assistant text outputs\n",
|
||||
" for item in chunk.get(\"output\", []):\n",
|
||||
" if item.get(\"type\") == \"message\":\n",
|
||||
" for c in item.get(\"content\", []):\n",
|
||||
" if c.get(\"text\"):\n",
|
||||
" print(c.get(\"text\"))\n",
|
||||
"\n",
|
||||
"await run_once()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2) Easy: Tools\n",
|
||||
"\n",
|
||||
"Add function tools to expose deterministic capabilities. Tools are auto-extracted to schemas and callable by the agent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def calculate_percentage(numerator: float, denominator: float) -> str:\n",
|
||||
" \"\"\"Calculate a percentage string.\n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" numerator: Numerator value\n",
|
||||
" denominator: Denominator value\n",
|
||||
" Returns:\n",
|
||||
" A formatted percentage string (e.g., '75.00%').\n",
|
||||
" \"\"\"\n",
|
||||
" if denominator == 0:\n",
|
||||
" return \"0.00%\"\n",
|
||||
" return f\"{(numerator/denominator)*100:.2f}%\"\n",
|
||||
"\n",
|
||||
"agent_with_tool = ComputerAgent(\n",
|
||||
" model=\"openai/computer-use-preview\",\n",
|
||||
" tools=[computer, calculate_percentage],\n",
|
||||
" instructions=\"When doing math, prefer the `calculate_percentage` tool when relevant.\",\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3) Intermediate: Callbacks\n",
|
||||
"\n",
|
||||
"Callbacks offer lifecycle hooks. For example, limit recent images or record trajectories."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback\n",
|
||||
"\n",
|
||||
"agent_with_callbacks = ComputerAgent(\n",
|
||||
" model=\"anthropic/claude-3-5-sonnet-20241022\",\n",
|
||||
" tools=[computer],\n",
|
||||
" callbacks=[\n",
|
||||
" ImageRetentionCallback(only_n_most_recent_images=3),\n",
|
||||
" TrajectorySaverCallback(\"./trajectories\"),\n",
|
||||
" ],\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 4) Expert: Custom `@register_agent`\n",
|
||||
"\n",
|
||||
"Register custom agent configs that implement `predict_step` (and optionally `predict_click`). This gives you full control over prompting, message shaping, and tool wiring.\n",
|
||||
"\n",
|
||||
"See: `libs/python/agent/agent/loops/` for concrete examples."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"- Start with `instructions` for fast wins.\n",
|
||||
"- Add function tools for determinism and reliability.\n",
|
||||
"- Use callbacks to manage cost, logs, and safety.\n",
|
||||
"- Build custom loops for specialized domains."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python",
|
||||
"version": "3.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
Reference in New Issue
Block a user