Files
computer/docs/content/docs/agent-sdk/customizing-computeragent.mdx
T

131 lines
4.1 KiB
Plaintext

---
title: Customizing Your ComputerAgent
---
<Callout>
A corresponding{' '}
<a
href="https://github.com/trycua/cua/blob/main/notebooks/customizing_computeragent.ipynb"
target="_blank"
>
Jupyter Notebook
</a>{' '}
is available for this documentation.
</Callout>
The `ComputerAgent` interface provides an easy proxy to any computer-using model configuration, and it is a powerful framework for extending and building your own agentic systems.
This guide shows four proven ways to increase capabilities and success rate:
- 1 — Simple: Prompt engineering
- 2 — Easy: Tools
- 3 — Intermediate: Callbacks
- 4 — Expert: Custom `@register_agent`
## 1) Simple: Prompt engineering
Provide guiding instructions to shape behavior. `ComputerAgent` accepts an optional `instructions: str | None` which acts like a system-style preface. Internally, this uses a callback that pre-pends a user message before each LLM call.
```python
from agent.agent import ComputerAgent
agent = ComputerAgent(
model="openai/computer-use-preview",
tools=[computer],
instructions=(
"You are a meticulous software operator. Prefer safe, deterministic actions. "
"Always confirm via on-screen text before proceeding."
),
)
```
## 2) Easy: Tools
Expose deterministic capabilities as tools (Python functions or custom computer handlers). The agent will call them when appropriate.
```python
def calculate_percentage(numerator: float, denominator: float) -> str:
"""Calculate percentage as a string.
Args:
numerator: Numerator value
denominator: Denominator value
Returns:
A formatted percentage string (e.g., '75.00%').
"""
if denominator == 0:
return "0.00%"
return f"{(numerator/denominator)*100:.2f}%"
agent = ComputerAgent(
model="openai/computer-use-preview",
tools=[computer, calculate_percentage],
)
```
- See `docs/agent-sdk/custom-tools` for authoring function tools.
- See `docs/agent-sdk/custom-computer-handlers` for building full computer interfaces.
## 3) Intermediate: Callbacks
Callbacks provide lifecycle hooks to preprocess messages, postprocess outputs, record trajectories, manage costs, and more.
```python
from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback, BudgetManagerCallback
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
callbacks=[
ImageRetentionCallback(only_n_most_recent_images=3),
TrajectorySaverCallback("./trajectories"),
BudgetManagerCallback(max_budget=10.0, raise_error=True),
],
)
```
- Browse implementations in `libs/python/agent/agent/loops/`.
## 4) Expert: Custom `@register_agent`
Build your own agent configuration class to control prompting, message shaping, and tool handling. This is the most flexible option for specialized domains.
- Register your own `model=...` loop using `@register_agent`
- Browse implementations in `libs/python/agent/agent/loops/`.
- Implement `predict_step()` (and optionally `predict_click()`) and return the standardized output schema.
```python
from agent.decorators import register_agent
@register_agent(models=r".*my-special-model.*", priority=10)
class MyCustomAgentConfig:
async def predict_step(self, messages, model, tools, **kwargs):
# 1) Format messages for your provider
# 2) Call provider
# 3) Convert responses to the agent output schema
return {"output": [], "usage": {}}
async def predict_click(self, model, image_b64, instruction):
# Optional: click-only capability
return None
def get_capabilities(self):
return ["step"]
```
## HUD integration (optional)
When using the HUD evaluation integration (`agent/integrations/hud/`), you can pass `instructions`, `tools`, and `callbacks` directly
```python
from agent.integrations.hud import run_single_task
await run_single_task(
dataset="username/dataset-name",
model="openai/computer-use-preview",
instructions="Operate carefully. Always verify on-screen text before actions.",
# tools=[your_custom_function],
# callbacks=[YourCustomCallback()],
)
```