mirror of
https://github.com/trycua/computer.git
synced 2026-05-06 23:21:32 -05:00
131 lines
4.1 KiB
Plaintext
131 lines
4.1 KiB
Plaintext
---
|
|
title: Customizing Your ComputerAgent
|
|
---
|
|
|
|
<Callout>
|
|
A corresponding{' '}
|
|
<a
|
|
href="https://github.com/trycua/cua/blob/main/notebooks/customizing_computeragent.ipynb"
|
|
target="_blank"
|
|
>
|
|
Jupyter Notebook
|
|
</a>{' '}
|
|
is available for this documentation.
|
|
</Callout>
|
|
|
|
The `ComputerAgent` interface provides an easy proxy to any computer-using model configuration, and it is a powerful framework for extending and building your own agentic systems.
|
|
|
|
This guide shows four proven ways to increase capabilities and success rate:
|
|
|
|
- 1 — Simple: Prompt engineering
|
|
- 2 — Easy: Tools
|
|
- 3 — Intermediate: Callbacks
|
|
- 4 — Expert: Custom `@register_agent`
|
|
|
|
## 1) Simple: Prompt engineering
|
|
|
|
Provide guiding instructions to shape behavior. `ComputerAgent` accepts an optional `instructions: str | None` which acts like a system-style preface. Internally, this uses a callback that pre-pends a user message before each LLM call.
|
|
|
|
```python
|
|
from agent.agent import ComputerAgent
|
|
|
|
agent = ComputerAgent(
|
|
model="openai/computer-use-preview",
|
|
tools=[computer],
|
|
instructions=(
|
|
"You are a meticulous software operator. Prefer safe, deterministic actions. "
|
|
"Always confirm via on-screen text before proceeding."
|
|
),
|
|
)
|
|
```
|
|
|
|
## 2) Easy: Tools
|
|
|
|
Expose deterministic capabilities as tools (Python functions or custom computer handlers). The agent will call them when appropriate.
|
|
|
|
```python
|
|
def calculate_percentage(numerator: float, denominator: float) -> str:
|
|
"""Calculate percentage as a string.
|
|
|
|
Args:
|
|
numerator: Numerator value
|
|
denominator: Denominator value
|
|
Returns:
|
|
A formatted percentage string (e.g., '75.00%').
|
|
"""
|
|
if denominator == 0:
|
|
return "0.00%"
|
|
return f"{(numerator/denominator)*100:.2f}%"
|
|
|
|
agent = ComputerAgent(
|
|
model="openai/computer-use-preview",
|
|
tools=[computer, calculate_percentage],
|
|
)
|
|
```
|
|
|
|
- See `docs/agent-sdk/custom-tools` for authoring function tools.
|
|
- See `docs/agent-sdk/custom-computer-handlers` for building full computer interfaces.
|
|
|
|
## 3) Intermediate: Callbacks
|
|
|
|
Callbacks provide lifecycle hooks to preprocess messages, postprocess outputs, record trajectories, manage costs, and more.
|
|
|
|
```python
|
|
from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback, BudgetManagerCallback
|
|
|
|
agent = ComputerAgent(
|
|
model="anthropic/claude-3-5-sonnet-20241022",
|
|
tools=[computer],
|
|
callbacks=[
|
|
ImageRetentionCallback(only_n_most_recent_images=3),
|
|
TrajectorySaverCallback("./trajectories"),
|
|
BudgetManagerCallback(max_budget=10.0, raise_error=True),
|
|
],
|
|
)
|
|
```
|
|
|
|
- Browse implementations in `libs/python/agent/agent/loops/`.
|
|
|
|
## 4) Expert: Custom `@register_agent`
|
|
|
|
Build your own agent configuration class to control prompting, message shaping, and tool handling. This is the most flexible option for specialized domains.
|
|
|
|
- Register your own `model=...` loop using `@register_agent`
|
|
- Browse implementations in `libs/python/agent/agent/loops/`.
|
|
- Implement `predict_step()` (and optionally `predict_click()`) and return the standardized output schema.
|
|
|
|
```python
|
|
from agent.decorators import register_agent
|
|
|
|
@register_agent(models=r".*my-special-model.*", priority=10)
|
|
class MyCustomAgentConfig:
|
|
async def predict_step(self, messages, model, tools, **kwargs):
|
|
# 1) Format messages for your provider
|
|
# 2) Call provider
|
|
# 3) Convert responses to the agent output schema
|
|
return {"output": [], "usage": {}}
|
|
|
|
async def predict_click(self, model, image_b64, instruction):
|
|
# Optional: click-only capability
|
|
return None
|
|
|
|
def get_capabilities(self):
|
|
return ["step"]
|
|
```
|
|
|
|
## HUD integration (optional)
|
|
|
|
When using the HUD evaluation integration (`agent/integrations/hud/`), you can pass `instructions`, `tools`, and `callbacks` directly
|
|
|
|
```python
|
|
from agent.integrations.hud import run_single_task
|
|
|
|
await run_single_task(
|
|
dataset="username/dataset-name",
|
|
model="openai/computer-use-preview",
|
|
instructions="Operate carefully. Always verify on-screen text before actions.",
|
|
# tools=[your_custom_function],
|
|
# callbacks=[YourCustomCallback()],
|
|
)
|
|
```
|