Customizing Your ComputerAgent
The ComputerAgent
interface provides an easy proxy to any computer-using model configuration, and it is a powerful framework for extending and building your own agentic systems.
This guide shows four proven ways to increase capabilities and success rate:
- 1 — Simple: Prompt engineering
- 2 — Easy: Tools
- 3 — Intermediate: Callbacks
- 4 — Expert: Custom
@register_agent
1) Simple: Prompt engineering
Provide guiding instructions to shape behavior. ComputerAgent
accepts an optional instructions: str | None
which acts like a system-style preface. Internally, this uses a callback that pre-pends a user message before each LLM call.
from agent.agent import ComputerAgent
agent = ComputerAgent(
model="openai/computer-use-preview",
tools=[computer],
instructions=(
"You are a meticulous software operator. Prefer safe, deterministic actions. "
"Always confirm via on-screen text before proceeding."
),
)
2) Easy: Tools
Expose deterministic capabilities as tools (Python functions or custom computer handlers). The agent will call them when appropriate.
def calculate_percentage(numerator: float, denominator: float) -> str:
"""Calculate percentage as a string.
Args:
numerator: Numerator value
denominator: Denominator value
Returns:
A formatted percentage string (e.g., '75.00%').
"""
if denominator == 0:
return "0.00%"
return f"{(numerator/denominator)*100:.2f}%"
agent = ComputerAgent(
model="openai/computer-use-preview",
tools=[computer, calculate_percentage],
)
- See
docs/agent-sdk/custom-tools
for authoring function tools. - See
docs/agent-sdk/custom-computer-handlers
for building full computer interfaces.
3) Intermediate: Callbacks
Callbacks provide lifecycle hooks to preprocess messages, postprocess outputs, record trajectories, manage costs, and more.
from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback, BudgetManagerCallback
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
callbacks=[
ImageRetentionCallback(only_n_most_recent_images=3),
TrajectorySaverCallback("./trajectories"),
BudgetManagerCallback(max_budget=10.0, raise_error=True),
],
)
- Browse implementations in
libs/python/agent/agent/loops/
.
4) Expert: Custom @register_agent
Build your own agent configuration class to control prompting, message shaping, and tool handling. This is the most flexible option for specialized domains.
- Register your own
model=...
loop using@register_agent
- Browse implementations in
libs/python/agent/agent/loops/
. - Implement
predict_step()
(and optionallypredict_click()
) and return the standardized output schema.
from agent.decorators import register_agent
@register_agent(models=r".*my-special-model.*", priority=10)
class MyCustomAgentConfig:
async def predict_step(self, messages, model, tools, **kwargs):
# 1) Format messages for your provider
# 2) Call provider
# 3) Convert responses to the agent output schema
return {"output": [], "usage": {}}
async def predict_click(self, model, image_b64, instruction):
# Optional: click-only capability
return None
def get_capabilities(self):
return ["step"]
HUD integration (optional)
When using the HUD evaluation integration (agent/integrations/hud/
), you can pass instructions
, tools
, and callbacks
directly
from agent.integrations.hud import run_single_task
await run_single_task(
dataset="username/dataset-name",
model="openai/computer-use-preview",
instructions="Operate carefully. Always verify on-screen text before actions.",
# tools=[your_custom_function],
# callbacks=[YourCustomCallback()],
)