LogoCua Documentation

Customizing Your ComputerAgent

A corresponding Jupyter Notebook is available for this documentation.

The ComputerAgent interface provides an easy proxy to any computer-using model configuration, and it is a powerful framework for extending and building your own agentic systems.

This guide shows four proven ways to increase capabilities and success rate:

  • 1 — Simple: Prompt engineering
  • 2 — Easy: Tools
  • 3 — Intermediate: Callbacks
  • 4 — Expert: Custom @register_agent

1) Simple: Prompt engineering

Provide guiding instructions to shape behavior. ComputerAgent accepts an optional instructions: str | None which acts like a system-style preface. Internally, this uses a callback that pre-pends a user message before each LLM call.

from agent.agent import ComputerAgent

agent = ComputerAgent(
    model="openai/computer-use-preview",
    tools=[computer],
    instructions=(
        "You are a meticulous software operator. Prefer safe, deterministic actions. "
        "Always confirm via on-screen text before proceeding."
    ),
)

2) Easy: Tools

Expose deterministic capabilities as tools (Python functions or custom computer handlers). The agent will call them when appropriate.

def calculate_percentage(numerator: float, denominator: float) -> str:
    """Calculate percentage as a string.

    Args:
        numerator: Numerator value
        denominator: Denominator value
    Returns:
        A formatted percentage string (e.g., '75.00%').
    """
    if denominator == 0:
        return "0.00%"
    return f"{(numerator/denominator)*100:.2f}%"

agent = ComputerAgent(
    model="openai/computer-use-preview",
    tools=[computer, calculate_percentage],
)
  • See docs/agent-sdk/custom-tools for authoring function tools.
  • See docs/agent-sdk/custom-computer-handlers for building full computer interfaces.

3) Intermediate: Callbacks

Callbacks provide lifecycle hooks to preprocess messages, postprocess outputs, record trajectories, manage costs, and more.

from agent.callbacks import ImageRetentionCallback, TrajectorySaverCallback, BudgetManagerCallback

agent = ComputerAgent(
    model="anthropic/claude-3-5-sonnet-20241022",
    tools=[computer],
    callbacks=[
        ImageRetentionCallback(only_n_most_recent_images=3),
        TrajectorySaverCallback("./trajectories"),
        BudgetManagerCallback(max_budget=10.0, raise_error=True),
    ],
)
  • Browse implementations in libs/python/agent/agent/loops/.

4) Expert: Custom @register_agent

Build your own agent configuration class to control prompting, message shaping, and tool handling. This is the most flexible option for specialized domains.

  • Register your own model=... loop using @register_agent
  • Browse implementations in libs/python/agent/agent/loops/.
  • Implement predict_step() (and optionally predict_click()) and return the standardized output schema.
from agent.decorators import register_agent

@register_agent(models=r".*my-special-model.*", priority=10)
class MyCustomAgentConfig:
    async def predict_step(self, messages, model, tools, **kwargs):
        # 1) Format messages for your provider
        # 2) Call provider
        # 3) Convert responses to the agent output schema
        return {"output": [], "usage": {}}

    async def predict_click(self, model, image_b64, instruction):
        # Optional: click-only capability
        return None

    def get_capabilities(self):
        return ["step"]

HUD integration (optional)

When using the HUD evaluation integration (agent/integrations/hud/), you can pass instructions, tools, and callbacks directly

from agent.integrations.hud import run_single_task

await run_single_task(
    dataset="username/dataset-name",
    model="openai/computer-use-preview",
    instructions="Operate carefully. Always verify on-screen text before actions.",
    # tools=[your_custom_function],
    # callbacks=[YourCustomCallback()],
)