Agent Loops
Supported computer-using agent loops and models
A corresponding Jupyter Notebook is available for this documentation.
An agent can be thought of as a loop - it generates actions, executes them, and repeats until done:
- Generate: Your
model
generatesoutput_text
,computer_call
,function_call
- Execute: The
computer
safely executes those items - Complete: If the model has no more calls, it's done!
To run an agent loop simply do:
from agent import ComputerAgent
from computer import Computer
computer = Computer() # Connect to a cua container
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer]
)
prompt = "Take a screenshot and tell me what you see"
async for result in agent.run(prompt):
if result["output"][-1]["type"] == "message":
print("Agent:", result["output"][-1]["content"][0]["text"])
For a list of supported models and configurations, see the Supported Agents page.
Response Format
{
"output": [
{
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "I can see..."}]
},
{
"type": "computer_call",
"action": {"type": "screenshot"},
"call_id": "call_123"
},
{
"type": "computer_call_output",
"call_id": "call_123",
"output": {"image_url": "data:image/png;base64,..."}
}
],
"usage": {
"prompt_tokens": 150,
"completion_tokens": 75,
"total_tokens": 225,
"response_cost": 0.01,
}
}
Environment Variables
Use the following environment variables to configure the agent and its access to cloud computers and LLM providers:
# Computer instance (cloud)
export CUA_CONTAINER_NAME="your-container-name"
export CUA_API_KEY="your-cua-api-key"
# LLM API keys
export ANTHROPIC_API_KEY="your-anthropic-key"
export OPENAI_API_KEY="your-openai-key"
Input and output
The input prompt passed to Agent.run
can either be a string or a list of message dictionaries:
messages = [
{
"role": "user",
"content": "Take a screenshot and describe what you see"
},
{
"role": "assistant",
"content": "I'll take a screenshot for you."
}
]
The output is an AsyncGenerator that yields response chunks.
Parameters
The ComputerAgent
constructor provides a wide range of options for customizing agent behavior, tool integration, callbacks, resource management, and more.
model
(str
): Default: required The LLM or agent model to use. Determines which agent loop is selected unlesscustom_loop
is provided. (e.g., "claude-3-5-sonnet-20241022", "computer-use-preview", "omni+vertex_ai/gemini-pro")tools
(List[Any]
): List of tools the agent can use (e.g.,Computer
, sandboxed Python functions, etc.).custom_loop
(Callable
): Optional custom agent loop function. If provided, overrides automatic loop selection.only_n_most_recent_images
(int
): If set, only the N most recent images are kept in the message history. Useful for limiting memory usage. Automatically addsImageRetentionCallback
.callbacks
(List[Any]
): List of callback instances for advanced preprocessing, postprocessing, logging, or custom hooks. See Callbacks & Extensibility.verbosity
(int
): Logging level (e.g.,logging.INFO
). If set, adds a logging callback.trajectory_dir
(str
): Directory path to save full trajectory data, including screenshots and responses. AddsTrajectorySaverCallback
.max_retries
(int
): Default:3
Maximum number of retries for failed API calls (default: 3).screenshot_delay
(float
|int
): Default:0.5
Delay (in seconds) before taking screenshots (default: 0.5).use_prompt_caching
(bool
): Default:False
Enables prompt caching for repeated prompts (mainly for Anthropic models).max_trajectory_budget
(float
|dict
): If set (float or dict), adds a budget manager callback that tracks usage costs and stops execution if the budget is exceeded. Dict allows advanced options (e.g.,{ "max_budget": 5.0, "raise_error": True }
).**kwargs
(any
): Any additional keyword arguments are passed through to the agent loop or model provider.
Example with advanced options:
from agent import ComputerAgent
from computer import Computer
from agent.callbacks import ImageRetentionCallback
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[Computer(...)],
only_n_most_recent_images=3,
callbacks=[ImageRetentionCallback(only_n_most_recent_images=3)],
verbosity=logging.INFO,
trajectory_dir="trajectories",
max_retries=5,
screenshot_delay=1.0,
use_prompt_caching=True,
max_trajectory_budget={"max_budget": 5.0, "raise_error": True}
)
Streaming Responses
async for result in agent.run(messages, stream=True):
# Process streaming chunks
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"], end="", flush=True)
elif item["type"] == "computer_call":
action = item["action"]
print(f"\n[Action: {action['type']}]")
Error Handling
try:
async for result in agent.run(messages):
# Process results
pass
except BudgetExceededException:
print("Budget limit exceeded")
except Exception as e:
print(f"Agent error: {e}")