Introduction
Overview of benchmarking in the c/ua agent framework
The c/ua agent framework uses benchmarks to test the performance of supported models and providers at various agentic tasks.
Benchmark Types
Computer-Agent benchmarks evaluate two key capabilities:
- Plan Generation: Breaking down complex tasks into a sequence of actions
- Coordinate Generation: Predicting precise click locations on GUI elements
Using State-of-the-Art Models
Let's see how to use the SOTA vision-language models in the c/ua agent framework.
Plan Generation + Coordinate Generation
OS-World - Benchmark for complete computer-use agents
This leaderboard tests models that can understand instructions and automatically perform the full sequence of actions needed to complete tasks.
# UI-TARS-1.5 is a SOTA unified plan generation + coordinate generation VLM
# This makes it suitable for agentic loops for computer-use
agent = ComputerAgent("huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", tools=[computer])
agent.run("Open Firefox and go to github.com")
# Success! 🎉
Coordinate Generation Only
GUI Agent Grounding Leaderboard - Benchmark for click prediction accuracy
This leaderboard tests models that specialize in finding exactly where to click on screen elements, but needs to be told what specific action to take.
# GTA1-7B is a SOTA coordinate generation VLM
# It can only generate coordinates, not plan:
agent = ComputerAgent("huggingface-local/HelloKKMe/GTA1-7B", tools=[computer])
agent.predict_click("find the button to open the settings") # (27, 450)
# This will raise an error:
# agent.run("Open Firefox and go to github.com")
Composed Agent
The c/ua agent framework also supports composed agents, which combine a planning model with a clicking model for the best of both worlds. Any liteLLM model can be used as the plan generation model.
# It can be paired with any LLM to form a composed agent:
# "gemini/gemini-1.5-pro" will be used as the plan generation LLM
agent = ComputerAgent("huggingface-local/HelloKKMe/GTA1-7B+gemini/gemini-1.5-pro", tools=[computer])
agent.run("Open Firefox and go to github.com")
# Success! 🎉