LogoCua Documentation

Benchmarks

Computer Agent SDK benchmarks for agentic GUI tasks

The benchmark system evaluates models on GUI grounding tasks, specifically agent loop success rate and click prediction accuracy. It supports both:

  • Computer Agent SDK providers (using model strings like "huggingface-local/HelloKKMe/GTA1-7B")
  • Reference agent implementations (custom model classes implementing the ModelProtocol)

Available Benchmarks

Quick Start

# Clone the benchmark repository
git clone https://github.com/trycua/cua
cd libs/python/agent/benchmarks

# Install dependencies
pip install "cua-agent[all]"

# Run a benchmark
python ss-v2.py