ScreenSpot-v2

Standard resolution GUI grounding benchmark

ScreenSpot-v2 is a benchmark for evaluating click prediction accuracy on standard resolution GUI screenshots.

Usage

# Run the benchmark
cd libs/python/agent/benchmarks
python ss-v2.py

# Run with custom sample limit
python ss-v2.py --samples 100

Model	Accuracy	Failure Rate	Samples
Coming Soon	-	-	-

Results will be populated after running benchmarks with various models.

Introduction

Overview of benchmarking in the c/ua agent framework

ScreenSpot-Pro

High-resolution GUI grounding benchmark