LogoCua Documentation

ScreenSpot-v2

Standard resolution GUI grounding benchmark

ScreenSpot-v2 is a benchmark for evaluating click prediction accuracy on standard resolution GUI screenshots.

Usage

# Run the benchmark
cd libs/python/agent/benchmarks
python ss-v2.py

# Run with custom sample limit
python ss-v2.py --samples 100

Results

ModelAccuracyFailure RateSamples
Coming Soon---

Results will be populated after running benchmarks with various models.


On this page