ScreenSpot-v2
Standard resolution GUI grounding benchmark
ScreenSpot-v2 is a benchmark for evaluating click prediction accuracy on standard resolution GUI screenshots.
Usage
# Run the benchmark
cd libs/python/agent/benchmarks
python ss-v2.py
# Run with custom sample limit
python ss-v2.py --samples 100
Results
Model | Accuracy | Failure Rate | Samples |
---|---|---|---|
Coming Soon | - | - | - |
Results will be populated after running benchmarks with various models.