Configuration
Detection Parameters
Box Threshold (0.3)
Controls the confidence threshold for accepting detections:

- Higher values (0.3) yield more precise but fewer detections
- Lower values (0.01) catch more potential icons but increase false positives
- Default is 0.3 for optimal precision/recall balance
IOU Threshold (0.1)
Controls how overlapping detections are merged:

- Lower values (0.1) more aggressively remove overlapping boxes
- Higher values (0.5) allow more overlapping detections
- Default is 0.1 to handle densely packed UI elements
OCR Configuration
-
Engine: EasyOCR
- Primary choice for all platforms
- Fast initialization and processing
- Built-in English language support
- GPU acceleration when available
-
Settings:
- Timeout: 5 seconds
- Confidence threshold: 0.5
- Paragraph mode: Disabled
- Language: English only
Performance
Hardware Acceleration
MPS (Metal Performance Shaders)
- Multi-scale detection (640px, 1280px, 1920px)
- Test-time augmentation enabled
- Half-precision (FP16)
- Average detection time: ~0.4s
- Best for production use when available
CPU
- Single-scale detection (1280px)
- Full-precision (FP32)
- Average detection time: ~1.3s
- Reliable fallback option
Example Output Structure
examples/output/
├── {timestamp}_no_ocr/
│ ├── annotated_images/
│ │ └── screenshot_analyzed.png
│ ├── screen_details.txt
│ └── summary.json
└── {timestamp}_ocr/
├── annotated_images/
│ └── screenshot_analyzed.png
├── screen_details.txt
└── summary.json