GUIrilla-Gold is a manually annotated test set derived from the GUIrilla-Task collection. It contains screenshots paired with natural language instructions and corresponding actions. The dataset was created by macpaw-research and was last updated on 2026-05-04.
Use Cases
- Train models for GUI automation based on the 'task' and 'action' fields.
- Benchmark visual instruction-following models using the 'image' and 'task' pairs.
- Evaluate model robustness by comparing performance on 'raw_task' versus cleaned 'task' instructions.
- Develop computer vision models for GUI element detection using the 'image_cropped' field.
Strengths
- Data is manually annotated, which likely indicates higher quality.
- Contains both raw and cleaned versions of tasks ('raw_task' and 'task'), allowing for robustness analysis.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
Provenance
- Source
- macpaw-research
- Collection Method
- Manually annotated from the GUIrilla-Task collection.
- Freshness
- Last updated 2026-05-04 09:00:03; freshness should be verified.