Description

ScreenSpot provides over 1200 text instructions paired with screens from iOS, Android, macOS, Windows, and web environments for evaluating GUI grounding. Researchers from Nanjing University and the Shanghai AI Laboratory created this benchmark to test large multimodal models. The dataset was last updated in April 2024.

Use Cases

Benchmarking model accuracy in locating UI elements like buttons or text fields based on natural language instructions.
Training models to understand cross-platform UI semantics by leveraging annotated element types from different operating systems.
Evaluating the robustness of multimodal agents on real-world GUI interaction tasks spanning mobile and desktop environments.

Strengths

Contains over 1200 distinct instructions for evaluation.
Covers 5 major GUI environments: iOS, Android, macOS, Windows, and Web.

Limitations

The specific row count, column names, and sample size per environment are not provided.
Potential for class imbalance across the different GUI platforms and element types is unknown.

Provenance

Source: Researchers at Nanjing University and Shanghai AI Laboratory.
Collection Method: Created as an evaluation benchmark; specific data collection method is not detailed.
Freshness: Last updated on 2024-04-10.

The full dataset description and specifics are hosted externally on the Hugging Face dataset page. License information is not provided in the input.

GUI Grounding Benchmark Across Mobile and Desktop Platforms

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info