A11y-CUA is a multimodal dataset containing real computer-use task trajectories from sighted users, blind and low vision users, and AI agents. The dataset includes structured interaction logs, metadata, screen video, and system audio for each task. It was created by berkeley-hci and was last updated on Hugging Face in April 2026.
Use Cases
- Train or evaluate multimodal AI agents for accessibility based on real user interaction logs.
- Analyze differences in computer-use patterns between sighted and blind/low vision users.
- Develop models for screen understanding and action prediction based on screen video and system audio.
- Benchmark assistive technologies using structured task trajectories and metadata.
Strengths
- Includes data from distinct user groups: sighted users, blind and low vision users, and AI agents (Claude-Sonnet-4.5, Qwen3-VL-32B-Instruct).
- Contains multiple data modalities per task, including structured logs, metadata, screen video, and system audio.
- Based on real task trajectories, likely providing ecologically valid interaction data.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count and dataset scale are unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- berkeley-hci
- Collection Method
- Likely collected from real user and AI agent computer-use sessions.
- Freshness
- Last updated 2026-04-22 18:19:10; freshness should be verified.