Name: SightAct-Bench: A 14-Family Synthetic Benchmark for VLM Agent Safety
Creator: SightAct, Bench
Published: 2026-05-04T23:54:40
Keywords: Vision Language Models, Benchmark, Safety Evaluation, Browser Agents, Synthetic Benchmark, Synthetic, Multimodal

Description

SightAct-Bench is a synthetic benchmark containing 14 families of tasks for evaluating the safety of Vision-Language Model-powered browser agents. The benchmark, authored by SightAct, Bench, and hosted on Harvard Dataverse, was last updated on 2026-05-04. It specifically tests whether agents safely handle task-relevant sensitive-information requests when a visually suspicious interaction is embedded in the workflow.

Use Cases

Benchmarking agent safety protocols based on the described scenario of handling sensitive-information requests.
Evaluating VLM robustness against visually suspicious interactions embedded in workflows.
Training or fine-tuning browser agents to recognize and avoid unsafe actions based on synthetic task families.

Strengths

Focuses on a specific safety evaluation scenario for multimodal agents.
Contains 14 distinct families of synthetic tasks for structured testing.
Hosted on the Harvard Dataverse platform, suggesting a research-oriented origin.

Limitations

Dataset size, file formats, and specific column structure are unknown.
Column-level documentation is absent; field semantics must be inferred after download.
Being a synthetic benchmark, its applicability to real-world agent performance may require validation.

Provenance

Source: Harvard Dataverse
Collection Method: Synthetically generated benchmark.
Freshness: Last updated 2026-05-04 23:54:40; freshness should be verified.

License information is unknown and should be checked before use.

Multimodal Vision Language Models Benchmark Safety Evaluation Browser Agents Synthetic Benchmark Synthetic

SightAct-Bench: A 14-Family Synthetic Benchmark for VLM Agent Safety

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info