GenAI-Bench: A Human Preference Benchmark for Multimodal Reward Models

Name: GenAI-Bench: A Human Preference Benchmark for Multimodal Reward Models
Creator: TIGER-Lab
Published: 2024-05-30T13:40:05
Keywords: Ai Evaluation, Human Preference, Multimodal Llm, Benchmark, Multimodal, Reward Model

by TIGER-LabUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

GenAI-Bench is a benchmark for evaluating multimodal large language models' ability to judge the quality of AI-generated content. The dataset is based on human preference data collected via the GenAI Arena platform and is maintained by TIGER-Lab. It was last updated on 2024-09-08.

Use Cases

Benchmarking MLLMs as multimodal reward models based on the described evaluation framework.
Training or fine-tuning models to predict human preferences for AI-generated content using the collected votes.
Studying the alignment between model judgments and human preferences for generative AI outputs.

Strengths

Dataset is explicitly designed for benchmarking multimodal reward models.
Data is sourced from human preferences collected via the GenAI Arena platform.
Votes were filtered using an NSFW filter, as mentioned in the description.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: TIGER-Lab via Hugging Face.
Collection Method: Human preference votes collected via the GenAI Arena platform, filtered with an NSFW filter.
Freshness: Last updated 2024-09-08 08:33:52; freshness should be verified.

License is unknown; terms of use must be verified before application.

Multimodal Ai Evaluation Human Preference Multimodal Llm Benchmark Reward Model

Related Datasets

Quality Score

D39

Description

42

Source

41

Reputation

37

Access

26

Community

665 downloads

8 likes

0 views

Dataset Info

Author: TIGER-Lab
Created: May 30, 2024
Updated: Sep 8, 2024
Last synced: May 28, 2026

Access

26

Community

665 downloads

8 likes

0 views

Dataset Info

Author: TIGER-Lab
Created: May 30, 2024
Updated: Sep 8, 2024
Last synced: May 28, 2026

GenAI-Bench: A Human Preference Benchmark for Multimodal Reward Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info