DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

MM-RLHF: A Multimodal LLM Alignment Dataset and Reward Model | DataSalon

Home Multimodal & LLMMM-RLHF: A Multimodal LLM Alignment Dataset and Reward Model

Multimodal & LLM

MM-RLHF: A Multimodal LLM Alignment Dataset and Reward Model

Name: MM-RLHF: A Multimodal LLM Alignment Dataset and Reward Model
Creator: yifanzhang114
Published: 2025-02-04T11:27:23
Keywords: Rlhf, Alignment, Multimodal Llm, Benchmark, Human Feedback, Multimodal, Reward Model

by yifanzhang114·Updated 1y ago

Available on 1 platform

Description

MM-RLHF is a project for aligning Multimodal Large Language Models with human preferences. The release includes a high-quality alignment dataset and a strong critique-based reward model. The project was open-sourced by yifanzhang114 in February 2025.

Use Cases

Fine-tuning MLLMs for alignment based on human preferences.
Training or evaluating reward models for multimodal tasks.
Benchmarking MLLM safety and performance using the included evaluation suites.

Strengths

Includes a critique-based reward model and its training algorithm.
Released as part of a comprehensive project for MLLM alignment.
Associated with benchmark suites MM-RewardBench and MM-SafetyBench.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: yifanzhang114 on Hugging Face
Freshness: Last updated 2025-04-21 02:43:14; freshness should be verified.

License is unknown; terms of use must be verified before application.

Multimodal Rlhf Alignment Multimodal Llm Benchmark Human Feedback Reward Model

Related Datasets

Quality Score

D37

Description

Source

Reputation

Quality Score

D37

Description

Source

Reputation

Access

Community

186 downloads

14 likes

0 views

Dataset Info

Author: yifanzhang114
Created: Feb 4, 2025
Updated: Apr 21, 2025
Last synced: May 15, 2026

Access

Community

186 downloads

14 likes

0 views

Dataset Info

Author: yifanzhang114
Created: Feb 4, 2025
Updated: Apr 21, 2025
Last synced: May 15, 2026

MM-RLHF: A Multimodal LLM Alignment Dataset and Reward Model

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info