DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Sft Hh Rlhf: Human Feedback Data for Language Model Alignment | DataSalon

Home Multimodal & LLMSft Hh Rlhf: Human Feedback Data for Language Model Alignment

Multimodal & LLM

Sft Hh Rlhf: Human Feedback Data for Language Model Alignment

Name: Sft Hh Rlhf: Human Feedback Data for Language Model Alignment
Creator: Dahoas
Published: 2022-12-16T22:14:45
Keywords: Alignment, Text, Language Model, Reinforcement Learning, Human Feedback

by Dahoas·Updated 3y ago

Description

Sft Hh Rlhf is a dataset published on Hugging Face by Dahoas, with its last update recorded on 2022-12-22. The title suggests it contains data related to reinforcement learning from human feedback (RLHF) and supervised fine-tuning (SFT). The dataset's specific content, scale, and structure require verification after download.

Use Cases

Fine-tuning a language model with supervised data (inferred from domain, verify after download)
Training a reward model for reinforcement learning from human feedback (inferred from domain, verify after download)
Benchmarking alignment methods for language models (inferred from domain, verify after download)

Strengths

Published on the Hugging Face platform, a major repository for machine learning resources.
Authored by Dahoas, a known contributor in the AI alignment community.

Limitations

Metadata is minimal; actual content requires verification after download.
Row count, file formats, and column definitions are unknown, limiting suitability assessment.
Last updated 2022-12-22 16:46:10; freshness should be verified for current research.

Provenance

Source: Hugging Face
Collection Method: Likely collected or generated for language model alignment research.
Time Range: null
Freshness: 2022-12-22 16:46:10
Geography: null

License is unknown; users must verify permissions before use.

Text Alignment Language Model Reinforcement Learning Human Feedback

Related Datasets

Quality Score

D23

Description

Source

Reputation

Quality Score

D23

Description

Source

Reputation

Access

Community

24 downloads

5 likes

0 views

Dataset Info

Author: Dahoas
Created: Dec 16, 2022
Updated: Dec 22, 2022
Last synced: May 27, 2026

Access

Community

24 downloads

5 likes

0 views

Dataset Info

Author: Dahoas
Created: Dec 16, 2022
Updated: Dec 22, 2022
Last synced: May 27, 2026

Sft Hh Rlhf: Human Feedback Data for Language Model Alignment

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info