Descriptiveness and Sentiment Preference Data for RLHF

Name: Descriptiveness and Sentiment Preference Data for RLHF
Creator: trl-internal-testing
Published: 2024-04-09T13:55:01
Keywords: Size Categories10 Kn100 K, Librarypolars, Modalitytext, Librarymlcroissant, Text Preferences, Sentiment Analysis, Librarydatasets, Librarypandas, Text, Parquet, Regionus, Reinforcement Learning, Natural Language Processing, Human Feedback

by trl-internal-testingUpdated 2y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

TRL's Sentiment and Descriptiveness Preference Dataset originates from an early RLHF paper by OpenAI. The data has been preprocessed into a standard prompt, chosen, rejected format for reinforcement learning from human feedback. The dataset was last updated on the Hugging Face platform on 2024-04-09.

Use Cases

Fine-tuning reward models based on human preferences for descriptiveness and sentiment.
Training language models to align with human feedback using the provided prompt, chosen, rejected format.
Benchmarking RLHF algorithms on a dataset from an early, foundational paper in the field.

Strengths

Dataset is derived from a foundational, cited RLHF paper (arXiv:1909.08593).
Data is preprocessed into a standardized format (prompt, chosen, rejected) for RLHF workflows.
Last update was recorded as 2024-04-09 16:29:51.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: OpenAI (original paper: arXiv:1909.08593)
Collection Method: Preprocessed from an early RLHF dataset into a standard format.
Freshness: 2024-04-09 16:29:51

License information is unknown and should be verified before use.

Text Parquet Size Categories10 Kn100 K Librarypolars Modalitytext Librarymlcroissant Text Preferences Sentiment Analysis Librarydatasets Librarypandas Regionus Reinforcement Learning Natural Language Processing Human Feedback

Related Datasets

Quality Score

D36

Description

39

Source

36

Reputation

37

Access

22

Community

2.1K downloads

4 likes

0 views

Dataset Info

Author: trl-internal-testing
Created: Apr 9, 2024
Updated: Apr 9, 2024

Access

22

Community

2.1K downloads

4 likes

0 views

Dataset Info

Author: trl-internal-testing
Created: Apr 9, 2024
Updated: Apr 9, 2024

Descriptiveness and Sentiment Preference Data for RLHF

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info