Sign in to view source links and access this dataset
Description
A collection of conversational data structured for supervised fine-tuning (SFT) of language models. The dataset contains a list of messages from both users and assistants, with an associated language field. It was created by utter-project and last updated on February 6, -2026.
Use Cases
Training instruction-following models based on the provided user-assistant conversation structure.
Fine-tuning language models for conversational tasks based on the dialogue format.
Analyzing or benchmarking multilingual model performance based on the language metadata.
Strengths
Explicitly structured for supervised fine-tuning (SFT), a core machine learning task.
Contains conversational data with both user and assistant messages, providing a complete interaction context.
Includes a language metadata field, which suggests potential for multilingual analysis.
Limitations
Description metadata is limited; actual data quality, scale, and language accuracy require manual inspection after download.
Row count, column details, and license information are unknown, which may limit suitability assessment.
The language field may not be fully accurate, especially for conversations involving multiple languages.
Provenance
Source
utter-project on Hugging Face.
Collection Method
Likely gathered or curated for training the EuroLLM-22B model, as indicated by the citation.
Time Range
null
Freshness
Last updated 2026-02 06 02:21:40; freshness should be verified.
Geography
null
License is unknown; users must verify permissions before use.