Name: PolyglotTeachers SFT Synth: Multilingual Supervised Fine-Tuning Examples
Creator: ljvmiranda921
Published: 2026-04-05T15:21:17
Keywords: Language Models, Text, Multilingual, Synthetic Data, Supervised Fine Tuning, Synthetic

Description

Synthetic supervised fine-tuning examples were generated by teacher models evaluated in the Polyglot Teachers paper. The dataset contains examples across six languages: Arabic, Czech, German, Indonesian, Japanese, Spanish, and Tagalog. It was created by ljvmiranda921 and last updated on April 5, 2026.

Use Cases

Training multilingual instruction-following models based on the synthetic supervised fine-tuning examples.
Benchmarking teacher model performance for synthetic data generation across the six languages mentioned.
Studying the characteristics of effective teacher models for data generation as described in the associated paper.

Strengths

Covers six distinct languages: Arabic, Czech, German, Indonesian, Japanese, Spanish, and Tagalog.
Examples were generated by teacher models systematically characterized for quality in the associated research paper.
Last updated on April 5, 2026, indicating recent maintenance.

Limitations

Row count, file formats, and column-level documentation are unknown, limiting suitability assessment.
The dataset is synthetic, which may introduce artifacts not present in human-generated data.
License information is unknown, which may restrict usage.

Provenance

Source: ljvmiranda921 on Hugging Face.
Collection Method: Synthetic data generation by teacher language models evaluated in the 'Polyglot Teachers' research paper.
Time Range: null
Freshness: Last updated 2026-04-05 16:06:08; freshness should be verified.
Geography: null

License is unknown; users must verify permissions before use. The full description is hosted externally.

Text Multilingual Language Models Synthetic Data Supervised Fine Tuning Synthetic

PolyglotTeachers SFT Synth: Multilingual Supervised Fine-Tuning Examples

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info