Sign in to view source links and access this dataset
Description
Synthetic supervised fine-tuning examples were generated by teacher models evaluated in the Polyglot Teachers paper. The dataset contains examples across six languages: Arabic, Czech, German, Indonesian, Japanese, Spanish, and Tagalog. It was created by ljvmiranda921 and last updated on April 5, 2026.
Use Cases
Training multilingual instruction-following models based on the synthetic supervised fine-tuning examples.
Benchmarking teacher model performance for synthetic data generation across the six languages mentioned.
Studying the characteristics of effective teacher models for data generation as described in the associated paper.
Strengths
Covers six distinct languages: Arabic, Czech, German, Indonesian, Japanese, Spanish, and Tagalog.
Examples were generated by teacher models systematically characterized for quality in the associated research paper.
Last updated on April 5, 2026, indicating recent maintenance.
Limitations
Row count, file formats, and column-level documentation are unknown, limiting suitability assessment.
The dataset is synthetic, which may introduce artifacts not present in human-generated data.
License information is unknown, which may restrict usage.
Provenance
Source
ljvmiranda921 on Hugging Face.
Collection Method
Synthetic data generation by teacher language models evaluated in the 'Polyglot Teachers' research paper.
Time Range
null
Freshness
Last updated 2026-04-05 16:06:08; freshness should be verified.
Geography
null
License is unknown; users must verify permissions before use. The full description is hosted externally.