6,803 multi-turn Socratic dialogues covering elementary science topics for grades 1–6. This dataset was used to train SocratTeachLLM and published in the KELE paper (EMNLP 2025 Findings). An English translation is available as ulises-c/SocratDataset-EN.
Use Cases
- Training conversational AI tutors based on Socratic dialogue structure
- Benchmarking question-answering models on elementary science topics
- Analyzing pedagogical patterns in multi-turn tutoring dialogues
- Developing cross-lingual NLP models using the available English translation
Strengths
- Contains 6,803 dialogues, providing a substantial corpus for training
- Focuses on a specific educational domain: elementary science for grades 1–6
- Follows a structured pedagogical framework (SocRule)
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is known, but other metadata like file formats and license are unknown
- Data may reflect geographic and educational bias inherent to its Chinese elementary school source
Provenance
- Source
- huggingface
- Collection Method
- Original Chinese dataset used to train SocratTeachLLM, as published in the KELE paper.
- Freshness
- Last updated 2026-05-04 07:44:39
- Geography
- China (based on the language and educational context)