BharatLLM K-12 Indian Curriculum Dataset contains 4,904,936 question-answer pairs covering the CBSE/NCERT K-12 curriculum. The dataset spans 12 Indian languages, including English, Hindi, Bengali, Telugu, Tamil, Kannada, Malayalam, Marathi, Gujarati, Odia, Punjabi, and Urdu. It was created by user 'krittus' and last updated on Hugging Face on April 12, III.
Use Cases
- Training question-answering models based on the described K-12 curriculum content.
- Developing multilingual educational chatbots based on the 12-language question-answer pairs.
- Fine-tuning large language models for domain-specific knowledge in Indian education.
- Analyzing curriculum coverage and language distribution across the described subjects.
Strengths
- Large scale with 4.9 million question-answer pairs.
- Multilingual coverage across 12 major Indian languages.
- Specific curriculum alignment with CBSE/NCERT K-12 standards.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
Provenance
- Source
- Hugging Face user 'krittus'.
- Collection Method
- Likely compiled from CBSE/NCERT K-12 curriculum materials.
- Time Range
- null
- Freshness
- Last updated 2026-04-12 18:54:20; freshness should be verified.
- Geography
- India