Nepali STEM Textbook Corpus with 94 Books and 4.45 Million Words

Name: Nepali STEM Textbook Corpus with 94 Books and 4.45 Million Words
Creator: InfoBayAI
Published: 2026-06-08T07:18:09
Keywords: Textbooks, Nepali Language, Multilingual Corpus, Text, Multilingual, Stem Education, Large Scale, Natural Language Processing, Multimodal

by InfoBayAIUpdated 7d ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

94 Nepali STEM textbooks containing 4.45 million words, designed to support NLP and AI model development for scientific understanding and concept learning. The dataset was created by InfoBayAI and is part of a larger multilingual educational corpus. It was last updated on June 8, 2026.

Use Cases

Train language models for Nepali scientific text generation based on the textbook corpus.
Develop question-answering systems for STEM education using the described problem-solving content.
Build models for cross-lingual concept alignment leveraging the dataset's place in a larger multilingual corpus.
Fine-tune models for educational content summarization based on the textbook data.

Strengths

Contains 4.45 million words, providing a substantial text corpus for model training.
Comprises 94 distinct books, suggesting breadth across STEM subjects.
Part of a larger corpus of over 2.6 billion words, indicating potential for scaling.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: InfoBayAI
Collection Method: Likely compiled from educational textbook sources.
Freshness: Last updated 2026-06-08 07:23:16; freshness should be verified.
Geography: Nepal (inferred from language and subject focus)

Text Multimodal Multilingual Textbooks Nepali Language Multilingual Corpus Stem Education Large Scale Natural Language Processing

Related Datasets

Quality Score

D37

Description

42

Source

36

Reputation

39

Access

22

Community

11 downloads

1 likes

0 views

Dataset Info

Author: InfoBayAI
Created: Jun 8, 2026
Updated: Jun 8, 2026
Last synced: Jun 15, 2026

Access

22

Community

11 downloads

1 likes

0 views

Dataset Info

Author: InfoBayAI
Created: Jun 8, 2026
Updated: Jun 8, 2026
Last synced: Jun 15, 2026

Nepali STEM Textbook Corpus with 94 Books and 4.45 Million Words

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info