Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
InfoBayAI published this Arabic non-STEM textbook sample in March 2026, providing between 1,000 and 10,000 records for LLM training. It is derived from a larger multilingual corpus of 1.9 billion words across 27,000 textbooks and is structured for instruction tuning and evaluation.
This is a sample of a larger corpus; users may need to contact InfoBayAI for access to the full 1.9B word dataset. The data is provided in Parquet format.