Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
4.79 million Slovenian web documents enriched with metadata for educational value, domain classification, and web registers. The dataset, created by zID4si, includes a filtered subset optimized for language model pre-training. It was last updated on February 13, 2026.
License is unknown and must be verified before use.