Sign in to view source links and access this dataset
Description
Shipaker Dataset is a collection of health and medicine articles in the Karakalpak language, scraped from shipaker.uz. The site is run by a public health organization in Karakalpakstan, Uzbekistan, and publishes articles on topics such as disease prevention, nutrition, psychology, and general wellness. The dataset was uploaded by author turdibek and last updated on 2026-05-01.
Use Cases
Train machine translation models based on Karakalpak language content.
Analyze public health communication strategies based on article topics like disease prevention and nutrition.
Develop text classification models for health articles based on the provided titles and content.
Study linguistic patterns and terminology in a low-resource Turkic language based on the textual corpus.
Strengths
Articles are sourced from a public health organization, suggesting domain authority.
Content is provided in both original HTML and extracted plain text formats.
Limitations
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
License information is unknown, which may restrict usage.
Provenance
Source
Articles scraped from shipaker.uz, a site run by a public health organization in Karakalpakstan, Uzbekistan.
Collection Method
Web scraping.
Freshness
Last updated 2026-05-01 11:03:03; freshness should be verified.
Geography
Karakalpakstan, Uzbekistan.
License is unknown, which may impose usage restrictions.