Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
SIB-200 is a multilingual topic classification dataset covering 205 languages and dialects. It is based on the human-translated Flores-200 corpus, with topic annotations originally provided in English for categories like science/technology, travel, and politics. The dataset was created by the mteb organization and last updated in February 2026.
License is unknown; users should verify terms before use.