Sign in to view source links and access this dataset
Description
Naija-Stopwords is a list of collected stopwords from the four most widely spoken languages in Nigeria — Hausa, Igbo, Nigerian-Pidgin, and Yorùbá. It is part of the Naija-Senti project and was authored by HausaNLP. The dataset was last updated on June 18, 2023.
Use Cases
Filtering stopwords for text preprocessing based on the multilingual word list.
Improving sentiment analysis models for Nigerian languages by removing common, non-informative words.
Enhancing information retrieval systems for Nigerian language corpora.
Building language-specific NLP pipelines for Hausa, Igbo, Nigerian-Pidgin, or Yorùbá.
Strengths
Covers four major Nigerian languages, providing a multilingual resource.
Part of a larger, named project (Naija-Senti), suggesting a research context.
Has a specific last update timestamp (2023-06-18 15:38:04).
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Provenance
Source
HausaNLP
Collection Method
Collected stopwords list.
Freshness
Last updated 2023-06-18 15:38:04.
Geography
Nigeria
License is unknown; terms of use should be verified before application.