A March 2022 database dump of Namuwiki, a Korean-language wiki platform, containing 867,024 entries. The dataset was uploaded by user 'heegyu' to Hugging Face in October 2022. It provides a snapshot of the wiki's content at that time.
Use Cases
- Train Korean-language text generation models based on wiki article structure and content.
- Build Korean knowledge graph extraction tools based on the wiki's interconnected articles.
- Analyze Korean internet culture and slang based on the informal wiki entries.
- Create Korean question-answering systems based on the factual and explanatory text.
Strengths
- Contains 867,024 wiki entries, providing substantial textual data.
- Offers a specific snapshot from March 2022, allowing temporal analysis.
- Likely contains diverse topics and informal language typical of community wikis.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Last updated 2022-10-01 02:40:40; freshness should be verified.
- Column-level documentation is absent; field semantics must be inferred after download.
Provenance
- Source
- Namuwiki (namu.wiki)
- Collection Method
- Database dump
- Time Range
- Snapshot from March 2022