Norwegian Named Entities (NorNE) is a manually annotated corpus extending the Norwegian Dependency Treebank. It comprises around 600,000 tokens across both Bokmål and Nynorsk standards of written Norwegian. The corpus was created by NbAiLab and annotates entities including persons, organizations, locations, geo-political entities, products, and events.
Use Cases
- Train named entity recognition models based on the annotated entity types.
- Benchmark NLP tools for Norwegian based on the dependency treebank extension.
- Analyze linguistic patterns across Bokmål and Nynorsk standards based on the corpus composition.
Strengths
- Manually annotated corpus with expert-generated annotations.
- Contains around 600,000 tokens, providing a substantial text base.
- Covers both official standards of written Norwegian (Bokmål and Nynorsk).
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- NbAiLab
- Collection Method
- Manually annotated extension of the Norwegian Dependency Treebank.
- Freshness
- Last updated 2026-04-29 13:20:26; freshness should be verified.
- Geography
- Norway