Sign in to view source links and access this dataset
Description
A dataset of approximately 67,000 sentences from German federal court decisions, annotated for Named Entity Recognition. It contains 54,000 annotated entities across 19 fine-grained semantic classes, using the BIO tagging scheme. The dataset was created by elenanereiss and last updated on the Hugging Face platform in July 2025.
Use Cases
Training NER models based on the 19 fine-grained entity classes.
Benchmarking legal NLP systems on German court documents.
Analyzing the distribution of named entities in German legal language.
Developing domain-specific language models for the German legal sector.
Strengths
Approximately 67,000 sentences provide a substantial text corpus.
Human-annotated with 54,000 entities for supervised learning.
Includes 19 fine-grained semantic classes for detailed entity analysis.
Uses the standard BIO tagging scheme for model compatibility.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect geographic and domain bias inherent to its source of German federal court decisions.
Provenance
Source
German federal court decisions.
Collection Method
Human-annotated.
Freshness
Last updated 2025-07-19 18:28:31.
Geography
Germany
License is unknown; terms of use must be verified before application.