Propella-1-4b, a small multilingual language model, generated these annotations for text documents across 18 properties. The annotations are organized into six categories, including core content, quality, and safety. The dataset was created by openeurollm and last updated on March 20, 2026.
Use Cases
- Filtering training datasets based on document quality and value scores mentioned in the description
- Selecting documents for specific audiences based on annotated purpose and audience properties
- Curating multilingual text corpora using geographic relevance and language annotations
- Screening documents for safety and compliance concerns using the annotated safety category
Strengths
- Annotations cover 18 distinct properties across six categories, providing multi-faceted document labels
- Annotations were produced by a multilingual model, suggesting applicability to text in multiple languages
- Dataset was last updated on 2026-03-20, indicating recent maintenance
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment
- The dataset description references a full description on an external page, requiring a click-through for complete details
Provenance
- Source
- huggingface
- Collection Method
- Annotations generated by the propella-1-4b language model.
- Freshness
- Last updated 2026-03-20 09:22:36