Sign in to view source links and access this dataset
Description
Healthybench-German is a German-language dataset for benchmarking models on cautious, user-facing guidance. It contains 750 evaluation examples in a single test split, created by author 8Fai and last updated on April 27, 2026. Each example includes a user prompt, scoring rubrics, a category, difficulty label, and reference answers.
Use Cases
Benchmarking model safety in everyday wellbeing scenarios based on the described user prompts.
Evaluating triage and crisis-escalation advice based on the provided rubric items and reference solutions.
Testing medication-safety guidance generation in German based on the dataset's categories and difficulty labels.
Assessing model performance on first-aid instructions based on the structured evaluation examples.
Strengths
750 evaluation examples provide a defined scale for benchmarking.
Includes structured components like rubrics, categories, difficulty labels, and reference answers for scoring.
Specifically targets German-language health-and-safety guidance, filling a niche for non-English evaluation.
Limitations
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
huggingface
Collection Method
Likely constructed for model evaluation, but specific gathering method is unknown.
Time Range
null
Freshness
Last updated 2026-04-27 18:20:27; freshness should be verified.
Geography
null
License is unknown; restrictions must be checked before use.