Name: Healthybench-German: 750 German-Language Health and Safety Evaluation Examples
Creator: 8Fai
Published: 2026-04-27T18:18:45
Keywords: German Language, Evaluation Benchmark, Benchmark, Healthcare, Text, Medical Advice, Health Safety

Description

Healthybench-German is a German-language dataset for benchmarking models on cautious, user-facing guidance. It contains 750 evaluation examples in a single test split, created by author 8Fai and last updated on April 27, 2026. Each example includes a user prompt, scoring rubrics, a category, difficulty label, and reference answers.

Use Cases

Benchmarking model safety in everyday wellbeing scenarios based on the described user prompts.
Evaluating triage and crisis-escalation advice based on the provided rubric items and reference solutions.
Testing medication-safety guidance generation in German based on the dataset's categories and difficulty labels.
Assessing model performance on first-aid instructions based on the structured evaluation examples.

Strengths

750 evaluation examples provide a defined scale for benchmarking.
Includes structured components like rubrics, categories, difficulty labels, and reference answers for scoring.
Specifically targets German-language health-and-safety guidance, filling a niche for non-English evaluation.

Limitations

Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Collection Method: Likely constructed for model evaluation, but specific gathering method is unknown.
Time Range: null
Freshness: Last updated 2026-04-27 18:20:27; freshness should be verified.
Geography: null

License is unknown; restrictions must be checked before use.

Text German Language Evaluation Benchmark Benchmark Healthcare Medical Advice Health Safety

Healthybench-German: 750 German-Language Health and Safety Evaluation Examples

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info