Name: Med-ART: 120 Clinical Decision Tasks for AI Agent Benchmarking
Creator: CentificAIResearch
Published: 2026-06-11T07:30:27
Keywords: Benchmark, Healthcare, Tabular, Clinical Reasoning, Medical Ai, Benchmark Tasks, Fhir Data, Synthetic

Description

ART is a programmatically generated clinical decision benchmark built on real FHIR patient data. This subset contains a 120-task stratified sample targeting three dominant error categories in medical AI reasoning: retrieval failures, aggregation errors, and conditional logic. The dataset was created by CentificAIResearch and uploaded to Hugging Face on June 11, 2026.

Use Cases

Benchmarking medical AI agents' reasoning capabilities based on the described error categories.
Training models to handle retrieval failures using real FHIR patient data.
Evaluating AI performance on conditional logic tasks in a clinical context.
Studying aggregation errors in medical information synthesis.

Strengths

Based on real FHIR patient data, providing a foundation in real-world clinical information.
Targets three specific, defined error categories (retrieval, aggregation, conditional logic) for focused benchmarking.
Contains a 120-task stratified sample from the larger ART benchmark.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and overall dataset size are unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: CentificAIResearch via Hugging Face.
Collection Method: Programmatically generated benchmark based on real FHIR patient data.
Freshness: Last updated 2026-06-11 08:05:12; freshness should be verified.

License is unknown; terms of use must be verified before application.

Tabular Benchmark Healthcare Clinical Reasoning Medical Ai Benchmark Tasks Fhir Data Synthetic

Med-ART: 120 Clinical Decision Tasks for AI Agent Benchmarking

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info