Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
ART is a programmatically generated clinical decision benchmark built on real FHIR patient data. This subset contains a 120-task stratified sample targeting three dominant error categories in medical AI reasoning: retrieval failures, aggregation errors, and conditional logic. The dataset was created by CentificAIResearch and uploaded to Hugging Face on June 11, 2026.
License is unknown; terms of use must be verified before application.