Sign in to view source links and access this dataset
Description
14,685 unique cardiology diagnosis expressions are paired with ICD-10 codes in Portuguese and English. The dataset was developed by the Biomedical Informatics Laboratory at the Heart Institute, Hospital das Clínicas, University of São Paulo Medical School. It was originally created to support internal research on large language models.
Use Cases
Training or evaluating named entity recognition models for cardiology terms based on the diagnosis expressions.
Developing machine translation systems for clinical text between Portuguese and English.
Building automated medical coding systems for cardiology diagnoses using the aligned ICD-10 codes.
Conducting research on the alignment and standardization of clinical terminology across languages.
Strengths
Contains 14,685 unique diagnosis expressions, providing a substantial vocabulary.
Explicitly aligned with ICD-10 codes, a standard medical classification system.