Name: GPT-4 Annotated Severity Scores for Human Phenotype Ontology Abnormalities
Creator: Kitty B. Murphy
Published: 2026-05-21T05:44:09
License: CC-BY-4.0
Keywords: CSV, Rare Diseases, Human Phenotype Ontology, Bioinformatics, Healthcare, Tabular, Large Scale, Clinical Severity, Llm Annotation, Synthetic

Description

GPT-4 annotated the severity of over 17,500 phenotypic abnormalities catalogued in the Human Phenotype Ontology. The annotations are based on nine clinical characteristics and their frequency, benchmarked against ground-truth labels with a mean recall of 97%. Kitty B. Murphy published the dataset on figshare in May 2026.

Use Cases

Prioritize phenotypes for gene therapy based on the generated quantitative severity metrics.
Benchmark other LLMs or automated methods for clinical metadata annotation using the provided ground-truth comparisons.
Integrate severity scores into phenome-wide analyses to rank phenotypes by impact on health and quality of life.

Strengths

Annotations cover over 17,500 phenotypic abnormalities across more than 8,600 rare diseases.
Benchmarking demonstrated strong performance with true positive recall rates ranging from 89% to 100% (mean = 97%).
The severity scoring system integrates both the nature of nine clinical characteristics and their frequency of occurrence.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The dataset is 56.2 KB, indicating a limited scope likely containing aggregated scores or metadata rather than raw annotations.

Provenance

Source: figshare
Collection Method: GPT-4 was employed to annotate severity based on clinical characteristics, with outputs benchmarked against ground-truth labels within the HPO.
Freshness: Last updated 2026-05-21 05:44:10; freshness should be verified.

License is CC-BY-4.0.

Tabular CSV Rare Diseases Human Phenotype Ontology Bioinformatics Healthcare Large Scale Clinical Severity Llm Annotation Synthetic

GPT-4 Annotated Severity Scores for Human Phenotype Ontology Abnormalities

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info