Date of Award
5-2024
Document Type
Thesis - Closed Access
Degree Name
MS in Human Genetics
First Advisor
Laura Hercher, MS, CGC
Second Advisor
Dr. John Greally
Third Advisor
Monisha Sebastin, MS, CGC
Abstract
Human Phenotype Ontology (HPO) terms help identify and rank causative genes in exome/genome sequencing for patients with rare disease, yet diagnostic rates remain low. GenomeDiver reanalyzes phenotypes to prioritize features that distinguish variants and diseases. Manual extraction of phenotypic terms from Electronic Health Records is time-consuming, providing opportunities for natural language processing (NLP) to support the diagnostic process. We evaluated NLP system performance for Clinphen and Elastex in extracting HPO terms for use in GenomeDiver. 14 patients with various note types were randomly selected from the NYCKidSeq study. Two annotators independently extracted HPO terms from the 56 total notes. A third investigator adjudicated, creating the gold standard (GS) dataset. Pooled Kappa determined interannotator agreement. NLP’s were evaluated by comparing each system’s extracted HPO terms to the GS, obtaining precision, recall and F1. GS, Clinphen, and Elastex averaged identifying 6.96, 5.66, and 14.9 HPO terms per note, respectively, for a total of 239, 183, and 337 unique HPO terms across all notes. Interannotator agreement for GS = 0.67. Elastex’s recall was higher (0.69 vs. 0.44), while Clinphen’s precision was higher (0.64 vs 0.55). ClinPhen demonstrated higher precision, allowing more curated terms to be sent back to clinicians through GenomeDiver. Yet systems with higher recall are easier for providers to identify true positives and discard false positives from the list of phenotypic terms generated by NLP evaluation. Awareness of the limitations of NLP systems may optimize the utility of automated HPO extraction for the purposes of GenomeDiver.
Recommended Citation
Rutter, Bailey; Kan, Maggie; and Rosales, Kathy, "Evaluating Natural Language Processing Algorithms for the Phenotype-Guided Genomic Diagnosis Platform, GenomeDiver" (2024). Human Genetics Theses. 131.
https://digitalcommons.slc.edu/genetics_etd/131
Included in
Congenital, Hereditary, and Neonatal Diseases and Abnormalities Commons, Diagnosis Commons, Genetics Commons, Genomics Commons, Molecular Genetics Commons, Quality Improvement Commons