In order to train high-quality machine learning models, it is essential to be able to determine which samples in the training dataset did (and did not) experience the targeted outcome of interest. Accurate identification of positives and negatives (referred to as “phenotyping”) is particularly challenging in healthcare because there are many confounding data points that can require clinician judgement to interpret, and clinical review is impractical for large-scale datasets. In sepsis, a common method to identify sepsis cases in retrospective datasets is the presence of ICD billing codes. Although billing codes have high precision (low false positive rate), they suffer from low sensitivity (miss many positive cases) and cannot be used to determine sepsis onset time. This study published in Critical Care Explorations describes a new method for sepsis phenotyping that outperforms other automated tools because it accounts for comorbidities that confound other automated tools.
Rigorous definition of targets is one of many strategies Bayesian Health uses to develop best-in-class machine learning models that achieve high sensitivity (80-95%) with 300%-700%+ better precision than many other solutions.
Read the full research paper here.