The widespread adoption of electronic medical records (EMRs) in healthcare has provided vast new amounts of data for statistical machine learning researchers in their efforts to model and predict patient health status, potentially enabling novel advances in treatment. However, there are significant barriers that must be overcome to extract these insights from EMR data. First, EMR datasets consist of both static and dynamic observations of discrete and continuous-valued variables, many of which may be missing, precluding the application of standard multivariate analysis techniques. Second, clinical populations observed via EMRs and relevant to the study and management of debilitating conditions like sepsis are often heterogeneous; properly accounting for this heterogeneity is critical. Here, we describe a joint probabilistic framework called a composite mixture model that can simultaneously accommodate the wide variety of observations frequently observed in EMR datasets, stratify heterogeneous clinical populations into relevant subgroups, and handle missing observations. We demonstrate the efficacy of our approach by applying our framework to a large-scale sepsis cohort, identifying physiological trends and distinct subgroups of the dataset associated with elevated risk of mortality during hospitalization.
Source: Flexible Analysis of Electronic Medical Record Data with Composite Mixture Models | bioRxiv