Feature importance bi-clustering across diseases and predictors [1].

I am working to develop new methods for representing data in electronic health records (EHR) to improve predictive modeling and interpretation of patient outcomes. EHR data offer a promising opportunity for advancing the understanding of how clinical decisions and patient conditions interact over time to influence patient health. However, EHR data are difficult to use for predictive modeling due to the various data types they contain (continuous, categorical, text, etc.), their longitudinal nature, the high amount of non-random missingness for certain measurements, and other concerns. Furthermore, patient outcomes often have heterogenous causes and require information to be synthesized from several clinical lab measures and patient visits. The core challenge at hand is overcoming the mismatch between data representations in the EHR and the assumptions underlying commonly used statistical and machine learning (ML) methods.

Our preliminary work in this area was to analyze how intelligible models were from current ML tools [1]. Each method makes its own assumptions about which factors are important in the model, so we analyzed how these assumptions result in agreements, or disagreements, in the factors contributing to several patient outcomes.

  1. La Cava, W., Bauer, C. R., Moore, J. H., & Pendergrass, S. A. (2019). Interpretation of machine learning predictions for patient outcomes in electronic health records. AMIA 2019 Annual Symposium. arXiv