The biomedical informatics field has shown increasing interest in the development and use of machine learning methods for improving predictions of outcomes of interest. The main issue facing methods development is a lack of adequate benchmarking standards for comparing new methods to those available in literature. To tackle this issue, we created the Penn Machine Learning Benchmark suite (PMLB) [1], a cleaned, standardized, and fetchable archive of more than 160 open-source benchmark datasets collected from around the web, including informatics applications from fundamental biology, genetics, and clinical decision making. We provided a comprehensive analysis of this suite using 14 open-source machine learning methods and rigorous statistical analysis. We performed bi-clustering to extract groups of datasets for which certain machine learning methods outperform others. We found that a surprising number of open-source datasets used in many publications essentially provide redundant information with respect to the relative performance of several classification methodologies. We also identified particular sets of datasets that were more useful in illuminating the strengths and weaknesses of various approaches. We quantified the affect of hyperparameter tuning and model selection strategies. Finally, we provided a short list of recommendation to biomedical researchers as a starting point for modeling their data [2]. The results contribute a much needed advance to the practice of machine learning development as well as use in informatics.

  1. Olson, R. S., La Cava, W., Orzechowski, P., Urbanowicz, R. J., & Moore, J. H. (2017). PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison. BioData Mining. In Press. arXiv

  2. Olson, R. S.*, La Cava, W.*, Mustahsan, Z., Varik, A., & Moore, J. H. (2017). Data-driven Advice for Applying Machine Learning to Bioinformatics Problems. Accepted in Pacific Symposium on Biocomputing (PSB).*contributed equally. arXiv

  3. La Cava, W., Olson, R.S., Orzechowski, P., Urbanowicz, R.J. (2017) New Standards for Benchmarking in Evolutionary Computation Research. Workshop at GECCO ‘17. link