A Performance Comparison of Two Machine Learning Models to Predict the Formation of Pharmaceutical Cocrystals


  • Joaquin Urbina University of Belize
  • Paul Morgan Icahn School of Medicine at Mount Sinai, NY, USA
  • Alex Moralez University of Belize
  • Chelsea Herrera University of Belize


Pharmaceutical cocrystals, machine learning, cocrystal prediction, binary logistic regression model, random forest model


The use of machine learning has recently attracted the pharmaceutical industry and academia because it is able to reliably predict the cocrystal formation outcomes of API-coformer combinations and thus lead to an efficient cocrystal screening approach.  In this study, binary logistic regression and random forest models were developed with the intention of comparing their performance against predicting the cocrystal outcomes of a dataset of API-coformer combinations using a variety of inherent molecular features, and identifying which of these features tend to influence cocrystal formation more than others.  The feature importance data of both models revealed that the most basic acceptor site on an API (basic pKa1) seemed to be one of the most important features that can reliably predict the formation of cocrystals.  It was also found that the random forest model showed superior performance over the binary logistic regression model in its predictive accuracy (0.901 vs 0.811 respectively) based on the ROC plots and confusion matrices.  The cocrystal prediction power of these and other models will be further investigated by expanding the number and types of molecular properties and the size of the dataset.





Health, Natural Sciences, and Technology