A Performance Comparison of Two Machine Learning Models to Predict the Formation of Pharmaceutical Cocrystals

Joaquin Urbina; Paul Morgan; Alex Moralez; Chelsea Herrera

Authors

Joaquin Urbina University of Belize
Paul Morgan Icahn School of Medicine at Mount Sinai, NY, USA
Alex Moralez University of Belize
Chelsea Herrera University of Belize

Keywords:

Pharmaceutical cocrystals, machine learning, cocrystal prediction, binary logistic regression model, random forest model

Abstract

The use of machine learning has recently attracted the pharmaceutical industry and academia because it is able to reliably predict the cocrystal formation outcomes of API-coformer combinations and thus lead to an efficient cocrystal screening approach. In this study, binary logistic regression and random forest models were developed with the intention of comparing their performance against predicting the cocrystal outcomes of a dataset of API-coformer combinations using a variety of inherent molecular features, and identifying which of these features tend to influence cocrystal formation more than others. The feature importance data of both models revealed that the most basic acceptor site on an API (basic pK_a1) seemed to be one of the most important features that can reliably predict the formation of cocrystals. It was also found that the random forest model showed superior performance over the binary logistic regression model in its predictive accuracy (0.901 vs 0.811 respectively) based on the ROC plots and confusion matrices. The cocrystal prediction power of these and other models will be further investigated by expanding the number and types of molecular properties and the size of the dataset.

A Performance Comparison of Two Machine Learning Models to Predict the Formation of Pharmaceutical Cocrystals

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

License

Developed By

Information

Make a Submission