Pharmaceutical development has recently taken a new route owing to advancement in technology with the latest one being a significant improvement in the small molecule drug development. Solvates are generally obtained by incorporating solvent molecules in a solute crystal during the crystallization process. Considering the potential effects of the solvate formation in pharmaceutical development, alternative crystallization process development methods to either prevent solvate formation or generate new solvates for enhanced physiochemical properties is highly desirable. Current approaches for predicting solvate forms majorly rely on the small-scale experimental screening, which may not reflect material behavior in large scale production. To this end, there is a great need to enhance the efficiency of solvate formation prediction in terms of new solid forms identification, risk assessment, and crystallization process.
Among the available solvate prediction methods, thermodynamic and multiple energy-based approaches have been advanced through better prediction algorithms. Unfortunately, several challenges like ignoring the effects of molecular interactions in solid states have led to inaccurate results. However, solvate prediction based on statistical models has provided a better platform for systematic analysis of both solvate and non-solvate crystalline structures. Recently, researchers have identified a machine learning model as a promising technique for solid-state property prediction. Owing to the limited applicability of these models, understanding of the chemical diversity of the active pharmaceutical ingredients is highly desirable.
To this note, scientists at Boehringer Ingelheim Pharmaceuticals: Dr. Dongyue Xin, Dr. Nina Gonnella, Dr. Xiaorong He and Dr. Keith Horspool explored two machine learning models based on random forests and support vector machine algorithms and validated their potential for pharmaceutical organic molecule solvate prediction. In particular, the data used in this study for training and testing the models were derived from the Cambridge Structural Database. The research work is currently published in the research journal, Crystal Growth and Design.
In brief, the research team initiated their studies by cross-examining different solvate prediction methods for pharmaceutical molecules. Next, the data obtained from the Cambridge Structural Database was filtered to remain only the structures resembling the pharmaceutical molecules. Thus, nine organic solvents commonly used in crystallization of pharmaceuticals with large number of solvate and non-solvate structures were investigated. Eventually, the performance of the best models was tested for the selected pharmaceutically relevant molecules.
The developed machine learning models only required two-dimensional input structure. Both random forests and support vector machine algorithms were able to successfully predict solvate formation propensity for organic molecules with a high success rate of 86% as demonstrated by the selected twenty pharmaceutical molecules. However, RF performed slightly better than support vector machine. Additionally, it was worth noting that different machine learning models exhibit varying driving force depending on the type of the solvate.
In summary, Boehringer Ingelheim Pharmaceuticals researchers presented two useful machine learning-based algorithms: random forests and support vector machine for predicting solvate formation in pharmaceutical molecules. A collection of 20 pharmaceutical molecules was selected from the literature to validate the performance of the models. In general, machine learning models proved a promising practical tool for accurate and fast prediction of solvate formation in pharmaceutical molecules. Therefore, the study provides insights that will enable expansion of the experimental screening data sheets.
Xin, D., Gonnella, N., He, X., & Horspool, K. (2019). Solvate Prediction for Pharmaceutical Organic Molecules with Machine Learning. Crystal Growth & Design, 19(3), 1903-1911. .Go To Crystal Growth & Design