Hybrid Models Based on Machine Learning and an Increasing Degree of Process Knowledge: Application to Cell Culture Processes


Mathematical models are powerful tools in the optimization of the performance of biotechnological manufacturing processes. They can describe past performance and predict the future performance of biotechnological processes. Improved (bio)pharmaceutical mathematical modeling would accelerate the move towards automated processes, cut production times, and reduce costs. Mathematical models are typically broadly grouped into data-driven/statistical and knowledge-driven/mechanistic models. The former relies on statistical methods to infer the input-output relationships and is commonly used when there is little or no understanding of such relationships. In contrast, the latter is used when there is an adequate and precise understanding of the relationships between the input and output of the process (and/or product). Both cases use appropriate mathematical expressions to represent the relationship between inputs and outputs involved in the process.

The choice of the most appropriate modeling approach depends on the availability of historical data, process knowledge and the ability to generate ad hoc data. Presently, simple data-driven models are commonly used to represent biopharmaceutical cell culture processes. This can be attributed to three reasons: inadequate understanding of the elementary physiochemical processes in the cell culture, subsequent difficulty in developing detailed mechanistic models and the time- and resource-intensive experiments involved in cell culture.

Developing hybrid models has emerged as a pragmatic solution to the limitations of current approaches because it synergistically combines the merits and benefits of data- and knowledge-driven models. However, hybrid models in the previous studies have been based on prior definitions of the model parts, which are assumed to be constant throughout the modeling processes. This approach could induce practical limitations in some cases, such as those involving highly biased knowledge or assumptions.

Herein, Dr. Harini Narayanan (currently a postdoctoral fellow at Massachusetts Institute of Technology), Dr. Martin Luna, Dr. Michael Sokolov, Dr. Alessandro Butté, and Professor Massimo Morbidelli from ETH Zurich in Switzerland introduced the new concept of degree of hybridization for cell culture process modeling which was defined as the amount of process knowledge (or engineering know-how) incorporated in a data-driven model to generate the hybrid models. With this concept, the authors showed that a family of hybrid models could be such that the two extremes were fully mechanistic (100% -hybridized) and fully data-driven (0%- hybridized) models. The performance of the family of hybrid models on various metrics, like model accuracy, extrapolation capability, ease of practical utilization, and ability to generate new process understanding, was studied. Their work has been published in the journal, Industrial and Engineering Chemistry research.

The research team demonstrated the feasibility of designing hybrid models with varying degrees of hybridization. The Hybrid Rate (HR), which uses an artificial neural network to learn the lumped specific rates in the macro-kinetic mechanistic model of the system, had the optimal degree of hybridization in terms of model accuracy, extrapolation, amount of training data, and practical applications. This could be attributed to the compromise between increasing model parameters and adding process knowledge achieved by the HR model. Hybridization could serve as a useful tool for testing mechanistic hypotheses about the elementary cell culture processes to enhance the understanding of the key process features.

In the case of cell culture processes, however, each extreme had limitations. The limitations of the data-driven model included poor performance at low data availability, inability to enhance process understanding, poor extrapolatory capability and inefficient practical application. Consequently, the mechanistic model exhibited poor accuracy attributed to model biasing due to the addition of excess knowledge. Thus, choosing an appropriate model should be based on the intended goal and data availability. For example, hybrid models incorporating mass balances of each species, such as the HR model, exhibited better performance in transferring models across various operation models. In contrast, those with a higher degree of hybridization allowed more process interpretation possibilities.

In summary, the study demonstrated the superiority of hybrid models over purely mechanistic- or data-driven models for cell culture processes. Shifting from data-driven to mechanistic models progressively improved the overall model performance, provided that the added knowledge was not too biased. In a statement to Advances in Engineering, Professor Massimo Morbidelli, the corresponding author and the president of DataHow AG, Zurich a company that helps pharmaceutical and biotechnology companies in advanced process digitalization, optimization and modeling, that their findings will expand the application of hybrid models in complex processes like cell cultures which ultimately result in improved production of therapeutic antibodies for cancer and autoimmune diseases.


Narayanan, H., Luna, M., Sokolov, M., Butté, A., & Morbidelli, M. (2022). Hybrid models based on machine learning and an increasing degree of process knowledge: Application to cell culture processes. Industrial & Engineering Chemistry Research, 61(25), 8658–8672.

Go To Industrial & Engineering Chemistry Research

Check Also

Illuminating Paths to Fluorinated Molecules: A Photocatalytic Leap in Organic Synthesis - Advances in Engineering

Illuminating Paths to Fluorinated Molecules: A Photocatalytic Leap in Organic Synthesis