Adaptive Thermal Optimization for Rack-Based Data Centers Using Predictive Control and Parameter-Varying Models

Significance

In today’s hyperconnected world, data centers play an indispensable role in supporting everything from cloud services and e-commerce to scientific computing and machine learning. But as our reliance on digital infrastructure grows, so too does the energy footprint of the systems that keep it running. Of particular concern is the energy required for thermal management. Cooling alone can account for as much as 40% of a data center’s total energy usage, which is significant given that global data center operations already consume around 1% of the world’s electricity—a number that’s projected to rise sharply in the coming decade. One of the persistent challenges engineers face is improving cooling efficiency without compromising the thermal safety of server hardware. Most facilities still use conventional room-based cooling layouts that are prone to inefficient airflow. This often leads to familiar issues like hot air recirculation and cold air bypass, both of which reduce system effectiveness and raise energy costs. Complicating matters further, these setups tend to rely on PID controllers—tried-and-true but inherently reactive. A PID system waits for deviations before adjusting output, which works fine for steady-state systems but struggles with the dynamic, high-density environments typical of modern data centers. When server workloads fluctuate rapidly, as they often do, these reactive controllers fall behind, causing either overheating risks or overcooling that wastes energy.

It’s not just a matter of tweaking a few fan speeds or changing temperature setpoints. At scale, optimizing a cooling system means accounting for complex interactions between airflow, server utilization, and spatial thermal gradients. Doing this in real time—without bogging down computing resources—is a tall order. And yet, it’s increasingly necessary if operators hope to keep energy consumption in check while maintaining reliability. New research paper published in Journal Energy and Buildings and conducted by Weiqi Deng, Professor Jiaqiang Wang, Chang Yue, Yang Guo from the Central South University alongside Dr. Quan Zhang from Hunan University, proposed a more forward-looking solution which centers on a control framework that’s not just reactive, but predictive and adaptive. Rather than rely on static models or trial-and-error tuning, they built a linear parameter-varying (LPV) state-space model that feeds into an economic model predictive control (EMPC) system. The idea is to forecast how thermal conditions will evolve based on current workloads, and then adjust both airflow rates and supply air temperature in a coordinated way.

What makes their approach compelling is its balance between rigor and practicality. By simplifying the underlying model just enough to allow fast computation—without sacrificing physical accuracy—they’ve created a control method that could realistically be deployed in an operational setting. In other words, it’s not a theoretical exercise; it’s a tool that could be used on the ground to help data centers stay cool more efficiently, adapt to shifting workloads, and cut energy waste at the same time. To evaluate how their model-based control approach would perform in practice, the researchers built a detailed simulation that closely reflected conditions inside a real rack-based data center. The setup featured ten vertically stacked servers, each modeled individually to capture local variations in airflow and temperature. Cooling was provided from below using a rack-mounted unit, mimicking the airflow dynamics seen in actual facilities. To replicate real operating conditions, they used server workload data drawn from EPA web traces and translated it into 24-hour CPU utilization cycles. This ensured the control system was tested under realistic, fluctuating loads rather than idealized inputs.

The new control strategy, based on EMPC, was tested alongside a standard PID controller for comparison. Both were run under identical conditions to ensure fairness. While PID adjusted airflow in response to temperature deviations, EMPC went a step further by anticipating changes ahead of time and adjusting both airflow and supply air temperature. This predictive element gave EMPC a key advantage—it could stabilize thermal conditions without waiting for a deviation to occur. The results made the contrast clear. Over a 24-hour simulation, EMPC reduced cooling energy consumption by almost 10% compared to PID. It achieved this by operating at a slightly higher average supply air temperature, which improved cooling efficiency without compromising safety. Because the EMPC strategy fine-tuned airflow more precisely, it also lowered the power used by the fans. Thermal regulation improved as well. The EMPC system kept both the server inlet and return air temperatures well within safe limits, even during spikes in workload. The PID system, on the other hand, responded too late during high-demand periods, causing wider temperature swings and higher energy use. The authors also tested EMPC across low, medium, and high server loads. In each case, the energy savings held steady—around 9–10%—highlighting the system’s adaptability. It didn’t rely on fixed conditions to perform well. Lastly, they looked at how the prediction window (or “horizon”) affected performance. After testing several options, they found that a three-minute horizon provided the best trade-off between control accuracy and computational demand. Longer horizons offered little added value and made the system slower to respond.

In conclusion, the new study provided an important timely and practical step forward in how we manage energy use in data centers, particularly when it comes to cooling—an area that remains both costly and difficult to optimize. Traditional approaches, especially those based on PID control, tend to fall short in environments where workloads shift rapidly and unpredictably. They respond after the fact, rather than anticipating what’s coming. What the authors proposed here is a predictive control framework that not only accounts for system dynamics in real time but does so with enough computational efficiency to be usable in real deployments.

One of the more important implications of this work is its potential to improve energy efficiency without compromising system reliability. It’s well known that many data centers overcool as a precaution, not out of necessity. This safety margin comes at a price—higher operational costs and unnecessary energy use. By predicting workload-driven thermal changes before they happen, the EMPC strategy introduced in this study enables smarter adjustments that reduce energy waste while keeping server temperatures within safe limits. A particularly thoughtful aspect of the work is the use of a linear parameter-varying state-space model, which strikes a balance between physical accuracy and computational simplicity. Unlike black-box machine learning models that often struggle to generalize beyond their training data, this approach remains grounded in thermodynamic principles. It’s designed to adapt across a range of operating conditions without the need for dense sensor arrays or heavy computational loads. We believe another strength is how well the method performs across a variety of workload scenarios. Whether under light, moderate, or heavy server utilization, the model consistently delivers energy savings in the range of 9–10%. That kind of flexibility is crucial in real-world data centers, where workloads don’t follow predictable patterns. More broadly, the environmental and economic implications are significant. Cutting cooling energy use by even a small percentage—let alone 10%—can lead to substantial savings when scaled across multiple facilities. It also contributes to reducing the carbon footprint of the digital infrastructure we increasingly rely on. As power grids grow more strained and regulations tighten, solutions like this become not just beneficial, but necessary.

Reference

Weiqi Deng, Jiaqiang Wang, Chang Yue, Yang Guo, Quan Zhang, Model-based control strategy with linear parameter-varying state-space model for rack-based cooling data centers, Energy and Buildings, Volume 319, 2024, 114528,

Go to Energy and Buildings

Advances in Engineering Advances in Engineering features breaking research judged by Advances in Engineering advisory team to be of key importance in the Engineering field. Papers are selected from over 10,000 published each week from most peer reviewed journals.

Adaptive Thermal Optimization for Rack-Based Data Centers Using Predictive Control and Parameter-Varying Models

Significance

Reference

Check Also

Dual Adaptive UKF-Based Model Updating for Hybrid Seismic Testing