Full-Space Latent Representation for Mixed Qualitative–Quantitative Factor Gaussian Processes

Significance 

Gaussian process modeling offers a disciplined way to emulate complex systems with preserved uncertainty structure. Its effectiveness rests on simple assumption: distances between inputs encode how strongly outputs relate. That assumption holds cleanly when inputs are numerical but difficulty emerges once design choices, material types, configurations, or operating modes enter the problem as categories. Engineers routinely encounter such mixed-variable settings, however, the mathematical machinery of Gaussian processes resists categorical reasoning because labels don’t admit subtraction, scaling, or smooth interpolation. Several strategies attempt to bridge this divide. Some approaches hard-code similarity through indicator functions, treating all distinct categories as equally separated. Others attempt to learn full correlation matrices for each qualitative factor, accepting a growing parameter burden as the number of categories increases. Latent-variable formulations offer a different logic: they embed categories into an abstract numerical space and let distance regain meaning. These constructions introduce flexibility, but they also introduce assumptions, often quietly. Many latent-variable Gaussian process formulations restrict qualitative factors to low-dimensional embeddings, motivated by parameter economy and arguments borrowed from dimension reduction. That restriction carries consequences. Certain correlation structures simply can’t be represented when multiple categories must remain equidistant, or when ordinal structure needs to coexist with nominal freedom. The persistence of these limitations reflects a deeper tension. Gaussian process modeling relies on smoothness and stationarity, yet qualitative factors encode discontinuous choices. Treating all categories symmetrically discards order information; imposing order risks biasing nominal distinctions. Existing methods often compromise, either by inflating the number of parameters until estimation becomes fragile or by compressing representation until expressiveness suffers.

A recent research paper published in Mechanical Systems and Signal Processing and conducted by Dr. Liming Chen and Dr. Qingshan Wang from Central South University, in collaboration with Dr. Chen Jiang, Dr. Haobo Qiu and Dr. Liang Gao from Huazhong University of Science and Technology, presents a Gaussian process modeling framework that maps ordinal and nominal qualitative variables into a full latent space matched to their structural properties. Ordinal factors occupy ordered one-dimensional latent coordinates with learned spacing and nominal factors occupy axis-aligned latent spaces with one coordinate per category. These latent variables integrate directly into standard Gaussian process kernels through normalized distance and explicit length-scale estimation which will result in a flexible parsimonious surrogate model for mixed-variable engineering problems.

Briefly, the research team treated ordinal variables by mapping their levels onto ordered points along a unit interval, which allowed spacing to vary and be learned from data and by this they preserved rank information and avoided fixed assumptions about distance between adjacent levels. For nominal variables, the investigators constructed an axis-aligned latent space whose dimension matched the number of categories, assigning each category to its own coordinate axis. That construction allowed any pair of categories to share equal distance, a configuration that lower-dimensional embeddings cannot reproduce. They embedded these latent coordinates directly into conventional Gaussian process correlation functions after renormalization. Quantitative variables and latent variables shared a common distance scale, with characteristic length parameters governing influence. The authors estimated latent coordinates and length scales jointly through maximum likelihood, using numerical strategies designed to manage nonconvexity and conditioning. To evaluate behavior across diverse structures, the study examined twelve benchmark problems drawn from surrogate modeling literature, spanning ordinal-only, nominal-only, and fully mixed-variable cases. The researchers applied the proposed method alongside several established Gaussian process formulations, holding sampling schemes constant across replications. This design choice isolated representational effects from data availability.

Across analytically defined test functions, the investigators observed that performance varied by problem structure and methods that rely on fixed correlation matrices struggled when category counts increased or when functional behavior shifted sharply across ordinal levels. On the other hand, latent-variable methods handled interaction complexity more effectively, though low-dimensional embeddings showed stress in cases demanding symmetric relationships among many categories. The authors’ proposed full latent formulation maintained stable behavior across these settings, reflecting the absence of imposed dimensional compression. The study also examined a real engineering dataset involving cooling system noise, where qualitative design choices (fan speed levels and airflow configurations) interact with quantitative design parameters (air duct structural parameters and fan parameters). In this setting, the researchers demonstrated that the learned latent coordinates aligned with physically interpretable distinctions among categories. That interpretability emerged from the explicit geometry of the latent space. One trade-off remains implicit. Expanding latent dimensionality increases representational freedom but shifts burden onto likelihood optimization and the authors mitigated this through parsimonious parameterization and normalization.

To summarize, the new study by Dr. Liming Chen and colleagues reframes how qualitative variables enter Gaussian process models. Instead of forcing categorical information into distance metrics designed for numerical inputs, the paper treats representation as a geometric problem. Ordinal and nominal factors demand different geometries, and the model reflects that demand explicitly. That shift matters for engineers who rely on surrogate models for understanding how design choices shape system response. The method removes an artificial hierarchy imposed by low-dimensional embeddings by allowing nominal categories to occupy an axis-aligned latent space. Equal similarity among multiple categories becomes representable without distortion. For ordinal factors, learned spacing acknowledges that order doesn’t imply uniform effect and these choices influence correlation structure directly, shaping smoothness assumptions and sensitivity behavior. The real impact shows up when engineers start exploring design options and comparing alternatives. When Gaussian process surrogates guide optimization or uncertainty analysis, misrepresentation of qualitative factors can bias conclusions. A model that preserves symmetry or order as warranted reduces that risk. The explicit length-scale treatment further allows practitioners to compare the influence of qualitative and quantitative factors on a common footing, supporting sensitivity analysis that respects mixed-variable structure. Downstream use remains bounded. The method assumes noise-free observations and relies on likelihood-based estimation, which can strain under sparse sampling or high category counts. Its benefits appear strongest when qualitative distinctions materially alter system behavior, not when categories serve as minor modifiers. Within those bounds, the approach offers a clearer conceptual alignment between engineering knowledge and statistical modeling.

We believe the new study speaks to a problem engineers run into all the time. Real designs are shaped by choices that are not numbers but what material to use, which configuration to adopt, which operating mode to accept and those choices often steer system behavior just as much as any continuous parameter. Standard surrogate models tend to flatten these decisions into convenient numeric stand-ins, and that simplification can quietly bend conclusions. Researchers from Central South University and Huazhong University of Science and Technology treated ordinal and nominal factors in a way that matches how engineers think about them in practice: order is preserved where it exists, symmetry is respected where no order makes sense. That modeling decision carries through to correlations, length scales, and how trade-offs appear once variables interact. For engineers relying on Gaussian processes to explore design space or manage uncertainty, the work offers a way to represent qualitative decisions without forcing artificial rankings or paying a heavy price in parameters, which makes the resulting models easier to trust when they are used to support real engineering judgments.

About the author

Liming Chen is currently a lecturer in the College of Mechanical and Electrical Engineering at Central South University. He received his B.S. and Ph.D. degrees from Central South University and Huazhong University of Science and Technology in 2016 and 2021, respectively. His research interests include surrogate-based methods with applications to design optimization and digital twins.

Webpage link: https://www.researchgate.net/profile/Liming-Chen-17/research

About the author

Qingshan Wang is a professor in the College of Mechanical and Electrical Engineering at Central South University. His research interests include structural design and dynamics of engineering equipment, advanced materials and structural lightweight design.

Webpage link: https://www.researchgate.net/profile/Qingshan-Wang-7/research

Email: [email protected]

About the author

Chen Jiang is currently a lecturer in the School of Mechanical Science and Engineering at Huazhong University of Science and Technology. Before that, he was a postdoctoral fellow with the Department of Industrial and Manufacturing Systems Engineering at the University of Michigan-Dearborn and the Department of Mechanical Engineering at the Seoul National University. He received his Ph.D. degree from Huazhong University of Science and Technology in 2020 and Bachelor’s degree from Hunan University in 2015. His research interests include design under uncertainty, uncertainty quantification, model verification and validation, prognostics and health management.

Webpage link: https://www.researchgate.net/profile/Chen-Jiang-9/research

Email: [email protected]

About the author

Haobo Qiu is a professor in the School of Mechanical Science and Engineering at Huazhong University of Science and Technology. His research interests include equipment health management, early fault prognostics, maintenance optimization, system reliability modeling and analysis, and surrogate-based design optimization.

Webpage link: https://www.researchgate.net/profile/Haobo-Qiu/research

Email: [email protected]

About the author

Liang Gao is a professor in the School of Mechanical Science and Engineering at Huazhong University of Science and Technology. His research interests include operations research and optimization, big data, and machine learning.

Webpage link: https://www.researchgate.net/profile/Liang-Gao-17/research

Email: [email protected]

 

Reference

Liming Chen, Qingshan Wang, Chen Jiang, Haobo Qiu, Liang Gao, Exploring a full latent space for Gaussian process modeling with qualitative and quantitative factors, Mechanical Systems and Signal Processing, Volume 239, 2025, 113297.

Go to Journal of Mechanical Systems and Signal Processing.

Check Also

A decoupled large-stroke piezoelectric tool holder for cylindrical microchannel turning

Significance  Reference Qinghou Cheng, Yangkun Zhang, Yingxue Yao, Yang Yang, A decoupled large-stroke 2-DOF tool …