Paramps: Tensor-Decomposed Convolutional Networks for Robust Heart Sound Classification and Cardiovascular Diagnosis

Significance 

Cardiovascular disease is the largest contributor to mortality worldwide and accounts for well over two-thirds of deaths linked to chronic illness. In many low- and middle-income countries, the shortage of trained physicians makes even basic diagnosis a challenge and the auscultation with a stethoscope remains the first line of screening. Anyone who has spent time in a noisy ward knows how difficult it can be to pick out a faint murmur. Errors are therefore inevitable, and for conditions like valve disease or early arrhythmias, such errors can delay intervention at exactly the wrong time. Heart sounds themselves are rich with information. They capture valve motion and subtle changes in flow, but they are notoriously hard to interpret. Murmurs from regurgitation or stenosis, while textbook examples, rarely appear clean in recordings. They are distorted by body habitus, breathing, or simply by the design of the stethoscope. Researchers initially leaned on filters and wavelet transforms to clean these signals, and while those methods helped to a degree, they never really solved the issue of extracting discriminative features from highly non-stationary data. Later, classifiers such as SVMs were brought in. They pushed accuracy up, but at the price of heavy computation and a limited ability to generalize beyond the datasets on which they were trained.

The advent of convolutional neural networks changed the conversation. By converting phonocardiograms into spectrograms, CNNs made it possible to learn features that human engineers could not have designed. Their success is real, but the practical obstacles are equally real: millions of parameters to optimize, sensitivity to recording conditions, and computational costs that rule out use in resource-constrained clinics. Ironically, the settings that would benefit most are the least able to run these models. To this account, new research paper published in Signal Processing  and conducted by Dr. Lin Duan, Professor Lidong Yang, and Dr. Yong Guo from the Inner Mongolia University of Science and Technology, the authors developed two models: a parallel CNN baseline (para-CNN) and a tensor-decomposed extension (para-MPS), later enhanced with an attention mechanism to form the final Paramps model and their novelty lies in embedding matrix product state–based tensor decomposition within CNN layers and by this drastically reduced redundant parameters while preserving discriminative capacity.

The authors began their experimental pipeline with preprocessing heart sound recordings into Mel spectrograms, a representation that preserves perceptually meaningful frequency information. Steps such as pre-emphasis, framing, windowing, Fourier transformation, and Mel filtering standardized the input data, enabling the models to learn from consistent high-resolution features. The team first tested their models on the widely used 2016 PhysioNet/CinC Challenge dataset, which contains over 3,000 labeled recordings gathered in both clinical and noisy environments. This dataset posed a significant hurdle due to imbalanced class distributions, with far fewer abnormal cases than normal ones. Training used an 8:2 train-test split, a batch size of 32, and the Adam optimizer with a learning rate of 0.0001. Across 300 epochs, cross-entropy loss guided convergence. The baseline para-CNN achieved 94.6% accuracy, while the para-MPS model, benefiting from tensor decomposition, reduced parameters from 6.41 million to 4.74 million and improved accuracy to 96.1%. Importantly, specificity reached 98.7%, confirming that the model effectively minimized false alarms—a critical metric for clinical deployment. Afterward, the researchers integrated an attention mechanism to create the Paramps model. This refinement allowed the network to selectively emphasize informative regions within spectrograms. In head-to-head comparisons, Paramps achieved an accuracy of 96.4% with specificity rising to 99.1%. Sensitivity and F1-score also showed modest but consistent gains, highlighting improved reliability across both positive and negative cases. Particularly in subset analyses, the model excelled on most data groups, though some difficulty remained in subsets with highly imbalanced distributions, a limitation attributed to insufficient abnormal samples. They then performed a second evaluation using Yaseen open heart sound dataset, which contained balanced samples across five diagnostic categories, including aortic stenosis, mitral stenosis, regurgitation, and prolapse. Here, Paramps achieved a remarkable 99.2% accuracy and 99.8% specificity. The confusion matrices revealed nearly perfect classification across categories, surpassing other benchmark models such as WaveNet, multiclass composite classifiers, and traditional CNNs. Ablation studies confirmed the role of tensor decomposition and attention mechanisms as essential contributors: accuracy gains of 1.5% and 0.3% respectively were recorded after their inclusion.

The new findings by Duan, Yang, and Guo establish Paramps as a technically elegant architecture and clinically meaningful advancement. Indeed, its ability to maintain high accuracy under noisy, real-world conditions distinguishes it from earlier CNN-based approaches, where performance typically degraded outside controlled environments.  The implications of this work extend beyond incremental performance gains. By integrating tensor decomposition into CNNs, the researchers have effectively introduced a new paradigm for biomedical audio analysis. Parameter efficiency matters for practical adoption in clinics, rural health stations, and portable diagnostic devices and the Paramps, which reduces complexity without sacrificing fidelity, aligns with the global push toward equitable and accessible healthcare technologies.

What stands out to me in these results is not simply the accuracy but the very high specificity. In practice, a false positive is never just a number—it means extra imaging, repeat clinic visits, and a fair amount of stress for the patient. Crossing the 99% threshold across more than one dataset is unusual, and it suggests that the model is doing more than overfitting to neat examples. Sensitivity is also important, of course, since missing a genuine murmur or arrhythmic case has obvious clinical consequences. Both measures together give some confidence that this approach might be useful outside of the lab. Still, there are issues left open. The imbalance in available datasets is real and can distort performance on rare conditions, so strategies like oversampling or synthetic augmentation will probably be necessary. Longer term, applying tensor–CNN hybrids to other biosignals—ECG, respiratory recordings could be an even more telling test.

About the author

Lin Duan: Master. She is a master’s student at the School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology. Her research interests lie in audio signal processing, audio recognition, and audio scene classification.

About the author

Professor Lidong Yang: Doctor. Professor and Master’s Supervisor at the School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology. He is a “Grassland Talent” – a young innovative talent in Inner Mongolia Autonomous Region. His main research fields are audio signal processing and pattern recognition.

About the author

Professor Yong Guo: Doctor. Associate Professor and Master’s Supervisor at the School of Science of Inner Mongolia University of Science and Technology. His main research fields include mathematical theories and methods in signal processing (fractional Fourier transform, time-frequency analysis).

Reference

Lin Duan, Lidong Yang, Yong Guo, Paramps: Convolutional neural networks based on tensor decomposition for heart sound signal analysis and cardiovascular disease diagnosis, Signal Processing, Volume 227, 2025, 109716,

Go to Signal Processing

Check Also

Modular Hardware Paths for Scalable Quantum Information Processing

Significance  Image credit: Science. 2025 Dec 4;390(6777):1004-1010. doi: 10.1126/science.adz8659. Reference Awschalom DD, Bernien H, Hanson …