Optimized Vision Transformer Architecture for Cardiac Auscultation Classification using GAN augmented MFCC representations

Divya Lalita Sri Jalligampala; Gangadhara Rao Kancharla; Lalitha R.V.S

doi:10.54392/irjmt2567

Articles

Home / Archives / Volume 7, Issue 6, Year 2025 /

DOI: 10.54392/irjmt2567

Optimized Vision Transformer Architecture for Cardiac Auscultation Classification using GAN augmented MFCC representations

Divya Lalita Sri Jalligampala⁺⁻
Gangadhara Rao Kancharla⁺⁻
Lalitha R.V.S⁺⁻

Department of Computer Science Engineering, University College of Sciences, Acharya Nagarjuna University, Nagarjuna Nagar, Guntur, Andhra Pradesh, 522510, India.

Department of Computer Science Engineering, Aditya University, Suramplaem, Andhra Pradesh, 533447, India.

Dimensions

Plum Analytics

Abstract

Heart auscultation is a key diagnostic tool for detecting cardiac abnormalities; however, human interpretation is subjective and prone to error. Classic machine learning algorithms like LSTMs and BiLSTMs have been employed for computer-aided heart sound classification but face challenges with handling acoustic variation, data sparsity, and long-range correlations in spectrograms. Solo Vision Transformers (ViT’s) improve feature extraction but require large datasets to function best. This article introduces a hybrid model combining a Generative Adversarial Network (GAN) and a Vision Transformer (ViT) to address these issues, applying GAN-based data augmentation to enhance training diversity and leveraging ViT's self-attention mechanism to interpret spectrograms better. The data, accessed through the iStethoscope Pro app and clinical testing with DigiScope, comprised normal, murmur, and artifact classes. Preprocessing included silent cutting, resampling, and extraction of MFCCs, spectral contrast, chroma features, and RMSE. The proposed GAN+ViT model was compared to BiLSTM, LSTM, and standalone ViT. The performance showed that GAN+ViT outperformed all baseline models with 90% accuracy, 0.90 F1-score, 0.91 precision, and 0.89 recall, and AUC-ROC values of 0.92 for artifacts, 0.93 for murmurs, and 0.91 for normal sounds. On the other hand, BiLSTM (85%), LSTM (83%), and ViT (80%) were poor in their performance, particularly in discriminating between murmurs and normal sounds. The improved classification power of the hybrid model is due to complementary data augmentation and attention-based feature learning, thereby reducing misclassifications. This research recommends that GAN+ViT is a viable method for automated analysis of cardiac sounds, with high accuracy and generalizability for clinical applications. Future research could explore multimodal integration with ECG data and employ explainable AI methods to enhance diagnostic consistency.

Keywords

Heart sound classification, Generative Adversarial Network (GAN), Vision Transformer (ViT), Data augmentation, Mel-Frequency Cepstral Coefficients (MFCCs),

Downloads

Download data is not yet available.

References

WHO, (2020) Cardiovascular Diseases, Available at: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
C. Liu, D. Springer, Q. Li, B. Moody, R.A. Juan, F.J. Chorro, F. Castells, J.M. Roig, I. Silva, A.E. Johnson, Z. Syed, S.E. Schmidt, C.D. Papadaniil, L. Hadjileontiadis, H. Naseri, A. Moukadem, A. Dieterlen, C. Brandt, H. Tang, M. Samieinasab, M.R. Samieinasab, R. Sameni, R.G. Mark, G.D. Clifford, An open access database for the evaluation of heart sound algorithms. Physiological measurement, 37(12), (2016) 2181. https://doi.org/10.1088/0967-3334/37/12/2181
C. Liu, A. Murray, Applications of complexity analysis in clinical heart failure. In Complexity and Nonlinearity in Cardiovascular Signals, Springer, (2017) 301–325. https://doi.org/10.1007/978-3-319-58709-7_11
D.H. Peters, A. Garg, G. Bloom, D.G. Walker, W.R. Brieger, M. Hafizur Rahman, Poverty and access to health care in developing countries. Annals of the New York Academy of Sciences, 1136(1), (2008) 161–171. https://doi.org/10.1196/annals.1425.011
A.K. Dwivedi, S.A. Imtiaz, E. Rodriguez-Villegas, Algorithms for automatic analysis and classification of heart sounds–a systematic review. IEEE Access, IEEE, 7, (2018) 8316–8345. https://doi.org/10.1109/ACCESS.2018.2889437
J.S. Chorba, A.M. Shapiro, L. Le, J. Maidens, J. Prince, S. Pham, M.M. Kanzawa, D.N. Barbosa, C. Currie, C. Brooks, B.E. White, Deep learning algorithm for automated cardiac murmur detection via a digital stethoscope platform. Journal of the American Heart Association, 10(9), (2021) e019905. https://doi.org/10.1161/JAHA.120.019905
F. Noman, C.M. Ting, S.H. Salleh, H. Ombao, (2019) Short-segment heart sound classification using an ensemble of deep convolutional neural networks. In ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, Brighton, UK, 1318–1322. https://doi.org/10.1109/ICASSP.2019.8682668
J. Lee, T. Kang, N. Kim, S. Han, H. Won, W. Gong, I.Y. Kwak, Deep learning based heart murmur detection using frequency-time domain features of heartbeat sounds. In 2022 Computing in Cardiology (CinC), IEEE, Tampere, Finland, 498, (2022) 1–4. https://doi.org/10.22489/CinC.2022.071
H. Lu, J.B. Yip, T. Steigleder, S. Grießhammer, M. Heckel, N.V.S.J. Jami, B. Eskofier, C. Ostgathe, A. Koelpin, A lightweight robust approach for automatic heart murmurs and clinical outcomes classification from phonocardiogram recordings. In 2022 Computing in Cardiology (CinC), IEEE, (2022) 1–4. https://doi.org/10.22489/CinC.2022.165
G.B. Lim, AI used to detect cardiac murmurs, Nature Reviews Cardiology, 18(7), (2021) 460. https://doi.org/10.1038/s41569-021-00567-8
M. Zha, G. Meng, C. Lin, Z. Zhou, K. Chen. (2019) RoLMA: a practical adversarial attack against deep learning-based LPR systems. In International conference on information security and cryptology, Springer, 101–117. https://doi.org/10.1007/978-3-030-42921-8_6
K. Phua, J. Chen, T.H. Dat, L. Shue, Heart sound as a biometric. Pattern Recognit, 41(3), (2008) 906–919. https://doi.org/10.1016/j.patcog.2007.07.018
L. Jia, D. Song, L. Tao, Y. Lu, Heart sounds classification with a fuzzy neural network method with structure learning. In International Symposium on Neural Networks, Springer, (2012) 130–140. https://doi.org/10.1007/978-3-642-31362-2_15
S.W. Deng, J.Q. Han, towards heart sound classification without segmentation via autocorrelation feature and diffusion maps. Future Generation Computer Systems, 60, (2016) 13–21. https://doi.org/10.1016/j.future.2016.01.010
M. Zabihi, A.B. Rad, S. Kiranyaz, M. Gabbouj, A.K. Katsaggelos, Heart sound anomaly and quality detection using ensemble of neural networks without segmentation. In 2016 computing in cardiology conference (CinC), IEEE, (2016) 613–616. https://doi.org/10.22489/CinC.2016.180-213
C. Potes, S. Parvaneh, A. Rahman, B. Conroy, Ensemble of feature-based and deep learning-based classifiers for detection of abnormal heart sounds. In 2016 computing in cardiology conference (CinC), IEEE, (2016) 621–624. https://doi.org/10.22489/CinC.2016.182-399
W. Zhang, J. Han, S. Deng, Heart sound classification based on scaled spectrogram and partial least squares regression. Biomedical Signal Processing and Control, 32, (2017) 20–28. https://doi.org/10.1016/j.bspc.2016.10.004
Z. Arabasadi, R. Alizadehsani, M. Roshanzamir, H. Moosaei, A.A. Yarifard, Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm. Computer methods and programs in biomedicine, 141, (2017) 19–26. https://doi.org/10.1016/j.cmpb.2017.01.004
J.P. Dominguez-Morales, A.F. Jimenez-Fernandez, M.J. Dominguez-Morales, G. Jimenez-Moreno, Deep neural networks for the recognition and classification of heart murmurs using neuromorphic auditory sensors. IEEE transactions on biomedical circuits and systems, IEEE, 12(1), (2017) 24–34. https://doi.org/10.1109/TBCAS.2017.2751545
F. Zulfiqar, U.I. Bajwa, Y. Mehmood, Multi-class classification of brain tumor types from MR images using EfficientNets. Biomedical Signal Processing and Control, 84, (2023) 104777. https://doi.org/10.1016/j.bspc.2023.104777
M. Hamidi, H. Ghassemian, M. Imani, Classification of heart sound signal using curve fitting and fractal dimension. Biomedical Signal Processing and Control, 39, (2018) 351–359. https://doi.org/10.1016/j.bspc.2017.08.002
W. Zhang, J. Han, S. Deng, Abnormal heart sound detection using temporal quasi-periodic features and long short-term memory without segmentation. Biomedical Signal Processing and Control, 53, (2019) 101560. https://doi.org/10.1016/j.bspc.2019.101560
M. Deng, T. Meng, J. Cao, S. Wang, J. Zhang, H. Fan, Heart sound classification based on improved MFCC features and convolutional recurrent neural networks. Neural Networks, 130, (2020) 22–32, 2020. https://doi.org/10.1016/j.neunet.2020.06.015
W. Chen, Q. Sun, X. Chen, G. Xie, H. Wu, C. Xu, Deep learning methods for heart sounds classification: A systematic review. Entropy, 23(6), (2021) 667. https://doi.org/10.3390/e23060667
G. Tian, C. Lian, Z. Zeng, B. Xu, Y. Su, J. Zang, Z. Zhang, C. Xue, Imbalanced heart sound signal classification based on two-stage trained dsanet. Cognitive Computation, 14(4), (2022) 1378–1391. https://doi.org/10.1007/s12559-022-10009-3
W. Xu, K. Yu, J. Ye, H. Li, J. Chen, F. Yin, J. Xu, J. Zhu, D. Li, Q. Shu, Automatic pediatric congenital heart disease classification based on heart sound signal. Artificial intelligence in medicine, 126, (2022) 102257. https://doi.org/10.1016/j.artmed.2022.102257
Z. Ren, K. Qian, F. Dong, Z. Dai, W. Nejdl, Y. Yamamoto, B.W. Schuller, Deep attention-based neural networks for explainable heart sound classification. Machine Learning with Applications, 9, (2022) 100322. https://doi.org/10.1016/j.mlwa.2022.100322
X. Chen, H. Li, Y. Huang, W. Han, X. Yu, P. Zhang, R. Tao, Heart sound classification based on equal scale frequency cepstral coefficients and deep learning. Biomedical Engineering/Biomedizinische Technik, 68(3), (2023) 285–295. https://doi.org/10.1515/bmt-2021-0254
M.T. Nguyen, W.W. Lin, J.H. Huang, Heart sound classification using deep learning techniques based on log-mel spectrogram. Circuits, Systems, and Signal Processing, 42(1), (2023) 344–360. https://doi.org/10.1007/s00034-022-02124-1
M. Xiang, J. Zang, J. Wang, H. Wang, C. Zhou, R. Bi, Z. Zhang, C. Xue, Research of heart sound classification using two-dimensional features. Biomedical Signal Processing and Control, 79, (2023) 104190. https://doi.org/10.1016/j.bspc.2022.104190
Z. Ren, Y. Chang, T.T. Nguyen, Y. Tan, K. Qian, B.W. Schuller, A comprehensive survey on heart sound analysis in the deep learning era. IEEE Computational Intelligence Magazine, IEEE, 19(3), (2024) 42–57. https://doi.org/10.1109/MCI.2024.3401309
S. Ismail, B. Ismail, I. Siddiqi, U. Akram, PCG classification through spectrogram using transfer learning. Biomedical Signal Processing and Control, 79, (2023) 104075. https://doi.org/10.1016/j.bspc.2022.104075
M. Bahreini, R. Barati, A. Kamali, Cardiac sound classification using a hybrid approach: MFCC-based feature fusion and CNN deep features. EURASIP Journal on Advances in Signal Processing, 2025(1), (2025) https://doi.org/10.1186/s13634-025-01203-0
I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets. Advances in neural information processing systems, 27, (2014).
A.M. Shaker, M. Tantawi, H.A. Shedeed, M.F. Tolba, Generalization of convolutional neural networks for ECG classification using generative adversarial networks. IEEE access, 8, (2020) 35592–35605. https://doi.org/10.1109/ACCESS.2020.2974712
J.N. Mogan, C.P. Lee, K.M. Lim, M. Ali, A. Alqahtani, Gait-CNN-ViT: Multi-model gait recognition with convolutional neural networks and vision transformer. Sensors, 23(8), (2023) 3809. https://doi.org/10.3390/s23083809
S. Li, (2024) Audio Feature Extraction Algorithms and Implementation Technologies Analysis. In 2024 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS), IEEE, Guangzhou, China, 507–516. https://doi.org/10.1109/ISPDS62779.2024.10667490
X. Fang, G. Wei, Research on entertainment creation robot based on artificial intelligence speech recognition in the process of music style analysis. Entertainment Computing, 51, (2024) 100739. https://doi.org/10.1016/j.entcom.2024.100739
A. Tajik. (2025). Beyond Voice Recognition: Integrating Alexa’s Emotional Intelligence and ChatGPT’s Language Processing for EFL Learners’ Development and Anxiety Reduction-A Comparative Analysis. https://doi.org/10.21203/rs.3.rs-5989702/v1
A.J. Benjamin, K. Siedenburg, Effects of spectral manipulations of music mixes on musical scene analysis abilities of hearing-impaired listeners. PLoS One, 20(1), (2025) e0316442. https://doi.org/10.1371/journal.pone.0316442
J. Shi, L. Liu, Construction and Implementation of Content-Based National Music Retrieval Model under Deep Learning. International Journal of Information System Modeling and Design, 15(1), (2024) 1–17. https://doi.org/10.4018/IJISMD.343631
S. Chakraborty, P. Kochhar, S. Patil, K. Kotecha, S. Gite, G. Selvachandran, S. Das, Generative adversarial network augmented data for improved heart sound abnormality detection. Computers in Biology and Medicine, 195, (2025) 110623. https://doi.org/10.1016/j.compbiomed.2025.110623
S.U.R. Khan, Z. Khan, Detection of Abnormal Cardiac Rhythms Using Feature Fusion Technique with Heart Sound Spectrograms. Journal of Bionic Engineering, (2025) 1–20.
E. Partovi, A. Babic, A. Gharehbaghi, A review on deep learning methods for heart sound signal analysis. Frontiers in Artificial Intelligence, 7, (2024) 1434022. https://doi.org/10.3389/frai.2024.1434022
A.O. Ige, M. Sibiya, (2024). State-of-the-art in 1d convolutional neural networks: A survey. IEEE Access,IEEE, 144082 – 144105. https://doi.org/10.1109/ACCESS.2024.3433513
M.T. Ahad, S.A. Preanto, B. Song, Y. Li, Gan-Generated Spectrogram Detection and Classification for Heartbeat Classification Using a Vision Transformer. SSRN 4892869.

Downloads

PDF

Article Details

Volume 7, Issue 6, Year 2025

DOI: 10.54392/irjmt2567

Published 2025-11-04

How to Cite

Jalligampala, Divya Lalita Sri, Gangadhara Rao Kancharla, and Lalitha R.V.S. 2025. “Optimized Vision Transformer Architecture for Cardiac Auscultation Classification Using GAN Augmented MFCC Representations”. International Research Journal of Multidisciplinary Technovation 7 (6):114-28. https://doi.org/10.54392/irjmt2567.

Copyrights & License

This work is licensed under a Creative Commons Attribution 4.0 International License.