Abstract

Heart auscultation is a key diagnostic tool for detecting cardiac abnormalities; however, human interpretation is subjective and prone to error. Classic machine learning algorithms like LSTMs and BiLSTMs have been employed for computer-aided heart sound classification but face challenges with handling acoustic variation, data sparsity, and long-range correlations in spectrograms. Solo Vision Transformers (ViT’s) improve feature extraction but require large datasets to function best. This article introduces a hybrid model combining a Generative Adversarial Network (GAN) and a Vision Transformer (ViT) to address these issues, applying GAN-based data augmentation to enhance training diversity and leveraging ViT's self-attention mechanism to interpret spectrograms better. The data, accessed through the iStethoscope Pro app and clinical testing with DigiScope, comprised normal, murmur, and artifact classes. Preprocessing included silent cutting, resampling, and extraction of MFCCs, spectral contrast, chroma features, and RMSE. The proposed GAN+ViT model was compared to BiLSTM, LSTM, and standalone ViT. The performance showed that GAN+ViT outperformed all baseline models with 90% accuracy, 0.90 F1-score, 0.91 precision, and 0.89 recall, and AUC-ROC values of 0.92 for artifacts, 0.93 for murmurs, and 0.91 for normal sounds. On the other hand, BiLSTM (85%), LSTM (83%), and ViT (80%) were poor in their performance, particularly in discriminating between murmurs and normal sounds. The improved classification power of the hybrid model is due to complementary data augmentation and attention-based feature learning, thereby reducing misclassifications. This research recommends that GAN+ViT is a viable method for automated analysis of cardiac sounds, with high accuracy and generalizability for clinical applications. Future research could explore multimodal integration with ECG data and employ explainable AI methods to enhance diagnostic consistency.

Keywords

Heart sound classification, Generative Adversarial Network (GAN), Vision Transformer (ViT), Data augmentation, Mel-Frequency Cepstral Coefficients (MFCCs),

Downloads

Download data is not yet available.

References

  1. WHO, (2020) Cardiovascular Diseases, Available at: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
  2. C. Liu, D. Springer, Q. Li, B. Moody, R.A. Juan, F.J. Chorro, F. Castells, J.M. Roig, I. Silva, A.E. Johnson, Z. Syed, S.E. Schmidt, C.D. Papadaniil, L. Hadjileontiadis, H. Naseri, A. Moukadem, A. Dieterlen, C. Brandt, H. Tang, M. Samieinasab, M.R. Samieinasab, R. Sameni, R.G. Mark, G.D. Clifford, An open access database for the evaluation of heart sound algorithms. Physiological measurement, 37(12), (2016) 2181. https://doi.org/10.1088/0967-3334/37/12/2181
  3. C. Liu, A. Murray, Applications of complexity analysis in clinical heart failure. In Complexity and Nonlinearity in Cardiovascular Signals, Springer, (2017) 301–325. https://doi.org/10.1007/978-3-319-58709-7_11
  4. D.H. Peters, A. Garg, G. Bloom, D.G. Walker, W.R. Brieger, M. Hafizur Rahman, Poverty and access to health care in developing countries. Annals of the New York Academy of Sciences, 1136(1), (2008) 161–171. https://doi.org/10.1196/annals.1425.011
  5. A.K. Dwivedi, S.A. Imtiaz, E. Rodriguez-Villegas, Algorithms for automatic analysis and classification of heart sounds–a systematic review. IEEE Access, IEEE, 7, (2018) 8316–8345. https://doi.org/10.1109/ACCESS.2018.2889437
  6. J.S. Chorba, A.M. Shapiro, L. Le, J. Maidens, J. Prince, S. Pham, M.M. Kanzawa, D.N. Barbosa, C. Currie, C. Brooks, B.E. White, Deep learning algorithm for automated cardiac murmur detection via a digital stethoscope platform. Journal of the American Heart Association, 10(9), (2021) e019905. https://doi.org/10.1161/JAHA.120.019905
  7. F. Noman, C.M. Ting, S.H. Salleh, H. Ombao, (2019) Short-segment heart sound classification using an ensemble of deep convolutional neural networks. In ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, Brighton, UK, 1318–1322. https://doi.org/10.1109/ICASSP.2019.8682668
  8. J. Lee, T. Kang, N. Kim, S. Han, H. Won, W. Gong, I.Y. Kwak, Deep learning based heart murmur detection using frequency-time domain features of heartbeat sounds. In 2022 Computing in Cardiology (CinC), IEEE, Tampere, Finland, 498, (2022) 1–4. https://doi.org/10.22489/CinC.2022.071
  9. H. Lu, J.B. Yip, T. Steigleder, S. Grießhammer, M. Heckel, N.V.S.J. Jami, B. Eskofier, C. Ostgathe, A. Koelpin, A lightweight robust approach for automatic heart murmurs and clinical outcomes classification from phonocardiogram recordings. In 2022 Computing in Cardiology (CinC), IEEE, (2022) 1–4. https://doi.org/10.22489/CinC.2022.165
  10. G.B. Lim, AI used to detect cardiac murmurs, Nature Reviews Cardiology, 18(7), (2021) 460. https://doi.org/10.1038/s41569-021-00567-8
  11. M. Zha, G. Meng, C. Lin, Z. Zhou, K. Chen. (2019) RoLMA: a practical adversarial attack against deep learning-based LPR systems. In International conference on information security and cryptology, Springer, 101–117. https://doi.org/10.1007/978-3-030-42921-8_6
  12. K. Phua, J. Chen, T.H. Dat, L. Shue, Heart sound as a biometric. Pattern Recognit, 41(3), (2008) 906–919. https://doi.org/10.1016/j.patcog.2007.07.018
  13. L. Jia, D. Song, L. Tao, Y. Lu, Heart sounds classification with a fuzzy neural network method with structure learning. In International Symposium on Neural Networks, Springer, (2012) 130–140. https://doi.org/10.1007/978-3-642-31362-2_15
  14. S.W. Deng, J.Q. Han, towards heart sound classification without segmentation via autocorrelation feature and diffusion maps. Future Generation Computer Systems, 60, (2016) 13–21. https://doi.org/10.1016/j.future.2016.01.010
  15. M. Zabihi, A.B. Rad, S. Kiranyaz, M. Gabbouj, A.K. Katsaggelos, Heart sound anomaly and quality detection using ensemble of neural networks without segmentation. In 2016 computing in cardiology conference (CinC), IEEE, (2016) 613–616. https://doi.org/10.22489/CinC.2016.180-213
  16. C. Potes, S. Parvaneh, A. Rahman, B. Conroy, Ensemble of feature-based and deep learning-based classifiers for detection of abnormal heart sounds. In 2016 computing in cardiology conference (CinC), IEEE, (2016) 621–624. https://doi.org/10.22489/CinC.2016.182-399
  17. W. Zhang, J. Han, S. Deng, Heart sound classification based on scaled spectrogram and partial least squares regression. Biomedical Signal Processing and Control, 32, (2017) 20–28. https://doi.org/10.1016/j.bspc.2016.10.004
  18. Z. Arabasadi, R. Alizadehsani, M. Roshanzamir, H. Moosaei, A.A. Yarifard, Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm. Computer methods and programs in biomedicine, 141, (2017) 19–26. https://doi.org/10.1016/j.cmpb.2017.01.004
  19. J.P. Dominguez-Morales, A.F. Jimenez-Fernandez, M.J. Dominguez-Morales, G. Jimenez-Moreno, Deep neural networks for the recognition and classification of heart murmurs using neuromorphic auditory sensors. IEEE transactions on biomedical circuits and systems, IEEE, 12(1), (2017) 24–34. https://doi.org/10.1109/TBCAS.2017.2751545
  20. F. Zulfiqar, U.I. Bajwa, Y. Mehmood, Multi-class classification of brain tumor types from MR images using EfficientNets. Biomedical Signal Processing and Control, 84, (2023) 104777. https://doi.org/10.1016/j.bspc.2023.104777
  21. M. Hamidi, H. Ghassemian, M. Imani, Classification of heart sound signal using curve fitting and fractal dimension. Biomedical Signal Processing and Control, 39, (2018) 351–359. https://doi.org/10.1016/j.bspc.2017.08.002
  22. W. Zhang, J. Han, S. Deng, Abnormal heart sound detection using temporal quasi-periodic features and long short-term memory without segmentation. Biomedical Signal Processing and Control, 53, (2019) 101560. https://doi.org/10.1016/j.bspc.2019.101560
  23. M. Deng, T. Meng, J. Cao, S. Wang, J. Zhang, H. Fan, Heart sound classification based on improved MFCC features and convolutional recurrent neural networks. Neural Networks, 130, (2020) 22–32, 2020. https://doi.org/10.1016/j.neunet.2020.06.015
  24. W. Chen, Q. Sun, X. Chen, G. Xie, H. Wu, C. Xu, Deep learning methods for heart sounds classification: A systematic review. Entropy, 23(6), (2021) 667. https://doi.org/10.3390/e23060667
  25. G. Tian, C. Lian, Z. Zeng, B. Xu, Y. Su, J. Zang, Z. Zhang, C. Xue, Imbalanced heart sound signal classification based on two-stage trained dsanet. Cognitive Computation, 14(4), (2022) 1378–1391. https://doi.org/10.1007/s12559-022-10009-3
  26. W. Xu, K. Yu, J. Ye, H. Li, J. Chen, F. Yin, J. Xu, J. Zhu, D. Li, Q. Shu, Automatic pediatric congenital heart disease classification based on heart sound signal. Artificial intelligence in medicine, 126, (2022) 102257. https://doi.org/10.1016/j.artmed.2022.102257
  27. Z. Ren, K. Qian, F. Dong, Z. Dai, W. Nejdl, Y. Yamamoto, B.W. Schuller, Deep attention-based neural networks for explainable heart sound classification. Machine Learning with Applications, 9, (2022) 100322. https://doi.org/10.1016/j.mlwa.2022.100322
  28. X. Chen, H. Li, Y. Huang, W. Han, X. Yu, P. Zhang, R. Tao, Heart sound classification based on equal scale frequency cepstral coefficients and deep learning. Biomedical Engineering/Biomedizinische Technik, 68(3), (2023) 285–295. https://doi.org/10.1515/bmt-2021-0254
  29. M.T. Nguyen, W.W. Lin, J.H. Huang, Heart sound classification using deep learning techniques based on log-mel spectrogram. Circuits, Systems, and Signal Processing, 42(1), (2023) 344–360. https://doi.org/10.1007/s00034-022-02124-1
  30. M. Xiang, J. Zang, J. Wang, H. Wang, C. Zhou, R. Bi, Z. Zhang, C. Xue, Research of heart sound classification using two-dimensional features. Biomedical Signal Processing and Control, 79, (2023) 104190. https://doi.org/10.1016/j.bspc.2022.104190
  31. Z. Ren, Y. Chang, T.T. Nguyen, Y. Tan, K. Qian, B.W. Schuller, A comprehensive survey on heart sound analysis in the deep learning era. IEEE Computational Intelligence Magazine, IEEE, 19(3), (2024) 42–57. https://doi.org/10.1109/MCI.2024.3401309
  32. S. Ismail, B. Ismail, I. Siddiqi, U. Akram, PCG classification through spectrogram using transfer learning. Biomedical Signal Processing and Control, 79, (2023) 104075. https://doi.org/10.1016/j.bspc.2022.104075
  33. M. Bahreini, R. Barati, A. Kamali, Cardiac sound classification using a hybrid approach: MFCC-based feature fusion and CNN deep features. EURASIP Journal on Advances in Signal Processing, 2025(1), (2025) https://doi.org/10.1186/s13634-025-01203-0
  34. I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets. Advances in neural information processing systems, 27, (2014).
  35. A.M. Shaker, M. Tantawi, H.A. Shedeed, M.F. Tolba, Generalization of convolutional neural networks for ECG classification using generative adversarial networks. IEEE access, 8, (2020) 35592–35605. https://doi.org/10.1109/ACCESS.2020.2974712
  36. J.N. Mogan, C.P. Lee, K.M. Lim, M. Ali, A. Alqahtani, Gait-CNN-ViT: Multi-model gait recognition with convolutional neural networks and vision transformer. Sensors, 23(8), (2023) 3809. https://doi.org/10.3390/s23083809
  37. S. Li, (2024) Audio Feature Extraction Algorithms and Implementation Technologies Analysis. In 2024 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS), IEEE, Guangzhou, China, 507–516. https://doi.org/10.1109/ISPDS62779.2024.10667490
  38. X. Fang, G. Wei, Research on entertainment creation robot based on artificial intelligence speech recognition in the process of music style analysis. Entertainment Computing, 51, (2024) 100739. https://doi.org/10.1016/j.entcom.2024.100739
  39. A. Tajik. (2025). Beyond Voice Recognition: Integrating Alexa’s Emotional Intelligence and ChatGPT’s Language Processing for EFL Learners’ Development and Anxiety Reduction-A Comparative Analysis. https://doi.org/10.21203/rs.3.rs-5989702/v1
  40. A.J. Benjamin, K. Siedenburg, Effects of spectral manipulations of music mixes on musical scene analysis abilities of hearing-impaired listeners. PLoS One, 20(1), (2025) e0316442. https://doi.org/10.1371/journal.pone.0316442
  41. J. Shi, L. Liu, Construction and Implementation of Content-Based National Music Retrieval Model under Deep Learning. International Journal of Information System Modeling and Design, 15(1), (2024) 1–17. https://doi.org/10.4018/IJISMD.343631
  42. S. Chakraborty, P. Kochhar, S. Patil, K. Kotecha, S. Gite, G. Selvachandran, S. Das, Generative adversarial network augmented data for improved heart sound abnormality detection. Computers in Biology and Medicine, 195, (2025) 110623. https://doi.org/10.1016/j.compbiomed.2025.110623
  43. S.U.R. Khan, Z. Khan, Detection of Abnormal Cardiac Rhythms Using Feature Fusion Technique with Heart Sound Spectrograms. Journal of Bionic Engineering, (2025) 1–20.
  44. E. Partovi, A. Babic, A. Gharehbaghi, A review on deep learning methods for heart sound signal analysis. Frontiers in Artificial Intelligence, 7, (2024) 1434022. https://doi.org/10.3389/frai.2024.1434022
  45. A.O. Ige, M. Sibiya, (2024). State-of-the-art in 1d convolutional neural networks: A survey. IEEE Access,IEEE, 144082 – 144105. https://doi.org/10.1109/ACCESS.2024.3433513
  46. M.T. Ahad, S.A. Preanto, B. Song, Y. Li, Gan-Generated Spectrogram Detection and Classification for Heartbeat Classification Using a Vision Transformer. SSRN 4892869.