Improving Medical Image Captioning with a Context-Aware Knowledge Graph Transformer Framework

Aarti Sahitya; Shilpa Shinde

doi:10.54392/irjmt25510

Articles

Home / Archives / Volume 7, Issue 5, Year 2025 /

DOI: 10.54392/irjmt25510

Improving Medical Image Captioning with a Context-Aware Knowledge Graph Transformer Framework

Aarti Sahitya⁺⁻
Shilpa Shinde⁺⁻

Department of Computer Engineering, Ramrao Adik Institute of Technology/D.Y Patil Deemed to be university, Nerul, 400706, India

Dimensions

Plum Analytics

Abstract

In this paper, we proposed a context-aware knowledge graph transformer framework for improving the caption of chest X-ray images. Normally the role of a radiologist is to interpret the chest X-ray or MRI image and write a detailed summary of finding patterns in a report. To generate an automatic detailed summary of the image the proposed framework is divided into three steps. The first step captures the visual feature of images using computer vision algorithms as Resnet 50 and Alexnet. The Second step uses the knowledge graph layer is employed for calculating the similarity between the tokens based on angel and token overlap to generate context-aware meaning of each token. The third step utilizes the transformer-based decoder to generate the detailed caption. The performance of the proposed model is compared against existing baselines including LSTM, CONV2D, and BI-LSTM architectures. The Proposed model outperforms baseline models by achieving higher evaluation scores in terms of evaluation metrics as 63% (BLEU-1), 61% (BLEU-4), 79% (RIBES), 85% (precision), 82% (recall), 82% (SPICE),and 79% (METEOR) demonstrating its effectiveness in medical text summarization.

Keywords

Medical Image Captioning, Knowledge Graph, Transformers, Cosine Similarity, Jaccard Similarity,

Downloads

Download data is not yet available.

References

T. Ghandi, H. Pourreza, H. Mahyar, Deep learning approaches on image captioning: A review. ACM Computing Surveys, 56(3), (2023) 1-39. https://doi.org/10.1145/3617592
Y. Lin, K. Lai, W. Chang, Skin medical image captioning using multi-label classification and Siamese network. IEEE Access, 11, (2023) 23447-54. https://doi.org/10.1109/ACCESS.2023.3249462
J.H. Moon, H. Lee, W. Shin, Y.H. Kim, E. Choi, Multi-modal understanding and generation for medical images and text via vision-language pre-training. IEEE Journal of Biomedical and Health Informatics, 26(12), (2022) 6070-6080. https://doi.org/10.1109/JBHI.2022.3207502
Z. Wang, H. Han, L. Wang, X. Li, L. Zhou, Automated radiographic report generation purely on transformer: A multicriteria supervised approach. IEEE Transactions on Medical Imaging, 41(10), (2022) 2803-13. https://doi.org/10.1109/TMI.2022.3171661
Y. Zhang, X. Wang, Z. Xu, Q. Yu, A. Yuille, D. Xu, When radiology report generation meets knowledge graph. InProceedings of the AAAI conference on artificial intelligence, 34(7), (2020) 12910-12917. https://doi.org/10.1609/aaai.v34i07.6989
Y. Peng, Y. Tang, S. Lee, Y. Zhu, R.M. Summers, Z. Lu, COVID 19 CT CXR: a freely accessible and weakly labelled chest X ray and CT image collection on COVID 19 from biomedical literature. IEEE Transactions on Big Data, 7(1), (2020) 3 12. https://doi.org/10.1109/TBDATA.2020.3035935
D. Singh, M. Kaur, J.M. Alanazi, A.A. AlZubi, H.N. Lee, Efficient Evolving Deep Ensemble Medical Image Captioning Network. IEEE Journal of Biomedical and Health Informatics, 27(2), (2023) 1016–25. https://doi.org/10.1109/JBHI.2022.3223181
D. Hou, Z. Zhao, Y. Liu, F. Chang, S. Hu, Automatic report generation for chest X ray images via adversarial reinforcement learning. IEEE Access, 9, (2021) 21236–21250. https://doi.org/10.1109/ACCESS.2021.3056175
F. Wang, X. Liang, L. Xu, L. Lin, Unifying relational sentence generation and retrieval for medical image report composition. IEEE Transactions on Cybermetrics, 52(6), (2020) 5015–5025. https://doi.org/10.1109/TCYB.2020.3026098
M.M. Mohsan, M.U. Akram, G. Rasool, N.S. Alghamdi, M.A.A. Baqai, M. Abbas, Vision Transformer and Language Model Based Radiology Report Generation. IEEE Access, 11, (2023) 1814–1824. https://doi.org/10.1109/ACCESS.2022.32327
H. Park, K. Kim, S. Park, J. Choi, Medical image captioning model to convey more details: Methodological comparison of feature difference generation. IEEE Access, 9, (2021) 150560–150568.
W. Wang, R. Wang, X. Chen, (2021) Topic scene graph generation by attention distillation from caption. In Proceedings of the IEEE/CVF international conference on computer vision, IEEE, Montreal, QC, Canada, 15900–15910. https://doi.org/10.1109/ICCV48922.2021.01560
P. Qi, Z. Huang, Y. Sun, H. Luo, (2022) A Knowledge Graph Based Abstractive Model Integrating Semantic and Structural Information for Summarizing Chinese Meetings. In Proceedings IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD), IEEE, Hangzhou, China, 746–751. https://doi.org/10.1109/CSCWD54268.2022.9776298
J. Guo, Y. Wang, (2021) Summarizing RDF graphs using Node Importance and Query History. In Proceedings IEEE 2021 International Conference on Service Science (ICSS),IEEE, Xi'an, China, https://doi.org/10.1109/ICSS53362.2021.0001
M. Aamir, A.U. Jan, N. Mukhtar, M.A. Khan, Z. Ali, W.A. Abro, Y. Guan, An unsupervised graph-based hybrid approach for opinion summarization. In Proceedings 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), IEEE, Chengdu, China, 83–88. https://doi.org/10.1109/ICCWAMTIP53232.2021.9674086
H. Zhan, K. Zhang, C. Hu, V.S. Sheng, (2021) Gated Graph Neural Networks (GG NNs) for Abstractive Multi Comment Summarization. In Proceeding IEEE Int Conf Big Knowledge (ICBK), IEEE, Auckland, New Zealand, 323–330. https://doi.org/10.1109/ICKG52313.2021.00050
U. Barman, V. Barman, M. Rahman, N.K. Choudhury, Graph based extractive news articles summarization approach leveraging static word embeddings. In Proceedings 2021 International Conference on Computational Performance Evaluation (ComPE), IEEE, Shillong, India, 8–11. https://doi.org/10.1109/ComPE53109.2021.9752056
R. Jalota, D. Vollmers, D. Moussallem, A.C.N. Ngomo. (2021) LAUREN – Knowledge Graph Summarization for Question Answering. In Proceeding IEEE 15th International Conference on Semantic Computing (ICSC),IEEE, Laguna Hills, CA, USA, 221–226. https://doi.org/10.1109/ICSC50631.2021.00047
E. Yang, F. Hao, J. Gao, Y. Wu, G. Min, (2020) Entity spatio temporal evolution summarization in knowledge graphs. In 2020 IEEE International Conference on Knowledge Graph (ICKG), IEEE, Nanjing, China, 181–187. https://doi.org/10.1109/ICBK50248.2020.00035
T. Yao, Y. Pan, Y. Li, T. Mei, (2019) Hierarchy Parsing for Image Captioning. In Proceeding IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Seoul, Korea (South), 2621–2629. https://doi.org/10.1109/ICCV.2019.00271
A. Jangra, S. Mukherjee, A. Jatowt, S. Saha, M. Hasanuzzaman, A survey on multi modal summarization. ACM Computing Surveys, 55(13s), (2023) 1-36. https://doi.org/10.1145/3584700
S.K. Uppada, P. Patel, B. Sivaselvan, An image and text based multimodal model for detecting fake news in OSN’s. Journal of Intelligent Information Systems, 61(2), (2023) 367–393. https://doi.org/10.1007/s10844-022-00764-y
B. He, J. Wang, J. Qiu, T. Bui, A. Shrivastava, Z. Wang, (2023) Align and Attend: Multimodal Summarization with Dual Contrastive Losses. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE, Vancouver, BC, Canada. https://doi.org/10.1109/CVPR52729.2023.01428
M. Xiao, J. Zhu, H. Lin, Y. Zhou, C. Zong, (2023) CFSum: A Coarse-to-Fine Contribution Network for Multimodal Summarization. arXiv preprint arXiv:2307.02716. https://doi.org/10.48550/arXiv.2307.02716
T. Gigant, F. Dufaux, C. Guinaudeau, M. Décombas, (2023) TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records. In Proceedings of the 20th International Conference on Content-based Multimedia Indexing, 61-70. https://doi.org/10.1145/3617233.3617238
J. Li, X. Wang, Y. Zhu, Y. Zhang, J. Tang, Elastic deep multi-view autoencoder with .0 diversity embedding. Neurocomputing, 2022. 512 41. https://doi.org/10.1016/j.neucom.2022.09.001
D. Jha, S. Saha, N. Dey, Automatic colorectal cancer detection using machine learning and deep learning based on feature selection in histopathological images. Applied Soft Computing, 112, (2021) 107813. https://doi.org/10.1016/j.asoc.2021.107813
Z. Wang, Y. Liu, X. Hu, Image captioning by diffusion models: a survey. Information Fusion, 93, (2023) 130–145. https://doi.org/10.1016/j.inffus.2023.04.002

Downloads

PDF

Article Details

Volume 7, Issue 5, Year 2025

DOI: 10.54392/irjmt25510

Published 2025-09-28

How to Cite

Sahitya, Aarti, and Shilpa Shinde. 2025. “Improving Medical Image Captioning With a Context-Aware Knowledge Graph Transformer Framework”. International Research Journal of Multidisciplinary Technovation 7 (5):150-68. https://doi.org/10.54392/irjmt25510.