Abstract

The accurate segmentation of gliomas and subregions of multimodal magnetic resonance imaging (MRI) plays a significant role in the accurate diagnosis of the disease and formulation of therapy as well as disease surveillance. However, the non-uniformity of tumours where boundaries vary and modality variations are specific to the intensity subjects a person to a huge challenge. The work will present SLAM-FusionNet a transformer-based architecture, a Multi-Modal Fusion (MMF) strategy and a Spatial Local Attention Module (SLAM) in order to address effectively the local and global contextual information of tumor fine-grained information. MMF module can be incorporated to increase cross-modality representation learning, complementary characteristics of T1, T2, FLAIR and T1ce images are merged; to increase localization in a spatial dimension, SLAM increases the significance of the spatially-relevant and boundary-sensitive regions that can better differentiate intra-tumor subregions. The suggested network is premised on a Swin Transformer backbone, and the primary strength of long-range dependency description and local spatial fidelity. Considerable testing has been carried out on the BraTS dataset that demonstrates that SLAM-FusionNet is functional with 95.6% whole tumor (WT) 96.2% tumor core (TC) and 94.8% enhancing tumor (ET) Dice scores and an average Dice of 95.5. The average HD95 is also increased to 3.95 mm and improved compared to state of the art models such as Swin-UNet and nnU-Net. Additive value of MMF and SLAM is confirmed by formal studies of ablation. The results highlight the applicability and clinical power of SLAM-FusionNet in computer-aided brain tumor segmentation in precision neuro-oncology.

Keywords

Brain Tumor Segmentation, Glioma, Swin Transformer, Spatial Local Attention, Multi-Modal MRI, Deep Learning,

Downloads

Download data is not yet available.

References

  1. K. Kamnitsas, C. Ledig, V.F. Newcombe, J.P. Simpson, A.D. Kane, D.K. Menon, D. Rueckert, B. Glocker, Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical Image Analysis, 36, (2017) 61–78. https://doi.org/10.1016/j.media.2016.10.004
  2. F. Isensee, P.F. Jaeger, S.A.A. Kohl, J. Petersen, K.H. Maier-Hein, nnU-Net: A self-configuring method for deep learning–based biomedical image segmentation. Nature Methods, 18(2), (2021) 203–211. https://doi.org/10.1038/s41592-020-01008-z
  3. Z. Zhou, M.M.R. Siddiquee, N. Tajbakhsh, J. Liang (2018). UNet++: A nested U-Net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer, Cham. https://doi.org/10.1007/978-3-030-00889-5_1
  4. F. Milletari, N. Navab, S.A. Ahmadi (2016). V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), IEEE, USA. https://doi.org/10.1109/3DV.2016.79
  5. Z. Xiong, ResSAXU-Net for multimodal brain tumor segmentation from brain MRI. Scientific Reports, 15, (2025) 24179. https://doi.org/10.1038/s41598-025-09539-1
  6. A. Myronenko (2019). 3D MRI brain tumor segmentation using autoencoder regularization. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2018, Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-030-11726-9_28
  7. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit N. Houlsby (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv.
  8. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Canada. https://doi.org/10.1109/ICCV48922.2021.00986
  9. J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A.L. Yuille, Y. Zhou (2021). TransUNet: Transformers make strong encoders for medical image segmentation. arXiv. https://doi.org/10.48550/arXiv.2102.04306
  10. A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman, H.R. Roth, D. Xu, (2022). Unetr: Transformers for 3d medical image segmentation. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, USA. https://doi.org/10.1109/WACV51458.2022.00181
  11. H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, M. Wang (2021). Swin-Unet: Unet-like pure transformer for medical image segmentation. Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-031-25066-8_9
  12. C. Simionescu, Medformer: A multitask multimodal foundational model for medical imaging. Procedia Computer Science, 270, (2025) 446–455. https://doi.org/10.1016/j.procs.2025.09.163
  13. A. Chartsias, T. Joyce, R. Dharmakumar, S.A. Tsaftaris (2019). Factorised representation learning in multi-modal medical image analysis. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2019, Springer, Cham. https://doi.org/10.1007/978-3-030-32245-8_4
  14. G. Wang, W. Li, S. Ourselin, T. Vercauteren, (2017) Automatic brain tumor segmentation using cascaded anisotropic convolutional neural networks. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2017. Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-319-75238-9_16
  15. O. Ronneberger, P. Fischer, & T. Brox (2015) U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Springer, Cham. https://doi.org/10.1007/978-3-319-24574-4_28
  16. Ö. Çiçek, A. Abdulkadir, S.S. Lienkamp, T. Brox, O. Ronneberger (2016). 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016, Springer, Cham. https://doi.org/10.1007/978-3-319-46723-8_49
  17. J. Zhang, J. Zeng, P. Qin, L. Zhao, Brain tumor segmentation of multi-modality MR images via triple intersecting U-Nets. Neurocomputing, 426, (2021) 195–209. https://doi.org/10.1016/j.neucom.2020.09.016
  18. O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N.Y. Hammerla, B. Kainz, B. Glocker, D. Rueckert (2018). Attention U-Net: Learning where to look for the pancreas. arXiv. https://doi.org/10.48550/arXiv.1804.03999
  19. J. Schlemper, O. Oktay, M. Schaap, M. Heinrich, B. Kainz, B. Glocker, D. Rueckert, Attention gated networks: Learning to leverage salient regions in medical images. Medical image analysis, 53, (2019) 197-207. https://doi.org/10.1016/j.media.2019.01.012
  20. X. Wang, R. Girshick, A. Gupta, K. He (2018). Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, USA. https://doi.org/10.1109/CVPR.2018.00813
  21. J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, L. Lu (2019). Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, USA. https://doi.org/10.1109/CVPR.2019.00326
  22. S. Woo, J. Park, J.Y. Lee, I.S. Kweon (2018). CBAM: Convolutional block attention module. In European Conference on Computer Vision (ECCV), ECCV 2018. Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_1
  23. J. Hu, L. Shen, G. Sun (2018). Squeeze-and-excitation networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, USA. https://doi.org/10.1109/CVPR.2018.00745
  24. Roy, A. G., Navab, N., & Wachinger, C. (2018, September). Concurrent spatial and channel ‘squeeze & excitation’in fully convolutional networks. In International conference on medical image computing and computer-assisted intervention, Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-00928-1_48
  25. S.E. Bekhouche, G. Maroun, F. Dornaika, A. Hadid (2025). SegDT: A diffusion transformer-based segmentation model for medical imaging. arXiv. https://doi.org/10.48550/arXiv.2507.15595
  26. H. Kuang, Y. Wang, X. Tan, J. Yang, J. Sun, J. Liu, W. Qiu, J. Zhang, J. Zhang, C. Yang, J. Wang, Y. Chen, LW-CTrans: A lightweight hybrid network of CNN and transformer for 3D medical image segmentation. Medical Image Analysis, 102, (2025) 103545. https://doi.org/10.1016/j.media.2025.103545
  27. Y.H. Xie, B.S. Huang, F. Li, UnetTransCNN: Integrating transformers with convolutional neural networks for enhanced medical image segmentation. Frontiers in Oncology, 15, (2025) 1467672. https://doi.org/10.3389/fonc.2025.1467672
  28. J. Zhang, Z. Ye, M. Chen, J. Yu, Y. Cheng, TransGraphNet: A novel network for medical image segmentation based on transformer and graph convolution. Biomedical Signal Processing and Control, 104, (2025) 107510. https://doi.org/10.1016/j.bspc.2025.107510
  29. X. Liu, J. Tian, S. Huang, W. Shen (2025). Enhancing medical image segmentation via complementary CNN-transformer fusion and boundary perception. Frontiers in Computer Science, 7, 1677905. https://doi.org/10.3389/fcomp.2025.1677905
  30. L. Xu, A. Halike, G. Sen, M. Sha (2025). Medical image segmentation model based on local enhancement driven global optimization. Scientific Reports, 15, 18281. https://doi.org/10.1038/s41598-025-02393-1