Abstract
The accurate segmentation of gliomas and subregions of multimodal magnetic resonance imaging (MRI) plays a significant role in the accurate diagnosis of the disease and formulation of therapy as well as disease surveillance. However, the non-uniformity of tumours where boundaries vary and modality variations are specific to the intensity subjects a person to a huge challenge. The work will present SLAM-FusionNet a transformer-based architecture, a Multi-Modal Fusion (MMF) strategy and a Spatial Local Attention Module (SLAM) in order to address effectively the local and global contextual information of tumor fine-grained information. MMF module can be incorporated to increase cross-modality representation learning, complementary characteristics of T1, T2, FLAIR and T1ce images are merged; to increase localization in a spatial dimension, SLAM increases the significance of the spatially-relevant and boundary-sensitive regions that can better differentiate intra-tumor subregions. The suggested network is premised on a Swin Transformer backbone, and the primary strength of long-range dependency description and local spatial fidelity. Considerable testing has been carried out on the BraTS dataset that demonstrates that SLAM-FusionNet is functional with 95.6% whole tumor (WT) 96.2% tumor core (TC) and 94.8% enhancing tumor (ET) Dice scores and an average Dice of 95.5. The average HD95 is also increased to 3.95 mm and improved compared to state of the art models such as Swin-UNet and nnU-Net. Additive value of MMF and SLAM is confirmed by formal studies of ablation. The results highlight the applicability and clinical power of SLAM-FusionNet in computer-aided brain tumor segmentation in precision neuro-oncology.
Keywords
Brain Tumor Segmentation, Glioma, Swin Transformer, Spatial Local Attention, Multi-Modal MRI, Deep Learning,Downloads
References
- K. Kamnitsas, C. Ledig, V.F. Newcombe, J.P. Simpson, A.D. Kane, D.K. Menon, D. Rueckert, B. Glocker, Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical Image Analysis, 36, (2017) 61–78. https://doi.org/10.1016/j.media.2016.10.004
- F. Isensee, P.F. Jaeger, S.A.A. Kohl, J. Petersen, K.H. Maier-Hein, nnU-Net: A self-configuring method for deep learning–based biomedical image segmentation. Nature Methods, 18(2), (2021) 203–211. https://doi.org/10.1038/s41592-020-01008-z
- Z. Zhou, M.M.R. Siddiquee, N. Tajbakhsh, J. Liang (2018). UNet++: A nested U-Net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer, Cham. https://doi.org/10.1007/978-3-030-00889-5_1
- F. Milletari, N. Navab, S.A. Ahmadi (2016). V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), IEEE, USA. https://doi.org/10.1109/3DV.2016.79
- Z. Xiong, ResSAXU-Net for multimodal brain tumor segmentation from brain MRI. Scientific Reports, 15, (2025) 24179. https://doi.org/10.1038/s41598-025-09539-1
- A. Myronenko (2019). 3D MRI brain tumor segmentation using autoencoder regularization. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2018, Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-030-11726-9_28
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit N. Houlsby (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Canada. https://doi.org/10.1109/ICCV48922.2021.00986
- J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A.L. Yuille, Y. Zhou (2021). TransUNet: Transformers make strong encoders for medical image segmentation. arXiv. https://doi.org/10.48550/arXiv.2102.04306
- A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman, H.R. Roth, D. Xu, (2022). Unetr: Transformers for 3d medical image segmentation. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, USA. https://doi.org/10.1109/WACV51458.2022.00181
- H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, M. Wang (2021). Swin-Unet: Unet-like pure transformer for medical image segmentation. Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-031-25066-8_9
- C. Simionescu, Medformer: A multitask multimodal foundational model for medical imaging. Procedia Computer Science, 270, (2025) 446–455. https://doi.org/10.1016/j.procs.2025.09.163
- A. Chartsias, T. Joyce, R. Dharmakumar, S.A. Tsaftaris (2019). Factorised representation learning in multi-modal medical image analysis. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2019, Springer, Cham. https://doi.org/10.1007/978-3-030-32245-8_4
- G. Wang, W. Li, S. Ourselin, T. Vercauteren, (2017) Automatic brain tumor segmentation using cascaded anisotropic convolutional neural networks. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2017. Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-319-75238-9_16
- O. Ronneberger, P. Fischer, & T. Brox (2015) U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Springer, Cham. https://doi.org/10.1007/978-3-319-24574-4_28
- Ö. Çiçek, A. Abdulkadir, S.S. Lienkamp, T. Brox, O. Ronneberger (2016). 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016, Springer, Cham. https://doi.org/10.1007/978-3-319-46723-8_49
- J. Zhang, J. Zeng, P. Qin, L. Zhao, Brain tumor segmentation of multi-modality MR images via triple intersecting U-Nets. Neurocomputing, 426, (2021) 195–209. https://doi.org/10.1016/j.neucom.2020.09.016
- O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N.Y. Hammerla, B. Kainz, B. Glocker, D. Rueckert (2018). Attention U-Net: Learning where to look for the pancreas. arXiv. https://doi.org/10.48550/arXiv.1804.03999
- J. Schlemper, O. Oktay, M. Schaap, M. Heinrich, B. Kainz, B. Glocker, D. Rueckert, Attention gated networks: Learning to leverage salient regions in medical images. Medical image analysis, 53, (2019) 197-207. https://doi.org/10.1016/j.media.2019.01.012
- X. Wang, R. Girshick, A. Gupta, K. He (2018). Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, USA. https://doi.org/10.1109/CVPR.2018.00813
- J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, L. Lu (2019). Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, USA. https://doi.org/10.1109/CVPR.2019.00326
- S. Woo, J. Park, J.Y. Lee, I.S. Kweon (2018). CBAM: Convolutional block attention module. In European Conference on Computer Vision (ECCV), ECCV 2018. Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_1
- J. Hu, L. Shen, G. Sun (2018). Squeeze-and-excitation networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, USA. https://doi.org/10.1109/CVPR.2018.00745
- Roy, A. G., Navab, N., & Wachinger, C. (2018, September). Concurrent spatial and channel ‘squeeze & excitation’in fully convolutional networks. In International conference on medical image computing and computer-assisted intervention, Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-00928-1_48
- S.E. Bekhouche, G. Maroun, F. Dornaika, A. Hadid (2025). SegDT: A diffusion transformer-based segmentation model for medical imaging. arXiv. https://doi.org/10.48550/arXiv.2507.15595
- H. Kuang, Y. Wang, X. Tan, J. Yang, J. Sun, J. Liu, W. Qiu, J. Zhang, J. Zhang, C. Yang, J. Wang, Y. Chen, LW-CTrans: A lightweight hybrid network of CNN and transformer for 3D medical image segmentation. Medical Image Analysis, 102, (2025) 103545. https://doi.org/10.1016/j.media.2025.103545
- Y.H. Xie, B.S. Huang, F. Li, UnetTransCNN: Integrating transformers with convolutional neural networks for enhanced medical image segmentation. Frontiers in Oncology, 15, (2025) 1467672. https://doi.org/10.3389/fonc.2025.1467672
- J. Zhang, Z. Ye, M. Chen, J. Yu, Y. Cheng, TransGraphNet: A novel network for medical image segmentation based on transformer and graph convolution. Biomedical Signal Processing and Control, 104, (2025) 107510. https://doi.org/10.1016/j.bspc.2025.107510
- X. Liu, J. Tian, S. Huang, W. Shen (2025). Enhancing medical image segmentation via complementary CNN-transformer fusion and boundary perception. Frontiers in Computer Science, 7, 1677905. https://doi.org/10.3389/fcomp.2025.1677905
- L. Xu, A. Halike, G. Sen, M. Sha (2025). Medical image segmentation model based on local enhancement driven global optimization. Scientific Reports, 15, 18281. https://doi.org/10.1038/s41598-025-02393-1
Articles

