A Multimodal RAG and ArXiv-Integrated Conversational Framework for Automated Research Discovery

Ashwini Dalvi; Suswar Sawant; Amaan Syed; Irfan Siddavatam; Venkataramanan V

doi:10.54392/irjmt2626

Articles

Home / Archives / Volume 8, Issue 2, Year 2026 /

DOI: 10.54392/irjmt2626

A Multimodal RAG and ArXiv-Integrated Conversational Framework for Automated Research Discovery

Ashwini Dalvi⁺⁻
Suswar Sawant⁺⁻
Amaan Syed⁺⁻
Irfan Siddavatam⁺⁻
Venkataramanan V⁺⁻

Department of Information Technology, KJ, Somaiya School of Engineering (formerly KJ Somaiya College of Engineering), Somaiya Vidyavihar University, Mumbai, 400077, Maharashtra, India

Dimensions

Plum Analytics

Abstract

Knowledge-rich fields employ multimodal documents, which require advanced analysis systems to assess document content and enable research exploration. The research introduces a unified system which combines Multimodal Retrieval-Augmented Generation (RAG) technology with automated arXiv research discovery functionality. The system operates through a user-friendly application that runs on the Streamlit web platform. The system demonstrates robust performance, achieving an average faithfulness of 0.86, with 0.81 answer relevancy on a diverse Portable Document Format (PDF) dataset. The system employs a hierarchical retrieval architecture which enhances contextual content retention capability through its complete document ingestion process that handles 15 pages within 12 seconds, which outperforms MultiModal-GPT and other evaluated systems by 33 percent. The automated arXiv Integration System also offers paper recommendations with a relevance confidence percentage of 80% at a cost of under 3 seconds. The research presents a scalable high-performance solution through its streaming conversational interface which uses LangChain Expression Language (LCEL) to demonstrate a real-world application of Conversational AI that emphasizes fast system responsiveness and user-friendly multimodal document analysis.

Keywords

ArXiv Integration, Conversational AI, Hierarchical Retrieval, Multimodal RAG, RAGAS Evaluation, Research Discovery,

Downloads

Download data is not yet available.

References

B. Tural, Z. Orpek, Z. Destan, (2024) Retrieval-Augmented Generation (RAG) and LLM Integration. 8th International Symposium on Innovative Approaches in Smart Technologies (ISAS), IEEE, Turkiye. https://doi.org/10.1109/ISAS64331.2024.10845308
P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, (2020) 9459-9474.
P. Ersoy, M. Erşahın, A Comparative Evaluation of RAG Architectures for Cross-Domain LLM Applications: Design, Implementation, and Assessment. IEEE Access, 13, (2025) 194185-194196. https://doi.org/10.1109/ACCESS.2025.3632404
G. Chen, W. Yu, X. Lu, X. Zhang, E. Meng, L. Sha, Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation. IEEE Transactions on Audio, Speech and Language Processing, 33, (2025) 4430-4439. https://doi.org/10.1109/TASLPRO.2025.3622944
H. Wang, L. Liu, H. Zhang, L. Zhu, X. Chang, H. Du, VisualRAG: Knowledge-Guided Retrieval Augmentation for Image-Text Matching. In IEEE Transactions on Circuits and Systems for Video Technology, 36(1), (2026) 1234-1248. https://doi.org/10.1109/TCSVT.2025.3597097
S. Wang, H. Yang, W. Liu, Research on the construction and application of retrieval enhanced generation (RAG) model based on knowledge graph. Scientific reports, 15, (2025) 40425. https://doi.org/10.1038/s41598-025-21222-z
A. Patel, R. Shivani, N.V. Usha, A. Shruthiba, (2025) Enhancing Interactive Querying with a Multimodal RAG System: Integrating Text, Video, and Document Analysis via LLaMA3. International Conference on Emerging Technologies in Computing and Communication (ETCC), IEEE, India. https://doi.org/10.1109/ETCC65847.2025.11108584
H. Elkiran, J. Rasheed, EvaRAG: Evaluating Advanced RAG Techniques with Indexing and Distance Metrics. IEEE Access, 13, (2025) 215724-215747. https://doi.org/10.1109/ACCESS.2025.3646665
T.J. Bradshaw, X. Tie, J. Warner, J. Hu, Q. Li, X. Li, Large Language Models and Large Multimodal Models in Medical Imaging: A Primer for Physicians. Journal of Nuclear Medicine, 66(2), (2025) 173–182. https://doi.org/10.2967/jnumed.124.268072
D. Vake, J. Vicic, A. Tosic, Bridging the Question–Answer Gap in Retrieval-Augmented Generation: Hypothetical Prompt Embeddings. IEEE Access, 13, (2025) 129952-129961. https://doi.org/10.2967/jnumed.124.268072
Z. Li, Z. Wang, W. Wang, K. Hung, H. Xie, F.L. Wang, Retrieval-augmented generation for educational application: A systematic survey. Computers and Education: Artificial Intelligence, 8, (2025) 100417. https://doi.org/10.1016/j.caeai.2025.100417
G. Zhang, Z. Xu, Q. Jin, F. Chen, Y. Fang, Y. Liu, J.F. Rousseau, Z. Xu, Z. Lu, C. Weng, Leveraging long context in retrieval augmented language models for medical question answering. npj Digital Medicine, 8,(2025) 239. https://doi.org/10.1038/s41746-025-01651-w
H. Zhang, C. Xu, Y.F. Zhang, Z. Zhang, L. Wang, J. Bian, TimeRAF: Retrieval-Augmented Foundation Model for Zero-Shot Time Series Forecasting. IEEE Transactions on Knowledge and Data Engineering, 37(9), (2025) 5654-5665. https://doi.org/10.1109/TKDE.2025.3579137
C.N. Hang, P.D. Yu, C. W. Tan, TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking. IEEE Transactions on Artificial Intelligence, 6(11), (2025) 3148-3162. https://doi.org/10.1109/TAI.2025.3567369
B. Saha, U. Saha, M. Zubair Malik, QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance. IEEE Access, 12, (2024) 185401-185410. https://doi.org/10.1109/ACCESS.2024.3513155
M. Kyoung, J.H. Lim, Y. Kim, Reasoning Beyond Length Limits: Improving Accuracy in Long-Context Question Answering With Small-Scale Language Models. IEEE Access, 13, (2025) 172930-172937. https://doi.org/10.1109/ACCESS.2025.3617449
H. Wang, Y. Lepage, Extraction-Augmented Generation of Scientific Abstracts Using Knowledge Graphs. IEEE Access, 13, (2025) 48775-48791. https://doi.org/10.1109/ACCESS.2025.3551756
J. Johnson, M. Douze, H. Jégou, Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7, (2021) 535-547. https://doi.org/10.1109/TBDATA.2019.2921572
T. Yu, B. Wu, K. Chen, C. Yan, G. Zhang, W. Liu, HDANNS: In-Memory Hyperdimensional Computing for Billion-Scale Approximate Nearest Neighbour Search Acceleration. IEEE Transactions on Circuits and Systems for Artificial Intelligence, 2(2), (2025) 126-138. https://doi.org/10.1109/TCASAI.2025.3540957
E.A. Olca, Professor X: Diagnosis and Treatment of Dermatological Diseases by Integration of Visual Diagnosis and Retrieval-Augmented Generation (RAG) Technologies. IEEE Access, 13, (2025) 201246-201263. https://doi.org/10.1109/ACCESS.2025.3636437
B. Praneeth, Mohana, E.C. Nattem, K. Jetti, B.K. Kavyashree, D.Rakshitha, Optimization of Customer Feedback Summarization Using Large Language Models (LLM) and Advanced Retrieval-Augmented Generation. IEEE Access, 13, (2025) 124319-124332. https://doi.org/10.1109/ACCESS.2025.3588337
E. Bazzi Mohamed Salim, T. Anass, A. Ider Abdelouahed, Advancing Multilingual Retrieval-Augmented Generation for Reliable Medication Counseling. IEEE Access, 13, (2025) 215550-215564. https://doi.org/10.1109/ACCESS.2025.3646941
M. Zakir Khan, Y. Ge, M. Mollel, J. Mccann, Q.H. Abbasi, M. Imran, RFSensingGPT: A Multi-Modal RAG-Enhanced Framework for Integrated Sensing and Communications Intelligence in 6G Networks. IEEE Transactions on Cognitive Communications and Networking, 12, (2026) 298-311. https://doi.org/10.1109/TCCN.2025.3558069
H. Hao, J. Han, C. Li, Y.F. Li, X. Yue, RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, USA. https://doi.org/10.1109/CVPR52734.2025.01355
J. Kim, E. Cho, S. Kim, H. J. Kim, (2024) Retrieval-augmented open-vocabulary object detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, USA. https://doi.org/10.1109/CVPR52733.2024.01650
M.H. Heydari, A. Hemmat, E. Naman, A. Fatemi, (2024) Context awareness gate for retrieval augmented generation. 15th International Conference on Information and Knowledge Technology (IKT), IEEE, Iran. https://doi.org/10.1109/IKT65497.2024.10892659
C. Han Chen, M. Fang Shiu, KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation. IEEE Access, 13, (2025) 197234-197244. https://doi.org/10.1109/ACCESS.2025.3633838
X. Zeng, H. Lin, Y. Ye, W. Zeng, Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning. IEEE Transactions on Visualization and Computer Graphics, 31(1), (2025) 525-535. https://doi.org/10.1109/TVCG.2024.3456159
E. Collini, F. Indra Kurniadi, P. Nesi, G. Pantaleo, Context-Aware Retrieval Augmented Generation Using Similarity Validation to Handle Context Inconsistencies in Large Language Models. IEEE Access, 13, (2025) 170065-170080. https://doi.org/10.1109/ACCESS.2025.3614553
M. Ding, Y. Ma, P. Qin, J. Wu, Y. Li, L. Nie, RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training. IEEE Transactions on Multimedia, 27, (2025) 7522 – 7532. https://doi.org/10.1109/TMM.2025.3599070
F. Sammani, T. Mukherjee, N. Deligiannis, (2022) Nlx-gpt: A model for natural language explanations in vision and vision-language tasks. in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE, USA. https://doi.org/10.1109/CVPR52688.2022.00814

Downloads

PDF

Article Details

Volume 8, Issue 2, Year 2026

DOI: 10.54392/irjmt2626

Published 2026-03-11

How to Cite

Ashwini Dalvi, Suswar Sawant, Amaan Syed, Irfan Siddavatam, and Venkataramanan V. 2026. “A Multimodal RAG and ArXiv-Integrated Conversational Framework for Automated Research Discovery”. International Research Journal of Multidisciplinary Technovation 8 (2):109-20. https://doi.org/10.54392/irjmt2626.