Abstract
Knowledge-rich fields employ multimodal documents, which require advanced analysis systems to assess document content and enable research exploration. The research introduces a unified system which combines Multimodal Retrieval-Augmented Generation (RAG) technology with automated arXiv research discovery functionality. The system operates through a user-friendly application that runs on the Streamlit web platform. The system demonstrates robust performance, achieving an average faithfulness of 0.86, with 0.81 answer relevancy on a diverse Portable Document Format (PDF) dataset. The system employs a hierarchical retrieval architecture which enhances contextual content retention capability through its complete document ingestion process that handles 15 pages within 12 seconds, which outperforms MultiModal-GPT and other evaluated systems by 33 percent. The automated arXiv Integration System also offers paper recommendations with a relevance confidence percentage of 80% at a cost of under 3 seconds. The research presents a scalable high-performance solution through its streaming conversational interface which uses LangChain Expression Language (LCEL) to demonstrate a real-world application of Conversational AI that emphasizes fast system responsiveness and user-friendly multimodal document analysis.
Keywords
ArXiv Integration, Conversational AI, Hierarchical Retrieval, Multimodal RAG, RAGAS Evaluation, Research Discovery,Downloads
References
- B. Tural, Z. Orpek, Z. Destan, (2024) Retrieval-Augmented Generation (RAG) and LLM Integration. 8th International Symposium on Innovative Approaches in Smart Technologies (ISAS), IEEE, Turkiye. https://doi.org/10.1109/ISAS64331.2024.10845308
- P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, (2020) 9459-9474.
- P. Ersoy, M. Erşahın, A Comparative Evaluation of RAG Architectures for Cross-Domain LLM Applications: Design, Implementation, and Assessment. IEEE Access, 13, (2025) 194185-194196. https://doi.org/10.1109/ACCESS.2025.3632404
- G. Chen, W. Yu, X. Lu, X. Zhang, E. Meng, L. Sha, Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation. IEEE Transactions on Audio, Speech and Language Processing, 33, (2025) 4430-4439. https://doi.org/10.1109/TASLPRO.2025.3622944
- H. Wang, L. Liu, H. Zhang, L. Zhu, X. Chang, H. Du, VisualRAG: Knowledge-Guided Retrieval Augmentation for Image-Text Matching. In IEEE Transactions on Circuits and Systems for Video Technology, 36(1), (2026) 1234-1248. https://doi.org/10.1109/TCSVT.2025.3597097
- S. Wang, H. Yang, W. Liu, Research on the construction and application of retrieval enhanced generation (RAG) model based on knowledge graph. Scientific reports, 15, (2025) 40425. https://doi.org/10.1038/s41598-025-21222-z
- A. Patel, R. Shivani, N.V. Usha, A. Shruthiba, (2025) Enhancing Interactive Querying with a Multimodal RAG System: Integrating Text, Video, and Document Analysis via LLaMA3. International Conference on Emerging Technologies in Computing and Communication (ETCC), IEEE, India. https://doi.org/10.1109/ETCC65847.2025.11108584
- H. Elkiran, J. Rasheed, EvaRAG: Evaluating Advanced RAG Techniques with Indexing and Distance Metrics. IEEE Access, 13, (2025) 215724-215747. https://doi.org/10.1109/ACCESS.2025.3646665
- T.J. Bradshaw, X. Tie, J. Warner, J. Hu, Q. Li, X. Li, Large Language Models and Large Multimodal Models in Medical Imaging: A Primer for Physicians. Journal of Nuclear Medicine, 66(2), (2025) 173–182. https://doi.org/10.2967/jnumed.124.268072
- D. Vake, J. Vicic, A. Tosic, Bridging the Question–Answer Gap in Retrieval-Augmented Generation: Hypothetical Prompt Embeddings. IEEE Access, 13, (2025) 129952-129961. https://doi.org/10.2967/jnumed.124.268072
- Z. Li, Z. Wang, W. Wang, K. Hung, H. Xie, F.L. Wang, Retrieval-augmented generation for educational application: A systematic survey. Computers and Education: Artificial Intelligence, 8, (2025) 100417. https://doi.org/10.1016/j.caeai.2025.100417
- G. Zhang, Z. Xu, Q. Jin, F. Chen, Y. Fang, Y. Liu, J.F. Rousseau, Z. Xu, Z. Lu, C. Weng, Leveraging long context in retrieval augmented language models for medical question answering. npj Digital Medicine, 8,(2025) 239. https://doi.org/10.1038/s41746-025-01651-w
- H. Zhang, C. Xu, Y.F. Zhang, Z. Zhang, L. Wang, J. Bian, TimeRAF: Retrieval-Augmented Foundation Model for Zero-Shot Time Series Forecasting. IEEE Transactions on Knowledge and Data Engineering, 37(9), (2025) 5654-5665. https://doi.org/10.1109/TKDE.2025.3579137
- C.N. Hang, P.D. Yu, C. W. Tan, TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking. IEEE Transactions on Artificial Intelligence, 6(11), (2025) 3148-3162. https://doi.org/10.1109/TAI.2025.3567369
- B. Saha, U. Saha, M. Zubair Malik, QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance. IEEE Access, 12, (2024) 185401-185410. https://doi.org/10.1109/ACCESS.2024.3513155
- M. Kyoung, J.H. Lim, Y. Kim, Reasoning Beyond Length Limits: Improving Accuracy in Long-Context Question Answering With Small-Scale Language Models. IEEE Access, 13, (2025) 172930-172937. https://doi.org/10.1109/ACCESS.2025.3617449
- H. Wang, Y. Lepage, Extraction-Augmented Generation of Scientific Abstracts Using Knowledge Graphs. IEEE Access, 13, (2025) 48775-48791. https://doi.org/10.1109/ACCESS.2025.3551756
- J. Johnson, M. Douze, H. Jégou, Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7, (2021) 535-547. https://doi.org/10.1109/TBDATA.2019.2921572
- T. Yu, B. Wu, K. Chen, C. Yan, G. Zhang, W. Liu, HDANNS: In-Memory Hyperdimensional Computing for Billion-Scale Approximate Nearest Neighbour Search Acceleration. IEEE Transactions on Circuits and Systems for Artificial Intelligence, 2(2), (2025) 126-138. https://doi.org/10.1109/TCASAI.2025.3540957
- E.A. Olca, Professor X: Diagnosis and Treatment of Dermatological Diseases by Integration of Visual Diagnosis and Retrieval-Augmented Generation (RAG) Technologies. IEEE Access, 13, (2025) 201246-201263. https://doi.org/10.1109/ACCESS.2025.3636437
- B. Praneeth, Mohana, E.C. Nattem, K. Jetti, B.K. Kavyashree, D.Rakshitha, Optimization of Customer Feedback Summarization Using Large Language Models (LLM) and Advanced Retrieval-Augmented Generation. IEEE Access, 13, (2025) 124319-124332. https://doi.org/10.1109/ACCESS.2025.3588337
- E. Bazzi Mohamed Salim, T. Anass, A. Ider Abdelouahed, Advancing Multilingual Retrieval-Augmented Generation for Reliable Medication Counseling. IEEE Access, 13, (2025) 215550-215564. https://doi.org/10.1109/ACCESS.2025.3646941
- M. Zakir Khan, Y. Ge, M. Mollel, J. Mccann, Q.H. Abbasi, M. Imran, RFSensingGPT: A Multi-Modal RAG-Enhanced Framework for Integrated Sensing and Communications Intelligence in 6G Networks. IEEE Transactions on Cognitive Communications and Networking, 12, (2026) 298-311. https://doi.org/10.1109/TCCN.2025.3558069
- H. Hao, J. Han, C. Li, Y.F. Li, X. Yue, RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, USA. https://doi.org/10.1109/CVPR52734.2025.01355
- J. Kim, E. Cho, S. Kim, H. J. Kim, (2024) Retrieval-augmented open-vocabulary object detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, USA. https://doi.org/10.1109/CVPR52733.2024.01650
- M.H. Heydari, A. Hemmat, E. Naman, A. Fatemi, (2024) Context awareness gate for retrieval augmented generation. 15th International Conference on Information and Knowledge Technology (IKT), IEEE, Iran. https://doi.org/10.1109/IKT65497.2024.10892659
- C. Han Chen, M. Fang Shiu, KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation. IEEE Access, 13, (2025) 197234-197244. https://doi.org/10.1109/ACCESS.2025.3633838
- X. Zeng, H. Lin, Y. Ye, W. Zeng, Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning. IEEE Transactions on Visualization and Computer Graphics, 31(1), (2025) 525-535. https://doi.org/10.1109/TVCG.2024.3456159
- E. Collini, F. Indra Kurniadi, P. Nesi, G. Pantaleo, Context-Aware Retrieval Augmented Generation Using Similarity Validation to Handle Context Inconsistencies in Large Language Models. IEEE Access, 13, (2025) 170065-170080. https://doi.org/10.1109/ACCESS.2025.3614553
- M. Ding, Y. Ma, P. Qin, J. Wu, Y. Li, L. Nie, RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training. IEEE Transactions on Multimedia, 27, (2025) 7522 – 7532. https://doi.org/10.1109/TMM.2025.3599070
- F. Sammani, T. Mukherjee, N. Deligiannis, (2022) Nlx-gpt: A model for natural language explanations in vision and vision-language tasks. in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE, USA. https://doi.org/10.1109/CVPR52688.2022.00814
Articles

