Abstract

Knowledge-rich fields employ multimodal documents, which require advanced analysis systems to assess document content and enable research exploration. The research introduces a unified system which combines Multimodal Retrieval-Augmented Generation (RAG) technology with automated arXiv research discovery functionality. The system operates through a user-friendly application that runs on the Streamlit web platform. The system demonstrates robust performance, achieving an average faithfulness of 0.86, with 0.81 answer relevancy on a diverse Portable Document Format (PDF) dataset. The system employs a hierarchical retrieval architecture which enhances contextual content retention capability through its complete document ingestion process that handles 15 pages within 12 seconds, which outperforms MultiModal-GPT and other evaluated systems by 33 percent. The automated arXiv Integration System also offers paper recommendations with a relevance confidence percentage of 80% at a cost of under 3 seconds. The research presents a scalable high-performance solution through its streaming conversational interface which uses LangChain Expression Language (LCEL) to demonstrate a real-world application of Conversational AI that emphasizes fast system responsiveness and user-friendly multimodal document analysis.

Keywords

ArXiv Integration, Conversational AI, Hierarchical Retrieval, Multimodal RAG, RAGAS Evaluation, Research Discovery,

Downloads

Download data is not yet available.

References

  1. B. Tural, Z. Orpek, Z. Destan, (2024) Retrieval-Augmented Generation (RAG) and LLM Integration. 8th International Symposium on Innovative Approaches in Smart Technologies (ISAS), IEEE, Turkiye. https://doi.org/10.1109/ISAS64331.2024.10845308
  2. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, (2020) 9459-9474.
  3. P. Ersoy, M. Erşahın, A Comparative Evaluation of RAG Architectures for Cross-Domain LLM Applications: Design, Implementation, and Assessment. IEEE Access, 13, (2025) 194185-194196. https://doi.org/10.1109/ACCESS.2025.3632404
  4. G. Chen, W. Yu, X. Lu, X. Zhang, E. Meng, L. Sha, Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation. IEEE Transactions on Audio, Speech and Language Processing, 33, (2025) 4430-4439. https://doi.org/10.1109/TASLPRO.2025.3622944
  5. H. Wang, L. Liu, H. Zhang, L. Zhu, X. Chang, H. Du, VisualRAG: Knowledge-Guided Retrieval Augmentation for Image-Text Matching. In IEEE Transactions on Circuits and Systems for Video Technology, 36(1), (2026) 1234-1248. https://doi.org/10.1109/TCSVT.2025.3597097
  6. S. Wang, H. Yang, W. Liu, Research on the construction and application of retrieval enhanced generation (RAG) model based on knowledge graph. Scientific reports, 15, (2025) 40425. https://doi.org/10.1038/s41598-025-21222-z
  7. A. Patel, R. Shivani, N.V. Usha, A. Shruthiba, (2025) Enhancing Interactive Querying with a Multimodal RAG System: Integrating Text, Video, and Document Analysis via LLaMA3. International Conference on Emerging Technologies in Computing and Communication (ETCC), IEEE, India. https://doi.org/10.1109/ETCC65847.2025.11108584
  8. H. Elkiran, J. Rasheed, EvaRAG: Evaluating Advanced RAG Techniques with Indexing and Distance Metrics. IEEE Access, 13, (2025) 215724-215747. https://doi.org/10.1109/ACCESS.2025.3646665
  9. T.J. Bradshaw, X. Tie, J. Warner, J. Hu, Q. Li, X. Li, Large Language Models and Large Multimodal Models in Medical Imaging: A Primer for Physicians. Journal of Nuclear Medicine, 66(2), (2025) 173–182. https://doi.org/10.2967/jnumed.124.268072
  10. D. Vake, J. Vicic, A. Tosic, Bridging the Question–Answer Gap in Retrieval-Augmented Generation: Hypothetical Prompt Embeddings. IEEE Access, 13, (2025) 129952-129961. https://doi.org/10.2967/jnumed.124.268072
  11. Z. Li, Z. Wang, W. Wang, K. Hung, H. Xie, F.L. Wang, Retrieval-augmented generation for educational application: A systematic survey. Computers and Education: Artificial Intelligence, 8, (2025) 100417. https://doi.org/10.1016/j.caeai.2025.100417
  12. G. Zhang, Z. Xu, Q. Jin, F. Chen, Y. Fang, Y. Liu, J.F. Rousseau, Z. Xu, Z. Lu, C. Weng, Leveraging long context in retrieval augmented language models for medical question answering. npj Digital Medicine, 8,(2025) 239. https://doi.org/10.1038/s41746-025-01651-w
  13. H. Zhang, C. Xu, Y.F. Zhang, Z. Zhang, L. Wang, J. Bian, TimeRAF: Retrieval-Augmented Foundation Model for Zero-Shot Time Series Forecasting. IEEE Transactions on Knowledge and Data Engineering, 37(9), (2025) 5654-5665. https://doi.org/10.1109/TKDE.2025.3579137
  14. C.N. Hang, P.D. Yu, C. W. Tan, TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking. IEEE Transactions on Artificial Intelligence, 6(11), (2025) 3148-3162. https://doi.org/10.1109/TAI.2025.3567369
  15. B. Saha, U. Saha, M. Zubair Malik, QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance. IEEE Access, 12, (2024) 185401-185410. https://doi.org/10.1109/ACCESS.2024.3513155
  16. M. Kyoung, J.H. Lim, Y. Kim, Reasoning Beyond Length Limits: Improving Accuracy in Long-Context Question Answering With Small-Scale Language Models. IEEE Access, 13, (2025) 172930-172937. https://doi.org/10.1109/ACCESS.2025.3617449
  17. H. Wang, Y. Lepage, Extraction-Augmented Generation of Scientific Abstracts Using Knowledge Graphs. IEEE Access, 13, (2025) 48775-48791. https://doi.org/10.1109/ACCESS.2025.3551756
  18. J. Johnson, M. Douze, H. Jégou, Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7, (2021) 535-547. https://doi.org/10.1109/TBDATA.2019.2921572
  19. T. Yu, B. Wu, K. Chen, C. Yan, G. Zhang, W. Liu, HDANNS: In-Memory Hyperdimensional Computing for Billion-Scale Approximate Nearest Neighbour Search Acceleration. IEEE Transactions on Circuits and Systems for Artificial Intelligence, 2(2), (2025) 126-138. https://doi.org/10.1109/TCASAI.2025.3540957
  20. E.A. Olca, Professor X: Diagnosis and Treatment of Dermatological Diseases by Integration of Visual Diagnosis and Retrieval-Augmented Generation (RAG) Technologies. IEEE Access, 13, (2025) 201246-201263. https://doi.org/10.1109/ACCESS.2025.3636437
  21. B. Praneeth, Mohana, E.C. Nattem, K. Jetti, B.K. Kavyashree, D.Rakshitha, Optimization of Customer Feedback Summarization Using Large Language Models (LLM) and Advanced Retrieval-Augmented Generation. IEEE Access, 13, (2025) 124319-124332. https://doi.org/10.1109/ACCESS.2025.3588337
  22. E. Bazzi Mohamed Salim, T. Anass, A. Ider Abdelouahed, Advancing Multilingual Retrieval-Augmented Generation for Reliable Medication Counseling. IEEE Access, 13, (2025) 215550-215564. https://doi.org/10.1109/ACCESS.2025.3646941
  23. M. Zakir Khan, Y. Ge, M. Mollel, J. Mccann, Q.H. Abbasi, M. Imran, RFSensingGPT: A Multi-Modal RAG-Enhanced Framework for Integrated Sensing and Communications Intelligence in 6G Networks. IEEE Transactions on Cognitive Communications and Networking, 12, (2026) 298-311. https://doi.org/10.1109/TCCN.2025.3558069
  24. H. Hao, J. Han, C. Li, Y.F. Li, X. Yue, RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, USA. https://doi.org/10.1109/CVPR52734.2025.01355
  25. J. Kim, E. Cho, S. Kim, H. J. Kim, (2024) Retrieval-augmented open-vocabulary object detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, USA. https://doi.org/10.1109/CVPR52733.2024.01650
  26. M.H. Heydari, A. Hemmat, E. Naman, A. Fatemi, (2024) Context awareness gate for retrieval augmented generation. 15th International Conference on Information and Knowledge Technology (IKT), IEEE, Iran. https://doi.org/10.1109/IKT65497.2024.10892659
  27. C. Han Chen, M. Fang Shiu, KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation. IEEE Access, 13, (2025) 197234-197244. https://doi.org/10.1109/ACCESS.2025.3633838
  28. X. Zeng, H. Lin, Y. Ye, W. Zeng, Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning. IEEE Transactions on Visualization and Computer Graphics, 31(1), (2025) 525-535. https://doi.org/10.1109/TVCG.2024.3456159
  29. E. Collini, F. Indra Kurniadi, P. Nesi, G. Pantaleo, Context-Aware Retrieval Augmented Generation Using Similarity Validation to Handle Context Inconsistencies in Large Language Models. IEEE Access, 13, (2025) 170065-170080. https://doi.org/10.1109/ACCESS.2025.3614553
  30. M. Ding, Y. Ma, P. Qin, J. Wu, Y. Li, L. Nie, RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training. IEEE Transactions on Multimedia, 27, (2025) 7522 – 7532. https://doi.org/10.1109/TMM.2025.3599070
  31. F. Sammani, T. Mukherjee, N. Deligiannis, (2022) Nlx-gpt: A model for natural language explanations in vision and vision-language tasks. in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE, USA. https://doi.org/10.1109/CVPR52688.2022.00814