Benchmarking YOLO Variants on an Unseen Video: A Comparison of Inference Speed, GFLOPs, and Recall

Vibha T.G; Theodore Chandra S; Sivaramakrishnan S

doi:10.54392/irjmt26212

Articles

Home / Archives / Volume 8, Issue 2, Year 2026 /

DOI: 10.54392/irjmt26212

Benchmarking YOLO Variants on an Unseen Video: A Comparison of Inference Speed, GFLOPs, and Recall

Vibha T.G⁺⁻
Theodore Chandra S⁺⁻
Sivaramakrishnan S⁺⁻

Department of Electronics and Communication Engineering, Dayananda Sagar University, Bengaluru, Karnataka, India

School of Computer Science and Engineering, Presidency University, Bengaluru, Karnataka, India

Dimensions

Plum Analytics

Abstract

Object detection serves as a fundamental task in the field of computer vision and recent developments in YOLO family aim to enhance the real time detection performance. However, the generalization performance of recent YOLO models on unseen video datasets remains underexplored. Analysing the model performance on unseen datasets is essential for assessing robustness in real world deployments. This work provides a systematic comparison of recent YOLO versions and their variants. The study evaluates YOLOv9, YOLOv10 and YOLOv11 models on an unseen MOT20 video dataset for multi pedestrian detection. Pedestrian detection is important because it forms the basis for many computer vision tasks involving human interaction, crowd monitoring, behaviour analysis, and traffic management the models and their variants are evaluated using the metrics: recall, inference speed and GFLOPs. The experimental results indicates that the variant YOLOv9-m achieves highest recall of 43.9% among all evaluated models, while YOLOv11-n showed marginally lower recall value of 40.9%. However, YOLOv11-n exhibits significantly faster inference speed (9.6ms per image) compared to YOLOv9-m (26.3ms per image) and fewer computational resources-(6.5 Vs 131.3). In contrast YOLOv10 exhibits significantly lower recall (28%) despite its increased efficiency. These findings highlight the inherent trade-offs between accuracy-efficiency in recent YOLO architectures. The study offers the understanding of strengths and limitations of modern YOLO models, aiding in model selection for real time computer vision applications.

Keywords

Comparison, Flops, Inference speed, Recall, Unseen video, Yolo,

Downloads

Download data is not yet available.

References

D. Nimma, O. Al-Omari, R. Pradhan, Z. Ulmas, R.V.V. Krishna, Ts. Yousef A. Baker El-Ebiary, V.S. Rao, Object detection in real-time video surveillance using attention based transformer-YOLOv8 model. Alexandria Engineering Journal, 118, (2024) 482-495. https://doi.org/10.1016/j.aej.2025.01.032
C. Jiang, H. Ren, X. Ye, J. Zhu, H. Zeng, Y. Nan, M. Sun, X. Ren, H. Huo, Object detection from UAV thermal infrared images and videos using YOLO models. International Journal of Applied Earth Observation and Geoinformation, 112, (2021) 102912. https://doi.org/10.1016/j.jag.2022.102912
B. Ganga, B.T. Lata, K.R. Venugopal, Object detection and crowd analysis using deep learning techniques: Comprehensive review and future directions. Neurocomputing, 597, (2024) 127932. https://doi.org/10.1016/j.neucom.2024.127932
W. Chen, Y. Zhu, Z. Tian, F. Zhang, M. Yao, Occlusion and multi-scale pedestrian detection A review. Array, 19, (2023) 100318. https://doi.org/10.1016/j.array.2023.100318
J. Tang, H. Lai, G. Gao, T. Wang, FEL-Net: A lightweight network to enhance feature for multi-scale pedestrian detection. Journal of King Saud University-Computer and Information Sciences, 36(8), (2024) 102198. https://doi.org/10.1016/j.jksuci.2024.102198
Z.Q. Zhao, P. Zheng, S.T. Xu, X. Wu, Object Detection with Deep Learning: A Review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), (2019) 3212 – 3232. https://doi.org/10.1109/TNNLS.2018.2876865
L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, M. Pietikäinen, Deep learning for generic object detection: A survey. International journal of computer vision, 128(2), (2020) 261-318.
M. Hussain, R. Khanam, In-Depth Review of YOLOv1 to YOLOv10 Variants for Enhanced Photovoltaic Defect Detection. Solar, 4(3), (2024) 351-386. https://doi.org/10.3390/solar4030016
J. Terven, D.M. Córdova-Esparza, J.-A. Romero-González, A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Machine Learning and Knowledge Extraction, 5(4), (2023) 1680-1716. https://doi.org/10.3390/make5040083
C.Y. Wang, H.Y. M. Liao, YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems. APSIPA Transactions on Signal and Information Processing, 13(1), (2024) 1–38. https://doi.org/10.1561/116.20240058
M. Hussain, (2024) Yolov5, yolov8 and yolov10: The go-to detectors for real-time vision. arXiv preprint arXiv:2407.02988. https://doi.org/10.48550/arXiv.2407.02988
P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, L. Leal-Taixé, (2020) Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003. https://doi.org/10.48550/arXiv.2003.09003
B. Karbouja, A. Garabet, J. Topalian-Rivasa, J. Kruger, Comparative Performance Evaluation of One-Stage and Two-Stage Object Detectors for Screw Head Detection and Classification in Disassembly Processes. Procedia CIRP, 122, (2024) 527-532. https://doi.org/10.1016/j.procir.2024.01.077
J. Anandakrishnan, A.K. Sangaiah, H. Darmawan, N.K. Son, Y.B. Lin, M. J.F. Alenazi, Precise Spatial Prediction of Rice Seedlings From Large Scale Airborne Remote Sensing Data Using Optimized Li-YOLOv9. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 18, (2024) 2226 – 2238. https://doi.org/10.1109/JSTARS.2024.3505964
A. Sharma, V. Kumar, L. Longchamps, Comparative performance of YOLOv8, YOLOv9, YOLOv10, YOLOv11 and Faster R-CNN models for detection of multiple weed species. Smart Agricultural Technology, 9, (2024) 100648. https://doi.org/10.1016/j.atech.2024.100648
C.Y. Wang, I.H. Yeh, H.Y.M. Liao, (2024) YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. Computer Vision – ECCV2024, Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-031-72751-1_1
A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, G. Ding, YOLOv10: Real-time end-to-end object detection. Advances in Neural Information Processing Systems, 37, (2024) 107984-108011. https://doi.org/10.52202/079017-3429
R. Khanam, M. Hussain, (2024) YOLOv11: An overview of the key architectural enhancements. arXiv preprint arXiv:2410.17725, https://doi.org/10.48550/arXiv.2410.17725
J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection. IEEE Conference on Computer Vision Pattern Recognition (CVPR), (2016) 779–788. https://doi.org/10.1109/CVPR.2016.91
A. Bochkovskiy, C.Y. Wang, H.Y.M. Liao, (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934. https://doi.org/10.48550/arXiv.2004.10934
C.Y. Wang, A. Bochkovskiy, H.Y. M. Liao, YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. IEEE/CVF Conference Computer Vision Pattern Recognition (CVPR), IEEE, Canada. https://doi.org/10.1109/CVPR52729.2023.00721
V. Afifah, S. Erniwati, YOLOv8 for object detection: A comprehensive review of advances, techniques, and applications. International Journal of Advanced Computing and Informatics, 2(1), (2026) 53–61. https://doi.org/10.71129/ijaci.v2i1.pp53-61
R. Sapkota, Z. Meng, M. Churuvija, X. Du, Z. Ma, M. Karkee, (2024). Comprehensive performance evaluation of YOLO11, YOLOv10, YOLOv9, and YOLOv8 on detecting and counting fruitlet in complex orchard environments. Agriculture Communications. https://doi.org/10.32388/E9Y7XI
Z. Qi, H. Kongfa, W. Tianshu, Y. Tao, Lightweight and polarized self-attention mechanism for abnormal morphology classification algorithm during traditional Chinese medicine inspection. Digital Chinese Medicine, 7(3), (2024) 256-263. https://doi.org/10.1016/j.dcmed.2024.12.005

Downloads

PDF

Article Details

Volume 8, Issue 2, Year 2026

DOI: 10.54392/irjmt26212

Published 2026-03-28

How to Cite

T.G, Vibha, Theodore Chandra S, and Sivaramakrishnan S. 2026. “Benchmarking YOLO Variants on an Unseen Video: A Comparison of Inference Speed, GFLOPs, and Recall”. International Research Journal of Multidisciplinary Technovation 8 (2):203-11. https://doi.org/10.54392/irjmt26212.