Abstract

Object detection serves as a fundamental task in the field of computer vision and recent developments in YOLO family aim to enhance the real time detection performance. However, the generalization performance of recent YOLO models on unseen video datasets remains underexplored. Analysing the model performance on unseen datasets is essential for assessing robustness in real world deployments. This work provides a systematic comparison of recent YOLO versions and their variants. The study evaluates YOLOv9, YOLOv10 and YOLOv11 models on an unseen MOT20 video dataset for multi pedestrian detection. Pedestrian detection is important because it forms the basis for many computer vision tasks involving human interaction, crowd monitoring, behaviour analysis, and traffic management the models and their variants are evaluated using the metrics: recall, inference speed and GFLOPs. The experimental results indicates that the variant YOLOv9-m achieves highest recall of 43.9% among all evaluated models, while YOLOv11-n showed marginally lower recall value of 40.9%. However, YOLOv11-n exhibits significantly faster inference speed (9.6ms per image) compared to YOLOv9-m (26.3ms per image) and fewer computational resources-(6.5 Vs 131.3). In contrast YOLOv10 exhibits significantly lower recall (28%) despite its increased efficiency. These findings highlight the inherent trade-offs between accuracy-efficiency in recent YOLO architectures. The study offers the understanding of strengths and limitations of modern YOLO models, aiding in model selection for real time computer vision applications.

Keywords

Comparison, Flops, Inference speed, Recall, Unseen video, Yolo,

Downloads

Download data is not yet available.

References

  1. D. Nimma, O. Al-Omari, R. Pradhan, Z. Ulmas, R.V.V. Krishna, Ts. Yousef A. Baker El-Ebiary, V.S. Rao, Object detection in real-time video surveillance using attention based transformer-YOLOv8 model. Alexandria Engineering Journal, 118, (2024) 482-495. https://doi.org/10.1016/j.aej.2025.01.032
  2. C. Jiang, H. Ren, X. Ye, J. Zhu, H. Zeng, Y. Nan, M. Sun, X. Ren, H. Huo, Object detection from UAV thermal infrared images and videos using YOLO models. International Journal of Applied Earth Observation and Geoinformation, 112, (2021) 102912. https://doi.org/10.1016/j.jag.2022.102912
  3. B. Ganga, B.T. Lata, K.R. Venugopal, Object detection and crowd analysis using deep learning techniques: Comprehensive review and future directions. Neurocomputing, 597, (2024) 127932. https://doi.org/10.1016/j.neucom.2024.127932
  4. W. Chen, Y. Zhu, Z. Tian, F. Zhang, M. Yao, Occlusion and multi-scale pedestrian detection A review. Array, 19, (2023) 100318. https://doi.org/10.1016/j.array.2023.100318
  5. J. Tang, H. Lai, G. Gao, T. Wang, FEL-Net: A lightweight network to enhance feature for multi-scale pedestrian detection. Journal of King Saud University-Computer and Information Sciences, 36(8), (2024) 102198. https://doi.org/10.1016/j.jksuci.2024.102198
  6. Z.Q. Zhao, P. Zheng, S.T. Xu, X. Wu, Object Detection with Deep Learning: A Review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), (2019) 3212 – 3232. https://doi.org/10.1109/TNNLS.2018.2876865
  7. L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, M. Pietikäinen, Deep learning for generic object detection: A survey. International journal of computer vision, 128(2), (2020) 261-318.
  8. M. Hussain, R. Khanam, In-Depth Review of YOLOv1 to YOLOv10 Variants for Enhanced Photovoltaic Defect Detection. Solar, 4(3), (2024) 351-386. https://doi.org/10.3390/solar4030016
  9. J. Terven, D.M. Córdova-Esparza, J.-A. Romero-González, A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Machine Learning and Knowledge Extraction, 5(4), (2023) 1680-1716. https://doi.org/10.3390/make5040083
  10. C.Y. Wang, H.Y. M. Liao, YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems. APSIPA Transactions on Signal and Information Processing, 13(1), (2024) 1–38. https://doi.org/10.1561/116.20240058
  11. M. Hussain, (2024) Yolov5, yolov8 and yolov10: The go-to detectors for real-time vision. arXiv preprint arXiv:2407.02988. https://doi.org/10.48550/arXiv.2407.02988
  12. P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, L. Leal-Taixé, (2020) Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003. https://doi.org/10.48550/arXiv.2003.09003
  13. B. Karbouja, A. Garabet, J. Topalian-Rivasa, J. Kruger, Comparative Performance Evaluation of One-Stage and Two-Stage Object Detectors for Screw Head Detection and Classification in Disassembly Processes. Procedia CIRP, 122, (2024) 527-532. https://doi.org/10.1016/j.procir.2024.01.077
  14. J. Anandakrishnan, A.K. Sangaiah, H. Darmawan, N.K. Son, Y.B. Lin, M. J.F. Alenazi, Precise Spatial Prediction of Rice Seedlings From Large Scale Airborne Remote Sensing Data Using Optimized Li-YOLOv9. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 18, (2024) 2226 – 2238. https://doi.org/10.1109/JSTARS.2024.3505964
  15. A. Sharma, V. Kumar, L. Longchamps, Comparative performance of YOLOv8, YOLOv9, YOLOv10, YOLOv11 and Faster R-CNN models for detection of multiple weed species. Smart Agricultural Technology, 9, (2024) 100648. https://doi.org/10.1016/j.atech.2024.100648
  16. C.Y. Wang, I.H. Yeh, H.Y.M. Liao, (2024) YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. Computer Vision – ECCV2024, Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-031-72751-1_1
  17. A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, G. Ding, YOLOv10: Real-time end-to-end object detection. Advances in Neural Information Processing Systems, 37, (2024) 107984-108011. https://doi.org/10.52202/079017-3429
  18. R. Khanam, M. Hussain, (2024) YOLOv11: An overview of the key architectural enhancements. arXiv preprint arXiv:2410.17725, https://doi.org/10.48550/arXiv.2410.17725
  19. J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection. IEEE Conference on Computer Vision Pattern Recognition (CVPR), (2016) 779–788. https://doi.org/10.1109/CVPR.2016.91
  20. A. Bochkovskiy, C.Y. Wang, H.Y.M. Liao, (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934. https://doi.org/10.48550/arXiv.2004.10934
  21. C.Y. Wang, A. Bochkovskiy, H.Y. M. Liao, YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. IEEE/CVF Conference Computer Vision Pattern Recognition (CVPR), IEEE, Canada. https://doi.org/10.1109/CVPR52729.2023.00721
  22. V. Afifah, S. Erniwati, YOLOv8 for object detection: A comprehensive review of advances, techniques, and applications. International Journal of Advanced Computing and Informatics, 2(1), (2026) 53–61. https://doi.org/10.71129/ijaci.v2i1.pp53-61
  23. R. Sapkota, Z. Meng, M. Churuvija, X. Du, Z. Ma, M. Karkee, (2024). Comprehensive performance evaluation of YOLO11, YOLOv10, YOLOv9, and YOLOv8 on detecting and counting fruitlet in complex orchard environments. Agriculture Communications. https://doi.org/10.32388/E9Y7XI
  24. Z. Qi, H. Kongfa, W. Tianshu, Y. Tao, Lightweight and polarized self-attention mechanism for abnormal morphology classification algorithm during traditional Chinese medicine inspection. Digital Chinese Medicine, 7(3), (2024) 256-263. https://doi.org/10.1016/j.dcmed.2024.12.005