DAP-YOLO: A Multi-scale Adaptive Fusion Human Detection Model for Aerial Photography Disaster Scenarios
DOI:
https://doi.org/10.6911/WSRJ.202512_11(12).0007Keywords:
Aerial disaster imagery, PPA, Adaptive Spatial Feature Fusion, Tiny human body human target, Dilation wise residual, YOLO11Abstract
In emergency rescue operations for natural disasters and traffic accidents, rapidly and accurately detecting core disaster areas and identifying survivors in urgent need of rescue holds significant practical importance. To address this, this study proposes a multi-scale adaptive fusion human detection model for aerial disaster scenes—DAP-YOLO. First, the PPA module is introduced, employing multi-branch parallel fusion to enhance focus on small objects. Second, the C3k2_DWR module utilizes DWR to expand the receptive field, mitigating feature information loss during downsampling. The ASFF module is integrated into the detection layer, dynamically assigning weights to adaptively adjust the contribution of feature maps at different scales. Additionally, a micro-detection structure is designed for small objects, significantly reducing model parameters while enhancing detection performance. Experiments demonstrate that DAP-YOLO outperforms YOLO11n on the C2A dataset by 6.70% in recall, 5.30% in mAP50, 5.10% in mAP50-95, and 4.63% in F1-Score. and on the SARD dataset, improvements of 6.40%, 4.10%, 8.20%, and 4.78% respectively. Further comparison of mAP50-95 reveals that DAP-YOLO outperforms YOLOv12n by 5.10% while significantly surpassing the lightweight YOLOv13n model. These experiments validate the effectiveness of human detection models in disaster scenarios, providing technical support for drone-based post-disaster search and rescue operations.
Downloads
References
[1] C. Xu, Z. Xue, "Applications and challenges of artificial intelligence in the field of disaster prevention, reduction, and relief," Natural Hazards Research, 4(1), Vol.(2024), pp. 169-172.
[2] L. Jiang, B. Yuan, J. Du, et al., "MFFSODNet: Multiscale Feature Fusion Small Object Detection Network for UAV Aerial Images," IEEE Transactions on Instrumentation and Measurement, 73, Vol.(2024), pp. 1-14.
[3] X. Luo, Y. Wu, L. Zhao, "YOLOD: A target detection method for UAV aerial imagery," Remote Sensing, 14(14), Vol.(2022), pp. 3240.
[4] R. Varghese, M. S: YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness, 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS)* 18-19 April 2024), p. 1-6.
[5] W. Liu, D. Anguelov, D. Erhan, et al.: Ssd: Single shot multibox detector, Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, p. 21-37.
[6] N. Carion, F. Massa, G. Synnaeve, et al.: End-to-end object detection with transformers, European conference on computer vision, p. 213-229.
[7] X. Zhu, W. Su, L. Lu, et al., "Deformable detr: Deformable transformers for end-to-end object detection," arXiv preprint arXiv:2010.04159, Vol.(2020).
[8] Z. Liu, Y. Lin, Y. Cao, et al.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, 2021 IEEE/CVF International Conference on Computer Vision (ICCV)* 10-17 Oct. 2021), p. 9992-10002.
[9] W. Wang, E. Xie, X. Li, et al.: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions, 2021 IEEE/CVF International Conference on Computer Vision (ICCV)* 10-17 Oct. 2021), p. 548-558.
[10] J. Ren, C. Niu,J. Han, "An IF-RCNN Algorithm for Pedestrian Detection in Pedestrian Tunnels," IEEE Access, 8,Vol.(2020), pp. 165335-165343.
[11] J. Zhang, F.-W. Li, W.-Z. Nie, et al., "Visual attribute detction for pedestrian detection," Multimedia Tools Appl., 78(19), Vol.(2019), pp. 26833–26850.
[12] G. Wang, Y. Chen, P. An, et al., "UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios," Sensors, 23(16), Vol.(2023), pp. 7190.
[13] E. T. Alotaibi, S. S. Alqefari, A. Koubaa, "LSAR: Multi-UAV Collaboration for Search and Rescue Missions," IEEE Access, 7, Vol.(2019), pp. 55817-55832.
[14] C. Qiu, D. Zhang, Y. Hu, et al., "Radio-Assisted Human Detection," Trans. Multi., 25, Vol.(2023), pp. 2613–2623.
[15] H. Zhang, W. Sun, C. Sun, et al., "HSP-YOLOv8: UAV Aerial Photography Small Target Detection Algorithm," Drones, 8(9), Vol.(2024), pp. 453.
[16] B. Du, Y. Huang, J. Chen, et al.: Adaptive sparse convolutional networks with global context enhancement for faster object detection on drone images, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p. 13435-13444.
[17] L. Zhao, M. Zhu, "MS-YOLOv7:YOLOv7 Based on Multi-Scale for Object Detection on UAV Aerial Photography," Drones, 7(3), Vol.(2023), pp. 188.
[18] X. Zhang, Y. Feng, S. Zhang, et al., "Finding Nonrigid Tiny Person With Densely Cropped and Local Attention Object Detector Networks in Low-Altitude Aerial Images," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15, Vol.(2022), pp. 4371-4385.
[19] J. Zhao, W. Yang, F. Wang, et al., "Research on UAV Aided Earthquake Emergency System," IOP Conference Series: Earth and Environmental Science, 610(1), Vol.(2020), pp. 012018.
[20] H. Wei, X. Liu, S. Xu, et al., "DWRSeg: Rethinking efficient acquisition of multi-scale contextual information for real-time semantic segmentation," arXiv preprint arXiv:2212.01173, Vol.(2022), pp.
[21] S. Liu, D. Huang, Y. Wang, "Learning Spatial Fusion for Single-Shot Object Detection," ArXiv, abs/1911.09516, Vol.(2019), pp.
[22] R. A. Nihal, B. Yen, K. Itoyama, et al.: UAV-Enhanced Combination to Application: Comprehensive Analysis and Benchmarking of a Human Detection Dataset for Disaster Scenarios, International Conference on Pattern Recognition, p. 145-162.
[23] S. Sambolek, M. Ivasic-Kos, "Automatic Person Detection in Search and Rescue Operations Using Deep CNN Detectors," IEEE Access, 9, Vol.(2021), pp. 37905-37922.
[24] Q. Tang, C. Su, Y. Tian, et al., "YOLO-SS: optimizing YOLO for enhanced small object detection in remote sensing imagery," The Journal of Supercomputing, 81(1), Vol.(2024), pp. 303.
[25] A. Chattopadhay, A. Sarkar, P. Howlader, et al.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks, 2018 IEEE winter conference on applications of computer vision (WACV), p. 839-847.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 World Scientific Research Journal

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




