MBD-YOLO: Steel Defect Detection Model Combining Multi-scale Features and Routing Attention
DOI:
https://doi.org/10.6911/WSRJ.202509_11(9).0008Keywords:
Deep learning; steel defect detection; YOLO; attention mechanism; Small object detection.Abstract
With the growing demand for high-quality steel, accurate and efficient surface defect detection is becoming increasingly important. Traditional methods often face challenges such as missed detection, inaccurate localization, and poor performance in identifying small defects. This study introduces MBD-YOLO, an enhanced defect detection model based on the YOLOv8 framework. The model incorporates a novel attention module, Bivo, in the feature fusion stage, which captures both global and local features while enabling effective cross-scale information exchange. This significantly improved the accuracy and performance of small-object detection in complex backgrounds. Second, a new DRM detection head was designed. Through its multi-context enhancement and its dual-branch pooling combined with upsampling, the global and local context information is enhanced, and the detection ability of small target, low contrast and direction sensitive defects is significantly improved. The backbone integrates MobileViT, combining CNN and Vision Transformer architectures to improve feature extraction for small targets. Experiments on the NEU-det dataset show that MBD-YOLO achieves an mAP@0.5 of 85.7%, outperforming mainstream models. A large number of experiments not only confirm the combined effectiveness of the proposed modules, provide excellent performance in various defect categories, but also verify the generalization of the model. MBD-YOLO provides a robust and high-precision solution for steel defect detection that meets the needs of modern industrial applications.
Downloads
References
[1] Li, Z., Wei, X., Hassaballah, M., Li, Y., & Jiang, X. (2024). A deep learning model for steel surface defect detection. Complex & Intelligent Systems, 10(1), 885-897.
[2] He, Y., Song, K., Meng, Q., & Yan, Y. (2019). An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE transactions on instrumentation and measurement, 69(4), 1493-1504.
[3] Chorowski, J., Wang, J., & Zurada, J. M. (2014). Review and performance comparison of SVM-and ELM-based classifiers. Neurocomputing, 128, 507-516.
[4] Lien, P. C., & Zhao, Q. (2018, August). Product surface defect detection based on deep learning. In 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech) (pp. 250-255). IEEE.
[5] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
[6] Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
[7] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
[8] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[9] Varghese, R., & Sambath, M. (2024, April). Yolov8: A novel object detection algorithm with enhanced performance and robustness. In 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS) (pp. 1-6). IEEE.
[10] Hatab, M., Malekmohamadi, H., & Amira, A. (2021). Surface defect detection using YOLO network. In Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 1 (pp. 505-515). Springer International Publishing.
[11] Zhao, C., Shu, X., Yan, X., Zuo, X., & Zhu, F. (2023). RDD-YOLO: A modified YOLO for detection of steel surface defects. Measurement, 214, 112776.
[12] Wang, Y., Wang, H., & Xin, Z. (2022). Efficient detection model of steel strip surface defects based on YOLO-V7. Ieee Access, 10, 133936-133944.
[13] Lu, M., Sheng, W., Zou, Y., Chen, Y., & Chen, Z. (2024). WSS-YOLO: An improved industrial defect detection network for steel surface defects. Measurement, 236, 115060.
[14] Wang, C. Y., Liao, H. Y. M., Wu, Y. H., Chen, P. Y., Hsieh, J. W., & Yeh, I. H. (2020). CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 390-391).
[15] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1904-1916.
[16] Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
[17] Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759-8768).
[18] Zhu, L., Wang, X., Ke, Z., Zhang, W., & Lau, R. W. (2023). Biformer: Vision transformer with bi-level routing attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10323-10333).
[19] Ren, S., Zhou, D., He, S., Feng, J., & Wang, X. (2022). Shunted self-attention via multi-scale token aggregation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10853-10862).
[20] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
AUTHOR CONTRIBUTIONS
Conceptualization, Xvyue Zhang and Baoping Wang.; methodology, Xvyue Zhang; software, Xvyue Zhang; validation, Xvyue Zhang, Qin Sun and Baoping Wang; formal analysis, Xvyue Zhang; investigation, Xvyue Zhang,Baoping Wang; resources, Da Zhao,Lifang Zhao; data curation, Xvyue Zhang; writing—original draft preparation, Xvyue Zhang; writing—review and editing, Baoping Wang,Qin Sun,Da Zhao,Lifang Zhao; visualization, Xvyue Zhang; supervision, Baoping Wang,Lifang Zhao; project administration, Baoping Wang,Lifang Zhao; funding ,Baoping Wang,Lifang Zhao. All authors have read and agreed to the published version of the manuscript.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 World Scientific Research Journal

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




