Abstract:In the practice of modern agricultural production, the method of harvesting agricultural products is gradually shifting towards mechanization and intelligence. An increasing number of robots are being introduced into actual production and progressively replacing traditional manual labor. However, in natural environments, factors such as weather,lighting,the similarity in color between fruits and their backgrounds,and mutual occlusion between fruits and branches significantly increased the difficulty of fruit target detection. To accurately detect pears in natural environments, a lightweight pear detection method M-YOLO v7-SCSN+F was designed based on the YOLO v7-S foundational model. This model introduced MobileNetv3 into the YOLO v7-S model as its backbone feature extraction network, thereby reducing the number of parameters in the network. It incorporated a coordinate attention (CA) mechanism in the model’s feature fusion layer to enhance the network’s feature representation capabilities. The loss function CIoU in YOLO v7-S was replaced with SIoU, which was used in conjunction with the normalized Wasserstein distance (NWD) mechanism for small target detection, further improving the detection accuracy for fragrant pears. Based on the Fourier transform (FT) data augmentation method, new image data was generated by analyzing the frequency domain information of images and reconstructing the amplitude components, thereby enhancing the model’s generalization ability. Experimental results showed that the improved M-YOLO v7-SCSN+F model achieved mean average precision (mAP), precision, and recall rates of 97.23%,97.63% and 93.66%,respectively,on the validation set,with a detection speed of 69.39 f/s. The proposed detection model improved performance compared with Faster R-CNN, SSD, YOLO v3, YOLO v4,YOLO v5s, YOLO v7-S, YOLO v8n and RT-DETR-R50 models on the validation set, with mean average precision (mAP) enhancements of 14.50, 26.58, 3.88, 2.40, 1.58, 0.16, 0.07 and 0.86 percentage points, respectively. Furthermore, the improved M-YOLO v7-SCSN+F model reduced its parameter count by 16.47MB and 13.30MB, respectively, when compared with the advanced YOLO v8n and RT-DETR-R50 detection models. The detection model introduced demonstrated a high degree of effectiveness in target detection for mature pears, offering a reference for detecting small objects with backgrounds of similar color, and provided effective technical support for the automation of pear harvesting.