Abstract:Aiming at the problem that most image inpainting algorithms based on generative adversarial networks restore unclear image texture details and cannot fully integrate texture detail information and semantic information extracted by neural networks, a two-stage algorithm was proposed based on residual network and feature fusion. A network image inpainting model was generated, and inpainted images were obtained which conformed to the overall distribution of the training set by inpainting the occluded images in the training set. Firstly, a lightweight multi-scale receptive field residual module was designed, which extracted feature information through multiple convolution kernels with different receptive fields, and improved the ability of the coarsening generation network to retain texture information. Secondly, a bilateral fine inpainting network structure was constructed, which processed texture detail information and semantic information separately and aggregated them to realize fine inpainting of images. Finally, experiments were carried out on the GWHD dataset to verify the effectiveness of the algorithm. The experimental results showed that compared with the CE, GL, PEN-Net, and CA algorithms, the objective evaluation index L1-loss of this model was reduced by 0.56~3.79 percentage points, PSNR and SSIM were improved by 0.2~1.8dB and 0.02~0.08, and the restoration effect with clear texture structure and reasonable semantic features was realized in the intuitive perception of human eyes. Compared with the original GWHD data set, in the wheat data set expanded based on the proposed algorithm herein, the average detection accuracy mAP of predicting wheat ears using YOLO v5s was increased by 1.41 percentage points, the accuracy rate was increased by 3.65 percentage points, and the recall rate was increased by 0.36 percentage points.