Abstract:Remote sensing technology for ground change detection has been widely used in the fields of agricultural planting planning and disaster situation assessment. For grapes, which is an important economic crop in China, accurately obtaining its spatial change information is crucial for industrial planning and sustainable development. Nevertheless, the dispersed arrangement of the grape growing areas, their diverse sizes, and the intricate nature of feature types, along with the heterogeneity among different temporal images, collectively contribute to a diminished accuracy in detecting areas of change. Therefore, a change detection model (Multiscale difference feature capture net, MDFCNet) based on attention mechanism and multiscale difference features was proposed.The main structure of the network adopted an encoderdecoder structure, which incorporated the squeeze and excitation (SE) attention module on the basis of ResNet101 backbone network to improve the network’s ability to adequately extract change features from remote sensing images, suppressing interference from extraneous pixels. We also designed the cross difference feature capture (CDFC) module, it captured different features with dense contextual information, thereby improving the accuracy of change detection in the case of complex feature types. While the supervised ensemble attention (SEA) module was designed to enrich multiscale features by fusing low-level detailed texture features and high-level abstract semantic features layer by layer to enhance the network’s ability to detect small planting areas. Comparison and ablation experiments were conducted on the constructed change dataset of grape growing area, which was located within the city of Yinchuan, Ningxia Hui Autonomous Region. The experimental results showed that the MDFCNet method achieved the best detection results compared with the current state-of-the-art change detection methods of SNUNet, A2Net, DSIFN and ResNet-CD. Compared with the model with the 2nd highest performance(A2Net), the evaluation metrics of IoU, recall, F1 value and precision were improved by 5.42, 5.62, 3.48 and 0.95 percentage points, respectively. And the ablation experiments also demonstrated the effectiveness of fusing the modules. Compared with the base network, the addition of the three modules resulted in 12.9, 5.63, 8.64 and 11.75 percentage points increases in the evaluation metrics of IoU, recall, F1 value and precision respectively. The model extracted different features with larger sensory fields to provide rich inferential information for change detection, and the fused multiscale features can effectively avoid the problem of false detection and missed detection in the results. The extracted change areas were more complete and retain more edge detail, providing a solution to the task of change detection for the complex background of the wide range of grape growing areas.