Abstract:Aiming at the problems of weak spectral characteristic information and poor robustness of model inversion in the estimation of soil heavy metal content by hyperspectral remote sensing, it was proposed to construct spatial features of pollution source and sink to quantify the spatial influence factors of pollutant diffusion and aggregation, and integrate the spectral features to establish the estimation model of soil heavy metal content based on extremely randomized trees (ERT). Taking the cultivated soil of Jiyuan City as the study area, totally 249 soil samples were collected. The effectiveness and influence mechanism of spectral features, topographic features and spatial features of pollution sources in the inversion of soil heavy metal Pb and Cd were analyzed. The multi-source characteristics were optimized by permutation importance index, and the prediction accuracy of ERT model was evaluated by comparing with various regression models. The research showed that the ERT model constructed from the transformed soil spectral features can achieve a certain inversion accuracy, and the accuracy was significantly improved after the introduction of topographic features and spatial features of pollution sources. In particular, the advantage of the spatial features of pollution sources was more obvious, the RMSE of Pb ERT model was decreased from 43.185mg/kg to 22.301mg/kg, with decrease of 48.36%, the RMSE of Cd ERT model was decreased from 0.738mg/kg to 0.371mg/kg, with down of 49.73%, which fully demonstrated the effectiveness of the pollution diffusion spatial features. The results of multi-feature combination modeling experiments showed that the features with the high permutation importance index were the spatial features of the pollution source, followed by the spectral features. In the research, the estimation model established by using the selected features of the permutation importance index was very close to the optimal modeling accuracy when all the features were used, which showed the effectiveness of the feature screening method based on the permutation importance index. Compared with regression models such as MLR, SVM, RF, and GBDT, the ERT estimation model had obvious advantages in the evaluation of various indicators. The R2 value of the Pb ERT model in the test set reached 0.964, and the R2 value of the Cd ERT model was 0.923. The experimental results showed that the introduction of the pollutant diffusion spatial features and the fusion of spectral features to construct ERT model to estimate soil heavy metal content had high accuracy and certain popularization and application value.