Abstract:Rapid and accurate estimation of wheat yield can improve the efficiency of breeding. Yield data of wheat lines and hyperspectral data during grain filling period were collected. Firstly, the feature wavelengths were selected as model input variables by using recursive feature elimination method. Then three linear algorithms (ridge regression, partial least squares regression, multiple linear regression) and six nonlinear algorithms (random forest, gradient boosting regression, eXtreme gradient boosting, Gaussian process regression, support vector regression, K-nearest neighbor) were employed to establish single algorithm yield estimation models for precision comparison. Finally, the Stacking algorithm was adopted to develop multi-model ensemble combinations, aiming to identify the optimal ensemble model. The results showed that the accuracy of yield estimation models, based on different algorithms, varied significantly, and that the nonlinear models were better than the linear models. The yield estimation model based on GBR performed best in the single models, with R2 of 0.72, RMSE of 534.49kg/hm2 and NRMSE of 11.10% in the training set, R2 of 0.60, RMSE of 628.73kg/hm2, and NRMSE of 13.88% in the testing set. The performance of the ensemble models based on Stacking algorithm was closely related to the selection of primary and secondary models. The model with KNN, RR, SVR as primary models and GBR as the secondary model effectively improved the yield estimation accuracy. Compared with the single model GBR, the training set R2 was increased by 1.39% and the testing set R2 was increased by 3.33%. The research result can provide an application reference for yield estimation of wheat lines based on hyperspectral technology.