Abstract:In order to implement the fast and accurate recognition of tomatoes for agricultural harvesting robots under greenhouse environments, an improved multiscale YOLO detection algorithm named IMSYOLO was presented. A new backbone network structure, which was named darknet20, with one residual block was designed based on a series of the previous YOLO algorithms, and a multiscale detection structure was utilized simultaneously for the detection algorithm. Therefore, a new kind of neural network model was formed for the fast recognition of tomatoes under complex environments. Due to some features of the method such as the fewer layers required, the larger amount of information extracted, and by using the multiscale structure to return both the detection categories and the bounding boxes, the detection speed and accuracy were improved. IMSYOLO model was tested on our own tomato dataset, and the detection performance of the network before and after the improvement as well as the influence of the variation of the backbone network layers on the feature extraction capacity were analyzed respectively. The test results showed that the proposed method had ideal features with a precision of tomato image detection of 97.13%, an accuracy of 96.36%, a recall rate of 96.03%, an intersection over union (I(xiàn)OU) of 83.32% and a detection time of 7.719ms. Furthermore, compared with YOLO v2, YOLO v3 and some other neural networks mentioned, IMSYOLO can meet the requirements of both detection accuracy and speed. At last, the feasibility of the proposed algorithm applying to the robots was verified by the harvesting tests of the ripe tomatoes under the greenhouse environments.