Abstract:In response to the common problems of complex environments, large training tasks, and poor stability caused by the disorder growth of green walnut and tree branches, etc., a harvesting device based on synchronous belt module and manipulator was designed, and the path planning of harvesting manipulator was carried out by using the twin delayed deep deterministic policy gradient with hindsight experience replay (HER-TD3) algorithm. HER algorithm was used to improve the agent’s ability of exploration and alleviate the problem of sparse reward, and TD3 algorithm was used to improve the agent’s stability and reduce the oscillation in training. In order to demonstrate the feasibility and generalization ability of the HER-TD3 algorithm, TD3 and HER-DDPG algorithms were introduced for comparison. Three deep reinforcement learning agents were trained by using dimensionality reduction training methods. The results showed that the success rate of the HER-TD3 algorithm model in completing path planning tasks reached 98%, which was 4 percentage points higher than that of the HER-DDPG algorithm and 19 percentage points higher than that of TD3. The 3D model simulation environment was built in CoppeliaSim software, and the initial attitude and collision detection were designed, YOLO v4 was used to recognize green walnuts, and used this algorithm model to guide the virtual harvesting robotic arm to avoid tree branches and obstacles to reach the target position, completing collision free path planning. The success rates of path planning were 91% in the absence of obstacles and 86% in the presence of obstacles. In the experiment of picking green walnut using a physical prototype, the path planning task was still well completed. The success rate of path planning for harvesting without obstacles was 86.7%, with an average motion time of 12.8s, while the success rate in the presence of obstacles was 80.0%, with an average motion time of 13.6s. It was verified that HER-TD3 algorithm had good adaptability and stability to complex environment.