Abstract:In smart livestock farming research, deep learningbased method for pig image instance segmentation is crucial for downstream tasks such as individual pig recognition, weight estimation, and behavior recognition. However, the model often requires a large number of pixel-wise annotated images for training, which imposes significant manpower and time costs. To address this issue, a weakly supervised pig segmentation strategy was proposed, creating a weakly supervised dataset, and introducing afeature extraction backbone network called RdsiNet. Firstly, the second-generation deformable convolution was incorporated into the ResNet-50 residual module to expand the network's receptive field. Secondly, spatial attention mechanisms were used to strengthen the network's weight values for important features. Finally, the involution operator was introduced to enhance deep spatial information and connect feature maps with semantic information by using its spatial specificity and channel sharing mechanism. The efficacy of RdsiNet for weakly supervised datasets was demonstrated through ablation experiments and comparative experiments. The experiments showed that the mean value of mask AP under the Mask R-CNN reached 88.6%, which was higher than a series of backbone networks such as ResNet-50 and GCNet.Meanwhile,the mean value of mask AP under the BoxInst reached 95.2%, which was also higher than that of ResNet-50 which reached only 76.7%. Furthermore, the display of image segmentation results of the test set showd RdsiNet also had better segmentation effect than ResNet-50. In the case of pig stacking, RdsiNet can better distinguish each pig. When using the BoxInst for training, RdsiNet can perfectly segment the outline of pigs, which was more conducive to downstream analysis.