Abstract:The diagnosis, prevention and control of livestock and poultry diseases is of great significance to ensure the healthy development of animal husbandry in China. Based on natural language processing, the word segmentation effect of livestock and poultry disease texts was improved to improve the diagnosis level of livestock and poultry diseases. In order to deal with the problems of lacking text corpus in livestock and poultry diseases, and a large number of out of vocabulary words contained in the texts, such as epidemic names and phrases, a word segmentation model was proposed based on BERT-BiLSTM-CRF combined with dictionary matching. Taking sheep diseases as the research object, the text datasets of common diseases were constructed combined with the general corpus PKU, and the text vectorizations were processed by BERT pre-trained language model. Then the context semantic features were obtained through the bidirectional long short-term memory network (BiLSTM), and globally optimal label sequences were outputted by conditional random field (CRF). Based on this, dictionary matching was refined by adding a dictionary in the field of livestock and poultry diseases after the CRF layer, which reduced the ambiguity segmentation caused by the epidemic names and phrases in the process of word segmentation, and the accuracy of word segmentation was further improved. Results showed that the F1 value of the BERT-BiLSTM-CRF model combined with dictionary matching on the text datasets of sheep common diseases was 96.38%, which was increased by 11.01, 10.62, 8.3 and 0.72 percentage points, compared with that of jieba word segmentation, BiLSTM-Softmax model, BiLSTM-CRF model, and BERT-BiLSTM-CRF model that did not combine with dictionary matching, respectively, which verified the effectiveness of BERT-BiLSTM-CRF. Compared with a single corpus, the mixed corpus combined with the general corpus PKU and the text datasets of sheep common diseases could accurately divide the professional terms of livestock and poultry diseases and common words in the texts of diseases at the same time, the F1 values of the general corpus and the text datasets of diseases were more than 95%, which illustrated its better generalization ability. BERT-BiLSTM-CRF model can be effectively used for word segmentation of texts on livestock and poultry diseases.