Abstract:Aiming at the problems of overlapping triples and diverse entity expressions in maize breeding text data, a joint bidirectional encoder representations from transformers-conditional random field (BERT-CRF) maize breeding entity relation extraction method with embedded lexical information was proposed. Firstly, the expression characteristics of maize breeding corpus were analyzed, and a synchronous labeling strategy for entity boundary, relation type, and entity position information was adopted. Secondly, a BERT-CRF model with embedded lexical information was constructed for training and prediction, a selfbuilt dictionary of maize breeding knowledge was designed to enhance the semantic ability of the model by embedding lexical information in BERT, integrating character features and lexical features, and using CRF model to output the globally optimal label sequence, and an entity and relation triple matching algorithm (ERTM) was designed to obtain triples by mapping and matching labels. Finally, in order to verify the effectiveness of the proposed method, experiments were carried out on maize breeding data set. The results showed that the precision, recall and F1 value were 91.84%, 95.84% and 93.80%, respectively, which improved the performance compared with the existing models. This method can extract maize breeding knowledge effectively and provide data basis for constructing maize breeding knowledge graph and other downstream tasks.