Abstract:The rapid diagnosis of crop diseases is crucial for agricultural production. A large amount of information on disease symptoms, drug prescriptions and environmental characteristics is recorded in the plant electronic medical record (EMR) in both structured and unstructured forms. Plant EMRs can provide a high-quality source of knowledge for intelligent diagnosis of diseases. However, their small sample size, the lack of publicly available datasets and the co-existence of multiple types of data posed difficulties for related research. A crop disease diagnosis model based on BERT-MPL data fusion and attention mechanism (BM-Att) was proposed for the characteristics of multiple types of data mixing in plant EMR. Firstly, BERT pre-trained language model was used to extract text semantic features from the unstructured part of the electronic medical record. Secondly, one-hot coding and multi-layer perceptron (MLP) was used to encode the structured data and augment the vector dimension. Finally, an attention mechanism was used to selectively highlight key features in the feature fusion phase and multiple fully connected layers were used to enable disease diagnosis. To verify the validity of the model, a dataset of 15 diseases of four crops, namely tomato, cucumber, lettuce and watermelon, was constructed and the following experiments were carried out. Ablation experiments were conducted;representative deep learning models for text classification were compared, such as CNN, RCNN, AttRNN, FastText, Transformer, BERT and ERNIE;representative models with different approaches to structured data processing were compared, such as BERT-ALEX, BERT-1dCNN, BERT-1dLSTM, BERT-1dAttLSTM, BERT-MLP, ERNIE-ALEX, ERNIE-1dCNN, ERNIE-1dLSTM, ERNIE-1dAttLSTM, ERNIE-MLP, etc. The results showed that BM-Att achieved optimal results with accuracy, precision, recall and F1-score of 95.82%, 96.38%, 95.48% and 95.85%, respectively in the test set, indicating that effective diagnosis of crop diseases can be achieved. The strategy of adding an attention mechanism to the feature fusion stage improved the F1 macro mean of the model by 1.47 percentage points, significantly improving the model’s classification of small sample diseases such as lettuce downy mildew and watermelon nematode. The research result can provide a reference for data mining of electronic medical records and the implementation of intelligent diagnosis of diseases.