亚洲一区欧美在线,日韩欧美视频免费观看,色戒的三场床戏分别是在几段,欧美日韩国产在线人成

基于BERT-BiLSTM-CRF模型的畜禽疫病文本分詞研究
作者:
作者單位:

作者簡(jiǎn)介:

通訊作者:

中圖分類(lèi)號(hào):

基金項(xiàng)目:

云南省重大科技專(zhuān)項(xiàng)計(jì)劃項(xiàng)目(202102AE090039)、北京市農(nóng)林科學(xué)院創(chuàng)新能力建設(shè)專(zhuān)項(xiàng)(KJCX20230204)和北京市數(shù)字農(nóng)業(yè)創(chuàng)新團(tuán)隊(duì)建設(shè)項(xiàng)目(BAIC10-2023)


Text Word Segmentation of Livestock and Poultry Diseases Based on BERT-BiLSTM-CRF Model
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 圖/表
  • |
  • 訪問(wèn)統(tǒng)計(jì)
  • |
  • 參考文獻(xiàn)
  • |
  • 相似文獻(xiàn)
  • |
  • 引證文獻(xiàn)
  • |
  • 資源附件
  • |
  • 文章評(píng)論
    摘要:

    針對(duì)畜禽疫病文本語(yǔ)料匱乏、文本內(nèi)包含大量疫病名稱(chēng)及短語(yǔ)等未登錄詞問(wèn)題,提出了一種結(jié)合詞典匹配的BERT-BiLSTM-CRF畜禽疫病文本分詞模型。以羊疫病為研究對(duì)象,構(gòu)建了常見(jiàn)疫病文本數(shù)據(jù)集,將其與通用語(yǔ)料PKU結(jié)合,利用BERT(Bidirectional encoder representation from transformers)預(yù)訓(xùn)練語(yǔ)言模型進(jìn)行文本向量化表示;通過(guò)雙向長(zhǎng)短時(shí)記憶網(wǎng)絡(luò)(Bidirectional long short-term memory network, BiLSTM)獲取上下文語(yǔ)義特征;由條件隨機(jī)場(chǎng)(Conditional random field, CRF)輸出全局最優(yōu)標(biāo)簽序列。基于此,在CRF層后加入畜禽疫病領(lǐng)域詞典進(jìn)行分詞匹配修正,減少在分詞過(guò)程中出現(xiàn)的疫病名稱(chēng)及短語(yǔ)等造成的歧義切分,進(jìn)一步提高了分詞準(zhǔn)確率。實(shí)驗(yàn)結(jié)果表明,結(jié)合詞典匹配的BERT-BiLSTM-CRF模型在羊常見(jiàn)疫病文本數(shù)據(jù)集上的F1值為96.38%,與jieba分詞器、BiLSTM-Softmax模型、BiLSTM-CRF模型、未結(jié)合詞典匹配的本文模型相比,分別提升11.01、10.62、8.3、0.72個(gè)百分點(diǎn),驗(yàn)證了方法的有效性。與單一語(yǔ)料相比,通用語(yǔ)料PKU和羊常見(jiàn)疫病文本數(shù)據(jù)集結(jié)合的混合語(yǔ)料,能夠同時(shí)對(duì)畜禽疫病專(zhuān)業(yè)術(shù)語(yǔ)及疫病文本中常用詞進(jìn)行準(zhǔn)確切分,在通用語(yǔ)料及疫病文本數(shù)據(jù)集上F1值都達(dá)到95%以上,具有較好的模型泛化能力。該方法可用于畜禽疫病文本分詞。

    Abstract:

    The diagnosis, prevention and control of livestock and poultry diseases is of great significance to ensure the healthy development of animal husbandry in China. Based on natural language processing, the word segmentation effect of livestock and poultry disease texts was improved to improve the diagnosis level of livestock and poultry diseases. In order to deal with the problems of lacking text corpus in livestock and poultry diseases, and a large number of out of vocabulary words contained in the texts, such as epidemic names and phrases, a word segmentation model was proposed based on BERT-BiLSTM-CRF combined with dictionary matching. Taking sheep diseases as the research object, the text datasets of common diseases were constructed combined with the general corpus PKU, and the text vectorizations were processed by BERT pre-trained language model. Then the context semantic features were obtained through the bidirectional long short-term memory network (BiLSTM), and globally optimal label sequences were outputted by conditional random field (CRF). Based on this, dictionary matching was refined by adding a dictionary in the field of livestock and poultry diseases after the CRF layer, which reduced the ambiguity segmentation caused by the epidemic names and phrases in the process of word segmentation, and the accuracy of word segmentation was further improved. Results showed that the F1 value of the BERT-BiLSTM-CRF model combined with dictionary matching on the text datasets of sheep common diseases was 96.38%, which was increased by 11.01, 10.62, 8.3 and 0.72 percentage points, compared with that of jieba word segmentation, BiLSTM-Softmax model, BiLSTM-CRF model, and BERT-BiLSTM-CRF model that did not combine with dictionary matching, respectively, which verified the effectiveness of BERT-BiLSTM-CRF. Compared with a single corpus, the mixed corpus combined with the general corpus PKU and the text datasets of sheep common diseases could accurately divide the professional terms of livestock and poultry diseases and common words in the texts of diseases at the same time, the F1 values of the general corpus and the text datasets of diseases were more than 95%, which illustrated its better generalization ability. BERT-BiLSTM-CRF model can be effectively used for word segmentation of texts on livestock and poultry diseases.

    參考文獻(xiàn)
    相似文獻(xiàn)
    引證文獻(xiàn)
引用本文

余禮根,郭曉利,趙紅濤,楊淦,張俊,李奇峰.基于BERT-BiLSTM-CRF模型的畜禽疫病文本分詞研究[J].農(nóng)業(yè)機(jī)械學(xué)報(bào),2024,55(2):287-294. YU Ligen, GUO Xiaoli, ZHAO Hongtao, YANG Gan, ZHANG Jun, LI Qifeng. Text Word Segmentation of Livestock and Poultry Diseases Based on BERT-BiLSTM-CRF Model[J]. Transactions of the Chinese Society for Agricultural Machinery,2024,55(2):287-294.

復(fù)制
分享
文章指標(biāo)
  • 點(diǎn)擊次數(shù):
  • 下載次數(shù):
  • HTML閱讀次數(shù):
  • 引用次數(shù):
歷史
  • 收稿日期:2023-11-13
  • 最后修改日期:
  • 錄用日期:
  • 在線發(fā)布日期: 2024-02-10
  • 出版日期:
文章二維碼