Abstract:There are two main problems in obtaining key information of forestry text, firstly, the key information is mainly considered from the perspective of keywords, and the information types of words are neglected;secondly, there is no unified description structure for forestry text on the Internet, which makes it difficult to extract word information types. Through combining the two characteristics of “keywords+information types”, a method about forestry text key information extraction was proposed based on inproved TextRank and clusters filtering. The main contents were as follows: the first step was to extract the text keywords according to the keywords extraction formula. The second step was to characterize the keywords with Word2Vec vectorization. The third step was to improve the TextRank algorithm, mainly by merging the word features and introducing the edge weights to construct the graph model of the text. The fourth step was to obtain the stable graph structures through iterative convergence, and then merged them to form clusters. And the clusters’s quality was evaluated from three aspects: the uniformity of elements distribution, the size of the clusters, and the universality of the clusters. The fifth step was to form the final clusters’set in combination with the TextRank algorithm. The final step was to label the final clusters about information types. The data used in the experiments were 2000 forestry texts related to forestry policies and news. The experimental results showed that compactness of the final clusters’ set was 0.9680, the separation of the final clusters’ set was 0.0572, and the F1-measure of the final clusters’ set was 0.8871. It showed that the information types of the clusters can be clearly marked. For a text’s keywords, their information type was obtained by calculating the cosine similarity of the keywords’ vector and the clusters’ heart. The combination of keywords and information types constituted key information of a foresty text. Meanwhile, manually labeled 400 texts, comparing with the six algorithms such as TextRank, TF-IDF, this method achieved the better results in MRR, Bpref, accuracy, and F1-measure. It showed that this method had advantages in extracting forestry text keywords.