Automatic Classification of Long Mongolian Text Based on Word Embedding Distributed Representation Deep Learning Model

Main Article Content

Gang Jin

Abstract

With the development of information technology of Mongolian language and the publication of international standard of Mongolian language coding,Mongolian electronic texts have sprung up in recent years. The research of automatic classification technology in the field of Mongolian information processing has received much concern compared to the time-consuming effort of manual processing and classification. The current research on long Mongolian text automatic classification mainly relies on traditional algorithms such as Naive Bayes and Support Vector Machine, as well as sparse feature representation methods based on word frequency like TF-IDF or One-Hot. However, in terms of the characteristics of Mongolian language itself, Mongolian sentences in news texts are usually long, and there are some problems such as difficulty in extracting semantic features and poor classification effect by using traditional algorithms. In this paper, the combination model of CNN and RNN (CNN+RNN Model) which is characterized by word embedding distributed expression is proposed to carry out the experiment of automatic classification of long Mongolian text which takes news text as data. Experimental results show that compared with CNN model or RNN model used alone, the combination model is superior to the model used alone in various indexes such as precision rate, loss rate, accuracy rate, recall rate and F1 value, and can effectively process longer Mongolian text.

Article Details

How to Cite
Gang Jin. (2021). Automatic Classification of Long Mongolian Text Based on Word Embedding Distributed Representation Deep Learning Model. CONVERTER, 2021(7), 1095-1101. Retrieved from https://converter-magazine.info/index.php/converter/article/view/602
Section
Articles