An Optimized K-means Algorithm for Text Clustering

Main Article Content

Jiani Zhao

Abstract

In the process of data mining, the two major problems confronted by K-means clustering analysis are the determination of the initial cluster center and the valuing of k. The traditional K-means algorithm has obvious subjectivity in the above-mentioned two aspects, which will directly affect the clustering effect. In this paper, an analysis method combining relational matrix and degree centrality is proposed to determine the initial center point and the k value of K-means algorithm. The improved K-means algorithm is applied to the clustering analysis of the Chinese entrepreneurial policy text collection, and the clustered topic effects are visually displayed through the word cloud graphs. This empirical analysis not only verifies its effectiveness and objectivity for the improved algorithm in processing large clusters of long text document clusters with random unknown number of categories and category topics, but also provides an approach for the objective classification of Chinese entrepreneurial policy text collections in the meanwhile.

Article Details

How to Cite
Zhao, J. (2021). An Optimized K-means Algorithm for Text Clustering. CONVERTER, 545 - 553. https://doi.org/10.17762/converter.85
Section
Articles