An Optimized K-means Algorithm for Text Clustering

Jiani Zhao

doi:10.17762/converter.85

PDF

Published: Jul 10, 2021

DOI: https://doi.org/10.17762/converter.85

Jiani Zhao

Abstract

In the process of data mining, the two major problems confronted by K-means clustering analysis are the determination of the initial cluster center and the valuing of k. The traditional K-means algorithm has obvious subjectivity in the above-mentioned two aspects, which will directly affect the clustering effect. In this paper, an analysis method combining relational matrix and degree centrality is proposed to determine the initial center point and the k value of K-means algorithm. The improved K-means algorithm is applied to the clustering analysis of the Chinese entrepreneurial policy text collection, and the clustered topic effects are visually displayed through the word cloud graphs. This empirical analysis not only verifies its effectiveness and objectivity for the improved algorithm in processing large clusters of long text document clusters with random unknown number of categories and category topics, but also provides an approach for the objective classification of Chinese entrepreneurial policy text collections in the meanwhile.

How to Cite

Zhao, J. (2021). An Optimized K-means Algorithm for Text Clustering. CONVERTER, 545 - 553. https://doi.org/10.17762/converter.85

Issue

Vol 2021: No. 3

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details