-
Essay / Recent Trends in Document Clustering with...
Document clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar textual characteristics. Previously, a number of statistical algorithms had been applied to perform clustering of data, including text documents. Recent efforts have been made to improve clustering performance with optimization-based algorithms such as evolutionary algorithms. Thus, document clustering with scalable algorithms has become an emerging topic that has attracted more attention in recent years. This article presents an up-to-date review devoted entirely to evolutionary algorithms designed for document clustering. It first provides a comprehensive inspection of the document clustering model, revealing its various components and associated concepts. Then it shows and analyzes the main research works in this area. Finally, it collects and classifies various objective functions from the research paper collection. The article concludes by addressing some important questions and challenges that may be the subject of future work. The objective function (or fitness function) is the measure that evaluates the optimality of the evolutionary algorithm solutions generated in the search space. In the field of clustering, the fitness function refers to the adequacy of the partitioning. Accordingly, it must be formulated carefully, taking into account that clustering is an unsupervised process. Different objective functions generate different solutions, even from the same evolutionary algorithm. Also assuming that fitness could be either a minimization or a maximization function. Additionally, the algorithm could be formulated with one or more objective functions. To summarize, "choosing optimization...... middle of paper ...... traction. 1999.76. Turney, PD, Learning algorithms for keyphrase extraction. Information Retrieval, 2000. 2 ( 4): pp. 303-336.77. Wu, J.-l. and AM Agogino, Automating Key Phrase Extraction with Multi-Objective Genetic Algorithms, Proceedings of the Hawaii International Conference on Systems Science, HICSS 2003, 2003.78. Combining terms using a genetic algorithm. International Journal of Computer and Electrical Engineering, 2010. 2(1): p., V., et al., On the performance of evolutionary algorithms in keyword clustering. biomedical, in Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computing 2011, ACM: Dublin, Ireland p.. 511-518.