MTECH PROJECTS
Parallel k-modes algorithm based on MapReduce K-modes is a typical categorical clustering algorithm. Firstly, we improve the process of K-modes: when allocating categorical objects to clusters, the number of each attribute item in clusters is updated, so that the new modes of clusters can be computed after reading the whole dataset once. In order to make K-modes capable for large-scale categorical data, we then implement K-modes on Hadoop using MapReduce parallel computing model. Experiments show that, parallel k-modes archives good speedup ratio when dealing with large-scale categorical data.