The Effects of Ties on Convergence in K-Modes Variants for Clustering Categorical Data

Nicholas Orlowski, David Schlorff, Jason R. Blevins, Daniel Cañas, Moody T. Chu, and Robert E. Funderlic, Unpublished Manuscript, 2004.

Abstract. Clustering methods for categorical data can easily have ties. These ties occur when an ambiguity arises in the process of executing an algorithm. This paper identifies two types of ties and studies their effect on the k-modes method for categorical data. Three variants of the k-modes algorithm, each of which handles tie breaking and stopping criterion differently, are compared. It is shown via simple yet subtly constructed examples that the convergence and quality of results can be greatly affected by how the ties are resolved. The notion of cluster death is also discussed.