Seminario de Estadística 2018

Lunes 2 de julio a las 12 hs.

Instituto de Cálculo FCEyN-UBA

Víctor Yohai (FCEyN-UBA y CONICET).

Robust Clustering.

Clustering is an useful tool in unsupervised data analysis. Several measureents are made on p items and used to obtain K natural groups called clusters. There are several approaches to clustering. K-means is a very popular nonparametric clustering procedure introduced by Steinhaus (1956) . K-means has several nice properties, in particular it is conceptually simple since it is obtained by minimizing the averge of the square of the Euclidean distances of points to the centers. However, K-means is very sensitive to the presence of atypical points that lay far away from all the K clusters. We are going to propose an alternative robust procedure called Tau-K-Means. Instead of minimizing the mean of the square distances we minimize a tau-scale (Zamar and Yohai, 1988) of these values. Monte Carlo simulations show that this procedure is not much affected for a small percentage of outliers and has a good performance finding clusters.