Saturday, July 8, 2017

Cluster analysis with Python - HAC and K-Means

This tutorial describes a cluster analysis process. We deal with a set of cheeses (29 instances) characterized by their nutritional properties (9 variables). The aim is to determine groups of homogeneous cheeses in view of their properties. We inspect and test two approaches using two Python procedures: the Hierarchical Agglomerative Clustering algorithm (SciPy package) ; and the K-Means algorithm (scikit-learn package).

One of the contributions of this tutorial is that we had conducted the same analysis with R previously, with the same steps. We can compare the commands used and the results provided by the available procedures. We observe that these tools have comparable behaviors and are substitutable in this context.

Keywords: python, scipy, scikit-learn, cluster analysis, clustering, hac, hierarchical agglomerative clustering, , k-means, principal component analysis, PCA
Turorial: hac and k-means with Python 
Dataset and cource code: hac_kmeans_with_python.zip
References :
Marie Chavent, Teaching Page, University of Bordeaux.
Tanagra Tutorials, "Cluster analysis with R - HAC and K-Means", July 2017.