Search This Blog

How to Cluster Data in MATLAB

 Clustering is the process of grouping a set of data given a certain criterion. In this way it is possible to define subgroups of data, called clusters, that share common characteristics. Determining the internal structure of the data is important in exploratory data analysis, but is also used for anomaly detection and preprocessing for supervised learning.

MATLAB’s Statistics and Machine Learning Toolbox offers a wide set of functions that help to cluster your data. If you are not familiar with clustering, you can start with k-means algorithm which groups data based on their squared euclidean distance. K-means requires the a priori knowledge of how many clusters are present in your data. A common criterion for the estimate of the optimal K is the Calinski-Harabasz method which assigns a score to each possible value of K. The Calinski-Harabasz score is defined as ratio between the within clusters dispersion and the between clusters dispersion. The optimal number of clusters is the one associated with the highest score. K-means is a simple and yet effective algorithm for clustering but it is just one of the many algorithm that the Statistics and Machine Learning MATLAB Toolbox offers, find them out at - Learn more about the K-means algorithm: - Evaluate the optimal number of clusters: - Density based clustering:

1 comment: