Hands-on Tutorial
Introduction to Fuzzy c-means for Clustering Algorithm
Basic introduction and implementation of Fuzzy c-means clustering algorithm using Python
--
There are a lot of clustering algorithms out there for the numerical data type. The k-means is one of the basic clustering algorithms that is commonly used by the researcher or analyst. But have you ever heard about the Fuzzy c-means before for clustering? If you haven’t, this article is for you.
In this short article, you will explore the Fuzzy c-means, starting from the basic structure of fuzzy, manual calculation and formula of Fuzzy c-means, and the implementation of Fuzzy c-means in Python using dummy data.
Okay, without further ado, let’s jump in!
Hard partition vs. fuzzy partition
Before talking about the basic theory of Fuzzy c-means, firstly better we talk about how the data points are theoretically allocated into clusters. Basically, there are two approaches, hard partition and fuzzy partition.
Hard partition — where the data points are strictly allocated as a member of one cluster and are not a member of another cluster, assuming that the number of clusters is known. The k-means is one of the algorithms that use a hard partition.
For instance, there are X = {x1, x2, …, x10}. They will be assigned into two clusters, let’s say cluster 1 and cluster 2. However, x6 and x7 are unfortunately in a grey area of two clusters.
Let’s say U is the partition matrix for X. Thus, the elements of matrix U will be as follows. The columns represent the data points while the rows are the clusters.
Remember that in a hard partition, there are only binary values [0, 1] so every data point must be assigned to one cluster. In this case, x6 is in cluster 1 while x7 is in cluster 2.