4 of the Machine Learning Advent Calendar.
During the first three days, we explored distance-based models for supervised learning:
In all these models, the idea was the same: we measure distances, and we decide the output based on the nearest points or nearest centers.
Today, we stay in this same family of ideas. But we use the distances in an unsupervised way: k-means.
Now, one question for those who already know this algorithm: k-means looks more similar to which model, the k-NN classifier, or the Nearest Centroid classifier?
And if you remember, for all the models we have seen so far, there was not really a “training” phase or hyperparameter tuning.
- For k-NN, there is no training at all.
- For LDA, QDA, or GNB, training is just computing means and variances. And there are also no real hyperparameters.
Now, with k-means, we are going to implement a training algorithm that finally looks like “real” machine learning.
We start with a tiny 1D example. Then we move to 2D.
Goal of k-means
In the training dataset, there are no initial labels.
The goal of k-means is to create meaningful labels by grouping points that are close to each other.
Let us look at the illustration below. You can clearly see two groups of points. Each centroid (the red square and the green square) is in the middle of its cluster, and every point is assigned to the closest one.
This gives a very intuitive picture of how k-means discovers structure using only distances.
And here, k means the number of centers we try to find.

Now, let us answer the question: Which algorithm is k-means closer to, the k-NN classifier or the Nearest Centroid classifier?
Do not be fooled by the k in k-NN and k-means.
They do not mean the same thing:
- in k-NN, k is the number of neighbors, not the number of classes;
- in k-means, k is the number of centroids.
K-means is much closer to the Nearest Centroid classifier.
Both models are represented by centroids, and for a new observation we simply compute the distance to each centroid to decide to which one it belongs.
The difference, of course, is that in the Nearest Centroid classifier, we already know the centroids because they come from labeled classes.
In k-means, we do not know the centroids. The whole goal of the algorithm is to discover suitable ones directly from the data.
The business problem is completely different: instead of predicting labels, we are trying to create them.
And in k-means, the value of k (the number of centroids) is unknown. So it becomes a hyperparameter that we can tune.
k-means with only One feature
We start with a tiny 1D example so that everything is visible on one axis. And we will choose the values in such a trivial way that we can instantly see the two centroids.
1, 2, 3, 11, 12, 13
Yes, 2, and 12.
But how would the computer know? The machine will “learn” by guessing step by step.
Here comes the algorithm called Lloyd’s algorithm.
We will implement it in Excel with the following loop:
- choose initial centroids
- compute the distance from each point to each centroid
- assign each point to the nearest centroid
- recompute the centroids as the average of the points in each cluster
- repeat steps 2 to 4 until the centroids no longer move
1. Choose initial centroids
Pick two initial centers, for example:
They should be within the data range (between 1 and 13).

2. Compute distances
For each data point x:
- compute the distance to c_1,
- compute the distance to c_2.
Typically, we use absolute distance in 1D.
We now have two distance values for each point.

3. Assign clusters
For each point:
- compare the two distances,
- assign the cluster of the smallest one (1 or 2).
In Excel, this is a simple IF or MIN based logic.

4. Compute the new centroids
For each cluster:
- take the points assigned to that cluster,
- compute their average,
- this average becomes the new centroid.

5. Iterate until reaching convergence
Now in Excel, thanks to the formulas, we can simply paste the new centroid values into the cells of the initial centroids.
The update is immediate, and after doing this a few times, you will see that the values stop changing. That is when the algorithm has converged.

We can also record each step in Excel, so we can see how the centroids and clusters evolve over time.

k-means with Two Features
Now let us use two features. The process is exactly the same, we simply use the Euclidean distance in 2D.
You can either do the copy-paste of the new centroids as values (with just a few cells to update),

or you can display all the intermediate steps to see the full evolution of the algorithm.

Visualizing the Moving Centroids in Excel
To make the process more intuitive, it is helpful to create plots that show how the centroids move.
Unfortunately, Excel or Google Sheets are not ideal for this kind of visualization, and the data tables quickly become a bit complex to organize.
If you want to see a full example with detailed plots, you can read this article I wrote almost three years ago, where each step of the centroid movement is shown clearly.

As you can see in this picture, the worksheet became quite unorganized, especially compared to the earlier table, which was very straightforward.

Choosing the optimal k: The Elbow Method
So now, it is possible to try k = 2 and k = 3 in our case, and compute the inertia for each one. Then we simply compare the values.
We can even begin with k=1.
For each value of k:
- we run k-Means until convergence,
- compute the inertia, which is the sum of squared distances between each point and its assigned centroid.
In Excel:
- For each point, take the distance to its centroid and square it.
- Sum all these squared distances.
- This gives the inertia for this k.
For example:
- for k = 1, the centroid is just the overall mean of x1 and x2,
- for k = 2 and k = 3, we take the converged centroids from the sheets where you ran the algorithm.
Then we can plot inertia as a function of k, for example for (k = 1, 2, 3).
For this dataset
- from 1 to 2, the inertia drops a lot,
- from 2 to 3, the improvement is much smaller.
The “elbow” is the value of k after which the decrease in inertia becomes marginal. In the example, it suggests that k = 2 is sufficient.

Conclusion
K-means is a very intuitive algorithm once you see it step by step in Excel.
We start with simple centroids, compute distances, assign points, update the centroids, and repeat. Now, we can see how “machines learn”, right?
Well, this is only the beginning, we will see that different models “learn” in really different ways.
And here is the transition for tomorrow’s article: the unsupervised version of the Nearest Centroid classifier is indeed k-means.
So what would be the unsupervised version of LDA or QDA? We will answer this in the next article.




