Advent of Kotlin: Week 3
As artificial intelligence gets more and more important, this week we will implement an important AI algorithm: k-means clustering.
k-means clustering
This algorithm is used to find a clusters of similar points in our data. For instance, imagine that you have data about how your users act on your website. You could use k-means clustering to find a typical kinds of users.
I've recorded a presentation of the algorithm, together with a simple explanation:
Your task is to implement a generic solver of the algorithm in the KMeansSolver. It is used to find clusters in different kinds spaces:
The functions you need to implement are described in comments. Notice, that the position of the values representing each categories are called a dictionary. In other sources they are called centroids.
/**
* Solves k-means problem.
*
* The implementation should :
* 1. Find the initial dictionary with `createInitialMeans`.
* 2. Group points by the current state of our dictionary.
* 3. For each group, find the new mean position of the element in
* dictionary representing them.
* 4. If the new error is better then at the previous iteration
* (by more than `threshold`), go back to the step 2.
* Otherwise return the final dictionary.
*
* @param points the points in our dataset.
* @param dictSize how many elements do we expect in our dictionary (the number of means).
* @param threshold the expected minimal improvement with each iteration.
*/
fun solve(
points: List<P>,
dictSize: Int,
threshold: Double = 0.0001
): List<P> = TODO()
// Group points by the dictionary points representing categories
fun grouped(dictionary: List<P>, means: List<P>): Map<P, List<P>> = TODO()
// Calculates the error of the dataset, that is the sum of distances to the closest dictionary points
protected open fun calculateError(points: List<P>, means: List<P>): Double = TODO()
// Calculates the next positions of the dictionary points
protected open fun nextMeans(points: List<P>, means: List<P>): List<P> = TODO()
protected fun closestMean(point: P, means: List<P>): P =
means.minByOrNull { mean -> distanceBetween(point, mean) }!!
More videos about k-means clustering:
Ending
I wish you the best of luck in this and future challenges and can't wait to see your solutions :)