Advent of Kotlin: Week 3

As artificial intelligence gets more and more important, this week we will implement an important AI algorithm: k-means clustering.

k-means clustering

This algorithm is used to find a clusters of similar points in our data. For instance, imagine that you have data about how your users act on your website. You could use k-means clustering to find a typical kinds of users.

I've recorded a presentation of the algorithm, together with a simple explanation:

Your task is to implement a generic solver of the algorithm in the KMeansSolver. It is used to find clusters in different kinds spaces:

The functions you need to implement are described in comments. Notice, that the position of the values representing each categories are called a dictionary. In other sources they are called centroids.

/** * Solves k-means problem. * * The implementation should : * 1. Find the initial dictionary with `createInitialMeans`. * 2. Group points by the current state of our dictionary. * 3. For each group, find the new mean position of the element in * dictionary representing them. * 4. If the new error is better then at the previous iteration * (by more than `threshold`), go back to the step 2. * Otherwise return the final dictionary. * * @param points the points in our dataset. * @param dictSize how many elements do we expect in our dictionary (the number of means). * @param threshold the expected minimal improvement with each iteration. */ fun solve( points: List<P>, dictSize: Int, threshold: Double = 0.0001 ): List<P> = TODO() // Group points by the dictionary points representing categories fun grouped(dictionary: List<P>, means: List<P>): Map<P, List<P>> = TODO() // Calculates the error of the dataset, that is the sum of distances to the closest dictionary points protected open fun calculateError(points: List<P>, means: List<P>): Double = TODO() // Calculates the next positions of the dictionary points protected open fun nextMeans(points: List<P>, means: List<P>): List<P> = TODO() protected fun closestMean(point: P, means: List<P>): P = means.minByOrNull { mean -> distanceBetween(point, mean) }!!

More videos about k-means clustering:

Ending

I wish you the best of luck in this and future challenges and can't wait to see your solutions :)