Effective Kotlin Item 53: Consider using groupingBy instead of groupBy

This is a chapter from the book Effective Kotlin. You can find it on LeanPub or Amazon.

As a part of numerous complex collection processing, we need to group elements. Here are a few tasks, that require this operation:

  • Count the number of users in each city, based on a list of users.
  • Find the number of points received by each team, based on a list of players.
  • Find the best option in each category, based on the list of options.

groupBy

The easiest way to solve this problem is by using groupBy function. It returns a Map<K, List<V>>, where V is the type of elements on the collection we started from, and K is the type we are mapping to. So if we have a list of User that we group by an id of type String, then the returned map is Map<String, List<User>>. In other words, groupBy is dividing our collection into a smaller one for each key. This is how this function can be used to solve the above problems:

// Count the number of users in each city val usersCount: Map<City, Int> = users .groupBy { it.city } .mapValues { (_, users) -> users.size } // Find the number of points received by each team val pointsPerTeam: Map<Team, Int> = players .groupBy { it.team } .mapValues { (_, players) -> players.sumOf { it.points } } // Find the best option in each category val bestFormatPerQuality: Map<Quality, Resolution> = formats.groupBy { it.quality } .mapValues { (_, quality) -> quality.maxByOrNull { it.resolution }!! // it is fine to use !! here, because // this collection cannot be empty }

Those are good solutions. When we use groupBy, we receive a Map as a result, and we can use all the different methods defined on it. This makes groupBy a really nice intermediate step. I would even say, it should be preferred for convenience and readability.

groupingBy

On the other hand, if we are dealing with some performance-critical parts of our code, this step is not necessary. It takes some time to create as many collections as many categories we have. Instead, we could use the function groupingBy, that is not doing any additional operations. It just wraps the iterable, together with specified key selector.

public inline fun <T, K> Iterable<T>.groupingBy( crossinline keySelector: (T) -> K ): Grouping<T, K> { return object : Grouping<T, K> { override fun sourceIterator(): Iterator<T> = this@groupingBy.iterator() override fun keyOf(element: T): K = keySelector(element) } }

The returned Grouping can be considered a bit like a map from key to a list of elements, but it has much fewer operations supported. Although since using it might be an important optimization, let's analyze what are the options.

The first problem (counting users per city), can be solved easily. There is already eachCount function on the Kotlin Standard Library. We just need to use it, to have a map from the city to the number of users.

val usersCount = users.groupingBy { it.city } .eachCount()

Finding the number of points received by each team is a bit harder. We can use fold function. It is like a fold on iterable, but it is having a separate accumulator for each key. So calculating the number of points per team is very similar to calculating the number of all the points in a collection.

val pointsPerTeam = players .groupingBy { it.team } .fold(0) { acc, elem -> acc + elem.points }

It would make sense to extract an extension function to calculate a sum on each group. We might call it eachSumBy.

fun <T, K> Grouping<T, K>.eachSumBy( selector: (T) -> Int ): Map<K, Int> = fold(0) { acc, elem -> acc + selector(elem) } val pointsPerTeam = players .groupingBy { it.team } .eachSumBy { it.points }

Finally, the last problem. We need to find the biggest element in the group. We might use fold, but this would require some "zero" value, which we don't have. Instead, we can use reduce, that just starts from the first element. Its lambda has one more parameter, that is the reference to the key of the group (in the below example we don't use it, so there is _ instead).

val bestFormatPerQuality = formats .groupingBy { it.quality } .reduce { _, acc, elem -> if (acc.resolution > elem.resolution) acc else elem }

Now you might notice, that we could have used reduce in the previous problem as well. That is right, and such a solution would be more efficient. I just wanted to present both options.

Again, we can extract an extension function.

// Could be optimized to keep accumulator selector inline fun <T, K> Grouping<T, K>.eachMaxBy( selector: (T) -> Int ): Map<K, T> = reduce { _, acc, elem -> if (selector(acc) > selector(elem)) acc else elem } val bestFormatPerQuality = formats .groupingBy { it.quality } .eachMaxBy { it.resolution }

The last important function from the stdlib defined on Grouping is aggregate. It is very similar to fold and reduce. It iterates over all the elements and aggregates for each key. Its operation has 4 parameters: key of the current element; accumulator (also per element) or null for the first element with this key; reference to the element; boolean saying if it is the first element for this key. This is how our last problem can be solved using aggregate:

val bestFormatPerQuality = formats .groupingBy { it.quality } .aggregate { _, acc: VideoFormat?, elem: VideoFormat, _ -> when { acc == null -> elem acc.resolution > elem.resolution -> acc else -> elem } }

Summary

Function groupBy is a part of many collection processing processes. It is convenient to use, as it returns a Map, that has plenty of useful functions. Its alternative is groupingBy, which is better for performance, but generally harder to use. It currently supports the following functions: eachCount, fold, reduce, and aggregate. Using them, we can define other functions we might need, like in this chapter we defined eachSumBy and eachMaxBy.