Effective Kotlin Item 53: Consider using groupingBy instead of groupBy
As part of many types of complex collection processing, we need to group elements. Here are a few tasks that require this operation:
- Counting the number of users in a city, based on a list of users.
- Finding the number of points received by each team, based on a list of players.
- Finding the best option in each category, based on a list of options.
The easiest way to solve this problem is by using the
groupBy function, which returns a
Map<K, List<V>>, where
V is the type of elements in the collection we started from, and
K is the type we are mapping to. So, if we have a
User list that we group by an id of type
String, then the returned map is
Map<String, List<User>>. In other words,
groupBy divides our collection into multiple small collections: one for each key. This is how this function can be used to solve the above problems:
These are good solutions. When we use
groupBy, we receive a
Map as a result, and we can use all the different methods defined on it. This makes
groupBy a really nice intermediate step. I would even say that it should be preferred due to its convenience and readability.
On the other hand, if we are dealing with some performance-critical parts of our code, this step is not necessary. It takes some time to create a collection for each category we have. Instead, we could use the
groupingBy function, which does not do any additional operations: it just wraps the iterable together with the specified key selector.
Grouping can be considered a bit like a map from a key to a list of elements, but it supports far fewer operations. However, since using it might be an important optimization, let's analyze the options.
The first problem (counting users per city) can be solved easily. The Kotlin Standard Library already has the
eachCount function, which easily gives us a map from the city to the number of users.
Finding the number of points received by each team is a bit harder. We can use the
fold function, which is like a
fold on an iterable, but it has a separate accumulator for each key. So, calculating the number of points per team is very similar to calculating the number of points in a collection.
It would make sense to extract an extension function to calculate the sum of elements in each group. We might call it
Finally, the last problem: we need to find the biggest element in the group. We might use
fold, but this would require a "zero" value, which we don't have. Instead, we can use
reduce, which just starts from the first element. Its lambda has one additional parameter: the reference to the key of the group (we don't use it in the example below, so there is
Now, you might have noticed that we could also have used
reducein the previous problem. That is right, and such a solution would be more efficient. I just wanted to present both options.
Again, we can extract an extension function.
The last important function from the stdlib that is defined on
aggregate, which is very similar to
reduce. It iterates over all the elements and aggregates for each key. Its operation has 4 parameters: key of the current element; accumulator (also per element) or
null for the first element with this key; reference to the element; boolean, which shows if this element is the first element for this key. This is how our last problem can be solved using
groupBy function is part of many collection processing processes. It is convenient to use as it returns a
Map that has plenty of useful functions. Its alternative is
groupingBy, which is better for performance but is generally harder to use. It currently supports the following functions:
aggregate. Using them, we can define other functions we might need, just as we defined
eachMaxBy in this chapter.