Effective Kotlin Item 37: Use the data modifier to represent a bundle of data
This is a chapter from the book Effective Kotlin. You can find it on LeanPub or Amazon.
In modern projects, we almost solely operate on only two kinds of objects:
- Active objects, like services, controllers, repositories, etc. Such classes don’t need to override any methods from
Any
because the default behavior is perfect for them. Each such object is considered unique because even if two accidentally have the same state, this state changes independently, so we don’t need to overrideequals
andhashCode
. We don’t want to expose such objects’ inner state in an uncontrolled way, so they don’t need to overridetoString
. - Data model class objects, which represent bundles of data. For such objects, we use the
data
modifier, which overrides thetoString
,equals
, andhashCode
methods. It makes two objects with the same data (the same primary constructor properties) equal. It also makes thetoString
method display the name of the class and the values and names of all primary constructor properties. It also makes thehashCode
method coherent withequals
. Thedata
modifier also implements thecopy
andcomponentN
(component1
,component2
, etc.) methods for convenience of modifying and destructuring such objects.
Let's start from a short overview of the methods that the data
modifier overrides.
The methods that data
modifier overrides
When we add the data
modifier, it generates the following methods:
toString
equals
andhashCode
copy
componentN
(component1
,component2
, etc.)
Let’s discuss them in turn.
toString
displays the name of the class and the values and names of all primary constructor properties. It is useful for logging and debugging.
equals
checks if all primary constructor properties are equal. hashCode
is coherent with it (see 41: Respect the contract of hashCode).
copy
is especially useful for immutable data classes. It creates a new object where each primary constructor’s properties have the same value by default, but each of them can be changed using named arguments.
This is what copy
would look like for the class Person
if we wrote it ourselves:
Notice that the copy
method makes a shallow copy of an object, but this is not a problem when the object is immutable as we do not need deep copies for such objects.
componentN
functions (component1
, component2
, etc.) allow position-based destructuring, as in the example below:
Destructuring in Kotlin translates directly into variable definitions using the componentN
functions, so this is what the code above will be compiled to under the hood:
These are currently all the functionalities that the data
modifier provides. Don't use it if you don't need toString
, equals
, hashCode
, copy
or destructuring. If you need some of these functionalities for a class representing a bundle of data, use the data
modifier instead of implementing the methods yourself.
When and how should we use destructuring?
Kotlin currently provides only position-based property destructuring, that has pros and cons. The biggest advantage is that we can name variables however we want. We can also destructure everything we want as long as it provides componentN
functions. This includes List
and Map.Entry
, both of which have componentN
functions defined as extensions:
On the other hand, position-based destructuring is dangerous. We need to adjust every destructuring when the order or number of elements in a data class changes. When we use this feature, it is very easy to introduce errors into our code by changing the order of the primary constructor’s properties.
We need to be careful with destructuring. It is useful to use the same names as data class primary constructor properties. In the case of an incorrect order, an IntelliJ/Android Studio warning will be shown. It might be even useful to upgrade this warning to an error.
Do not destructure to get just the first value as this might be really confusing and misleading for anyone who will read your code in the future, especially when you destructure in lambda expressions.
Destructuring a single value in lambda is very confusing, especially since parentheses around arguments in lambda expressions are either optional or required in some languages.
Prefer data classes instead of tuples
Data classes offer more than what is generally provided by tuples. Historically, they replaced tuples in Kotlin since they are considered better practice1. The only tuples that are left are Pair
and Triple
, but they are data classes under the hood:
These tuples remained because they are very useful for local purposes, like:
- When we immediately name values:
- To represent an aggregate not known in advance, as is commonly found in standard library functions:
In other cases, we prefer data classes. Take a look at an example: let’s say that we need a function that parses a full name into a name and a surname. One might represent this name and surname as a Pair<String, String>
:
The problem is that when someone reads this code, it is not clear that Pair<String, String>
represents a full name. What is more, it is not clear what the order of the values is, therefore someone could think that the surname goes first:
To make usage safer and the function easier to read, we should use a data class instead:
It costs nearly nothing and improves the function significantly:
The return type of this function is more clear.
The return type is shorter and easier to pass forward.
If a user destructures to variables with correct names but in incorrect positions, a warning will be displayed.
If you don’t want this class in a wider scope, you can restrict its visibility. It can even be private if you need to use it for some local processing only in a single file or class. It is worth using data classes instead of tuples. Classes are cheap in Kotlin, so don’t be afraid to use them in your projects.
Summary
- Use
data
modifier for classes that are used to represent a bundle of data. - Be careful with destructuring, and when you do that, prefer matching the variable name with the property name.
- Prefer data classes instead of tuples. Defining a data class costs little, and it makes the code more readable and less error-prone.
Kotlin had support for tuples when it was still in the beta version. We were able to define a tuple by brackets and a set of types, like (Int, String, String, Long)
. What we achieved behaved the same as data classes in the end, but it was far less readable. Can you guess what type this set of types represents? It can be anything. Using tuples is tempting, but using data classes is nearly always better. This is why tuples were removed and only Pair
and Triple
are left.