Data classes in Kotlin
This is a chapter from the book Kotlin Essentials. You can find it on LeanPub or Amazon. It is also available as a course.
In Kotlin, we say that all classes inherit from the Any
superclass, which is at the top of the class hierarchy3. Methods defined in Any
can be called on all objects. These methods are:
equals
- used when two objects are compared using==
,hashCode
- used by collections that use the hash table algorithm,toString
- used to represent an object as a string, e.g., in a string template or theprint
function.
Thanks to these methods, we can represent any object as a string or check the equality of any two objects.
Truth be told,
Any
is represented as a class, but it should actually be considered the head of the type hierarchy, but with some special functions. Consider the fact thatAny
is also the supertype of all interfaces, even though interfaces cannot inherit from classes.
The default implementations of equals
, hashCode
, and toString
are strongly based on the object’s address in memory. The equals
method returns true
only when the address of both objects is the same, which means the same object is on both sides. The hashCode
method typically transforms an address into a number. toString
produces a string that starts with the class name, then the at sign "@", then the unsigned hexadecimal representation of the hash code of the object.
By overriding these methods, we can decide how a class should behave. Consider the following class A
, which is equal to other instances of the same class and returns a constant hash code and string representation.
I've dedicated separate items in the Effective Kotlin book to implementing a custom equals
and hashCode
0, but in practice we rarely need to do that. As it turns out, in modern projects we almost solely operate on only two kinds of objects:
- Active objects, like services, controllers, repositories, etc. Such classes don’t need to override any methods from
Any
because the default behavior is perfect for them. - Data model class objects, which represent bundles of data. For such objects, we use the
data
modifier, which overrides thetoString
,equals
, andhashCode
methods. Thedata
modifier also implements the methodscopy
andcomponentN
(component1
,component2
, etc.), which are not inherited and cannot be modified1.
Let's discuss the aforementioned implicit data class methods and the differences between regular class behavior and data class behavior.
Transforming to a string
The default toString
transformation produces a string that starts with the class name, then the at sign "@", and then the unsigned hexadecimal representation of the hash code of the object. The purpose of this is to display the class name and to determine whether two strings represent the same object or not.
With the data
modifier, the compiler generates a toString
that displays the class name and then pairs with the name and value for each primary constructor property. We assume that data classes are represented by their primary constructor properties, so all these properties, together with their values, are displayed during a transformation to a string. This is useful for logging and debugging.
Objects equality
In Kotlin, we check the equality of two objects using ==
, which uses the equals
method from Any
. So, this method decides if two objects should be considered equal or not. By default, two different instances are never equal. This is perfect for active objects, i.e., objects that work independently of other instances of the same class and possibly have a protected mutable state.
Classes with the data
modifier represent bundles of data; they are considered equal to other instances if:
- both are of the same class,
- their primary constructor property values are equal.
This is what a simplified implementation of the equals
method generated by the data
modifier for the Player
class looks like:
Implementing a custom
equals
is described in Effective Kotlin, Item 42: Respect the contract ofequals
.
Hash code
Another method from Any
is hashCode
, which is used to transform an object into an Int
. With a hashCode
method, the object instance can be stored in the hash table data structure implementations that are part of many popular classes, including HashSet
and HashMap
. The most important rule of the hashCode
implementation is that it should:
- be consistent with
equals
, so it should return the sameInt
for equal objects, and it must always return the same hash code for the same object. - spread objects as uniformly as possible in the range of all possible
Int
values.
The default hashCode
is based on an object's address in memory. The hashCode
generated by the data
modifier is based on the hash codes of this object’s primary constructor properties. In both cases, the same number is returned for equal objects.
To learn more about the hash table algorithm and implementing a custom hashCode
method, see Effective Kotlin, Item 43: Respect the contract of hashCode
.
Copying objects
Another method generated by the data
modifier is copy
, which is used to create a new instance of a class but with a concrete modification. The idea is very simple: it is a function with parameters for each primary constructor property, but each of these parameters has a default value, i.e., the current value of the associated property.
This means we can call copy
with no parameters to make a copy of our object with no modifications, but we can also specify new values for the properties we want to change.
Note that copy
creates a shallow copy of an object; so, if our object holds a mutable state, a change in one object will be a change in all its copies too.
We do not have this problem when we use copy
for immutable classes, i.e., classes with only val
properties that hold immutable values. copy
was introduced as special support for immutability (for details, see Effective Kotlin, Item 1: Limit mutability).
Notice that data classes are unsuitable for objects that must maintain invariant constraints on mutable properties. For example, in the User
example below, the class would not be able to guarantee that the name
and surname
values are not blank if these variables were mutable (so, defined with var
). Data classes are perfectly fit for immutable properties, whose constraints might be checked during the creation of these objects. In the example below, we can be sure that the name
and surname
values are not blank in an instance of User
.
Destructuring
Kotlin supports a feature called position-based destructuring, which lets us assign multiple variables to components of a single object. For that, we place our variable names in round brackets.
This mechanism relies on position, not names. The object on the right side of the equality sign needs to provide the functions component1
, component2
, etc., and the variables are assigned to the results of these methods.
This code works because the data
modifier generates componentN
functions for each primary constructor parameter, according to their order in the constructor.
These are currently all the functionalities that the data
modifier provides. Don't use it if you don't need toString
, equals
, hashCode
, copy
or destructuring. If you need some of these functionalities for a class representing a bundle of data, use the data
modifier instead of implementing the methods yourself.
When and how should we use destructuring?
Position-based destructuring has pros and cons. Its biggest advantage is that we can name variables however we want, so we can use names like country
and city
in the example below. We can also destructure anything we want as long as it provides componentN
functions. This includes List
and Map.Entry
, both of which have componentN
functions defined as extensions:
On the other hand, position-based destructuring is dangerous. We need to adjust every destructuring when the order or number of elements in a data class changes. When we use this feature, it is very easy to introduce errors into our code by changing the order of the primary constructor’s properties.
We need to be careful with destructuring. It is useful to use the same names as data class primary constructor properties. In the case of an incorrect order, an IntelliJ/Android Studio warning will be shown. It might even be useful to upgrade this warning to an error.
Destructuring a single value in lambda is very confusing, especially since parentheses around arguments in lambda expressions are either optional or required in some languages.
Data class limitations
The idea behind data classes is that they represent a bundle of data; their constructors allow us to specify all this data, and we can access it through destructuring or by copying them to another instance with the copy
method. This is why only primary constructor properties are considered by the methods defined in data classes.
Data classes are supposed to keep all the essential properties in their primary constructor. Inside the body, we should only keep redundant immutable properties, which means properties whose value is distinctly calculated from primary constructor properties, like fullName
, which is calculated from name
and surname
. Such values are also ignored by data class methods, but their value will always be correct because it will be calculated when a new object is created.
You should also remember that data classes must be final and so cannot be used as a super-type for inheritance.
Prefer data classes instead of tuples
Data classes offer more than what is generally provided by tuples. Historically, they replaced tuples in Kotlin since they are considered better practice2. The only tuples that are left are Pair
and Triple
, but these are data classes under the hood:
The easiest way to create a Pair
is by using the to
function. This is a generic infix extension function, defined as follows (we will discuss both generic and extension functions in later chapters).
Thanks to the infix modifier, a method can be used by placing its name between arguments, as the infix name suggests. The result Pair
is typed, so the result type from the "ABC" to 123
expression is Pair<String, Int>
.
These tuples remain because they are very useful for local purposes, like:
- When we immediately name values:
- To represent an aggregate that is not known in advance, as is commonly the case in standard library functions:
In other cases, we prefer data classes. Take a look at an example: let’s say that we need a function that parses a full name into a name and a surname. One might represent this name and surname as a Pair<String, String>
:
The problem is that when someone reads this code, it is not clear that Pair<String, String>
represents a full name. What is more, it is not clear what the order of the values is, therefore someone might think that the surname goes first:
To make usage safer and the function easier to read, we should use a data class instead:
This costs nearly nothing and improves the function significantly:
The return type of this function is more clear.
The return type is shorter and easier to pass forward.
If a user destructures variables with correct names but in incorrect positions, a warning will be displayed in IntelliJ.
If you don’t want this class in a wider scope, you can restrict its visibility. It can even be private if you only need to use it for some local processing in a single file or class. It is worth using data classes instead of tuples. Classes are cheap in Kotlin, so don’t be afraid to use them in your projects.
Summary
In this chapter, we've learned about Any
, which is a superclass of all classes. We’ve also learned about methods defined by Any
: equals
, hashCode
, and toString
. We’ve also learned that there are two primary types of objects. Regular objects are considered unique and do not expose their details. Data class objects, which we made using the data
modifier, represent bundles of data (we keep them in primary constructor properties). They are equal when they hold the same data. When transformed to a string, they print all their data. They additionally support destructuring and making a copy with the copy
method. Two generic data classes in Kotlin stdlib are Pair
and Triple
, but (apart from certain cases) we prefer to use custom data classes instead of these. Also, for the sake of safety, when we destructure a data class, we prefer to match the variable names with the parameter names.
Now, let's move on to a topic dedicated to special Kotlin syntax that lets us create objects without defining a class.
These are Item 42: Respect the contract of equals
and Item 43: Respect the contract of hashCode
.
This type of class is so popular that in Java it is common practice to auto-generate equals
, hashCode
, and toString
in IntelliJ or using the Lombok library.
Kotlin had support for tuples when it was still in the beta version. We were able to define a tuple by brackets and a set of types, like (Int, String, String, Long)
. What we achieved behaved the same as data classes in the end, but it was far less readable. Can you guess what type this set of types represents? It can be anything. Using tuples is tempting, but using data classes is nearly always better. This is why tuples were removed, and only Pair
and Triple
are left.
So Any
is an analog to Object
in Java, JavaScript or C#. There is no direct analog in C++.