Data classes in Kotlin
In Kotlin, we say that all classes inherit from the
Any superclass, which is at the top of the class hierarchy3. Methods defined in
Any can be called on all objects. These methods are:
equals- used when two objects are compared using
hashCode- used by collections that use the hash table algorithm,
toString- used to represent an object as a string, e.g., in a string template or the
Thanks to these methods, we can represent any object as a string or check the equality of any two objects.
Truth be told,
Anyis represented as a class, but it should actually be considered the head of the type hierarchy, but with some special functions. Consider the fact that
Anyis also the supertype of all interfaces, even though interfaces cannot inherit from classes.
The default implementations of
toString are strongly based on the object’s address in memory. The
equals method returns
true only when the address of both objects is the same, which means the same object is on both sides. The
hashCode method typically transforms an address into a number.
toString produces a string that starts with the class name, then the at sign "@", then the unsigned hexadecimal representation of the hash code of the object.
By overriding these methods, we can decide how a class should behave. Consider the following class
A, which is equal to other instances of the same class and returns a constant hash code and string representation.
I've dedicated separate items in the Effective Kotlin book to implementing a custom
hashCode0, but in practice we rarely need to do that. As it turns out, in modern projects we almost solely operate on only two kinds of objects:
- Active objects, like services, controllers, repositories, etc. Such classes don’t need to override any methods from
Anybecause the default behavior is perfect for them.
- Data model class objects, which represent bundles of data. For such objects, we use the
datamodifier, which overrides the
datamodifier also implements the methods
component2, etc.), which are not inherited and cannot be modified1.
Let's discuss the aforementioned implicit data class methods and the differences between regular class behavior and data class behavior.
Transforming to a string
toString transformation produces a string that starts with the class name, then the at sign "@", and then the unsigned hexadecimal representation of the hash code of the object. The purpose of this is to display the class name and to determine whether two strings represent the same object or not.
data modifier, the compiler generates a
toString that displays the class name and then pairs with the name and value for each primary constructor property. We assume that data classes are represented by their primary constructor properties, so all these properties, together with their values, are displayed during a transformation to a string. This is useful for logging and debugging.
In Kotlin, we check the equality of two objects using
==, which uses the
equals method from
Any. So, this method decides if two objects should be considered equal or not. By default, two different instances are never equal. This is perfect for active objects, i.e., objects that work independently of other instances of the same class and possibly have a protected mutable state.
Classes with the
data modifier represent bundles of data; they are considered equal to other instances if:
- both are of the same class,
- their primary constructor property values are equal.
This is what a simplified implementation of the
equals method generated by the
data modifier for the
Player class looks like:
Implementing a custom
equalsis described in Effective Kotlin, Item 42: Respect the contract of
Another method from
hashCode, which is used to transform an object into an
Int. With a
hashCode method, the object instance can be stored in the hash table data structure implementations that are part of many popular classes, including
HashMap. The most important rule of the
hashCode implementation is that it should:
- be consistent with
equals, so it should return the same
Intfor equal objects, and it must always return the same hash code for the same object.
- spread objects as uniformly as possible in the range of all possible
hashCode is based on an object's address in memory. The
hashCode generated by the
data modifier is based on the hash codes of this object’s primary constructor properties. In both cases, the same number is returned for equal objects.
To learn more about the hash table algorithm and implementing a custom
hashCode method, see Effective Kotlin, Item 41: Respect the contract of
Another method generated by the
data modifier is
copy, which is used to create a new instance of a class but with a concrete modification. The idea is very simple: it is a function with parameters for each primary constructor property, but each of these parameters has a default value, i.e., the current value of the associated property.
This means we can call
copy with no parameters to make a copy of our object with no modifications, but we can also specify new values for the properties we want to change.
copy creates a shallow copy of an object; so, if our object holds a mutable state, a change in one object will be a change in all its copies too.
We do not have this problem when we use
copy for immutable classes, i.e., classes with only
val properties that hold immutable values.
copy was introduced as special support for immutability (for details, see Effective Kotlin, Item 1: Limit mutability).
Notice that data classes are unsuitable for objects that must maintain invariant constraints on mutable properties. For example, in the
User example below, the class would not be able to guarantee that the
surname values are not blank if these variables were mutable (so, defined with
var). Data classes are perfectly fit for immutable properties, whose constraints might be checked during the creation of these objects. In the example below, we can be sure that the
surname values are not blank in an instance of
Kotlin supports a feature called position-based destructuring, which lets us assign multiple variables to components of a single object. For that, we place our variable names in round brackets.
This mechanism relies on position, not names. The object on the right side of the equality sign needs to provide the functions
component2, etc., and the variables are assigned to the results of these methods.
This code works because the
data modifier generates
componentN functions for each primary constructor parameter, according to their order in the constructor.
These are currently all the functionalities that the
data modifier provides. Don't use it if you don't need
copy or destructuring. If you need some of these functionalities for a class representing a bundle of data, use the
data modifier instead of implementing the methods yourself.
When and how should we use destructuring?
Position-based destructuring has pros and cons. Its biggest advantage is that we can name variables however we want, so we can use names like
city in the example below. We can also destructure anything we want as long as it provides
componentN functions. This includes
Map.Entry, both of which have
componentN functions defined as extensions:
On the other hand, position-based destructuring is dangerous. We need to adjust every destructuring when the order or number of elements in a data class changes. When we use this feature, it is very easy to introduce errors into our code by changing the order of the primary constructor’s properties.
We need to be careful with destructuring. It is useful to use the same names as data class primary constructor properties. In the case of an incorrect order, an IntelliJ/Android Studio warning will be shown. It might even be useful to upgrade this warning to an error.
Destructuring a single value in lambda is very confusing, especially since parentheses around arguments in lambda expressions are either optional or required in some languages.
Data class limitations
The idea behind data classes is that they represent a bundle of data; their constructors allow us to specify all this data, and we can access it through destructuring or by copying them to another instance with the
copy method. This is why only primary constructor properties are considered by the methods defined in data classes.
Data classes are supposed to keep all the essential properties in their primary constructor. Inside the body, we should only keep redundant immutable properties, which means properties whose value is distinctly calculated from primary constructor properties, like
fullName, which is calculated from
surname. Such values are also ignored by data class methods, but their value will always be correct because it will be calculated when a new object is created.
You should also remember that data classes must be final and so cannot be used as a super-type for inheritance.
Prefer data classes instead of tuples
Data classes offer more than what is generally provided by tuples. Historically, they replaced tuples in Kotlin since they are considered better practice2. The only tuples that are left are
Triple, but these are data classes under the hood:
The easiest way to create a
Pair is by using the
to function. This is a generic infix extension function, defined as follows (we will discuss both generic and extension functions in later chapters).
Thanks to the infix modifier, a method can be used by placing its name between arguments, as the infix name suggests. The result
Pair is typed, so the result type from the
"ABC" to 123 expression is
These tuples remain because they are very useful for local purposes, like:
- When we immediately name values:
- To represent an aggregate that is not known in advance, as is commonly the case in standard library functions:
In other cases, we prefer data classes. Take a look at an example: let’s say that we need a function that parses a full name into a name and a surname. One might represent this name and surname as a
The problem is that when someone reads this code, it is not clear that
Pair<String, String> represents a full name. What is more, it is not clear what the order of the values is, therefore someone might think that the surname goes first:
To make usage safer and the function easier to read, we should use a data class instead:
This costs nearly nothing and improves the function significantly:
The return type of this function is more clear.
The return type is shorter and easier to pass forward.
If a user destructures variables with correct names but in incorrect positions, a warning will be displayed in IntelliJ.
If you don’t want this class in a wider scope, you can restrict its visibility. It can even be private if you only need to use it for some local processing in a single file or class. It is worth using data classes instead of tuples. Classes are cheap in Kotlin, so don’t be afraid to use them in your projects.
In this chapter, we've learned about
Any, which is a superclass of all classes. We’ve also learned about methods defined by
toString. We’ve also learned that there are two primary types of objects. Regular objects are considered unique and do not expose their details. Data class objects, which we made using the
data modifier, represent bundles of data (we keep them in primary constructor properties). They are equal when they hold the same data. When transformed to a string, they print all their data. They additionally support destructuring and making a copy with the
copy method. Two generic data classes in Kotlin stdlib are
Triple, but (apart from certain cases) we prefer to use custom data classes instead of these. Also, for the sake of safety, when we destructure a data class, we prefer to match the variable names with the parameter names.
Now, let's move on to a topic dedicated to special Kotlin syntax that lets us create objects without defining a class.
These are Item 42: Respect the contract of
equals and Item 43: Respect the contract of
This type of class is so popular that in Java it is common practice to auto-generate
toString in IntelliJ or using the Lombok library.
Kotlin had support for tuples when it was still in the beta version. We were able to define a tuple by brackets and a set of types, like
(Int, String, String, Long). What we achieved behaved the same as data classes in the end, but it was far less readable. Can you guess what type this set of types represents? It can be anything. Using tuples is tempting, but using data classes is nearly always better. This is why tuples were removed, and only
Triple are left.
Any is an analog to