Effective Kotlin Item 42: Respect the contract of equals
This is a chapter from the book Effective Kotlin. You can find it on LeanPub or Amazon.
In Kotlin, every object extends Any
, which has a few methods with well-established contracts. These methods are:
equals
hashCode
toString
Their contract is described in their comments and is elaborated in the official documentation. As I described in Item 31: Respect abstraction contracts, every subtype of a type with a contract should respect this contract. The aforementioned methods have an important position in Kotlin as they have been defined since the beginning of Java, therefore many objects and functions depend on their contracts. Breaking these contracts will often cause some objects or functions to not work properly; therefore, in this and the next items we will talk about overriding these functions and about their contracts. Let’s start with equals
.
Equality
In Kotlin, there are two types of equality:
Structural equality - checked with the
equals
method or the==
operator (and its negated counterpart!=
).a == b
translates toa.equals(b)
whena
is not nullable, otherwise it translates toa?.equals(b) ?: (b === null)
.Referential equality - checked with the
===
operator (and its negated counterpart!==
); returnstrue
when both sides point to the same object.
Since equals
is implemented in Any
, which is the superclass of every class, we can check the equality of any two objects. However, using operators to check equality is not allowed when objects are not of the same type:
Objects either need to have the same type, or one needs to be a subtype of another:
This limitation originates in the fact that it does not make sense to check the equality of two objects of different types, as will become clear when we explain the contract of equals.
Why do we need equals?
The default implementation of equals
checks if another object is exactly the same instance, just like the referential equality (===
). It means that every object is unique by default:
Such behavior is useful for many objects. It is perfect for active elements, like a database connection, a repository, or a thread. However, there are objects for which we need to represent equality differently. A popular alternative is a data class equality that checks if all primary constructor properties are equal:
Such behavior is perfect for classes that are represented by the data they hold, so we often use the data modifier in data model classes or in other data holders.
Notice that data class equality also helps when we need to compare some but not all properties, e.g., when we want to compare everything except for a cache or other redundant properties. Here is an example of an object that represents date and time and has the properties asStringCache
and changed
, which should not be compared by equality checking:
The same can be achieved using a data modifier:
Just notice that copy
in such a case will not copy properties that are not declared in the primary constructor. Such behavior is correct only when these additional properties are truly redundant (the object will behave correctly if they are lost).
Thanks to these two alternatives, namely default and data class equality, we rarely need to implement equality ourselves in Kotlin.
An example of when we might need to implement equality is when we would like to compare only one property. For instance, a User
class might have an assumption that two users are equal when their id
is identical.
As you can see, we implement equals
ourselves when:
- We need its logic to differ from the default logic.
- We need to compare only a subset of properties.
- We do not want our object to be a data class, or the properties we need to compare are not in the primary constructor.
The contract of equals
This is how equals
is described in its comments (Kotlin 1.9.0, formatted):
Indicates whether some other object is "equal to" this one. Implementations must fulfil the following requirements:
- Reflexive: for any non-null value
x
,x.equals(x)
should return true. - Symmetric: for any non-null values
x
andy
,x.equals(y)
should return true if and only ify.equals(x)
returns true. - Transitive: for any non-null values
x
,y
, andz
, ifx.equals(y)
returns true andy.equals(z)
returns true, thenx.equals(z)
should return true. - Consistent: for any non-null values
x
andy
, multiple invocations ofx.equals(y)
consistently return true or consistently return false, provided no information used inequals
comparisons on the objects is modified. - Never equal to null: for any non-null value
x
,x.equals(null)
should return false.
Additionally, we expect equals
, toString
and hashCode
to be fast. This is not a part of the official contract, but it would be highly unexpected to wait a few seconds to check if two elements are equal.
All these requirements are important. They have been assumed since the beginning of the JVM platform, so now many objects depend on these assumptions. To understand this contract well, let's discuss each of these requirements separately.
Object equality should be reflexive, meaning that x.equals(x)
returns true
. It sounds obvious, but this can be violated. For instance, someone might want to make a Time
object that can compare milliseconds as well as represent the current time:
Notice that here the result is inconsistent, so it also violates the last principle.
When an object is not equal to itself, it might not be found in most collections even if it is there when we check using the contains
method. Such an object will not work correctly in most unit test assertions either.
When a result is not consistent, we cannot trust it. We can never be sure if a result is correct or is just a result of inconsistency.
How should we improve our Time
class? A simple solution is checking separately if the object represents the current time; if it doesn’t, we should check if it has the same timestamp. Although this is a typical example of a tagged class, as described in Item 40: Prefer class hierarchies instead of tagged classes, it would be even better to use class hierarchy instead:
Object equality should be symmetric, meaning that the result of x == y
and y == x
should always be the same. This can easily be violated when we accept objects of a different type in our equality. For instance, let’s say that we implemented a class to represent complex numbers and made its equality accept Double
:
The problem is that Double
does not accept equality with Complex
. Therefore, the result depends on the order of the elements:
Lack of symmetry means, for instance, unexpected results on contains
collections or on unit tests’ assertions.
When equality is not symmetric, and it is used by another object, we cannot trust the result because it depends on whether this object compares x
to y
or y
to x
. This fact is not documented, and it is not a part of the contract as object creators assume that both should work in the same way (they assume symmetry). Also, creators might do some refactorization at any time, thus accidentally changing the order of these values. If your object is not symmetric, it might lead to unexpected and really hard-to-debug errors in your implementation. This is why when we implement equals
, we should always consider symmetry.
The general solution is that we should not accept equality between different classes. I’ve never seen a case in which it would be reasonable. Notice that similar classes are not equal to each other in Kotlin. 1 is not equal to 1.0, and 1.0 is not equal to 1.0F. These are different types, and they are not even comparable. In Kotlin we cannot use the ==
operator between two different types that do not have a common superclass other than Any
:
Object equality should be transitive, meaning that for any non-null reference values x
, y
, and z
, if x.equals(y)
returns true
and y.equals(z)
returns true
, then x.equals(z)
should return true
. The biggest problem with transitivity is when we implement different kinds of equality that check a different subtype of properties. For instance, let’s say that we have Date
and DateTime
defined this way:
The problem with the above implementation is that when we compare two DateTime
instances, we check more properties than when we compare DateTime
and Date
. Therefore, two DateTime
instances with the same day but a different time will not be equal to each other, but they’ll both be equal to the same Date
. As a result, their relation is not transitive:
Notice that the restriction to compare only objects of the same type doesn’t help here because we’ve used inheritance. Such inheritance violates the Liskov substitution principle and should not be used. In this case, use composition instead of inheritance (Item 36: Prefer composition over inheritance). When you do use composition instead of inheritance, do not compare two objects of different types. These classes are perfect examples of objects that hold data, so representing them this way is a good choice:
Equality should be consistent, meaning that the method invoked on two objects should always return the same result unless one of these objects was modified. For immutable objects, the result should always be the same. In other words, we expect equals
to be a pure function (it should not modify the state of an object) whose result always depends only on the input and the state of its receiver. We’ve seen a Time
class which violated this principle. This rule was also famously violated in java.net.URL.equals()
, what will be explained soon.
An object other than null
should never be equal to null: for any non-null value x
, x.equals(null)
must return false
. This is important because null
should be unique, and no object should be equal to it.
The problem with equals in java.net.URL
One example of a really poorly designed equals
is the one from java.net.URL
. The equality of two java.net.URL
objects depends on a network operation as two hosts are considered equivalent if both hostnames can be resolved to the same IP addresses. Take a look at the following example:
Should it return true or false? According to the contract, it should be true
, but the result is inconsistent. In normal conditions, it should print true
because the IP address for both URLs is resolved as the same; however, if you have the internet disconnected, it will print false
. You can check this yourself. This is a big mistake! Equality should not be network-dependent.
Here are the most important problems with this solution:
This behavior is inconsistent. For instance, two URLs could be equal when the internet connection is available but unequal when it is not. Also, IP addresses resolved by a URL can change over time, so the result might be inconsistent.
The network may be slow, and we expect
equals
andhashCode
to be fast. A typical problem is when we check if a URL is present in a list. Such an operation would require a network call for each element in the list. Also, on some platforms, like Android, network operations are prohibited on the main thread. As a result, even adding aURL
to a set needs to be started on a separate thread.The defined behavior is known to be inconsistent with virtual hosting in HTTP. Equal IP addresses do not imply equal content. Virtual hosting allows unrelated sites to share an IP address. This method could report two otherwise unrelated URLs to be equal because they're hosted on the same server.
In Android, this problem was fixed in Android 4.0 (Ice Cream Sandwich). Since that release, URLs are only equal if their hostnames are equal. When we use Kotlin/JVM on other platforms, it is recommended to use java.net.URI
instead of java.net.URL
.
Implementing equals
I recommend against implementing equals
yourself unless you have a good reason. Instead, use the default implementation or data class equality. If you do need custom equality, always consider whether your implementation is reflexive, symmetric, transitive, and consistent. The typical implementation of equals
looks like this:
Make classes with custom equals
final, or beware that subclasses should not change how equality behaves. It is hard to make custom equality while inheritance at the same time. Some even say it is impossible1. This is one of the reasons why data classes are final.
Summary
- The
==
operator translates toequals
and checks structural equality. The===
operator checks referential equality, i.e., if two values are exactly the same object. - Equality is reflexive, symmetric, transitive, and consistent. If you implement
equals
yourself, make sure it follows these rules. - To fulfill the contract of
equals
, only classes of the same type should be considered equal.equals
should be fast and should not require an internet connection. A famous example of a poor implementation from Java stdlib isjava.net.URL
equality.
As Effective Java by Joshua Bloch (third edition) claims in Item 10: Obey the general contract when overriding equals: "There is no way to extend an instantiable class and add a value component while preserving the equals contract, unless you’re willing to forgo the benefits of object-oriented abstraction". I have a feeling it is true, but I cannot prove it, so I avoid definitive statements.