Effective Kotlin Item 47: Avoid unnecessary object creation
Object creation can sometimes be expensive and always costs something. This is why avoiding unnecessary object creation can be an important optimization. It can be done on many levels. For instance, in JVM it is guaranteed that a string object will be reused by any other code running in the same virtual machine that happens to contain the same string literal2:
Boxed primitives (Integer, Long) are also reused in JVM when they are small (by default Integer Cache holds numbers in a range from -128 to 127).
Reference equality (
===) shows that this is the same object. If we use number that is either smaller than -128 or bigger than 127 though, different objects will be produced:
Notice that a nullable type is used to force
Integer instead of
int under the hood. When we use
Int, it is generally compiled to the primitive
int, but when we make it nullable or when we use it as a type argument,
Integer is used instead. It is because primitive cannot be
null and cannot be used as a type argument.
Knowing that such mechanisms were introduced in the language, you might wonder how significant they are. Is object creation expensive?
Is object creation expensive?
Wrapping something into an object has 3 parts of cost:
Objects take additional space. In a modern 64-bit JDK, an object has a 12-byte header, padded to a multiple of 8 bytes, so the minimum object size is 16 bytes. For 32-bit JVMs, the overhead is 8 bytes. Additionally, object references take space as well. Typically, references are 4 bytes on 32bit platforms or on 64bit platforms up to -Xmx32G, and 8 bytes above 32Gb (-Xmx32G). Those are relatively small numbers, but they can add up to a significant cost. When we think about such small elements like integers, they make a difference.
Intas a primitive fits in 4 bytes, when as a wrapped type on 64-bit JDK we mainly use today, it requires 16 bytes (it fits in the 4 bytes after the header) + its reference requires 4 or 8 bytes. In the end, it takes 5 or 6 times more space1. This is why an array of primitive integers (
IntArray) takes 5-times less space than an array of wrapped integers (
Array<Int>) as explained in the Item 54: Consider Arrays with primitives for performance-critical processing.
Access requires an additional function call when elements are encapsulated. That is again a small cost as function use is really fast, but it can add-up when we need to operate on a huge pool of objects. In Item 48: Use inline modifier for functions with parameters of functional types and Item 49: Consider using inline classes we will see how this cost can be eliminated.
Objects need to be created. An object needs to be created, allocated in the memory, a reference needs to be created, etc. Those are small numbers, but they can add up with many objects. In the below snippet, you can see a cost of object creation.
By eliminating objects, we can avoid all three costs. By reusing objects, we can eliminate the first and the third one. Knowing that, you might start thinking about limiting the number of unnecessary objects in your code. Let’s see some ways we can do that.
A very simple way to reuse an object instead of creating it every time is using object declaration (singleton). To see an example, let’s imagine that you need to implement a linked list. The linked list can be either empty, or it can be a node containing an element and pointing to the rest. This is an example implementation:
One problem with this implementation is that we need to create an instance of
Empty every time we create a list. Instead, we should just have one instance and use it universally. The only problem is the generic type. What generic type should we use? We want this empty list to be subtype of all lists. We cannot use all types, but we also don’t need to. A solution is that we can make it a list of
Nothing is a subtype of every type, and so
LinkedList<Nothing> will be a subtype of every
LinkedList once this list is covariant (
out modifier). Making type arguments covariant truly makes sense here since the list is immutable and this type is used only in out positions (Item 24: Consider variance for generic types). So this is the improved code:
This is a useful trick that is often used, especially when we define immutable sealed classes. Immutable, because using it for mutable objects can lead to subtle and hard to detect bugs connected to shared state management. The general rule is that mutable objects should not be cached (Item 1: Limit mutability). Apart from object declaration, there are more ways to reuse objects. Another one is a factory function with a cache.
Factory function with a cache
Every time we use a constructor, we create a new object. It is not necessarily true when we use a factory method, as they can have cache. The simplest case is when a factory function always returns the same object. This is, for instance, how
emptyList from stdlib is implemented:
Sometimes we have a set of objects, and we return one of them. For instance, when we use the default dispatcher in the Kotlin coroutines library
Dispatchers.Default, it has a pool of threads, and whenever we start anything using it, it will start it in one that is not in use. Similarly, we might have a pool of connections with a database. Having a pool of objects is a good solution when object creation is heavy, and we might need to use a few mutable objects at the same time.
Caching can also be done for parameterized factory methods. In such a case, we might keep our objects in a map:
Caching can be used for all pure functions. In such a case, we call it memoization. Here is, for instance, a function that calculates the Fibonacci number at a position based on the definition:
Now our method during the first run is nearly as efficient as a linear solution, and later it gives result immediately if it was already calculated. Comparison between this and classic linear fibonacci implementation on an example machine is presented in the below table. Also the iterative implementation we compare it to is presented below.
|n = 100||n = 200||n = 300||n = 400|
|fibIter||1997 ns||5234 ns||7008 ns||9727 ns|
|fib (first)||4413 ns||9815 ns||15484 ns||22205 ns|
|fib (later)||8 ns||8 ns||8 ns||8 ns|
You can see that using this function for the first time is slower than using the classic approach as there is additional overhead on checking if the value is in the cache and adding it there. Once values are added, the retrieval is nearly instantaneous.
It has a significant drawback though: we are reserving and using more memory since the
Map needs to be stored somewhere. Everything would be fine if this was cleared at some point. But take into account that for the Garbage Collector (GC), there is no difference between a cache and any other static field that might be necessary in the future. It will hold this data as long as possible, even if we never use the
fib function again. One thing that helps is using a soft reference that can be removed by the GC when memory is needed. It should not be confused with weak reference. In simple words, the difference is:
- Weak references do not prevent Garbage Collector from cleaning-up the value. So once no other reference (variable) is using it, the value will be cleaned.
- Soft references are not guaranteeing that the value won’t be cleaned up by the GC either, but in most JVM implementations, this value won’t be cleaned unless memory is needed. Soft references are perfect when we implement a cache.
This is an example property delegate (details in Item 21: Use property delegation to extract common property patterns) that on-demand creates a map and lets us use it, but does not stop Garbage Collector from recycling this map when memory is needed (full implementation should include thread synchronization):
Designing a cache well is not easy, and there are already some well-implemented libraries providing them. I've used
Ehcache. I am happy from both, but I am still waiting for a well-implemented cache with support for suspending functions.
In the end, caching is always a tradeoff: performance for memory. Remember this, and use caches wisely. No-one wants to move from performance issues to lack of memory issues.
Heavy object lifting
A very useful trick for performance is lifting a heavy object to an outer scope. Surely, we should lift if possible all heavy operations from collection processing functions to a general processing scope. For instance, in this function we can see that we will need to find the maximal element for every element in the
A better solution is to extract the maximal element to the level of
This solution is better for performance because we will not need to find the maximal element on the receiver on every iteration. Notice that it also improves readability by making it visible that
max is called on the extension receiver and so it is the same through all iterations.
Extracting a value to an outer scope to not recalculate it unnecessarily is an important practice. It might sound obvious, but it is not always so clear. Just take a look at this function where we use a regex to validate if a string contains a valid IP address:
The problem with this function is that
Regex object needs to be created every time we use it. It is a serious disadvantage since regex pattern compilation is a complex operation. This is why this function is not suitable to be used repeatedly in performance-constrained parts of our code. Though we can improve it by lifting regex up to the top-level:
If this function is in a file together with some other functions and we don’t want to create this object if it is not used, we can even initialize the regex lazily:
Making properties lazy is also useful when we are dealing with classes.
Often when we need to create a heavy class, it is better to do that lazily. For instance, imagine that class
A needs instances of
D that are heavy. If we just create them during class creation,
A creation will be very heavy, because it will need to create
D and then the rest of its body. The heaviness of objects creation will just accumulates.
There is a cure though. We can just initialize those heavy objects lazily:
Each object will then be initialized just before its first usage. The cost of those objects creation will be spread instead of accumulated.
Keep in mind that this sword is double-edged. You might have a case where object creation can be heavy, but you need methods to be as fast as possible. Imagine that
A is a controller in a backend application that responds to HTTP calls. It starts quickly, but the first call requires all heavy objects initialization. So the first call needs to wait significantly longer for the response, and it doesn’t matter for how long our application runs. This is not the desired behavior. This is also something that might clutter our performance tests.
In JVM we have a special built-in type to represent basic elements like number or character. They are called primitives, and they are used by Kotlin/JVM compiler under the hood wherever possible. Although there are some cases where a wrapped class needs to be used instead. The two main cases are:
- When we operate on a nullable type (primitives cannot be
- When we use the type as a generic
So in short:
|Kotlin type||Java type|
Knowing that, you can optimize your code to have primitives under the hood instead of wrapped types. Such optimization makes sense mainly on Kotlin/JVM and on some flavours of Kotlin/Native. Not at all on Kotlin/JS. It also needs to be remembered that it makes sense only when operations on a number are repeated many times. Access of both primitive and wrapped types are relatively fast compared to other operations. The difference manifests itself when we deal with a really big collections (we will discuss it in the Item 54: Consider Arrays with primitives for performance critical processing) or when we operate on an object intensively. Also remember that forced changes might lead to less readable code. This is why I suggest this optimization only for performance critical parts of our code and in libraries. You can find out what is performance critical using a profiler.
To see an example, imagine that you implement a standard library for Kotlin, and you want to introduce a function that will return the maximal element or
null if this iterable is empty. You don’t want to iterate over the iterable more than once. This is not a trivial problem, but it can be solved with the following function:
This implementation has serious disadvantages:
We need to use an Elvis operator in every step
We use a nullable value, so under the hood in JVM, there will be an
Integerinstead of an
Resolving these two problems requires us to implement the iteration using while loop:
For a collection of elements from 1 to 10 million, in my computer, the optimized implementation took 289 ms, while the previous one took 518 ms. This is nearly 2 times faster, but remember that this is an extreme case designed to show the difference. Such optimization is rarely reasonable in a code that is not performance-critical. Though if you implement a Kotlin standard library, everything is performance critical. This is why the second approach is chosen here:
In this item, you’ve seen different ways to avoid object creation. Some of them are cheap in terms of readability: those should be used freely. For instance, heavy object lifting out of a loop or function is generally a good idea. It is a good practice for performance, and also thanks to this extraction, we can name this object, so we make our function easier to read. Those optimizations that are tougher or require bigger changes should possibly be skipped. We should avoid premature optimization unless we have such guidelines in our project or we develop a library that might be used in who-knows-what-way. We’ve also learned some optimizations that might be used to optimize the performance critical parts of our code.
To measure the size of concrete fields on JVM objects, use Java Object Layout.
Java Language Specification, Java SE 8 edition, 3.10.5