Effective Kotlin Item 47: Avoid unnecessary object creation
This is a chapter from the book Effective Kotlin. You can find it on LeanPub or Amazon.
Object creation always costs something and can sometimes be expensive. This is why avoiding unnecessary object creation can be an important optimization. It can be done on many levels. For instance, in JVM it is guaranteed that a string object will be reused by other code running in the same virtual machine that happens to contain the same string literal2:
Boxed primitives (
Long) are also reused in JVM when they are small (by default, the Integer Cache holds numbers in the range from -128 to 127).
Reference equality (
===) shows that this is the same object. However, if we use a number that is either smaller than -128 or bigger than 127, different objects will be created:
Notice that a nullable type is used to force
intunder the hood. When we use
Int, it is generally compiled to the primitive
int, but if we make it nullable or when we use it as a type argument,
Integeris used instead. This is because a primitive cannot be
nulland cannot be used as a type argument.
Knowing that such mechanisms are available in Kotlin, you might wonder how significant they are. Is object creation expensive?
Is object creation expensive?
Wrapping something into an object has 3 costs:
Objects take additional space. In a modern 64-bit JDK, an object has a 12-byte header that is padded to a multiple of 8 bytes, so the minimum object size is 16 bytes. For 32-bit JVMs, the overhead is 8 bytes. Additionally, object references also take space. Typically, references are 4 bytes on 32-bit or 64-bit platforms up to -Xmx32G, and they are 8 bytes for memory allocation pool set above 32Gb (-Xmx32G). These are relatively small numbers, but they can add up to a significant cost. When we think about small elements like integers, they make a difference.
Intas a primitive fit in 4 bytes, but when it is a wrapped type on the 64-bit JDK we mainly use today, it requires 16 bytes (it fits in the 4 bytes after the header), and its reference requires 4 or 8 bytes. In the end, it takes 5 or 6 times more space1. This is why an array of primitive integers (
IntArray) takes 5 times less space than an array of wrapped integers (
Array<Int>), as explained in the Item 55: Consider Arrays with primitives for performance-critical processing.
Access requires an additional function call when elements are encapsulated. Again, this is a small cost as function use is very fast, but it can add up when we need to operate on a huge pool of objects. We will see how this cost can be eliminated in Item 48: Use the inline modifier for functions with parameters of functional types and Item 49: Consider using inline classes.
Objects need to be created and allocated in memory, references need to be created, etc. These are small numbers, but they can rapidly accumulate when there are many objects. In the snippet below, you can see the cost of object creation.
By eliminating objects, we can avoid all three of these costs. By reusing objects, we can eliminate the first and the third ones. If we know the costs of objects, we can start considering how we can minimize these costs in our applications by limiting the number of unnecessary objects. Let’s see some ways we can do this.
A very simple way to reuse an object instead of creating it every time is using an object declaration (singleton). To see an example, let’s imagine that you need to implement a linked list. This list can either be empty or it can be a node containing an element and a reference to the rest of the list. This is an example implementation:
One problem with this implementation is that we need to create an instance of
Empty every time we create a list. Instead, we should have just one instance and use it universally. The only problem is which generic type we should use. We want this empty list to be a subtype of all lists. We cannot use all types, but we don’t need to. A solution is that we can make
Empty a list of
Nothing, which is a subtype of every type. If this list is covariant (what we achieve with the
out modifier), then
LinkedList<Nothing> is a subtype of every
LinkedList. Making type arguments covariant truly makes sense here since
LinkedList is immutable and its generic type is used only in "out" positions (Item 24: Consider variance for generic types). So, this is the improved code:
This is a useful and popular trick, especially when we define immutable sealed classes. We use it when we define messages or specify a set of options using a sealed class or interface, etc.
Apart from object declaration, there are other ways to reuse objects, one of which is a factory function with a cache.
Factory function with a cache
Every time we use a constructor, we create a new object, but this is not necessarily true when we use factory functions as they can have a cache. The simplest case is when a factory function always returns the same object. This is, for instance, how
emptyList from stdlib is implemented:
Sometimes we have a set of objects, and we return one of them. For instance,
Dispatchers.Default in the Kotlin coroutines library has a pool of threads; whenever we start anything using the default dispatcher, it will be started in a thread that is not in use. Similarly, we might have a pool of connections with a database. Having a pool of objects is a good solution when object creation is heavy and we might need to use a few mutable objects at the same time.
Caching can also be done for parameterized factory functions. In such a case, we might keep our objects in a map:
Caching can be used for all pure functions. In such cases, we call it memoization. Here is a function that calculates a number in a given position in the Fibonacci sequence on the basis of the definition:
Now, during the first run our function is nearly as efficient as a linear solution, and later it gives a result immediately if it has already been calculated. A comparison between this and a classic linear Fibonacci number implementation on an example machine is presented in the below table. Also, the iterative implementation we compare it to is presented below.
|n = 100||n = 200||n = 300||n = 400|
|fibIter||1997 ns||5234 ns||7008 ns||9727 ns|
|fib (first)||4413 ns||9815 ns||15484 ns||22205 ns|
|fib (later)||8 ns||8 ns||8 ns||8 ns|
You can see that using this function for the first time is slower than using the classic approach as checking if the value is in the cache and adding it there causes additional overhead. However, once the values have been added, the retrieval is nearly instantaneous.
This has a significant drawback though: we are reserving and using more memory since the
Map needs to be stored somewhere. Everything would be fine if this were cleared at some point, but take into account the fact that for the Garbage Collector (GC) there is no difference between a cache and any other static field that might be necessary in the future. The cache will hold this data as long as possible, even if we never use the
fib function again. One thing that helps is using a soft reference that can be removed by the GC when memory is needed. This should not be confused with weak references. In simple words, the difference is:
- A weak reference does not prevent the Garbage Collector from cleaning up a value. So, if no other reference (variable) is using this value, it will be cleaned up.
- A soft reference does not guarantee that a value won’t be cleaned up by the GC either, but in most JVM implementations this value won’t be cleaned up unless memory is needed. Soft references are perfect when we implement a cache.
This is an example of a property delegate (details in Item 21: Use property delegation to extract common property patterns) that creates a map on demand and lets us use it, but this can be cleared when memory is needed (full implementation should include thread synchronization):
Designing a cache well is not easy, and there are already some well-implemented libraries providing them. I've used
Ehcache. I am happy with both, but I am still waiting for a well-implemented cache with support for suspending functions.
In the end, caching is always a tradeoff: we buy performance at the cost of memory. Remember this and use caches wisely. No one wants to move from performance issues to lack of memory issues.
Heavy object lifting
A very useful trick for performance is lifting a heavy object to an outer scope. Surely, if possible, we should lift all heavy operations from collection processing functions to a general processing scope. For instance, here is a function that counts the number of values equal to the maximum value:
A better solution is to extract the biggest element to the level of the
This solution is better for performance because we don’t need to find the biggest element on the receiver in every iteration. Notice that it also improves readability by making it clear that
max is called on the extension receiver, therefore it is the same throughout all iterations.
Extracting a value to an outer scope so as to not recalculate it unnecessarily is an important practice. This might sound obvious, but it is not always very clear. Just take a look at this function, where we use a regex to determine whether a string contains a valid IP address:
The problem with this function is that the
Regex object needs to be created every time we use it. This is a serious disadvantage since regex pattern compilation is a complex operation. This is why this function is not suitable for repeated use in performance-constrained parts of our code. However, we can improve it by lifting the regex up to the top level:
If this function is in a file together with some other functions and we don’t want to create this object unless it is used, we can even initialize the regex lazily:
Making properties lazy is also useful when we are dealing with classes.
Often, when we need to create a heavy class, it is better to do so lazily. For instance, imagine that class
A needs instances of
D, all of which are heavy. If we just create them during class creation, the creation of
A will be very heavy because it will need to create
D and then the rest of its body. Therefore, the heaviness of object creation will just accumulate.
There is a cure though. We can just initialize these heavy objects lazily:
Each object will then be initialized just before its first usage. The cost of creating these objects will be spread instead of accumulated.
In JVM we have a special built-in type to represent basic elements like numbers or characters. They are called primitives and are used by the Kotlin/JVM compiler under the hood wherever possible. However, there are some cases where a wrapped class needs to be used instead. The two main cases are:
- When we operate on a nullable type (primitives cannot be
- When we use a type as a generic.
So, in short:
|Kotlin type||Java type|
Now you know that you can optimize your code to have primitives under the hood instead of wrapped types. Such optimization makes sense mainly on Kotlin/JVM and on some flavors of Kotlin/Native, but it doesn’t make any sense on Kotlin/JS. It should also be remembered that it makes sense only when operations on a number are repeated many times. Access to both primitive and wrapped types is relatively fast compared to other operations. The difference manifests itself when we deal with really big collections (we will discuss this in Item 55: Consider Arrays with primitives for performance-critical processing) or when we operate on an object intensively. Also, remember that forced changes might lead to less readable code. This is why I suggest this optimization only for performance-critical parts of code and in libraries. You can find out what is performance-critical using a profiler.
To see an example, imagine that you are implementing a standard library for Kotlin and you want to introduce a function that will return the biggest element or
null if this iterable is empty. You don’t want to iterate over the iterable more than once. This is not a trivial problem, but it can be solved with the following function:
This implementation has serious disadvantages:
Resolving these two problems requires us to implement the iteration using a while loop:
For a collection of elements from 1 to 10 million, this optimized implementation took 289 ms on my computer, while the previous one took 518 ms. This is nearly 2 times faster, but remember that this is an extreme case designed to show the difference. Such optimization is rarely reasonable in code that is not performance-critical, but everything is performance-critical if you implement a Kotlin standard library. This is why the second approach is chosen here:
In this item, you’ve seen different ways to avoid object creation. Some of them are cheap in terms of readability, so these should be used freely. For instance, lifting heavy objects out of a loop or function is generally a good idea. This is good practice for performance, and thanks to this extraction we can name this object, so we make our function easier to read. Optimizations that are tougher or require bigger changes should possibly be skipped. We should avoid premature optimization unless we have such guidelines in our project or we are developing a library that might be used in unknown ways. We’ve also learned some optimizations that might be used to optimize the performance-critical parts of our code.
To measure the size of concrete fields in JVM objects, use Java Object Layout.
Java Language Specification, Java SE 8 edition, 3.10.5