Type modelling in Kotlin
I often see codebases where everything is defined like a square-hole
In 1983, an Air Canada plane ran out of fuel due to calculation errors as a result of switching from imperial units to the metric system.
In 1999, the Mars Climate Orbiter, a $125 million space probe, was lost as a result of NASA and Lockheed Martin using different units for thrust.
In medical history, you’ll find many cases of people dying as a result of being given the wrong dosages as a result of calculating dosages based on the patient’s weight in kilograms instead of pounds
And these are but a few of many examples … yet, in most code-bases, we still use primitives for everything. We could agree on just using one standard, but how’s that been working out so far … we have four different codes for the same language, country codes can be either two or three characters, and the US won’t switch to the Metric system like the rest of the world and the UK wants to go back to imperial units.
Part1: Value Classes
Here’s a simple example of how most of us would have written and used an API for the Mars Climate Orbiter:
The result of this code is:
applying 120 Pound of thrust
Oooof, we would have lost the probe!
If we look at the
MarsClimateObserver class, we’ll see that it actually expects Pound-Force and not Newtons, but unless we looked at the source code, we wouldn’t have known this, and even if we had looked, it might have changed.
Let’s fix this by defining some types first and while we’re at it, let’s define some helper extension variables:
Let’s deprecate the existing thrust method and add two new methods that take values in
Going back to our main method, we’ll now have the option to convert this unit-less input to use the expected units
Et voilà, it’s now blatantly obvious which unit a method is expecting and if you pass in the wrong unit, the code simply won’t compile unless you first convert it. Nothing stops you from using regular or data classes, but value classes are compiled out of the way so there’s no runtime overhead. https://kotlinlang.org/docs/inline-classes.html
Part2: Handling Errors
We already learned from our mistake and we’re now diligently using value classes instead of primitive types
…but when we test it in simulation, we lose our orbiter again
Exception in thread "main" mco.MarsClimateOrbiter$SensorOfflineException
If only we knew that
getElevation could throw an
Exception. Unlike Java which has checked exceptions that force you to handle all exceptions, Kotlin only has unchecked exceptions, and unless we look at the source of
getElevation, we wouldn’t have known that
getElevation can throw an exception.
Let’s fix this by using the
We are now forced to handle errors at the call-site either by using
link to documentation.
This certainly solves the problem of error handling, but at what cost?
Let’s loop through each of these options a million times so we can get some good average measurements.
Option1: return a result class
Option2: throwing an exception
Option3: using a sealed class to hold the error
Option4: the same as the third option, but we change the object to a data class in the sealed interface
Let’s create a mini benchmark that first warms up the JVM and then a bunch of runs to get the average time taken (I’m too lazy to set up JMH, share your results if you’ve tested it with JMH)
Option1 — Result Class: ~376ms
Option2 — Throwing an Exception: ~376ms
Option3 — Sealed Interface (error object): ~3.84ms
Option4 — Sealed Interface (error data class): ~2.15ms
Woah, using the Sealed Interface is ~100x faster, but this may be an exaggerated example since every second call is an error. That being said, if an error is part of your expected flow (validation error, no result found, etc), don’t use an exception for it, you’re taking a huge performance penalty when doing so.
Part3: Bounded context on untyped data
The Mars Climate Orbiter is sending back some raw data to earth in bytes that need to be queried in mcoql (Mars Climate Orbiter Query Language, a made-up language for the purpose of this article). One team specializes in writing queries for this data, and another team works with the repository created by the low-level team.
The team that processes the data often needs to look at functions like this to figure out what data they need to send and what data to expect back.
For someone new on the project who’ve never used mcoql, it’s not obvious what format the data needs to be in and what format the result is returned. We can see that it’s returning
Array<Array<ByteArray>>, which is a grid, but we can’t see the number of columns being returned just by looking at the method signature.
Adding some helper functions to TimeStamp allows the user to convert from their preferred date-time format to whatever date-time format is required to run the query,
Wouldn’t it be great if we could write
query<INT4, …, INT8> and depending on the number of generics we provide, that’s the number of results we can destructure to?
Ultimately we want to return a data class and make sense of these values, but at the low level, we want to be as close as possible to what the documentation says, if the documentation says running this query returns an INT4 and an INT8, the above format will make it very easy to see the intent whereas converting directly to a data class obfuscates the low-level intent. The same applies when writing database drivers, for example, see the PostgreSQL protocol message format documentation: https://www.postgresql.org/docs/14/protocol-message-formats.html or working with ISO bank integrations, or writing code that interacts with ATMs, planes, or just about anything that goes over the network in binary format.
Eventually, we want to end up with something like this:
Getting this exact structure is left as an exercise to the reader, in this article, we’ll only explore getting
List<Result4<A,B,C,D>> from query<A,B,C,D>` instead of a List<List
Let’s start with the
TypeMarker class that will allow us up to 5 columns (you can increase this to however many you want, but to keep things short, we’re only going up to 5 here).
Now for the upgraded query functions:
This now allows us to extract the values with correct types using destructuring:
The following snippet will still compile, but we’re throwing away the second value:
The following one, however, will not compile since we’re expecting 3 values, but only 2 are returned, we’re requesting more data than what is returned.
What about handling 4, 5, …, 100+ values? Kotlin only has
We’ll need a
Tuple, I’m only writing a
Tuple4 here, but you’ll need one of these for each number of columns you want to support.
Ultimately, you will write some simple code that generates this as text output using a simple for loop and some
Tuple1 to 100 and copy-paste it into your source file. I don’t recommend writing it by hand unless you’re looking for an excuse to practice your DVORAK or something.
allowing us to do
Now the engineers working at the byte-level can at least encapsulate their queries giving the untyped data some form of bounded context keeping the code in such a way that the code matches the documentation terminology 1:1 and then one layer further up, you can convert to something that’s closer to the user-space
Using some of these ideas, you’ll end up improving your APIs so that not everything goes into the square hole.
Addendum, here’s an example of this syntax in the wild (this is from the Coroutines PostgreSQL driver in the Alumonium library). Since we have
query<UUID>, we’re expecting
QueryResults1<UUID> and if we had
query<UUID, INT4>, we’re expecting
QueryResults2<UUID, INT4>, etc.