Job and children awaiting in Kotlin Coroutines
This is a chapter from the book Kotlin Coroutines. You can find it on LeanPub or Amazon.
When a coroutine is suspended, the only thing that remains is its continuation. This continuation includes references to local variables, labels marking where each suspending function has stopped, and this coroutine context. That is it. However, coroutines need to keep more information than that. They need to know their state, their relationships (parent and children), and more. So they keep it in the special context called Job.
Job is the most important context for each coroutine. Every coroutine has its own job, and it is the only context not inherited from the parent. It cannot be, every coroutine has its own state and its own relationships, so Job cannot be shared. It also cannot be set from the outside, as every coroutine builder creates and controls its own job.
Job is a context identified by Job
key, and implementing Job
interface. It is a cancellable thing with a lifecycle. It has a state, and it can be used to cancel coroutines, track their state, and much more. It is really important, so this and the next two chapters are dedicated to the Job context and the essential Kotlin Coroutine mechanisms that are connected to it.
Job and relationships
Every coroutine has its own job, and it can be accessed from its context using Job
key.
There is also an extension property job
, which lets us access the job more easily.
Asynchronous coroutine builders return their jobs, so it can be used elsewhere. This is clearly visible for launch
, where Job
is an explicit result type.
The type returned by the async
function is Deferred<T>
, and Deferred<T>
also implements the Job
interface, so it can also be used in the same way.
Job
is the only coroutine context that is not inherited by a coroutine from a coroutine. Every coroutine creates its own Job
, and the job from an argument or parent coroutine is used as a parent of this new job0.
The parent can reference all its children, and the children can refer to the parent. This parent-child relationship enables the implementation of cancellation and exception handling inside a coroutine’s scope. In most cases, Job is passed implicitly, in the scope of a coroutine builder, like in the below example, runBlocking
is a parent of launch
, because launch
can find its job in the scope provided by runBlocking
.
Structured concurrency mechanisms will not work if a new Job
context replaces the one from the parent.
In the above example, runBlocking
does not wait for launch
, because it has no relation with it. This is because launch
uses the job from the argument as a parent.
When a coroutine has its own (independent) job, it has nearly no relation to its parent. It inherits other contexts, but other consequences of the parent-child relationship will not apply. This causes us to lose structured concurrency, which is a problematic situation that should be avoided.
Coroutine lifecycle
Every coroutine has its own state, and this state is managed by its job. State lifecycle is essential for the basic mechanisms of coroutines, like cancellation and synchronization. Here is a graph of states and the transitions between them:
In the "Active" state, a job is running. If the job is created with a coroutine builder, this is the state where the body of this coroutine will be executed. In this state, we can start child coroutines. Most coroutines will start in the "Active" state. Only those that are started lazily will start with the "New" state. These need to be started in order for them to move to the "Active" state. When a coroutine is executing its body, it is surely in the "Active" state. When body execution is finished, its state changes to "Completing", where this coroutine waits for its children completion. Once all its children are completed, the job (coroutine) changes its state to "Completed", which is a terminal state. Alternatively, if a job cancels or fails during the "Active" or "Completing" state, its state will change to "Cancelling". In this state, we have the last chance to do some clean-up, like closing connections or freeing resources (we will see how to do this in the next chapter). Once this is done, the job will move to the "Cancelled" state.
The state is displayed in a job’s toString
2. In the example below, we see different jobs as their states change. The last one is started lazily, which means it does not start automatically. All the others will immediately become active once created.
To check the state in code, we use the properties isActive
, isCompleted
, and isCancelled
.
State | isActive | isCompleted | isCancelled |
---|---|---|---|
New (optional initial state) | false | false | false |
Active (default initial state) | true | false | false |
Completing (transient state) | true | false | false |
Cancelling (transient state) | false | false | true |
Cancelled (final state) | false | true | true |
Completed (final state) | false | true | false |
Job
interface also offers us some useful functions, that can be used to interact with the job. Let's start from join
, which is used to wait for the job to complete.
Awaiting job completion
Coroutine's job can be used to wait until it completes. For that, we use the join
method, that suspends until a concrete job reaches a final state (either "Cancelled" or "Completed").
The Job
interface also exposes a children
property that lets us reference all its children. We might as well use it to wait until all children are in a final state.
It is not uncommon to use join
for synchronizing coroutines. For instance, if you could use it to make sure that a coroutine is done before you start another one:
This use-case can also be solved using
Mutex
.
We can also use it to cancel the previous job before starting a new one. This is especially useful when we want to make sure that only one coroutine is running at a time.
Here is an example of a more complex use-case. We have an order that needs to be completed. We need to create an order, create an invoice, deliver the order, and send an email. We want to make sure that order is created before we mark order as invoiced. We also want to make sure that the invoice is created before we mark the order as delivered. We also want to make sure that the order is marked as invoiced and delivered before we send an email. We can use join
to synchronize these operations.
Instead of using
join
, you might also useawait
fromasync
to wait for the result of a coroutine. The only difference is thatawait
returns the result of the coroutine, whilejoin
returnsUnit
.
Job factory function
A Job
can be created without a coroutine using the Job()
factory function. Job()
creates a job that isn't associated with any coroutine and can be used as a context. This also means that we can use such a job as a parent of many coroutines. However, using such a job as a parent is tricky, and I recommend avoiding it.
A common mistake is to create a job using the Job()
factory function, use it as a parent for some coroutines, and then use join
on the job. Such a program will never end because Job
is still in the "Active" state, even when all its children are finished. This is because this context is still ready to be used by other coroutines.
A better approach would be to join all the current children of the job.
Job()
is an example of fake constructor pattern1. At first, you might think that you're calling a constructor of Job
, but you might then realize that Job
is an interface, and interfaces cannot have constructors. The reality is that it is a simple function that looks like a constructor. Moreover, the actual type returned by this function is not a Job
but its subinterface CompletableJob
.
The CompletableJob
interface extends the functionality of the Job
interface by providing two additional methods:
complete(): Boolean
- used to change this job’s state to "Completing". In this state, the job waits for all its children to complete, and once they are done, it changes its state to "Completed". Once a coroutine is "Completing" or "Completed", it cannot move back to "Active" state. The result ofcomplete
istrue
if this job was completed as a result of this invocation; otherwise, it isfalse
(if it was already completed).
completeExceptionally(exception: Throwable): Boolean
- Completes this job with a given exception. This means that all children will be cancelled immediately (withCancellationException
wrapping the exception provided as an argument). The result ofcomplete
istrue
if this job was completed as a result of this invocation; otherwise, it isfalse
(if it was already completed).
The complete
function can be used after we start the last coroutine on a job. Thanks to this, we can just wait for the job completion using the join
function.
You can pass a reference to the parent as an argument of the Job
function. Thanks to this, such a job will be cancelled when the parent is.
Synchronizing coroutines
It is not uncommon to use join
from Job
for synchronizing coroutines. For instance, if you want to make sure that an operation is started after another coroutine is finished, you can use join
from the job of the first coroutine.
In a similar way, we could collect a whole collection of jobs and wait for all of them to finish. The same can be done with async
and await
. The result of await
is Deferred
, which is a subtype of Job
, so we can also use join
, but more often we use await
, that additionally returns the result of the coroutine.
An exceptionally useful class for synchronizing coroutines is CompletableDeferred
. It represents a deferred value with a completion function. So it is like a box for a value, that can be completed with a value (complete
) or an exception (completeExceptionally
), and that has a waiting point, where coroutine can wait using await
until this CompletableDeferred
is completed.
CompletableDeferred
is useful when some coroutines need to await some value or event, that is produced by another coroutine. CompletableDeferred
accepts only one value that can be awaited multiple times by multiple coroutines. If you want to have multiple values, you should use Channel
instead. Channel
is explained in a dedicated chapter.
Summary
In this chapter, we learned that:
Job
is the most important context for each coroutine. It is a cancellable thing with a lifecycle. It has a state, and it can be used to cancel coroutines, track their state, and much more.- Every coroutine has its own job, and it is the only context not inherited from the parent. Job from an argument or parent coroutine is used as a parent of this new job.
- Coroutines can be in one of the following states: "New", "Active", "Completing", "Completed", "Cancelling", and "Cancelled". Regular coroutines start in the "Active" state, when they finish their body execution, they move to the "Completing" state, and then once their children are completed, they move to the "Completed" state.
- You should avoid using
Job()
as an explicit parent of coroutines, as it can lead to unexpected behavior. Job
can be used to synchronize coroutines. We can usejoin
to wait for a coroutine to complete, or we can useCompletableDeferred
to wait for a value produced by another coroutine.
The next two chapters describe cancellation and exception handling in Kotlin Coroutines. These two important mechanisms fully depend on the child-parent relationship created using Job
.
Yes, I repeat myself, but if there is one thing that I want you to remember, it is that Job
is not inherited.
A pattern that is well described in Effective Kotlin Item 32: Consider factory functions instead of constructors.
I hope I do not need to remind the reader that toString
should be used for debugging and logging purposes; it should not be parsed in code as this would break this function’s contract, as I described in Effective Kotlin.