article banner

Cancellation in Kotlin Coroutines

This is a chapter from the book Kotlin Coroutines. You can find it on LeanPub or Amazon.

One of the most important mechanisms of Kotlin Coroutines for Android developers is cancellation. It is because on Android, nearly every coroutine is associated with some view, and if this view is destroyed, its coroutines are not needed, so they should be cancelled. Other coroutines should be cancelled when the application is finished. This is a crucial capability, that in many other libraries required a lot of effort from developers. Kotlin Coroutines offer a simple and safe cancellation mechanism. This mechanism is also used on backend, especially when we deal with long connections, like WebSockets or long polling. What is more, cancellation often works in situations we do not even realize, to free resources and to make our application more efficient.

Cancellation is important, so important that some classes and libraries use suspending functions primarily to support cancellation1. There is a good reason for this: a good cancellation mechanism is worth its weight in gold2. Just killing a thread is a terrible solution as there should be an opportunity to close connections and free resources. Forcing developers to frequently check if some state is still active isn't convenient either. The problem of cancellation waited for a good solution for a very long time, but what Kotlin Coroutines offer is surprisingly simple, convenient and safe. This is the best cancellation mechanism I've seen in my career. Let's explore it.

Basic cancellation

The basic idea behind cancellation is simple: Calling cancel from a Job object changes its state to "Cancelling". Here are the consequences of this change:

  • All the children of this job are also cancelled.
  • The job cannot be used as a parent for any new coroutines.
  • At the first suspension point, a CancellationException is thrown. If this coroutine is currently suspended, it will be resumed immediately with CancellationException. CancellationException is ignored by the coroutine builder, so it is not necessary to catch it, but it is used to complete this coroutine body as soon as possible.
  • Once the coroutine body is completed, and all its children are completed too, it changes its state to "Cancelled".

Take a look at the following example:

import kotlinx.coroutines.delay import kotlinx.coroutines.launch import kotlinx.coroutines.coroutineScope //sampleStart suspend fun main(): Unit = coroutineScope { val job = launch { repeat(1_000) { i -> delay(200) println("Printing $i") } } delay(1100) job.cancel() job.join() println("Cancelled successfully") } // (0.2 sec) // Printing 0 // (0.2 sec) // Printing 1 // (0.2 sec) // Printing 2 // (0.2 sec) // Printing 3 // (0.2 sec) // Printing 4 // (0.1 sec) // Cancelled successfully //sampleEnd

launch starts a process that prints numbers every 200 ms. However, after 1100 ms, we cancel this process. So coroutine changes state to "Cancelling", and at the first suspension point, a CancellationException is thrown. This exception ends our coroutine body, so coroutine changes state to "Cancelled", join resumes, and we see "Cancelled successfully".

We might cancel with a different exception (by passing an exception as an argument to the cancel function) to specify the cause. This cause needs to be a subtype of CancellationException, because only an exception of this type can be used to cancel a coroutine.

finally block

The cancellation mechanism is simple, but it is very powerful. It guarantees that if we have a finally block, it will be executed, and all the resources will be freed. It also allows us to make an action only in case of cancellation, by catching CancellationException. It is not necessary to rethrow this exception, because it is ignored by the coroutine builder, but it is considered a good practice to do so, in case there is some outer scope that should know about the cancellation.

import kotlinx.coroutines.* //sampleStart suspend fun main(): Unit = coroutineScope { val job = launch { try { repeat(1_000) { i -> delay(200) println("Printing $i") } } catch (e: CancellationException) { println("Cancelled with $e") throw e } finally { println("Finally") } } delay(700) job.cancel() job.join() println("Cancelled successfully") delay(1000) } // (0.2 sec) // Printing 0 // (0.2 sec) // Printing 1 // (0.2 sec) // Printing 2 // (0.1 sec) // Cancelled with JobCancellationException... // Finally // Cancelled successfully //sampleEnd

Notice, that in the above example join is used to wait for the cancellation to finish before we can proceed. Without this, we would have a race condition, and we would (most likely) see "Cancelled successfully" before "Cancelled..." and "Finally". That is why when we cancel a coroutine, we often also add join to wait for the cancellation to finish. Since this is a common pattern, the kotlinx.coroutines library offers a convenient extension function with a self-descriptive name, cancelAndJoin5.

public suspend fun Job.cancelAndJoin() { cancel() return join() }

invokeOnCompletion

Another way to handle coroutine cancellation or completion is to use the invokeOnCompletion function from Job. It is used to set a handler to be called when the job reaches a terminal state, namely either "Completed" or "Cancelled". It also provides an exception that finished a coroutine in its parameter. invokeOnCompletion is guaranteed to be called exactly once (assuming this coroutine ever completed), even if the job was already completed when the handler was set.

import kotlinx.coroutines.* //sampleStart suspend fun main(): Unit = coroutineScope { val job = launch { repeat(1_000) { i -> delay(200) println("Printing $i") } } job.invokeOnCompletion { if (it is CancellationException) { println("Cancelled with $it") } println("Finally") } delay(700) job.cancel() job.join() println("Cancelled successfully") delay(1000) } // (0.2 sec) // Printing 0 // (0.2 sec) // Printing 1 // (0.2 sec) // Printing 2 // (0.1 sec) // Cancelled with JobCancellationException... // Finally // Cancelled successfully //sampleEnd

invokeOnCompletion handler is called with:

  • null if the job finished with no exception;
  • CancellationException if the coroutine was cancelled;
  • the exception that finished a coroutine (more about this in the next chapter).

invokeOnCompletion is called synchronously during cancellation, and we do not control the thread in which it will be running. It can be further customized with onCancelling3 and invokeImmediately4 parameters.

Children cancellation

When a job is cancelled, all its children are also cancelled. So cancellation propagates down the hierarchy of coroutines. This is a very useful feature, as when you cancel a process, it also cancels all its subprocesses. This is a good practice, as it prevents memory leaks and frees resources.

import kotlinx.coroutines.* suspend fun main(): Unit = coroutineScope { var childJob: Job? = null val job = launch { launch { try { delay(1000) println("A") } finally { println("A finished") } } childJob = launch { try { delay(2000) println("B") } catch (e: CancellationException) { println("B cancelled") } } launch { delay(3000) println("C") }.invokeOnCompletion { println("C finished") } } delay(100) job.cancel() job.join() println("Cancelled successfully") println(childJob?.isCancelled) } // (0.1 sec) // (the below order might be different) // A finished // B cancelled // C finished // Cancelled successfully // true

Cancellation in a coroutine scope

A job created using the Job() factory function can be cancelled in the same way. We often specify a job when we construct a coroutine scope. If we don't specify it explicitly, CoroutineScope creates a default job.

fun CoroutineScope( context: CoroutineContext ): CoroutineScope = ContextScope( if (context[Job] != null) context else context + Job() )

So when such a scope is used as a parent for coroutines, we can cancel all of them at once by cancelling parents' job. That can be done using cancel function from CoroutineScope, that calls cancel on the job.

fun CoroutineScope.cancel(cause: CancellationException? = null) { val job = coroutineContext[Job] ?: error("...") job.cancel(cause) }

This capability is often used to cancel all the tasks started by a class. That might be useful, for instance, to complete all processes in unit testing.

class OfferUploader { private val scope = CoroutineScope(Dispatchers.IO) fun upload(offer: Offer) { scope.launch { // upload } } fun cancel() { scope.cancel() } }

However, you must remember that once a job is cancelled, it cannot be used as a parent for new coroutines. So a scope with cancelled job is not useful anymore. That is even dangerous, as trying to start a new coroutine in such a scope will silently do nothing.

import kotlinx.coroutines.* suspend fun main() { val scope = CoroutineScope(Job()) scope.cancel() val job = scope.launch { // will be ignored println("Will not be printed") } job.join() }

That is why I recommend to use cancelChildren function from CoroutineContext, that cancels all the children of a job, but leaves the job itself in the active state. Keeping the job active costs us nothing, and it gives us some extra safety. On Android, cancellation happens automatically when we use viewModelScope or lifecycleScope provided by Android KTX. We might also create a scope ourselves, and then we should remember to cancel its children when this scope is no longer needed. It is a good practice to use SupervisorJob as a parent for such a scope, what we will explain in the next chapter.

class ProfileViewModel : ViewModel() { private val scope = CoroutineScope(Dispatchers.Main + SupervisorJob()) fun onCreate() { scope.launch { loadUserData() } } override fun onCleared() { scope.coroutineContext.cancelChildren() } // ... }

Just one more call

When we cancel a coroutine, it changes state to "Cancelling", where it should only clean up resources and complete. However, what if cleaning up resources requires making suspending calls or starting coroutines? We cannot do that, because those operations are not allowed in the "Cancelling" state. Calling suspending functions in this state will throw CancellationException. Starting a new coroutine will be ignored.

import kotlinx.coroutines.* suspend fun main(): Unit = coroutineScope { val job = Job() launch(job) { try { println("Coroutine started") delay(200) println("Coroutine finished") } finally { println("Finally") launch { println("Children executed") } delay(1000L) println("Cleanup done") } } delay(100) job.cancelAndJoin() println("Done") } // Coroutine started // (0.1 sec) // Finally // Done

For such situations, there is a special structure withContext(NonCancellable). We should use withContext(NonCancellable) for all suspending calls that should be executed even in the "Cancelling" state (so that should persist cancellation). NonCancellable is a job that is always active, and it should not be used outside this particular situation, where we need either make a suspending call or start a new coroutine even in case of cancellation.

import kotlinx.coroutines.* suspend fun main(): Unit = coroutineScope { val job = Job() launch(job) { try { println("Coroutine started") delay(200) println("Coroutine finished") } finally { println("Finally") withContext(NonCancellable) { launch { println("Children executed") } delay(1000L) println("Cleanup done") } } } delay(100) job.cancelAndJoin() println("Done") } // Coroutine started // (0.1 sec) // Finally // Children executed // (1 sec) // Cleanup done // Done

Even if you just implement a suspending function, and you specify some cleanup in the finally block, and this cleanup requires a suspending call, you should use withContext(NonCancellable) to make sure that the cleanup will be done even in case of cancellation.

suspend fun operation() { try { // operation } finally { withContext(NonCancellable) { // cleanup that requires suspending call } } }

Stopping the unstoppable

Because cancellation happens at the suspension points, it will not happen until a suspension. To simulate such a situation, we could use Thread.sleep instead of delay. This is a terrible practice, so please don’t do this in any real-life projects. We are just trying to simulate a case in which we are using our coroutines extensively but not suspending them. In practice, such a situation might happen if we have some complex calculations, like neural network learning (yes, we also use coroutines for such cases in order to simplify processing parallelization), or when we need to do some blocking calls (for instance, reading files).

The example below presents a situation in which a coroutine cannot be cancelled because there is no suspension point inside it (we use Thread.sleep instead of delay). The execution needs over 3 minutes, even though it should be cancelled after 1 second.

import kotlinx.coroutines.* //sampleStart suspend fun main(): Unit = coroutineScope { val job = Job() launch(job) { repeat(1_000) { i -> Thread.sleep(200) // We might have some // complex operations or reading files here println("Printing $i") } } delay(1000) job.cancelAndJoin() println("Cancelled successfully") delay(1000) } // Printing 0 // Printing 1 // Printing 2 // ... (up to 1000) //sampleEnd

There are a few ways to deal with such situations. The first one is to use the yield() function from time to time. This function suspends and immediately resumes a coroutine. This gives an opportunity to do whatever is needed during suspension (or resuming), including cancellation (or changing a thread using a dispatcher).

import kotlinx.coroutines.* //sampleStart suspend fun main(): Unit = coroutineScope { val job = Job() launch(job) { repeat(1_000) { i -> Thread.sleep(200) yield() println("Printing $i") } } delay(1100) job.cancelAndJoin() println("Cancelled successfully") delay(1000) } // Printing 0 // Printing 1 // Printing 2 // Printing 3 // Printing 4 // Cancelled successfully //sampleEnd

It is a good practice to use yield in suspend functions, between blocks of non-suspended CPU-intensive or time-intensive operations.

suspend fun cpu() = withContext(Dispatchers.Default) { cpuIntensiveOperation1() yield() cpuIntensiveOperation2() yield() cpuIntensiveOperation3() }

Another option is to track the state of the job. Inside a coroutine builder, this (the receiver) references the scope of this builder. CoroutineScope has a context we can reference using the coroutineContext property. Thus, we can access the coroutine job (using coroutineContext[Job] or coroutineContext.job) and check what its current state is. Since a job is often used to check if a coroutine is active, the Kotlin Coroutines library provides a function to simplify that:

public val CoroutineScope.isActive: Boolean get() = coroutineContext[Job]?.isActive ?: true

We can use the isActive property to check if a job is still active and stop calculations when it is inactive.

import kotlinx.coroutines.* //sampleStart suspend fun main(): Unit = coroutineScope { val job = Job() launch(job) { do { Thread.sleep(200) println("Printing") } while (isActive) } delay(1100) job.cancelAndJoin() println("Cancelled successfully") } // Printing // Printing // Printing // Printing // Printing // Printing // Cancelled successfully //sampleEnd

Alternatively, we might use the ensureActive() function, which throws CancellationException if Job is not active.

import kotlinx.coroutines.* //sampleStart suspend fun main(): Unit = coroutineScope { val job = Job() launch(job) { repeat(1000) { num -> Thread.sleep(200) ensureActive() println("Printing $num") } } delay(1100) job.cancelAndJoin() println("Cancelled successfully") } // Printing 0 // Printing 1 // Printing 2 // Printing 3 // Printing 4 // Cancelled successfully //sampleEnd

The result of ensureActive() and yield() seem similar, but they are very different. The function ensureActive() needs to be called on a CoroutineScope (or CoroutineContext, or Job). All it does is throw an exception if the job is no longer active. It is lighter. The function yield is a regular top-level suspension function. It does not need any scope, so it can be used in regular suspending functions. Since it does suspension and resuming, it causes redispatching, what means that if there is a queue to dispatcher, this coroutine will return thread and stand in queue. That is considered positive when our operations are demanding for threads, as it prevents other coroutines' starvation. yield should be used in suspending functions that make multiple CPU intensive or blocking operations.

CancellationException does not propagate to its parent

All exceptions that extend CancellationException are treated in a special way. They only cause cancellation of the current coroutine. CancellationException is an open class, so it can be extended by our own classes or objects.

import kotlinx.coroutines.* class MyNonPropagatingException : CancellationException() suspend fun main(): Unit = coroutineScope { launch { // 1 launch { // 2 delay(2000) println("Will not be printed") } delay(1000) throw MyNonPropagatingException() // 3 } launch { // 4 delay(2000) println("Will be printed") } } // (2 sec) // Will be printed

In the above snippet, we start two coroutines with builders at 1 and 4. After 1 second, we throw a MyNonPropagatingException exception at 3, which is a subtype of CancellationException. This exception is caught by launch (started at 1). This builder cancels itself, then it also cancels its children, namely the builder defined at 2. But the exception is not propagated to the parent (because it is of type CancellationException), so coroutineScope and its children (the coroutine started at 4) are not affected. That is why the coroutine started at 4 prints "Will be printed" after 2 seconds.

In some projects, I can observe a pattern of defining exceptions that extend CancellationException. That is a dangerous practice because it might lead to unexpected results. Consider that someone made the following code:

import kotlinx.coroutines.* import kotlin.coroutines.cancellation.CancellationException // Poor practice, do not do this class UserNotFoundException : CancellationException() suspend fun main() { try { updateUserData() } catch (e: UserNotFoundException) { println("User not found") } } suspend fun updateUserData() { updateUser() updateTweets() } suspend fun updateTweets() { delay(1000) println("Updating...") } suspend fun updateUser() { throw UserNotFoundException() } // User not found

This code works, but well... accidentally. It works only because between throwing an exception and catching it, there are no coroutines. Adding launch in between might change the result significantly.

import kotlinx.coroutines.* import kotlin.coroutines.cancellation.CancellationException // Poor practice, do not do this class UserNotFoundException : CancellationException() suspend fun main() { try { updateUserData() } catch (e: UserNotFoundException) { println("User not found") } } suspend fun updateUserData() = coroutineScope { launch { updateUser() } launch { updateTweets() } } suspend fun updateTweets() { delay(1000) println("Updating...") } suspend fun updateUser() { throw UserNotFoundException() } // (1 sec) // Updating...

In the above example, updateUser throws UserNotFoundException, but it is caught by launch from updateUserData, and only causes its cancellation. updateTweets is not affected, so it prints "Updating...", and the exception is not caught in main. This behavior is entirely different from the behaviour of the previous snippet, just because we extended CancellationException. It is a trap that someone can accidentally encounter.

Encountering such a situation is not common, as if we used async with await instead of launch, then await would throw UserNotFoundException and this exception should propagate. However, it is better to avoid such a practice, as it might lead to unexpected results that are hard to debug. It is safer to extend Exception or RuntimeException instead of CancellationException.

import kotlinx.coroutines.* class UserNotFoundException : RuntimeException() suspend fun main() { try { updateUserData() } catch (e: UserNotFoundException) { println("User not found") } } suspend fun updateUserData() { updateUser() updateTweets() } suspend fun updateTweets() { delay(1000) println("Updating...") } suspend fun updateUser() { throw UserNotFoundException() } // User not found

withTimeout

If you want to start a certain operation with timeout, you can use the withTimeout function. It behaves just like coroutineScope, until the timeout is exceeded. Then it cancels its children and throws TimeoutCancellationException (a subtype of CancellationException).

import kotlinx.coroutines.* suspend fun test(): Int = withTimeout(1500) { delay(1000) println("Still thinking") delay(1000) println("Done!") 42 } suspend fun main(): Unit = coroutineScope { try { test() } catch (e: TimeoutCancellationException) { println("Cancelled") } delay(1000) // Extra timeout does not help, // `test` body was cancelled } // (1 sec) // Still thinking // (0.5 sec) // Cancelled

The function withTimeout is especially useful for testing. It can be used to test if some function takes more or less than some time. If it is used inside runTest, it will operate in virtual time. We also use it inside runBlocking to just limit the execution time of some function (this is then like setting timeout parameter on @Test annotation from JUnit).

// will not start, because runTest requires kotlinx-coroutines-test, but you can copy it to your project import kotlinx.coroutines.* import kotlinx.coroutines.test.runTest import org.junit.Test class Test { @Test fun testTime2() = runTest { withTimeout(1000) { // something that should take less than 1000 delay(900) // virtual time } } @Test(expected = TimeoutCancellationException::class) fun testTime1() = runTest { withTimeout(1000) { // something that should take more than 1000 delay(1100) // virtual time } } @Test fun testTime3() = runBlocking { withTimeout(1000) { // normal test, that should not take too long delay(900) // really waiting 900 ms } } }

Beware that withTimeout throws TimeoutCancellationException, which is a subtype of CancellationException (the same exception that is thrown when a coroutine is cancelled). So, when this exception is thrown in a coroutine builder, it only cancels it and does not affect its parent.

import kotlinx.coroutines.* suspend fun main(): Unit = coroutineScope { launch { // 1 launch { // 2, cancelled by its parent delay(2000) println("Will not be printed") } withTimeout(1000) { // we cancel launch delay(1500) } } launch { // 3 delay(2000) println("Done") } } // (2 sec) // Done

In the above example, delay(1500) takes longer than withTimeout(1000) expects, so it throws TimeoutCancellationException. The exception is caught by launch from 1, and it cancels itself and its children, so launch from 2. launch started at 3 is not affected, so it prints "Done" after 2 seconds.

A less aggressive variant of withTimeout is withTimeoutOrNull, which does not throw an exception. If the timeout is exceeded, it just cancels its body and returns null. I find withTimeoutOrNull useful for wrapping functions in which waiting times that are too long signal that something went wrong. For instance, network operations: if we wait over 5 seconds for a response, it is unlikely we will ever receive it (some libraries might wait forever).

import kotlinx.coroutines.* class User() suspend fun fetchUser(): User { // Runs forever while (true) { yield() } } suspend fun getUserOrNull(): User? = withTimeoutOrNull(5000) { fetchUser() } suspend fun main(): Unit = coroutineScope { val user = getUserOrNull() println("User: $user") } // (5 sec) // User: null

suspendCancellableCoroutine

Here, you might remind yourself of the suspendCancellableCoroutine function introduced in the How does suspension work? chapter. It behaves like suspendCoroutine, but its continuation is wrapped into CancellableContinuation<T>, which provides some additional methods. The most important one is invokeOnCancellation, which we use to define what should happen when a coroutine is cancelled. Most often we use it to cancel processes in a library or to free some resources.

suspend fun someTask() = suspendCancellableCoroutine { cont -> cont.invokeOnCancellation { // do cleanup } // rest of the implementation }

The CancellableContinuation<T> also lets us check the job state (using the isActive, isCompleted and isCancelled properties) and cancel this continuation with an optional cancellation cause.

Summary

Cancellation is a powerful feature. It is generally easy to use, but it can sometimes be tricky. So, it is important to understand how it works. From this chapter, you should remember that:

  • When we cancel a coroutine, it changes its state to "Cancelling", and cancels all its children.
  • A coroutine in "Cancelling" state does not start child coroutines and throws CancellationException when we try to suspend it or if it is suspended.
  • It is guaranteed, that the body of the finally block and invokeOnCompletion handler will be executed.
  • We can invoke an operation only in case of cancellation by catching CancellationException, but we should rethrow it to inform the outer scope about the cancellation.
  • To start a new coroutine or make a suspending call in the "Cancelling" state, we can use withContext(NonCancellable).
  • To allow cancellation between non-suspending operations, we can use yield or ensureActive.
  • CancellationException does not propagate to its parent.
  • We can use withTimeout or withTimeoutOrNull to start a coroutine with a timeout.
  • Always use suspendCancellableCoroutine instead of suspendCoroutine when you need to transform a callback-based API into a suspending function, and use invokeOnCancellation to define what should happen when a coroutine is cancelled.

A properly used cancellation means fewer wasted resources and fewer memory leaks. It is important for our application’s performance, and I hope you will use these advantages from now on.

1:

A good example is CoroutineWorker on Android, where according to the presentation Understand Kotlin Coroutines on Android on Google I/O'19 by Sean McQuillan and Yigit Boyar (both working on Android at Google), support for coroutines was added primarily to use the cancellation mechanism.

2:

Actually, it’s worth much more since the code is currently not very heavy (it used to be, when it was stored on punched cards).

3:

If true, the function is called in the "Cancelling" state (i.e., before "Cancelled"). false by default.

4:

This parameter determines whether the handler should be called immediately if the handler is set when a coroutine is already in the desired state. true by default.

5:

This function first calls cancel, and then join, so it is called cancelAndJoin. Uncle Bob would be proud.