Site icon JoinAppStudio

Why has my background Worker stopped? Exploring Android WorkManger’s StopReason | by Paolo Rotolo | Sep, 2023

Why has my background Worker stopped? Exploring Android WorkManger’s StopReason | by Paolo Rotolo | Sep, 2023

Between platform changes, new rules, and Android customizations by OEMs, scheduling long-running background jobs has become a quite challenging task.

A practical solution that abstracts most of the underlying problems for developers is WorkManager, which supports both short, immediate jobs (Expedited Work) and long-running workers (with Foreground Services). But even WorkManager, can’t guarantee that a worker runs indefinitely in background so… what to do in that scenario?

WorkManager already had some debug tools like extensive logging, adb dumps, and diagnosing info to collect more insights about a failed schedulation of a worker or its premature stop. But let’s be real, in the majority of cases, situations like this don’t happen on developer devices, but out there in the open, where the app is used in real-life scenarios on heterogeneous devices in different conditions.

Introducing Stop Reason

If you have a look at WorkManager’s changelog, you’ll notice a new interesting feature being introduced starting from version 2.9.0-alpha02:

As you might have guessed, the goal of getStopReason() is to provide a clear hint of why the previous execution of a worker has been halted. Plus, the good news is that you can programmatically collect the result and eventually have it logged somewhere to gain useful information and take corrective actions.

If you had the chance to use JobScheduler’s API directly (which WorkManager uses under the hood in most of the cases), you’ll find out that stop reasons already existed there as a property in JobParameters.

class ExampleJobService : JobService() {
override fun onStopJob(params: JobParameters): Boolean {
Log.d(TAG, "Job stopped (${params.stopReason})")
return true
}

Git blaming the source code of JobParameters, we find out that getStopReason() was actually introduced in 2016:

So WorkManager 2.9.0 is basically exposing the stop reason of the internal JobService used to run the work to developers, plus some more WorkManager-specific stop reasons.

There’s also a little difference between WorkManager 2.9.0-alpha02 and WorkManager 2.9.0-beta01.

Looking at WorkManager’s WorkInfo class, version 2.9.0-alpha02 has just two Stop Reasons defined. Those are the additional stop reasons specific to WorkManager. To find out the other StopReasons constants, developers would have to use constants from JobParameters (JobParameters.STOP_REASON_*.).

WorkManager 2.9.0-alpha02, WorkInfo.kt

Fast forward a month later, WorkManger 2.9.0-beta01 gets released, with the whole constants of StopReasons mirrored in WorkInfo class. So now developers don’t need to query JobParameters constants anymore.

WorkManager 2.9.0-beta01, WorkInfo.kt

Test time

Now that we have an idea of the new API, let’s see it in action. For our tests, we simulate a worker running a job for 10 seconds.

class ExampleWorker(
appContext: Context,
params: WorkerParameters,
) : CoroutineWorker(appContext, params) {
override suspend fun doWork(): Result {
Log.d(TAG, "Starting work")
delay(10.seconds)
Log.d(TAG, "Work done")
return Result.success()
}
}

In our application, let’s see what happens if we start the Worker using a OneTimeWorkRequest with no constraints, observing the WorkInfo’s new StopReason.

  val name = "exampleWork"
val workManager = WorkManager.getInstance(applicationContext)
val workRequest = OneTimeWorkRequestBuilder<ExampleWorker>()
.build()

scope.launch {
workManager.getWorkInfoByIdFlow(workRequest.id).collect {
Log.d(TAG, PrintableStopReason(it.stopReason).toString())
}
}

workManager.enqueueUniqueWork(name, ExistingWorkPolicy.REPLACE, workRequest)

For convenience, I’ve wrapped stopReason in a utility class that prints the appropriate constant name. You can find it in this gist.

After an app run, this is the log:

com.paolorotolo.workmanagertest      D  STOP_REASON_NOT_STOPPED
com.paolorotolo.workmanagertest D Starting work
com.paolorotolo.workmanagertest D Work done

STOP_REASON_NOT_STOPPED

As we expected, the stop reason returned in this case is… not stopped. That’s because our work is in fact running (or enqueued) and has not been stopped before.

If we’re debugging a stopped worker, there’s no need to log this case. We can improve the code with this additional check:

val name = "exampleWork"
val workManager = WorkManager.getInstance(applicationContext)
val workRequest = OneTimeWorkRequestBuilder<ExampleWorker>()
.build()

scope.launch {
workManager.getWorkInfoByIdFlow(workRequest.id).collect {
if (it.runAttemptCount > 0 && it.state == WorkInfo.State.ENQUEUED) {
Log.d(TAG, PrintableStopReason(it.stopReason).toString())
}
}
}

workManager.enqueueUniqueWork(name, ExistingWorkPolicy.REPLACE, workRequest)

We’re now checking if the Worker has run before (runAttemptCount > 0) and it’s actually in the running queue (state == ENQUEUED), to avoid getting stopReason logs each time the WorkInfo changes when state == RUNNING.

Note that it might be a case when STOP_REASON_NOT_STOPPED would be logged again. That’s when a Worker exits with Result.retry(). In that case, runAttemptCount will be > 0 since Worker has run before, and state will be enqueued when the system schedules the next run.

Get stopReason in Worker itself

Instead of observing WorkInfo from the outside, it is also possible to get (and possibly log) the stop reason from the worker itself, right before it stops.

For regular Workers, you can override the onStopped() method and get the stop reason there. For CoroutineWorkers, the worker cancellation is signaled by a CancellationException in doWork(). So we can run something after the exception in the finally block.

  override suspend fun doWork(): Result {
try {
Log.d(TAG, "Starting work")
delay(10.seconds)
Log.d(TAG, "Work done")
return Result.success()
} finally {
Log.d(MainActivity.TAG, "${PrintableStopReason(getStopReason())}")
}
}

STOP_REASON_CANCELLED_BY_APP

Now, let’s make things more interesting.

workManager.enqueueUniqueWork(name, ExistingWorkPolicy.REPLACE, workRequest)
delay(5.seconds)
workManager.enqueueUniqueWork(name, ExistingWorkPolicy.REPLACE, workRequest)

Here we simulate the enqueue of another Worker with the same name. Since our policy is REPLACE, and there can be only one Unique Worker running at a time, we expect the first worker to stop.

com.paolorotolo.workmanagertest      D  Starting work
com.paolorotolo.workmanagertest D STOP_REASON_CANCELLED_BY_APP
com.paolorotolo.workmanagertest D Starting work
com.paolorotolo.workmanagertest D Work done

STOP_REASON_CANCELLED_BY_APP advises us that the app itself stopped the first service (replacing it with another one). The result would have been the same if the app stopped the worker with WorkManager’s cancelUniqueWork.

STOP_REASON_DEVICE_STATE

Until now, we did some tests with short jobs (our worker simulated a job with a function suspending for 10 seconds). What happens if we try to perform a background job for more time? Let’s raise the simulated job time to 15 minutes. This time, we count the minutes to know if the job is still running:

    override suspend fun doWork(): Result {
Log.d(TAG, "Starting work")
for (i in 0..15) {
Log.d(TAG, "$i minutes elapsed")
delay(1.minutes)
}
Log.d(TAG, "Work done")
return Result.success()
}

Since this could run for a lot of time, why not speed things up by activating Battery Saver?

We run the worker now and see the logs:

com.paolorotolo.workmanagertest      D  Starting work
com.paolorotolo.workmanagertest D 0 minutes elapsed
com.paolorotolo.workmanagertest D 1 minutes elapsed
com.paolorotolo.workmanagertest D 2 minutes elapsed
com.paolorotolo.workmanagertest D 3 minutes elapsed
com.paolorotolo.workmanagertest D 4 minutes elapsed
com.paolorotolo.workmanagertest D STOP_REASON_DEVICE_STATE
com.paolorotolo.workmanagertest D Starting work
com.paolorotolo.workmanagertest D 0 minutes elapsed

So, we learned that STOP_REASON_DEVICE_STATE triggers when the device state needs to stop the worker for:

  • Battery saver on
  • Doze
  • High memory usage

and more…

In this case, a background worker survived 4 minutes on my idle emulator with the battery saver on.

STOP_REASON_TIMEOUT

Should we try with Battery Saver off now? I’m testing on an Emulator with Android 13 (S). According to JobScheduler documentation:

In Android version Build.VERSION_CODES.LOLLIPOP, jobs had a maximum execution time of one minute. Starting with Android version Build.VERSION_CODES.M and ending with Android version Build.VERSION_CODES.R, jobs had a maximum execution time of 10 minutes. Starting from Android version Build.VERSION_CODES.S, jobs will still be stopped after 10 minutes if the system is busy or needs the resources, but if not, jobs may continue running longer than 10 minutes.

So, I should be able to run jobs for more than 15 minutes, if the system is not under load. Let’s try it out:

com.paolorotolo.workmanagertest      D  Starting work
com.paolorotolo.workmanagertest D 0 minutes elapsed
com.paolorotolo.workmanagertest D 1 minutes elapsed
com.paolorotolo.workmanagertest D 2 minutes elapsed
com.paolorotolo.workmanagertest D 3 minutes elapsed
com.paolorotolo.workmanagertest D 4 minutes elapsed
com.paolorotolo.workmanagertest D 5 minutes elapsed
com.paolorotolo.workmanagertest D 6 minutes elapsed
com.paolorotolo.workmanagertest D 7 minutes elapsed
com.paolorotolo.workmanagertest D 8 minutes elapsed
com.paolorotolo.workmanagertest D 9 minutes elapsed
com.paolorotolo.workmanagertest D 10 minutes elapsed
com.paolorotolo.workmanagertest D 11 minutes elapsed
com.paolorotolo.workmanagertest D 12 minutes elapsed
com.paolorotolo.workmanagertest D 13 minutes elapsed
com.paolorotolo.workmanagertest D 14 minutes elapsed
com.paolorotolo.workmanagertest D 15 minutes elapsed
com.paolorotolo.workmanagertest D Work done

We were lucky. The emulator was just idling, so all system resources were available to run the service. On lower versions of Android or on busy systems, you would probably STOP_REASON_TIMEOUT. As you may have guessed, that happens when the job uses its maximum execution time (refer to the documentation linked above to know what is the max execution time for each Android version).

STOP_REASON_USER

Since we tried to run our service for more than 10 minutes, why not promote it to a Foreground Service?

override suspend fun doWork(): Result {
setForeground(createForegroundInfo())

Log.d(TAG, "Starting work")
for (i in 0..(15)) {
Log.d(TAG, "$i minutes elapsed")
delay(1.minutes)
}
Log.d(TAG, "Work done")
return Result.success()
}

This also ensures that a notification is shown to the user since we’re running a Foreground Service. What if we add a button on this notification to let the user stop the service?

private fun createForegroundInfo(): ForegroundInfo {
val title = "Example Service"

// Pending Intent to cancel the worker
val intent = WorkManager.getInstance(applicationContext)
.createCancelPendingIntent(getId())

createNotificationChannel("channel_id", "Example channel")

val notification = NotificationCompat.Builder(applicationContext, "CHANNEL ID")
.setContentTitle(title)
.setTicker(title)
.setContentText("Running")
.setSmallIcon(R.mipmap.ic_launcher)
.setOngoing(true)
.addAction(android.R.drawable.ic_delete, "STOP", intent)
.build()

return ForegroundInfo(42, notification)
}

When the user clicks on Cancel, our Worker will be canceled. Let’s run the app.

Then after 1 minute, we click on STOP.

com.paolorotolo.workmanagertest      D  Starting work
com.paolorotolo.workmanagertest D 0 minutes elapsed
com.paolorotolo.workmanagertest D 1 minutes elapsed
com.paolorotolo.workmanagertest D STOP_REASON_CANCELLED_BY_APP

We have already seen this StopReason above. So user action on the notification button is still considered CANCELLED_BY_APP, since it is the app canceling the worker in reaction to the button press.

I wonder what happens when the user kills the service using the Task Manager.

com.paolorotolo.workmanagertest      D  Starting work
com.paolorotolo.workmanagertest D 0 minutes elapsed

---------------------------- PROCESS ENDED (12255) for package com.paolorotolo.workmanagertest ----------------------------

Uh, that actually killed the whole process, including the Application listening to events. Even the Worker didn’t have time to run its finally block and report back the stopReason.

According to documentation, we should get STOP_REASON_USER when:

The user stopped the job. This can happen either through force-stop, adb shell commands, uninstalling, or some other UI.

However, I didn’t find a way to log STOP_REASON_USER, even when the system re-schedules the worker when restarting the app. Also not sure how to recover the stopReason when the user uninstalls the app. If you have some insight, please let me know in the comments.

A workaround to log this case may be using the ApplicationExitInfo API. when the app restarts. According to the documentation, you should get ApplicationExitInfo.REASON_USER_REQUESTED when

[…] application process was killed because of the user request, for example, user clicked the “Force stop” button of the application in the Settings, or removed the application away from Recents.

App Buckets

Starting from Android 9, the system dynamically assigns each app to a priority bucket:

  • Active: app is being used or was used very recently.
  • Working set: app is in regular use.
  • Frequent: app is often used but not daily.
  • Rare: app isn’t frequently used.
  • Restricted: app consumes a lot of system resources or might exhibit undesirable behavior.

That affects the ability of the app to schedule background jobs. Some stopReasons could be delivered according to the app restrictions and its current bucket:

  • STOP_REASON_APP_STANDBY: The current standby bucket requires that the job stop now.
  • STOP_REASON_QUOTA: The app has consumed all of its current quota. The amount of execution time available to an app is based on the standby bucket and process importance.

This should be the quota that the system assigns to run Expedite Jobs. In fact, the user can even react to the app reaching its maximum running time by setting an OutOfQuotaPolicy. If OutOfQuotaPolicy.DROP_WORK_REQUEST is selected, you’ll probably end up with this stop reason.

  • STOP_REASON_ESTIMATED_APP_LAUNCH_TIME_CHANGED: The system’s estimate of when the app will be launched changed significantly enough to decide this worker shouldn’t be running right now.
  • STOP_REASON_BACKGROUND_RESTRICTION: The app is restricted from running in the background. The user could manually set background restrictions from the app’s battery usage page:

You can also programmatically get and log which bucket the app is currently on with getAppStandbyBucket() .

Worker’s Constraints

Finally, WorkManager can schedule work only when some constraints are met. For example, the system could automatically schedule an enqueued worker when:

  • there is unmetered Internet connectivity
  • the phone is charging or idle
  • the phone storage is not low

and when many other constraints are met.

If a worker starts and those constraints aren’t met anymore, the worker gets canceled, and guess what the stopReason will be?

  • STOP_REASON_CONSTRAINT_BATTERY_NOT_LOW: when the requested battery-not-low constraint is no longer satisfied
  • STOP_REASON_CONSTRAINT_CHARGING: when the requested charging constraint is no longer satisfied
  • STOP_REASON_CONSTRAINT_CONNECTIVITY: when the requested connectivity constraint is no longer satisfied
  • STOP_REASON_CONSTRAINT_DEVICE_IDLE: when the requested idle constraint is no longer satisfied
  • STOP_REASON_CONSTRAINT_STORAGE_NOT_LOW: when the requested storage-not-low constraint is no longer satisfied.

Conclusions and availability

In the end, StopReason is a powerful tool to gain useful info about the unexpected behavior of the background workers in our app. But… does it work for all Android versions?

Unfortunately, not. As you can see in WorkManager’s source code, WorkInfo.stopReason only returns a specific reason on SDK ≥ 31. Android versions below, will always get STOP_REASON_UNKNOWN:

    // SystemJobService.java

int stopReason;
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.S) {
stopReason = Api31Impl.getStopReason(params);
} else {
stopReason = WorkInfo.STOP_REASON_UNKNOWN;
}
//
mWorkLauncher.stopWorkWithReason(runId, stopReason);

Moreover, using getStopReason() in a Worker or CoroutineWorker requires API level 31.

Why? Digging into AOSP code review platform, we find out the answer:

So, at least on SDK ≥ 31, have fun debugging your workers!

Source link

Exit mobile version