What did I do?

I tried out the 2 most significant frameworks for on-device machine learning, TensorFlow Lite and PyTorch Mobile.

I created an Object Detection app and implemented the same functionality with both frameworks, inspired by the demo apps they provide in their official documentation.

The app opens the camera and starts feeding the captured images to either TFLite or PyTorch Mobile, depending on our selection. The framework runs the model on the image and responds with a set of bounding boxes and labels depicting the objects found in the image.

Object detection in action: Laptop, couch, and Ronnie.

For the UI I used Compose, Coroutines/Flows for passing data between layers, Hilt for Dependency Injection, and MVVM for the presentation layer architecture, something like this:

Generated with DALL·E 3

I am a developer, and as such, I will be focusing on the items that developers suffer the most. Ease of implementation, size, support, and reliability. I won’t be diving deep into complex benchmarks to compare performance over multiple inference situations as it is not my area of expertise. So let’s get started…

  1. Ease of implementation and APIs

This is a critical point for any developer. We want something that we can add to our project and start using right away without too much configuration and hassle.

Both libraries can be included in your project as a normal Gradle dependency. They both have their core versions and more granular libs for the specific vision APIs. Models are added as regular assets and you need to ensure that they are not compressed. Overall it was equally straightforward to get both of them up and running.

For the specific Object Detection use case, TFLite has an ObjectDetector class that contains a set of APIs to simplify the implementation. You can easily set base parameters such as the number of threads, number of objects to be detected, minimum confidence score, and other settings that make the integration seamless.

I was still able to provide these same functionalities using PyTorch Mobile but I had to implement them myself from scratch.

TFLite comes with an ImageProcessor object to perform a whole set of transformations to images and get them ready to be fed into the model. I had to manually implement these on PyTorch Mobile.

For the images to be processed by the frameworks, they need to be converted to Tensors first. Both libs have APIs that do this easily.

TFLite is the winner in this category due to the more mature and extended set of APIs.

2. Inference speed

To provide an even-handed and identical comparison of inference speed, I should’ve used the exact same models with the exact number of parameters, which wasn’t the case in my experiment. It didn’t matter too much as TFLite is the only lib that has GPU support out of the box. PyTorch Mobile has released an initial version of GPU support but it’s still in its early stages and it is only presented as a prototype now.

To understand how important GPU usage is and how it affects inference time, find below a table representing the average object detection time using the different computing settings on TFLite.

| Type               | Inference time (ms)*|
|--------------------|---------------------|
| TFLite CPU | 28.58 |
| TFLite GPU (NNAPI) | 11.18 |

* Average over 10 samples, inference time includes converting bitmap
to tensor + actual inference time.

Using GPU, inference time is almost 3x faster. Considering that most modern mobile phones already contain a GPU, TFLite is the winner in this section, at least until GPU support for PyTorch Mobile is fully stable and we can properly compare them.

3. Size

One of the key reasons why we can’t run certain models on mobile devices is the size of the models themselves. Storage size in mobile devices is minimal. On top of that, models cannot be obfuscated/shrank using Proguard/R8 out of the box, so the model size will directly impact the app size.

For that reason, we need the framework running the model to be as lightweight as possible. I checked the size of both frameworks on the release variant, minimizing with Proguard/R8 and these were the results. Old Size is PyTorch Mobile, New Size is TFLite.

Depending on the architecture, we see a reduction of 19.6MB up to 24.8MB when using TFLite, making it the clear winner of this section.

4. Official Support and Community

Last but definitely not least. Having a trusted community and official support gives developers peace of mind. Knowing that there is a team listening to bug reports or feature requests, and actively contributing to the library is a key factor.

We don’t want to spend time integrating a library into our project if we know that it will become stale soon and no longer be maintained.

A quick search through the TFLite and PyTorch Android Open issues demonstrates that there is an active community contributing to and keeping track of these. Documentation from TFLite and PyTorch Android is equally good. Both are open source as well.

TFLite has more official demo apps compared to PyTorch Mobile (19 vs 7 samples), but both cover the main use cases of Image Segmentation, Object Detection, Speech Recognition and Question Answering.

It’s a tie on this final section.

Conclusion

It was no surprise that TensorFlowLite, after all, would be the recommended framework for mobile inference. Maturity, GPU support, lib size, and the large range of APIs are the key points for this decision.

PyTorch Mobile remains a plausible option, especially considering that PyTorch itself (the full-sized library, not the mobile version) has become a standard among the research community. I’d also keep an eye on the official GPU support release to re-evaluate this decision.

If you want to check out the GitHub repository and experiment yourself:

Source link