Google has introduced Android NNAPI for neural network accelerators, starting with Android version 8.1. The API abstracts device hardware – thus allowing app developers to run neural network computation without worrying about actual underlying hardware implementation.
When the user device (smartphone, tablet, etc.) does not have dedicated neural network hardware, GPU, DSP, CPU are used to carry out computations as a fall-back.
NNAPI supports inferencing using pre-trained models.
To run an inference, consists of these steps:
- App code loads a computation graph to the API. The graph precisely specifies the sequence of operations – e.g. convolve layer X with filter Y, apply ReLU activation and so on
- App code instructs NNAPI to “compile” the computation graph into lower-level code that run on the actual underlying hardware
- App code instructs NNAPI to allocate memory buffers and fills the memory buffers with input data and weights
- NNAPI runs the computation
- App code reads out computed output from memory buffers