Imagination Technologies PowerVR 2NX architecture overview
Streaming architecture, apparently 8-bit inference.
- Weights, activations flow from “DDR” over bus interface to “NN Compute Core/Engine”
- “NN Compute Core/Engine” looks like a multiplier array
- Next, multiplication results proceed to an “Accumulation Buffer”
- Next, summed results pass through Activation/Pool/Normalize/”Element Engine” modules and end up in a “Shared Buffer”
- Lastly, data from the “Shared Buffer” streams into “Output Formatter” and on via bus interface to DDR
To save DRAM size, bandwidth and power consumption, weights and activations bit width is configurable up to 8 bits maximum.
- Less-than-8-bit-wide values appear to be stored in DDR in packed format to save DDR size and bandwidth.
- After getting fetched from DDR, less-than-8-bit-wide weights, activations get padded to full 8 bits width (e.g. with zeros) and continue on into the multiplier array.