AI Products Group at Intel has released a preprint of “Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks”.
- replace traditional FP32 with 16-bit “Flexpoint”
- Tensor values are int16 acting as mantissas
- A shared exponent value (5 bit) is specified for the entire tensor
- Since there is only one exponent per tensor, multiplication, addition (of individual elements from pair of tensors) become a fixed point operation
- On the other hand, shared-per-tensor exponent causes dynamic range of values in the tensor to reduce
- To counteract the reduced dynamic range, the shared exponent is “dynamically adjusted [managed] to minimize overflows and maximize available dynamic range”
- The format is verified with AlexNet, a deep residual network (ResNet) and a generative adversarial network (GAN) with no need to tweak model hyper-parameters
Shared-exponent management algorithm, called Autoflex, assumes that ranges of values in the network change sufficiently slowly, such that exponent ranges change slowly as model training proceeds and “exponents can be predicted with high accuracy based on historical trends”. Autoflex adjusts the common exponent up and down as it detects under-flows and over-flows.