NVIDIA has open-sourced their “Deep Learning Accelerator” (NVDLA), available at GitHub. It comes with the whole package
- Synthesizable RTL
- Synthesis scripts
- Verification testbench
- C-model (to be released)
- Documentation
- Linux drivers
Seems like there are no strings attached licensing-wise and patent-grant-wise – anyone can integrate it in a commercial product, sell the product and owe nothing to NVIDIA.
NVIDIA wants to continue NVDLA development in public, via GitHub community contribution.
Architecture-wise, NVDLA appears to be a convolution accelerator
- Input data streams from memory, via “Memory interface block” and via “Convolution buffer” (4Kb..32Kb) in to “Convolution core”
- The “Convolution core” is a “wide MAC pipeline”
- Followed by “Activation engine”
- Followed by “Pooling engine”
- Followed by “Local response normalization” block
- Followed by “Reshape” block
- and streaming out back to “Memory interface block”
The architecture is configurable using RTL synthesis parameters, supports
- Data type choice of Binary, INT4, INT8, INT16, INT32, FP16, FP32, FP64
- Winograd convolution
- Sparse compression for both weights and feature data to reduce memory storage, bandwidth – especially useful for fully-connected layers
- Second memory interface for on-chip buffering to increase bandwidth, reduce latency vs. DRAM access
- Batching, ranging 1..32 samples