Transprecision matrix multiplication (GEMM) for deep neural networks (DNN)

DNNs with fully connected layers use extensively GEMM as a computational kernel. Most DL frameworks use a GEMM version (such as MKL) which is optimized for large matrices. However, for many DNN models this kernel becomes a memory-bound operation because of the small matrices involved.We are looking for proposals to decouple the storage format used during training and inference from the arithmetic supported by the target hardware. The ultimate goal is to combine a compact low-precision (possibly non-standard) format for storage with a standard precision for the matrix multiplication arithmetic. This transprecision GEMM should be integrated into a current framework such as TensorFlow or Caffe2.
References:

For additional information, you can contact: Andrés Tomás Domínguez