Structured Pruning of RRAM Crossbars for Efficient In-Memory Computing Acceleration of Deep Neural Networks
2021; Institute of Electrical and Electronics Engineers; Volume: 68; Issue: 5 Linguagem: Inglês
10.1109/tcsii.2021.3069011
ISSN1558-3791
AutoresJian Meng, Yang Li, Xiaochen Peng, Shimeng Yu, Deliang Fan, Jae-sun Seo,
Tópico(s)Advanced Neural Network Applications
ResumoThe high computational complexity and a large number of parameters of deep neural networks (DNNs) become the most intensive burden of deep learning hardware design, limiting efficient storage and deployment. With the advantage of high-density storage, non-volatility, and low energy consumption, resistive RAM (RRAM) crossbar based in-memory computing (IMC) has emerged as a promising technique for DNN acceleration. To fully exploit crossbar-based IMC efficiency, a systematic compression design that considers both hardware and algorithm is necessary. In this brief, we present a system-level design considering the low precision weight and activation, structured pruning, and RRAM crossbar mapping. The proposed multi-group Lasso algorithm and hardware implementations have been evaluated on ResNet/VGG models for CIFAR-10/ImageNet datasets. With the fully quantized 4-bit ResNet-18 for CIFAR-10, we achieve up to 65.4× compression compared to full-precision software baseline, and 7× energy reduction compared to the 4-bit unpruned RRAM IMC hardware with 1.1% accuracy loss. For the fully quantized 4-bit ResNet-18 model for ImageNet dataset, we achieve up to 10.9× structured compression with 1.9% accuracy degradation.
Referência(s)