Artigo Produção Nacional Revisado por pares

Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs

2018; Institute of Electrical and Electronics Engineers; Volume: 68; Issue: 2 Linguagem: Inglês

10.1109/tr.2018.2878387

ISSN

1558-1721

Autores

Fernando Fernandes dos Santos, Pedro Foletto Pimenta, Caio Lunardi, Lucas Klein Draghetti, Luigi Carro, David Kaeli, Paolo Rech,

Tópico(s)

Adversarial Robustness in Machine Learning

Resumo

Graphics processing units (GPUs) are playing a critical role in convolutional neural networks (CNNs) for image detection. As GPU-enabled CNNs move into safety-critical environments, reliability is becoming a growing concern. In this paper, we evaluate and propose strategies to improve the reliability of object detection algorithms, as run on three NVIDIA GPU architectures. We consider three algorithms: 1) you only look once; 2) a faster region-based CNN (Faster R-CNN); and 3) a residual network, exposing live hardware to neutron beams. We complement our beam experiments with fault injection to better characterize fault propagation in CNNs. We show that a single fault occurring in a GPU tends to propagate to multiple active threads, significantly reducing the reliability of a CNN. Moreover, relying on error correcting codes dramatically reduces the number of silent data corruptions (SDCs), but does not reduce the number of critical errors (i.e., errors that could potentially impact safety-critical applications). Based on observations on how faults propagate on GPU architectures, we propose effective strategies to improve CNN reliability. We also consider the benefits of using an algorithm-based fault-tolerance technique for matrix multiplication, which can correct more than 87% of the critical SDCs in a CNN, while redesigning maxpool layers of the CNN to detect up to 98% of critical SDCs.

Referência(s)
Altmetric
PlumX