Teacher or supervisor? Effective online knowledge distillation via guided collaborative learning

Artigo Revisado por pares

Teacher or supervisor? Effective online knowledge distillation via guided collaborative learning

2023; Elsevier BV; Volume: 228; Linguagem: Inglês

10.1016/j.cviu.2023.103632

ISSN

1090-235X

Autores

Diana Borza, Tudor Alexandru Ileni, Alexandru Ion Marinescu, Sergiu Adrian Darabant,

Tópico(s)

Advanced Neural Network Applications

Resumo

Knowledge distillation is a widely-used and effective technique to boost the performance of a lightweight student network, by having it mimic the behavior of a more powerful teacher network. This paper presents an end-to-end online knowledge distillation strategy, in which several peer students are trained together and their predictions are aggregated into a powerful teacher ensemble via an effective ensembling technique that uses an online supervisor network to determine the optimal way of combining the student logits. Intuitively, this supervisor network learns the area of expertise of each student and assigns a weight to each student accordingly►it has knowledge of the input image, the ground truth data, and the predictions of each individual student, and tries to answer the following question: "how much can we rely on each student's prediction, given the current input image with this ground truth class?". The proposed technique can be thought of as an inference optimization mechanism as it improves the overall accuracy over the same number of parameters. The experiments we performed show that the proposed knowledge distillation consistently improves the performance of the knowledge-distilled students vs. the independently trained students.

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Teacher or supervisor? Effective online knowledge distillation via guided collaborative learning