Teacher or supervisor? Effective online knowledge distillation via guided collaborative learning
2023; Elsevier BV; Volume: 228; Linguagem: Inglês
10.1016/j.cviu.2023.103632
ISSN1090-235X
AutoresDiana Borza, Tudor Alexandru Ileni, Alexandru Ion Marinescu, Sergiu Adrian Darabant,
Tópico(s)Advanced Neural Network Applications
ResumoKnowledge distillation is a widely-used and effective technique to boost the performance of a lightweight student network, by having it mimic the behavior of a more powerful teacher network. This paper presents an end-to-end online knowledge distillation strategy, in which several peer students are trained together and their predictions are aggregated into a powerful teacher ensemble via an effective ensembling technique that uses an online supervisor network to determine the optimal way of combining the student logits. Intuitively, this supervisor network learns the area of expertise of each student and assigns a weight to each student accordingly►it has knowledge of the input image, the ground truth data, and the predictions of each individual student, and tries to answer the following question: "how much can we rely on each student's prediction, given the current input image with this ground truth class?". The proposed technique can be thought of as an inference optimization mechanism as it improves the overall accuracy over the same number of parameters. The experiments we performed show that the proposed knowledge distillation consistently improves the performance of the knowledge-distilled students vs. the independently trained students.
Referência(s)