paper
https://arxiv.org/abs/2006.05525

Knowledge distillation is a technique to transfer knowledge from large model to small model.
By knowledge distillation, a small model’s performance can be increased.
Knowledge can be anything related to output of model components.


Response based knowledge refers to the neural response of the last output layer of the teacher model.





.

Pros.
Response based Knowledge Distillation uses only output of last layer.
If we can use outputs of hidden layers, it will be better than Response based KD.
Feature based knowledge refers to the output of the intermediate layers in the teacher model.

Knowledge transfer via distillation of activation boundaries formed by hidden neurons


