What is Knowledge Distillation?
TL;DR
A technique that transfers knowledge from a large teacher model to a smaller student model. Enables lightweight AI deployment.
Knowledge Distillation: Definition & Explanation
Knowledge Distillation is a technique that transfers the 'knowledge' of a large, high-performing 'teacher model' to a smaller, lighter 'student model.' By using the teacher model's outputs (soft labels) as training targets for the student model, the student can achieve performance close to the teacher's with far fewer parameters. This enables efficient AI execution on resource-constrained devices like edge hardware and mobile phones. The technique is used commercially — for example, Google's Gemma was created by distilling knowledge from Gemini. In the development of SLMs (Small Language Models), distillation is crucial for efficiently condensing the knowledge of large models into compact ones, contributing to cost-effective AI solutions.