Self-Distillation as Privileged-Context Distillation
Overview 最近几篇 self-distillation 的论文,核心结构非常一致: Self-Distillation Enables Continual Learning Reinforcement Learning via Self-Distillation Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models 这三篇工作都不是传统意义上的“大模型蒸馏小模型”。更准确的表述是:同一个模型同时扮演 student 和 teacher,teacher 只是比 student 多看了一份特权上下文。 ...