The brain age has been proven a phenotype with relevance to cognitive performance and brain disease. With the development of deep learning, brain age estimation accuracy has been greatly improved. However, such methods may incur over-fitting and suffer from poor generalizations, especially for insufficient brain imaging data. This paper presents a novel regularization method that penalizes the predictive distribution using knowledge distillation and introduces additional knowledge to reinforce the learning process. During knowledge distillation, we propose a gated distillation mechanism to enable the student model to attentively learn key knowledge from the teacher model, given the assumption that the teacher may not always be correct. Moreover, to enhance the capability of knowledge transfer, the hint representation similarity is also adopted to regularize the model training. We evaluate the model by a cohort of 3655 subjects from 4 public datasets, demonstrating that the proposed method improves the prediction performance over several well-established models, where the mean absolute error of the estimated ages is 2.129 years.
Supplementary notes can be added here, including code.