To learn

  • train self supervvised
  • distilation: big model teacher is fully trained on a dataset, then the teacher teaches, together with the dataset, to the student