Notes, hypotheses, and findings from our work on end-to-end diffusion training and visual representation learning.
Because ImageNet alone is no longer sufficient.