Blog
Blog
Notes, paper digests, and the occasional running log.
Taming Outlier Tokens in Diffusion Transformers
Outlier patch tokens hurt both ViT encoders and diffusion transformers in RAE-DiT pipelines. Our Dual-Stage Registers (DSR) patch both sides and improve ImageNet-256 FID from 5.89 → 4.58 at 80 epochs.
Uni-Instruct: One-step Diffusion through Unified Divergence Instruction
A single f-divergence framework that subsumes 10+ one-step diffusion distillation methods — and a new SoTA one-step FID of 1.02 on ImageNet 64×64. NeurIPS 2025.
EMA: A Quiet Hyperparameter That Moves Diffusion Leaderboards
We benchmark EMA decay across pixel-, latent-, and representation-space diffusion models and show that the popular 0.9999 default silently trades recall for precision.
