Education
I expect to graduate in June 2027 and am seeking post-doctoral or researcher positions.
If you are interested in diffusion language models or unified diffusion models, I would welcome the opportunity to discuss.
• 2018–2022: Bachelor’s in Computer Science, Xi’an Jiaotong University
• 2022–(expected 2027): PhD in Artificial Intelligence, Renmin University of China (Advisor: Prof.
Chongxuan Li)
Research
I focus on
deep generative models, especially
multimodal diffusion models.
One of my favorite papers is
Vision Transformer. ViT taught me that removing inductive biases
from data (e.g., translation equivariance in images, and the left-to-right paradigm in text) and employing
large-scale training is beneficial for deep learning algorithms. This insight also aligns with "The Bitter Lesson".
Therefore, my research focuses on removing inductive biases and developing scalable generative models.
Selected Publications
-
Large Language Diffusion Models
Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li
Advances in Neural Information Processing Systems (NeurIPS), 2025.
Oral, NeurIPS 2025
Long Paper Best Paper Award, DeLTa@ICLR 2025
The first diffusion language model comparable to advanced LLMs (e.g., LLaMA). Earlier than Inception Lab's Mercury and Google's Gemini Diffusion.
[Paper]
-
All are Worth Words: A ViT Backbone for Diffusion Models
Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, Jun Zhu
Computer Vision and Pattern Recognition Conference (CVPR), 2023.
The first diffusion transformer. Earlier than Openai Sora's DiT.
[Paper]
Full Publications
-
Scaling up masked diffusion models on text
Shen Nie, Fengqi Zhu, Chao Du, Tianyu Pang, Qian Liu, Guangtao Zeng, Ming Lin, Chongxuan Li
International Conference on Learning Representations (ICLR), 2025.
[Paper]
-
The blessing of randomness: Sde beats ode in general diffusion-based image editing
Shen Nie, Hanzhong Guo, Cheng Lu, Yuhao Zhou, Chenyu Zheng, Chongxuan Li
International Conference on Learning Representations (ICLR), 2024.
[Paper]
-
Your absorbing discrete diffusion secretly models the conditional distributions of clean data
Jingyang Ou, Shen Nie, Kaiwen Xue, Fengqi Zhu, Jiacheng Sun, Zhanguo Li, Chongxuan Li
International Conference on Learning Representations (ICLR), 2025.
[Paper]
-
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Fengqi Zhu, Rongzhen Wang, Shen Nie, Xiaolu Zhang, Chunwei Wu, Jun Hu, Jun Zhou, Jianfei Chen, Yankai Lin, Ji-Rong Wen, Chongxuan Li
arXiv preprint, 2025.
[Paper]
-
Llada-v: Large language diffusion models with visual instruction tuning
Zebin You, Shen Nie, Xiaolu Zhang, Jun Hu, Jun Zhou, Zhiwu Lu, Ji-Rong Wen, Chongxuan Li
arXiv preprint, 2025.
[Paper]
-
Ultrallada: Scaling the context length to 128k for diffusion large language models
Guangxin He, Shen Nie, Fengqi Zhu, Yuankang Zhao, Tianyi Bai, Ran Yan, Jie Fu, Chongxuan Li, Binhang Yuan
arXiv preprint, 2025.
[Paper]
-
Masked Diffusion Models as Energy Minimization
Sitong Chen, Shen Nie, Jjiacheng Sun, Zijing Feng, Zhengguo Li, Ji-Rong Wen, Chongxuan Li
Advances in Neural Information Processing Systems (NeurIPS), 2025.
[Paper]
-
Unifying bayesian flow networks and diffusion models through stochastic differential equations
Kaiwen Xue, Yuhao Zhou, Shen Nie, Xu Min, Xiaolu Zhang, Jun Zhou, Chongxuan Li
International Conference on Machine Learning (ICML), 2024.
[Paper]
-
Real-time identity defenses against malicious personalization of diffusion models
Hanzhong Guo, Shen Nie, Chao Du, Tianyu Pang, Hao Sun, Chongxuan Li
arXiv preprint, 2024.
[Paper]
-
One transformer fits all distributions in multi-modal diffusion at scale
Fan Bao, Shen Nie, Kaiwen Xue, Chongxuan Li, Shi Pu, Yaole Wang, Gang Yue, Yue Cao, Hang Su, Jun Zhu
International Conference on Machine Learning (ICML), 2023.
[Paper]
Experience
-
ByteDance
Top-seed Research Intern, 2025.03 - Present
Focus: Diffusion Language Models
-
Ant Group
Research Intern, 2024.12 - 2025.02
Focus: Diffusion Language Models
-
Sea AI Lab
Research Intern, 2024.03 - 2024.11
Focus: Diffusion Language Models
-
Kuaishou (Kwai)
Research Intern, 2023.11 - 2024.01
Focus: Text-to-Image/Video Models
-
Shengshu
Research Intern, 2023.03 - 2023.10
Focus: Text-to-Image/Video Models, Unified Diffusion Model
Current Interests
-
Infra. Infra is the key in today's AI. I am currently learning it.
-
RL for Diffusion Language Models. Reinforcement Learning for dLLMs shares roots with autoregressive models, but presents many significant and fundamental differences.
-
Normalizing Flow. For example, [FARMER]. This is a very interesting topic and might be a reliable method or component for future unified models.
Academic Services
Conference reviewer for ICLR, ICML, NeurIPS, CVPR, MM, TPAMI
© 2025 Shen Nie