Score Entropy Discrete Diffusion Models
Speaker: Stefano Ermon
Abstract: Diffusion models are at the core of many state-of-the-art generative AI systems for content such as images, videos, and audio. These models crucially rely on estimating gradients of the data distribution (scores) and efforts to generalize score-based modeling to discrete structures have had limited success. As a result, state-of-the-art generative models for discrete data such as language are based on autoregressive modeling (i.e. next token prediction). In this work, we bridge this gap by proposing a framework that extends score matching to discrete spaces and integrates seamlessly to build discrete diffusion models. The resulting Score Entropy Discrete Diffusion models are an alternative probabilistic modeling technique that achieves highly competitive performance at the scale of GPT-2 while introducing distinct algorithmic benefits. Our empirical results challenge the longstanding dominance of autoregressive modeling and could pave the way for an alternative class of language models built from radically different principles.
Bio: Stefano Ermon is an Associate Professor of Computer Science in the CS Department at Stanford University where he is affiliated with the Artificial Intelligence Laboratory. His research is centered on techniques for probabilistic modeling of data and is motivated by applications in the emerging field of computational sustainability. He has won several awards, including multiple Best Paper Awards, a NSF Career Award, ONR and AFOSR Young Investigator Awards, Microsoft Research Fellowship, Sloan Fellowship, and the IJCAI Computers and Thought Award. Stefano earned his Ph.D. in Computer Science at Cornell University in 2015.