AI-Driven Scientific Discovery
Yuanqi Du, Cornell University
Abstract: Scientific discovery, the main driver for human progress, has long been constrained by two rate-limiting factors: the exponential expansion of hypothesis spaces and the prohibitive costs to validate hypothesis through high-fidelity simulation or experiment. Recent advances in AI present an unprecedented opportunity to overcome these limitations. In this talk, I will present a unified computational perspective on accelerating discovery through probabilistic machine learning. I will demonstrate how a single probabilistic paradigm, encompassing generative models, probabilistic inference, and large language models, drives progress across three critical stages of the discovery pipeline: search, validate and automate. First, I will show how generative models, trained on massive data, can serve as flexible, data-driven priors to guide and accelerate hypothesis search across vast and structured hypothesis spaces. Second, I will show how advances in probabilistic inference, grounded by non equilibrium thermodynamics, make it possible to efficiently discover rare but critical reaction behaviors in physical chemistry simulation. Finally, I will present how large language models can function as a generalist agent to close this loop by automating the iterative search-and-validate scientific discovery loop.
