Multimodal AI Agents
Ruslan Salakhutdinov
Join us for a talk by Ruslan Salakhutdinov, UPMC professor of Computer Science in the Machine Learning Department, School of Computer Science at Carnegie Mellon University. This talk is part of the Kempner Seminar Series, a research-level seminar series on recent advances in the field.
In recent years, the rise of Large Language Models (LLMs) with advanced general capabilities has has paved the way towards building language-guided agents that can perform complex, multi-step tasks on behalf of users, much like human assistants. Building agents that can perceive, plan, and act autonomously has long been a central goal of artificial intelligence research. In this talk I will introduce Multimodal AI agents capable of planning, reasoning, and executing actions on the web, that can not only comprehend textual information but also effectively navigate and interact with visual settings I will next present an inference-time search algorithm for agents to explicitly perform exploration and multi-step planning in interactive web environments. Our approach is a form of best-first tree search that operates within the actual environment space, and is complementary with most existing state-of-the-art agents. Finally, I will introduce VisualWebArena, a novel framework for evaluating multimodal autonomous language agents, and offer insights towards building stronger autonomous agents for both digital and physical environments.
Russ Salakhutdinov earned his PhD in computer science from the University of Toronto, where he was advised by Nobel Laureate Geoffrey Hinton. After spending two post-doctoral years at MIT, he joined the University of Toronto and later moved to CMU. He also served as a director of AI research at Apple. Russ’s primary interests lie in deep learning, machine learning, and generative AI. He is an action editor of the Journal of Machine Learning Research, served on the senior programme committee of several top-tier machine learning conferences including NeurIPS, ICLR, and ICML, was a program co-chair for ICML 2019 and general chair for ICML 2024. He has authored over 250 research papers and his work has received over 200,000 citations according to Google Scholar. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, a recipient of the Early Researcher Award, Google Faculty Award, and Nvidia’s Pioneers of AI award.
Coming from Longwood? Sign up to take the shuttle.