Self-Improving AI: From Test-Time Scaling to Local Intelligence
Azalia Mirhoseini
Join us for a talk by Azalia Mirhoseini, Assistant CS Prof at Stanford University / Founder at Ricursive Intelligence. This talk is part of the Kempner Seminar Series, a research-level seminar series that covers topics related to the basis of intelligence in natural and artificial systems.
Pre-training scaling has dominated AI progress in recent years, but we’ve entered a new era where models are no longer just frozen entities; they are active sources of intelligence that continually think, reason, and learn. In this talk, I present test-time scaling as a new frontier for AI progress, one that enables models to iteratively improve through two compounding forces: expanding compute at inference and training on model-generated experiences.
Drawing on our recent work, including Language Monkeys, KernelBench, and SWiRL, I’ll demonstrate why this loop represents one of the most promising levers for advancing AI. However, scale alone isn’t the whole story. As we build and deploy AI systems, we must answer an equally important question: how efficiently can AI actually scale? Through our Intelligence Per Watt framework (task accuracy per unit of power), we show that local models already handle the vast majority of real-world queries and that local efficiency is compounding rapidly via better architectures and hardware. This leads to OpenJarvis, our open-source framework for personal AI agents that run entirely on-device, built on composable primitives and a continual self-improvement loop. Together, these threads point toward a world where self-improving AI operates at both greater scale and greater efficiency.
Azalia Mirhoseini is Founder of Ricursive Intelligence, a frontier lab dedicated to recursive self-improvement through AI that designs the chips that fuel it. She is also an Assistant Professor of Computer Science at Stanford University where she directs Scaling Intelligence, a lab focused on developing scalable and self-improving AI systems and methodologies toward the goal of artificial general intelligence. Previously, she spent several years in industry AI labs, including Google Brain, Anthropic, and Google DeepMind, working on the development of Claude and Gemini. Her past work includes Mixture-of-Experts (MoE) neural architectures, now predominantly used in leading generative AI models; AlphaChip, a pioneering work on deep reinforcement learning for layout optimization used in the design of advanced chips like Google AI accelerators (TPUs) and data center CPUs; as well as pioneering research on LLM Test-Time Scaling. Her work has been recognized through the Okawa Research Grant, the Google ML and Systems Junior Faculty Award, MIT Technology Review’s 35 Under 35 Award, the Best ECE Thesis Award at Rice University, publications in flagship venues such as Nature, and coverage by various media outlets, including WSJ, NYT, Forbes, MIT Technology Review, IEEE Spectrum, WIRED, and TechCrunch.
