Loading Events

Towards Autonomous Language Model Systems

Ofir Press

Date: Thursday, April 3, 2025 Time: 2:30 - 3:30pm

Abstract: Language models (LMs) are increasingly used to assist users in day to day tasks such as programming (Github Copilot) or search (Google’s AI Overviews). But can we build language model systems that are able to autonomously complete entire tasks end-to-end? In this talk I’ll discuss our efforts to build autonomous LM systems, focusing on the software engineering domain. I’ll present SWE-bench, our novel method for measuring AI systems on their abilities to fix real issues in popular software libraries. I’ll then discuss SWE-agent, our system for solving SWE-bench tasks. SWE-bench and SWE-agent are used by many leading AI orgs in academia and industry including OpenAI, Anthropic, Meta, and Google, and SWE-bench has been downloaded over 2 million times. These projects show that academics on tight budgets are able to have substantial impact in steering the research community towards building autonomous systems that can complete challenging tasks.

 

Speaker Bio: Ofir Press is a postdoctoral fellow at Princeton University where he mainly works with Karthik Narasimhan’s lab. Ofir previously completed his PhD at the University of Washington in Seattle, where he was advised by Noah Smith. During his PhD, Ofir spent two years at Facebook AI Research Labs on Luke Zettlemoyer’s team.

 

View this event on the Harvard SEAS website.