Loading Events
Event Categories Workshops @ Kempner

Large Language Model Distributed Inference Virtual Workshop

Naeem Khoshnevis, Lead ML Research Engineer and Pytorch Ambassador

Date: Thursday, March 26, 2026 Time: 9:30am - 12:30pm
Location: Virtual

This hands-on, virtual workshop covers highperformance LLM inference using vLLM. You will work with leading open-weight models like Qwen and LLaMA to gain practical, real-world experience hosting and running inference for large language models that exceed the memory capacity of a single GPU.

What will attendees learn from this workshop?

  • Basics of distributed computing for large language model inference
  • Using vLLM, a popular library for LLM inference, to host a server
  • Setting up, prompting, and extracting logits from large language models on an HPC cluster
  • Using offline batch inference with large language models

Prerequisites

  • FAS Research Computing account
  • Familiarity with Python
  • Familiarity with large language models
  • Familiarity with HPC cluster
  • Completion of pre-work

Who can attend?

  • Any Harvard-affiliates with an FASRC account, with priority given to Kempner community members.

Contact Information:
For any questions about the workshop, please contact kempnereducation@harvard.edu

Distributed Inference Workshop Flyer