Large Language Model Distributed Inference Virtual Workshop

Name: Large Language Model Distributed Inference Virtual Workshop
Start: 2026-03-26T09:30:00-04:00
End: 2026-03-26T12:30:00-04:00
Location: Virtual

Naeem Khoshnevis, Lead ML Research Engineer and Pytorch Ambassador

Date: Thursday, March 26, 2026 Time: 9:30am - 12:30pm

Location: Virtual

This hands-on, virtual workshop covers highperformance LLM inference using vLLM. You will work with leading open-weight models like Qwen and LLaMA to gain practical, real-world experience hosting and running inference for large language models that exceed the memory capacity of a single GPU.

What will attendees learn from this workshop?

Basics of distributed computing for large language model inference
Using vLLM, a popular library for LLM inference, to host a server
Setting up, prompting, and extracting logits from large language models on an HPC cluster
Using offline batch inference with large language models

Prerequisites

FAS Research Computing account
Familiarity with Python
Familiarity with large language models
Familiarity with HPC cluster
Completion of pre-work

Who can attend?

Any Harvard-affiliates with an FASRC account, with priority given to Kempner community members.

Contact Information:
For any questions about the workshop, please contact kempnereducation@harvard.edu

Distributed Inference Workshop Flyer