Event Categories
Workshops @ Kempner
Large Language Model Distributed Inference Virtual Workshop
Naeem Khoshnevis, Lead ML Research Engineer and Pytorch Ambassador
This hands-on, virtual workshop covers highperformance LLM inference using vLLM. You will work with leading open-weight models like Qwen and LLaMA to gain practical, real-world experience hosting and running inference for large language models that exceed the memory capacity of a single GPU.
What will attendees learn from this workshop?
- Basics of distributed computing for large language model inference
- Using vLLM, a popular library for LLM inference, to host a server
- Setting up, prompting, and extracting logits from large language models on an HPC cluster
- Using offline batch inference with large language models
Prerequisites
- FAS Research Computing account
- Familiarity with Python
- Familiarity with large language models
- Familiarity with HPC cluster
- Completion of pre-work
Who can attend?
- Any Harvard-affiliates with an FASRC account, with priority given to Kempner community members.
Contact Information:
For any questions about the workshop, please contact kempnereducation@harvard.edu
