Large Language Model Distributed Inference Workshop
The Large Language Model Distributed Inference workshop, part of the Workshops @ Kempner series, will provide hands-on training on hosting and running inference for large language models that exceed the memory capacity of a single GPU. Participants will work with the vLLM library to host large Llama models, including those with 70 billion and 405 billion parameters, on an HPC cluster. The workshop will cover prompting these models, extracting logits, and parallelizing prompts to maximize efficiency of GPU resources.
Date: Tuesday December 17th
Time: 12:30- 3 pm
Location: Kempner Large Conference Room (SEC 6.242)
Presenters: Tim Ngotiaoco and Ella Batty
Who can attend this workshop?
Any Harvard-affiliated students, postdocs and faculty, with priority given to Kempner community members.
What will attendees learn from this workshop?
- Basics of distributed computing for large language model inference
- Using vLLM, a popular library for LLM inference, to host a server
- Setting up, prompting, and extracting logits from large Llama models on an HPC cluster
- Using offline batch inference with large Llama models
Prerequisite:
- Familiarity with Python programming
- Familiarity with large language models
- Familiarity with HPC cluster
- Access to FASRC cluster
Registration:
Please register your interest as soon as possible here. Space is limited.
Contact Information:
For any questions about the workshop, please contact kempnereducation@harvard.edu