Large Language Model Distributed Inference Workshop

Name: Large Language Model Distributed Inference Workshop
Start: 2024-12-17T12:30:00-05:00
End: 2024-12-17T15:00:00-05:00
Location: Kempner Large Conference Room (SEC 6.242)

Date: Tuesday, December 17, 2024 Time: 12:30 - 3:00pm

Location: Kempner Large Conference Room (SEC 6.242)

The Large Language Model Distributed Inference workshop, part of the Workshops @ Kempner series, will provide hands-on training on hosting and running inference for large language models that exceed the memory capacity of a single GPU. Participants will work with the vLLM library to host large Llama models, including those with 70 billion and 405 billion parameters, on an HPC cluster. The workshop will cover prompting these models, extracting logits, and parallelizing prompts to maximize efficiency of GPU resources.

Date: Tuesday December 17th

Time: 12:30- 3 pm

Location: Kempner Large Conference Room (SEC 6.242)

Presenters: Tim Ngotiaoco and Ella Batty

Who can attend this workshop?
Any Harvard-affiliated students, postdocs and faculty, with priority given to Kempner community members.

What will attendees learn from this workshop?

Basics of distributed computing for large language model inference
Using vLLM, a popular library for LLM inference, to host a server
Setting up, prompting, and extracting logits from large Llama models on an HPC cluster
Using offline batch inference with large Llama models

Prerequisite:

Familiarity with Python programming
Familiarity with large language models
Familiarity with HPC cluster
Access to FASRC cluster

Registration:

Please register your interest as soon as possible here. Space is limited.

Contact Information:
For any questions about the workshop, please contact kempnereducation@harvard.edu