Large Language Model Distributed Training Workshop

Name: Large Language Model Distributed Training Workshop
Start: 2024-10-18T10:00:00-04:00
End: 2024-10-18T13:00:00-04:00
Location: Kempner Large Conference Room (SEC 6.242)

Date: Friday, October 18, 2024 Time: 10:00am - 1:00pm

Location: Kempner Large Conference Room (SEC 6.242)

The Large Language Model Distributed Training workshop, part of the Workshops @ Kempner series, highlights various parallelization techniques for training large language models. We’ll cover techniques such as Distributed Data Parallelism (DDP), Model Parallelism (MP), Tensor Parallelism (TP), Pipeline Parallelism (PP), and Fully Sharded Data Parallelism (FSDP). In addition to reviewing the advantages of each technique and their use cases, this workshop will provide a few hands-on examples to help with understanding LLM distributed training approaches.

Date: Friday October 18th

Time: 10 am – 1 pm

Location: Kempner Large Conference Room (SEC 6.242)

Presenters: Yasin Mazloumi and Ella Batty

Who can attend this workshop? Open to the Kempner community. Harvard affiliates may join given availability.

What will attendees learn from this workshop?

Different parallelization techniques for LLM training using GPUs
Different GPU collective communication primitives
How to train a transformer in a distributed fashion using DDP and FSDP on GPUs

Prerequisite:

Familiarity with PyTorch framework and Python programming
Familiarity with LLMs
Familiarity with HPC cluster
Attending Intro to Distributed Computing will be helpful but not required
(Optional) Set up OLMo environment on the cluster

Registration:

Please register your interest as soon as possible here. Space is limited.

Contact Information:
For any questions about the workshop, please contact kempnereducation@harvard.edu