Large Language Model Distributed Training Workshop
The Large Language Model Distributed Training workshop, part of the Workshops @ Kempner series, highlights various parallelization techniques for training large language models. We’ll cover techniques such as Distributed Data Parallelism (DDP), Model Parallelism (MP), Tensor Parallelism (TP), Pipeline Parallelism (PP), and Fully Sharded Data Parallelism (FSDP). In addition to reviewing the advantages of each technique and their use cases, this workshop will provide a few hands-on examples to help with understanding LLM distributed training approaches.
Date: Friday October 18th
Time: 10 am – 1 pm
Location: Kempner Large Conference Room (SEC 6.242)
Presenters: Yasin Mazloumi and Ella Batty
Who can attend this workshop? Open to the Kempner community. Harvard affiliates may join given availability.
What will attendees learn from this workshop?
- Different parallelization techniques for LLM training using GPUs
- Different GPU collective communication primitives
- How to train a transformer in a distributed fashion using DDP and FSDP on GPUs
Prerequisite:
- Familiarity with PyTorch framework and Python programming
- Familiarity with LLMs
- Familiarity with HPC cluster
- Attending Intro to Distributed Computing will be helpful but not required
- (Optional) Set up OLMo environment on the cluster
Registration:
Please register your interest as soon as possible here. Space is limited.
Contact Information:
For any questions about the workshop, please contact kempnereducation@harvard.edu