Andrew Gu

Fully Sharded Data Parallel training (FSDP)

Co-authored PyTorch FSDP: practical lessons for scaling fully-sharded training workloads.

Highlights

SystemsTrainingPyTorchScaling

Focus: Fully Sharded Data Parallel training (FSDP)

Why it matters: Co-authored PyTorch FSDP: practical lessons for scaling fully-sharded training workloads.

Start here

SystemsTrainingPyTorchScaling

Andrew Gu - AI Researcher Profile | 500AI