Mostofa Patwary

Model-parallel training at scale (Megatron-LM)

Co-authored Megatron-LM: a core reference for scaling transformer training via model parallelism.

Highlights

SystemsTrainingScaling

Focus: Model-parallel training at scale (Megatron-LM)

Why it matters: Co-authored Megatron-LM: a core reference for scaling transformer training via model parallelism.

Start here

SystemsTrainingScaling

Mostofa Patwary - AI Researcher Profile | 500AI