Siyuan Zhuang

Fast, cheap LLM serving (PagedAttention)

Co-authored vLLM: a widely used serving stack for efficient LLM inference.

Highlights

vLLMServingSystems

Focus: Fast, cheap LLM serving (PagedAttention)

Why it matters: Co-authored vLLM: a widely used serving stack for efficient LLM inference.

Start here

vLLMServingSystems

Siyuan Zhuang - AI Researcher Profile | 500AI