Back to researchers
Zhuohan Li
Fast, cheap LLM serving (PagedAttention)
Co-authored vLLM: a widely used serving stack for efficient LLM inference.
Highlights
vLLMServingSystems
Focus: Fast, cheap LLM serving (PagedAttention)
Why it matters: Co-authored vLLM: a widely used serving stack for efficient LLM inference.
Research Areas
vLLMServingSystems