Back to researchers
Ion Stoica
Fast, cheap LLM serving (PagedAttention)
Co-authored vLLM: a widely used serving stack for efficient LLM inference.
Highlights
vLLMServingSystems
Focus: Fast, cheap LLM serving (PagedAttention)
Why it matters: Co-authored vLLM: a widely used serving stack for efficient LLM inference.
Research Areas
vLLMServingSystems