Back to researchers

Joseph E. Gonzalez

Fast, cheap LLM serving (PagedAttention)

Co-authored vLLM: a widely used serving stack for efficient LLM inference.

Highlights

vLLMServingSystems
Focus: Fast, cheap LLM serving (PagedAttention)
Why it matters: Co-authored vLLM: a widely used serving stack for efficient LLM inference.

Research Areas

vLLMServingSystems
Joseph E. Gonzalez - AI Researcher Profile | 500AI