Back to researchers

Yaniv Leviathan

Faster LLM inference via speculative decoding

Co-authored speculative decoding: a core trick for cutting latency while preserving output quality.

Highlights

InferenceSpeculative decodingServing
Focus: Faster LLM inference via speculative decoding
Why it matters: Co-authored speculative decoding: a core trick for cutting latency while preserving output quality.

Research Areas

InferenceSpeculative decodingServing
Yaniv Leviathan - AI Researcher Profile | 500AI