Back to researchers
Yaniv Leviathan
Faster LLM inference via speculative decoding
Co-authored speculative decoding: a core trick for cutting latency while preserving output quality.
Highlights
InferenceSpeculative decodingServing
Focus: Faster LLM inference via speculative decoding
Why it matters: Co-authored speculative decoding: a core trick for cutting latency while preserving output quality.
Research Areas
InferenceSpeculative decodingServing