Back to researchers

Matan Kalman

Faster LLM inference via speculative decoding

Co-authored speculative decoding: a core trick for cutting latency while preserving output quality.

Highlights

InferenceSpeculative decodingServing
Focus: Faster LLM inference via speculative decoding
Why it matters: Co-authored speculative decoding: a core trick for cutting latency while preserving output quality.

Research Areas

InferenceSpeculative decodingServing
Matan Kalman - AI Researcher Profile | 500AI