Speculative Decoding — Step by Step

Draft model proposes tokens ahead, full model verifies · EXD Ep5
Step 0
Pipeline
Tokens
Speculative decoding: a lightweight draft model guesses tokens ahead.
The full model verifies them all in one forward pass.
■ Green = correct guess (free speed) · ■ Red = wrong guess (wasted, struck out)
Space = next, = previous. Switch depth to compare.