Draft model proposes tokens ahead, full model verifies · EXD Ep5
Step 0
Pipeline
Tokens
Speculative decoding: a lightweight draft model guesses tokens ahead.
The full model verifies them all in one forward pass. ■ Green = correct guess (free speed) ·
■ Red = wrong guess (wasted, struck out) Space = next, ← = previous. Switch depth to compare.