MODAL
Powered by
High Performance Inference: Intro to Speculative Decoding

High Performance Inference: Intro to Speculative Decoding

MAR

11

Wednesday, March 11

4:00 PM - 5:00 PM

Register

When it comes to low-latency, high performance inference, speculative decoding is a tried and true approach for accelerating inference. 

But how can you take advantage of speculative decoding effectively?

Join us for a webinar with Shankha Biswas, who has worked with DFlash and EAGLE3 draft models on Modal's inference optimization team. Learn the basics of speculative decoding while understanding when draft models can help unlock major leaps in inference efficiency.

We'll dive into:

  • Inference performance with and without speculative decoding
  • Training draft models (EAGLE3 → custom speculators)
  • Lessons from Modal’s work with Decagon
  • Resources to get started
MAR

11

Wednesday, March 11

4:00 PM - 5:00 PM

Register