Modal

Coding agents actually work. At Ramp, over half of merged PRs were written by their background AI agent.

Hosted by [Rebecka Storm](https://www.linkedin.com/in/rebeckastorm/) and [Paul Butler](https://www.linkedin.com/in/paulgb/), this live session will dive into how AI agents code at scale. We’ll walk through:

1.  How customers like Ramp and Lovable use Modal sandboxes today
2.  Security, networking and scaffolding considerations for your coding agent
3.  Deploying Modal sandboxes at scale


Inside Modal Sandboxes: How agents code at scale

The era of actually open AI is here. We’ve spent the past year helping leading organizations deploy open models and inference engines in production at scale.

Hosted by [Charles Frye](https://x.com/charles_irl), this live session will walk you through:

1.  The three types of LLM workloads: offline, online and semi-online.
2.  The challenges engineers face and our recommended solutions to control cost, latency, and throughput
3.  How you can implement those solutions on [our cloud platform](https://modal.com/)


High Performance LLM Inference in Production

When it comes to low-latency, high performance inference, speculative decoding is a tried and true approach for accelerating inference. 

But how can you take advantage of speculative decoding effectively?

Join us for a webinar with [Shankha Biswas](https://www.linkedin.com/in/amartyashankha/), who has worked with DFlash and EAGLE3 draft models on Modal's inference optimization team. Learn the basics of speculative decoding while understanding when draft models can help unlock major leaps in inference efficiency.

We'll dive into:

*   Inference performance with and without speculative decoding
*   Training draft models (EAGLE3 → custom speculators)
*   Lessons from Modal’s work with Decagon
*   Resources to get started


High Performance Inference: Intro to Speculative Decoding

Replays

Inside Modal Sandboxes: How agents code at scale

High Performance LLM Inference in Production