Modal

The era of actually open AI is here. We’ve spent the past year helping leading organizations deploy open models and inference engines in production at scale.

Hosted by [Charles Frye](https://x.com/charles_irl), this live session will walk you through:

1.  The three types of LLM workloads: offline, online and semi-online.
2.  The challenges engineers face and our recommended solutions to control cost, latency, and throughput
3.  How you can implement those solutions on [our cloud platform](https://modal.com/)


High Performance LLM Inference in Production

Speaker

Charles Frye

GPU Enjoyer @ Modal