High Performance LLM Inference in Production
FEB
11
Wednesday, February 11
5:00 PM - 6:00 PM
The era of actually open AI is here. We’ve spent the past year helping leading organizations deploy open models and inference engines in production at scale.
Hosted by Charles Frye, this live session will walk you through:
- The three types of LLM workloads: offline, online and semi-online.
- The challenges engineers face and our recommended solutions to control cost, latency, and throughput
- How you can implement those solutions on our cloud platform
Speaker
Charles Frye
GPU Enjoyer @ Modal
FEB
11
Wednesday, February 11
5:00 PM - 6:00 PM