Smart model routing: reduce token spend without sacrificing code quality
Augment Code
42:03
Engineering leaders aren’t struggling to get access to powerful AI models anymore. Instead, they’re struggling to keep costs under control while maintaining the quality their teams expect.
In this session, we’re introducing Prism, our new intelligent model router that lives directly in the Augment model picker. Prism routes each turn to the model that best fits the work, across Claude, GPT, Gemini, and Kimi, so teams can keep frontier-level quality while reducing token spend.
Augment Code already helps developers ship production-grade code faster on large, complex codebases through our leading agent harness and Context Engine. With Prism, those same workflows become materially more cost-efficient.
Under the hood, Prism takes advantage of Augment’s model-agnostic platform. Instead of committing an entire session to a single frontier model, Prism evaluates each turn in real time and decides whether to stay on the current model or switch to a more efficient one, only when the expected win outweighs cache and routing costs.
For engineering and platform teams, that means you can stay within your preferred model family while letting Prism optimize for both quality and efficiency behind the scenes.
In this webinar, we’ll cover:
- An introduction to Prism: How model routing works inside Augment’s agent harness, and how it differs from simply “picking a cheaper model.”
- Cost and token efficiency in the real world: How enterprises and fast-growing teams are using Prism to reduce AI token usage by 20–70% without sacrificing quality.
- Routing across multiple frontier models: Learn about the two variants of Prism so that you can pick your favorite frontier model
- Prism for enterprise and commercial tenants: Deployment patterns, where Prism is available today (IDE, CLI, web), and how to roll it out alongside existing CBP / Token+ strategies.
If you’re a CTO, VP of Engineering, or senior engineering leader under pressure to reduce AI costs without slowing delivery, this session will give you a concrete blueprint for using model routing that is optimized for cost efficiency while not sacrificing quality, and model-provider agnostic to make your AI investments sustainably affordable.
Speakers
Robbert Kaufman
Solutions Architect
John Mu
Technical Staff
Smart model routing: reduce token spend without sacrificing code quality
42:03