Ollama: Speed Revolution - MTP Decoding and Smart Caching
Three major performance-focused PRs landed today, headlined by Patrick Devine's massive MTP speculative decoding implementation for Gemma4 models and Parth Sareen's server-side caching system for model metadata. Daniel Hiltgen also continued the ongoing codebase cleanup with imagegen layout improvements.
Duration: PT4M15S
Episode overview
This episode is a short developer briefing from Ollama.
It explains recent repository work in plain language.
- Show: Ollama
- Published: 2026-05-06T10:00:58Z
- Audio duration: PT4M15S
Transcript excerpt
This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.
Hey there, fellow developers! Welcome back to another episode of the Ollama podcast. I'm your host, and wow, do we have an exciting day to talk about! If you're working with AI models and care about performance - and let's be honest, who doesn't - today's changes are going to make you smile.
We've got three fantastic pull requests that just merged, and the theme is crystal clear: speed, efficiency, and making your development experience smoother. So grab your favorite beverage and let's dive right in!
Our headline story today comes from Patrick Devine with what I can only describe as a performance game-changer. Patrick just merged PR 15980, which brings MTP - that's multi-token prediction - speculative decoding to Gemma4 models. Now, if you're not familiar with speculative decoding, think of it like this: instead…
But here's what makes Patrick's implementation so thorough - it's not just the core algorithm. He's added support for importing safetensors-based draft models with ollama create, introduced a new DRAFT command in Modelfiles, and even included a quantize-draft flag. Plus, there's proper cache support and changes to…
Speaking of performance improvements, Parth Sareen…
Here's…
Nearby episodes from Ollama
- MLX Threading and Claude Image Fixes
- Model Transfer Optimization and Test Reliability
- Claude Desktop Integration Removed
- Launch Command Enhancements
- Go 1.26 Runtime Update
- Weekly Recap - MLX Threading & Model Recommendations
- MLX Threading Fixes and Claude App Integration
- Model Recommendations and Windows Gateway Fix