Ollama: MLX Performance Breakthrough and Smarter Caching

The Ollama team delivered major MLX improvements with a massive update that brings 6.4x speed improvements through new CUDA kernels, plus smarter caching logic for transformer models. Daniel Hiltgen led the MLX update while Jesse Gross enhanced cache performance with better partial matching capabilities.

2026-03-24T10:04:16Z

Duration: PT4M8S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

Show: Ollama
Published: 2026-03-24T10:04:16Z
Audio duration: PT4M8S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Hey there, code crafters! Welcome back to another episode of the Ollama podcast. I'm your host, and wow, do we have some exciting performance stories to dive into today. Grab your favorite beverage because we're talking about some seriously impressive speed improvements that are going to make your day.

So picture this - you're running a model and suddenly it's over six times faster. Not 6 percent, not 60 percent, but 6.4 times faster! That's exactly what happened with the massive MLX update that Daniel Hiltgen just merged. This isn't just a small tweak - we're talking about a complete refresh of the MLX…

Here's the story behind this update. Daniel pulled in the latest MLX changes from March 16th, but the real magic happened when they added the CUDA Fast Gated Delta kernel. When they tested it on a Qwen 3.5 model with an RTX 5090, the prefill speed jumped from 529 tokens per second to over 3,300 tokens per second.…

But Daniel didn't stop at the performance improvements. They also cleaned up some technical debt that had been lurking in the codebase. You know how it goes - sometimes when you're moving fast, little vendoring bugs creep in. This update caught and fixed those issues,…

Now,…

…

Nearby episodes from Ollama

Fixing the Inconsistencies That Matter 2026-03-28T10:11:50Z
Smart Caching and Better User Experience 2026-03-27T10:11:09Z
VS Code Integration Takes Center Stage 2026-03-26T10:11:22Z
Precision Revolution - New Float Formats and Testing Powerhouse 2026-03-25T10:04:04Z
Nvidia Partnership Takes Center Stage 2026-03-21T10:02:43Z
Bug Squashing Bonanza 2026-03-20T10:03:36Z
The Caching Revolution 2026-03-19T10:04:52Z
Bug Squashing and Launch Improvements 2026-03-16T00:00:00Z