Ollama: Simplifying the Sampling Story

Patrick Devine merged a significant refactor that streamlines how Ollama's MLX runner handles text generation sampling. The change replaces a complex chain of sampling interfaces with a single, stateful sampler that's much easier to work with and maintain.

2026-03-08T10:03:36Z

Duration: PT4M5S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

Show: Ollama
Published: 2026-03-08T10:03:36Z
Audio duration: PT4M5S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Hey everyone, and welcome back to another episode of the Ollama podcast! I'm your host, and wow, do we have an interesting story about code simplification today. You know those moments when someone looks at a complex system and says "there's got to be a better way"? Well, that's exactly what happened yesterday, and…

Let's dive right into our main story. Patrick Devine just merged a fantastic refactor that tackles something called the sampling system in our MLX runner. Now, if you're not familiar with sampling in AI models, think of it like this - when an AI generates text, it doesn't just pick the most obvious next word every…

The old system used what's called a "chain of interfaces" - imagine having separate little workers, each handling one aspect of sampling. One worker handled TopP sampling, another handled TopK, another managed penalties for repetition, and so on. While this worked, it created a lot of complexity. You had to…

Patrick's solution is beautifully simple. Instead of all these separate interfaces, he collapsed everything into a single, stateful Sampler struct. Think of it like replacing a relay team with one really capable runner who can handle the whole race. This…

Th…

Nearby episodes from Ollama

Spring Cleaning and Performance Gains 2026-03-13T10:04:50Z
Thinking Streams and Local Tool Power-ups 2026-03-12T10:06:42Z
Stability First - Error Handling and Performance Fixes 2026-03-11T10:02:32Z
MLX Gets a Major Upgrade and Web Search Goes Live 2026-03-10T10:05:52Z
Cloud Models Get Smarter & Build Performance Boost 2026-03-07T11:18:50Z
Cloud Integrations Get Some Love 2026-03-06T11:04:22Z
Smarter Constraints and Qwen3.5 Boost 2026-03-05T11:04:48Z
Cloud Integration Drama and AI Model Expansion 2026-03-04T11:10:53Z