Ollama: The Caching Revolution

Jesse Gross delivered a massive performance breakthrough with smart KV cache sharing across conversations, while Bruce MacDonald polished the user experience with multiple fixes for model selection and headless systems. The team also updated references from minimax-m2.5 to m2.7 across the codebase.

2026-03-19T10:04:52Z

Duration: PT4M9S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

Show: Ollama
Published: 2026-03-19T10:04:52Z
Audio duration: PT4M9S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Hey there, code adventurers! Welcome back to another episode of the Ollama podcast. I'm your host, and wow, do we have some exciting developments to dive into today. Grab your favorite beverage because we're talking about some serious performance magic that just landed in the codebase.

Let's jump right into the star of the show - Jesse Gross just merged what I can only describe as a caching masterpiece. We're talking about PR 14887, which enables KV cache sharing across conversations with common prefixes. Now, if you're thinking "what does that actually mean for me?" - here's the beautiful part.…

What Jesse built is essentially a smart memory system using something called a prefix trie. Think of it like a family tree for your conversations. When conversations share the same beginning - like that system prompt we mentioned - the system now says "hey, I already computed this part, let me just reuse it and only…

This isn't a small change either - we're looking at over 2,700 lines of additions across 12 files, including a whole new trie data structure and comprehensive test coverage. Jesse didn't just write the code; they built it right, with 859 lines of tests in the cache test file…

But…

Nearby episodes from Ollama

Precision Revolution - New Float Formats and Testing Powerhouse 2026-03-25T10:04:04Z
MLX Performance Breakthrough and Smarter Caching 2026-03-24T10:04:16Z
Nvidia Partnership Takes Center Stage 2026-03-21T10:02:43Z
Bug Squashing Bonanza 2026-03-20T10:03:36Z
Bug Squashing and Launch Improvements 2026-03-16T00:00:00Z
Launch Command Gets a Major Polish 2026-03-14T10:11:48Z
Spring Cleaning and Performance Gains 2026-03-13T10:04:50Z
Thinking Streams and Local Tool Power-ups 2026-03-12T10:06:42Z