Ollama: Fixing the Inconsistencies That Matter
Today we're diving into 7 merged PRs and 9 commits that tackle some really important quality-of-life issues in Ollama. The team fixed false "out of date" model warnings that were bugging users, improved tool calling reliability for Anthropic models, and enhanced Qwen 3.5's streaming capabilities. Special shoutouts to Bruce MacDonald for solving those pesky manifest comparison issues and the whole team for making MLX integration smoother.
Duration: PT4M15S
Transcript
Hey there, fellow developers! Welcome back to another episode of the Ollama podcast. I'm your host, and wow, do we have a treat for you today. It's March 28th, 2026, and the Ollama team has been absolutely crushing it with some really thoughtful fixes that are going to make your development experience so much smoother.
You know those annoying bugs that don't break everything but just make you go "ugh" every time you encounter them? Well, today's episode is all about squashing those irritating inconsistencies, and I'm genuinely excited to walk through what the team accomplished.
Let's jump right into the main story. Bruce MacDonald had a fantastic day yesterday, landing two PRs that solve a problem I bet many of you have encountered - those false "out of date" model warnings. You know the ones, right? You pull a model, and then Ollama immediately tells you it's outdated when you know it's fresh. Super frustrating!
Here's what was happening under the hood, and it's actually a really interesting technical story. When Ollama pulled model manifests from the registry, it would unmarshal the JSON into Go structs and then re-marshal them before writing to disk. But here's the kicker - Go doesn't guarantee the same JSON formatting or field ordering as the original registry response. So the SHA256 hashes would never match, triggering false staleness warnings.
Bruce's solution is elegant - instead of re-serializing, just preserve the raw bytes from the registry and write them directly to disk. Boom! Byte-for-byte identical files, and those annoying warnings disappear. He also moved the staleness comparison server-side, which gives much more reliable access to file modification times.
Speaking of reliability, Jesse Gross tackled a really subtle issue with Anthropic models and KV cache reuse. The problem was with tool call argument reordering - you know how Go maps don't guarantee key order? Well, that was degrading performance because the cache couldn't recognize similar requests. Jesse switched to using typed structs instead of maps, and now your tool calls will be much more consistent and performant.
The Qwen 3.5 model got some love too, with Jeffrey Morgan adding regression tests for streaming tool-call parsing, and Alfredo Matas fixing an issue where the model would start tool call blocks without properly closing think blocks. These might sound like small details, but they make a huge difference in how smoothly your AI interactions flow.
Patrick Devine improved MLX integration by fixing vision capabilities and minimum version handling, while Parth Sareen made sure MLX models skip unnecessary context length warnings - because when context length gets allocated automatically, why should you worry about it?
And let's not forget Devon Rifkin's fix for empty inputs in Anthropic content blocks. Sometimes it's the smallest details that make the biggest difference in maintaining API compatibility.
What I love about today's changes is that they're all about developer experience. None of these are flashy new features, but every single one removes friction from your workflow. The team added comprehensive tests too - I counted over 200 lines of new test coverage across multiple PRs. That's the kind of thoroughness that gives me confidence in a codebase.
Daniel Hiltgen also hardened the CUDA include path handling for Windows users, which shows the team is thinking about developers across all platforms. These infrastructure improvements might not be visible day-to-day, but they prevent those "works on my machine" moments we all dread.
For today's focus, if you've been experiencing any of those "out of date" model warnings, definitely update to get Bruce's fixes. And if you're working with tool calls on Anthropic models or Qwen 3.5, you should see noticeably more consistent behavior.
The testing improvements also mean this is a great time to contribute back to the project if you've been thinking about it. The team has set a really high bar for test coverage, which makes the codebase more welcoming for new contributors.
That's a wrap on today's episode! Seven merged PRs, comprehensive testing, and a bunch of those "finally!" moments we all love. Keep building amazing things, and I'll catch you in the next episode. Until then, happy coding!