Ollama: Tool Calling Gets Smarter
Four significant pull requests merged today focusing on tool calling improvements and system reliability. Jeffrey Morgan and Parth Sareen led major enhancements to Qwen3 and GLM parsers for better tool calling behavior, while Eva H fixed a critical Windows startup crash and Patrick Devine added peak memory usage visibility for MLX models.
Duration: PT3M47S
https://podlog.io/listen/ollama-3aed006f/episode/ollama-tool-calling-gets-smarter-055b8d95
Transcript
Hey there, wonderful developers! Welcome back to another episode of the Ollama podcast. I'm so excited to share what's been happening in the codebase because today feels like one of those days where everything just clicks into place.
So grab your favorite beverage and let's dive into what the team accomplished yesterday. We had four fantastic pull requests merge, and honestly, there's a beautiful theme running through all of this work around making Ollama smarter and more reliable.
Let's start with the star of the show - Jeffrey Morgan's work on fixing Qwen3 tool calling in thinking mode. Now, if you've ever worked with language models that can use tools, you know how tricky the parsing can get, especially when the model is in that "thinking" state where it's reasoning through a problem. Jeffrey tackled a really nuanced issue here - aligning Ollama's Qwen parser behavior with what Transformers serve does. The key insight was allowing tool call parsing even while the model is still in thinking mode. It's like teaching the system to walk and chew gum at the same time, if you will. The technical implementation involved detecting those tool call tags before the thinking state ends, and handling cases where streaming might split tags across chunks. Plus, Jeffrey added comprehensive tests to make sure this all works reliably.
Building on that theme, Parth Sareen jumped in with some excellent work on stable tool call indexing for both GLM47 and Qwen3 parsers. This might sound like a small detail, but consistent indexing is absolutely crucial when you're dealing with multiple tool calls. Think about it - if your model wants to use three different tools in sequence, you need rock-solid indexing to keep track of which call is which. Parth's implementation spans across multiple parser files and includes robust test coverage. It's the kind of foundational work that makes everything else possible.
Now, switching gears to reliability, Eva H came through with a fix that Windows users are going to really appreciate. There was this nasty crash happening on startup when there was a pending update - you know, that dreaded nil pointer dereference that would just kill the app before it even got going. Eva's solution is elegant in its simplicity: add a nil guard to prevent crashes when the tray isn't initialized yet, and then re-check for updates once everything is properly set up. It's a perfect example of defensive programming that makes the user experience so much smoother.
And finally, Patrick Devine added something I think we'll all find really useful - peak memory usage visibility for MLX-based models. If you're running models on Apple Silicon, this is going to give you much better insight into your memory usage patterns. It's not just about current usage anymore; you can see those memory spikes and plan accordingly. The implementation touches several files in the MLX runner system, adding this visibility throughout the pipeline.
What I love about today's changes is how they all work together. You've got smarter tool calling, more reliable startup behavior, and better observability. It's like the team is building a more mature, production-ready system one thoughtful improvement at a time.
For today's focus, if you're working with tool-calling models, definitely test out these Qwen3 improvements. The thinking mode fixes could make a real difference in how your models handle complex reasoning tasks. And if you're on Apple Silicon, keep an eye on that new peak memory usage info - it might help you optimize your model loading strategies.
The test coverage across all these changes is really impressive too. It shows a team that cares about long-term maintainability, not just getting features out the door.
That's a wrap on today's episode! Keep building amazing things, and I'll catch you tomorrow with more updates from the Ollama universe. Until then, happy coding!