Ollama: MLX Performance Breakthrough and Anthropic Search
The Ollama team delivered some impressive performance wins today with a major MLX runner overhaul that boosts GLM 4.7 Flash performance by 150%, plus enabling web search for Anthropic APIs. Patrick Devine led the MLX improvements while Parth Sareen added the Anthropic web search feature and fixed a PowerShell search bug.
Duration: PT3M50S
Transcript
Hey there, developer friends! Welcome back to another episode of the Ollama podcast. I'm your host, and wow, do we have some exciting updates to dive into today - February 14th, 2026. Happy Valentine's Day, by the way! And speaking of love, you're going to love the performance improvements the team has been cooking up.
Let's jump right into the big story of the day - we've got three fantastic pull requests that just landed, and they're all about making your experience smoother and faster.
First up, Patrick Devine has been working some serious magic with the MLX runner, and the results are absolutely stunning. This is one of those PRs that makes you sit up and pay attention - we're talking about a 150 percent performance improvement for the GLM 4.7 Flash model. That's not a typo, folks - one hundred and fifty percent faster!
What Patrick did was dive deep into how the MLX runner handles the Safetensors-based GLM 4.7 Flash model. The key breakthrough was fixing how scalar data types were being handled. You know how sometimes the smallest details can have the biggest impact? This is a perfect example. By getting those data type operations just right, the whole evaluation process became dramatically more efficient.
But that's not all Patrick tackled in this monster pull request - and I do mean monster, we're looking at over 750 lines added across 19 files. He also restored compatibility with some older experimental models that had stopped working, including the flux2-klein and z-image-turbo models. Plus, there's now a hidden flag that gives you more control over which runner handles your GLM 4.7 Flash model. It's like having a secret performance switch in your toolkit.
Now, while Patrick was supercharging the MLX runner, Parth Sareen was busy expanding Ollama's capabilities in a completely different direction. Parth implemented web search functionality for the Anthropic APIs, which is huge for anyone building applications that need real-time information. This opens up so many possibilities for your projects - imagine your AI models being able to pull in fresh data from the web during conversations. The implementation is solid too, with comprehensive test coverage that shows Parth really thought through all the edge cases.
And because no good deed goes unpunished in software development, Parth also had to tackle a PowerShell-specific bug in the TUI selector. You know how it is - sometimes the smallest platform differences can cause the most annoying issues. In this case, PowerShell was receiving runes instead of selections, which was breaking the search functionality. It's one of those fixes that might seem minor, but it makes a world of difference for Windows developers using PowerShell as their primary terminal.
What I love about today's updates is how they showcase the different dimensions of improvement happening in Ollama. We've got raw performance gains, new feature capabilities, and platform compatibility fixes - all landing on the same day. It's like watching a well-orchestrated symphony where different instruments come together to create something beautiful.
The MLX performance improvements are particularly exciting because they show the team is really digging into the fundamentals and finding those hidden optimization opportunities. A 150 percent speed boost doesn't just happen - it comes from understanding your code at a deep level and being willing to get your hands dirty with the details.
So here's today's focus for all you Ollama enthusiasts out there: if you're working with GLM 4.7 Flash models, definitely pull down this latest update and feel that performance difference for yourself. And if you've been wanting to build applications that combine AI with real-time web data, now's the time to explore those Anthropic web search capabilities.
Keep building amazing things, keep pushing the boundaries, and remember - every performance improvement, every new feature, and every bug fix is someone's day getting a little bit better. Until next time, happy coding!