Ollama: Cloud Integration Drama and AI Model Expansion
The Ollama team had quite the eventful day with a major cloud integration feature getting merged, reverted, reapplied, and reverted again - showing how careful they are about releases. Meanwhile, they expanded AI model support with Qwen 3.5 Next MoE integration and fixed critical image processing bugs for the GLM-OCR model.
Duration: PT4M12S
Transcript
Hey there, amazing developers! Welcome back to another episode of the Ollama podcast. I'm your host, and wow, do we have a story for you today from March 4th, 2026. Grab your coffee because this one's got more plot twists than a Netflix series!
So here's what happened - the team had seven pull requests merged and eight additional commits, but the real drama was around one particular feature that tells us so much about how thoughtful development works in practice.
Let me paint you the picture. Devon Rifkin submitted this massive pull request - we're talking 2,843 lines of code changes across 23 files - to eliminate the need for pulling stub models when working with cloud integrations. Pretty cool feature, right? The idea was that instead of having to first download a placeholder model that just points to a cloud service, you could directly use cloud models through the API. Much cleaner user experience.
But here's where it gets interesting. Jeffrey Morgan, one of the maintainers, merged it, then reverted it, then reapplied it, then reverted it again! Now before you think this is chaos, this is actually beautiful engineering discipline in action. They needed to get a critical bug fix out the door - specifically Victor's fix for the GLM-OCR image processing issue - and they didn't want to risk any complications from the large cloud integration changes.
Speaking of Victor's fix, this was solving a real problem that users were experiencing since version 0.17.1. The GLM-OCR model was returning empty markdown when processing images because it wasn't getting the proper image placeholder tags. Victor implemented a clean solution by injecting indexed image tags into the user content, and boom - OCR is working properly again. Sometimes the most important fixes are the ones that seem simple but restore functionality that people depend on every day.
Now, the star of the technical show today has to be Patrick Devine's work on Qwen 3.5 Next MoE support. This is a beast of an implementation - over 2,400 lines added across 14 files. Patrick didn't just add basic model support; he built out recurrent cache support, introduced a hybrid cache type for mixed cache storage, added a Gated Delta Metal kernel for fast inference, and implemented new MLX operations like Conv1d and DepthwiseConv1d. This is the kind of foundational work that makes advanced AI models actually usable in production.
What I love about Patrick's approach is how comprehensive it is. He didn't just make the models load - he thought about performance, memory management, and the entire pipeline integration. The fact that he included proper quantization for stacked expert tensors shows he's thinking about real-world deployment scenarios where memory efficiency matters.
The team also did some important housekeeping, removing old image generation LLM models that have been superseded by the MLX runner implementation. Jeffrey added proper renderer and parser support for Qwen 3.5 models too, making sure the user experience is smooth end-to-end.
You know what this whole episode teaches us? Sometimes the most professional thing you can do is hit the brakes. The cloud integration feature will come back - it's clearly ready from a technical standpoint. But prioritizing a user-facing bug fix and being willing to temporarily step back from a big feature? That's the kind of judgment that separates good teams from great ones.
Today's focus for anyone following along: if you're working with GLM-OCR models, definitely update to get Victor's fix. If you're interested in the Qwen 3.5 models, now's a great time to experiment with Patrick's new MoE support. And for all of us building software, remember that sometimes the bravest thing you can do is revert a change to keep your users happy while you perfect the experience.
That's a wrap on today's Ollama adventure! Keep coding, keep learning, and remember - even the reversals are progress. Catch you tomorrow!