Ollama

New Model Support and Memory Management Wins

Today brought some exciting developments with GLM-4.7-Flash model support landing alongside crucial memory management fixes for image generation models. Jeffrey Morgan led the charge with major architecture additions, while the team also tackled integration improvements and API consistency issues that'll make life easier for downstream developers.

Duration: PT4M7S

https://podlog.io/listen/ollama-3aed006f/episode/new-model-support-and-memory-management-wins-8fe62399

Transcript

Hey there, fellow developers! Welcome back to another episode of the Ollama podcast. I'm your host, and wow, do we have some fascinating changes to dive into today from January 20th. Grab your favorite beverage because we're about to explore some really cool model architecture work and some clever performance optimizations.

Let's start with the big story of the day - Jeffrey Morgan just merged support for a brand new model architecture called GLM-4.7-Flash. This is huge! We're talking about over 2,500 lines of code changes across 17 files to add the `Glm4MoeLiteForCausalLM` architecture. Now, if you're wondering what makes this interesting, Jeffrey noted something really smart in his description - this implementation shares a ton of overlap with DeepSeek V3, which opens up some exciting opportunities for code unification down the road. I love seeing these patterns emerge because it means the team is building a solid foundation that can support multiple model families efficiently.

The implementation includes new converters, model definitions, and parsers - it's like watching a well-orchestrated symphony of code organization. And here's a fun detail: Jeffrey mentioned they should probably rename the renderer and parser to "glm47" to keep things consistent. These kinds of naming considerations might seem small, but they're the difference between a codebase that's easy to navigate and one that becomes a maze over time.

Now, speaking of smart optimizations, Jeffrey also tackled a really clever performance issue around image generation models. Picture this scenario - you're trying to delete a model with `ollama rm`, but the system accidentally loads the entire model into memory just to immediately unload it. That's like opening your front door to tell someone to go away! The fix was elegantly simple: move the unload check before the image generation dispatch. It's one of those changes that makes you think "of course!" once you see it.

But wait, there's more to this story. The fix also caught a sneaky bug where deleting multiple models would only unload the first one. It was using `args[0]` instead of iterating through each `arg` properly. These are the kinds of bugs that can hide in plain sight until someone takes a careful look at the deletion flow.

Daniel Hiltgen contributed some solid testing infrastructure with new image generation test cases. Testing might not be the most glamorous part of development, but it's what gives us confidence that these complex model interactions actually work as expected. Plus, he fixed a regression in the tools test along the way - I love those two-for-one improvements!

And here's a change that really shows the team thinking about the developer experience beyond just Ollama itself. Devon Rifkin made a small but important fix to the `/api/show` endpoint. The issue was that missing `model_info` fields were causing problems for integrators, including an Android Studio integration. Instead of leaving the field undefined, the API now defaults to an empty `model_info` object. It's a perfect example of how a tiny change - literally just a few lines - can remove friction for everyone building on top of your platform.

What I find inspiring about today's changes is how they span the entire stack. We've got new model architecture support at the core level, memory management optimizations in the server logic, better testing coverage, and API improvements for external integrators. Each change makes the system more capable, more efficient, and more developer-friendly.

Today's Focus: If you're working with Ollama, now's a great time to test out that GLM-4.7-Flash support if it fits your use case. And if you've been building integrations, that `/api/show` consistency improvement should make your error handling a bit cleaner.

That's a wrap for today's episode! The Ollama project continues to evolve with thoughtful improvements at every level. Keep building, keep experimenting, and I'll catch you tomorrow with more updates from the world of local AI development. Until then, happy coding!