Memory Magic and Command Makeover

Today brought some serious memory optimization wizardry with MLA absorption for GLM models - though it took a few tries to get the CUDA builds just right! Plus, the team made the CLI more intuitive by renaming `ollama config` to `ollama launch`, and we got some nice fixes for image generation support.

2026-01-24T11:07:21Z

Duration: PT3M58S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

Show: Ollama
Published: 2026-01-24T11:07:21Z
Audio duration: PT3M58S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Hey there, fellow developers! Welcome back to another episode of the Ollama podcast. I'm your host, and wow - what a day it's been in the codebase! Grab your favorite beverage because we've got some really exciting stuff to dive into.

So the big story today is all about memory optimization, and let me tell you - it's been quite the journey! Jeffrey Morgan has been working on something called MLA absorption for GLM models, which is essentially a way to compress the KV cache and use way less memory. Think of it like organizing your closet - instead…

Now here's where it gets interesting - this feature had quite the adventure getting merged. It went through what I like to call the "third time's the charm" dance. First it got merged, then it had to be reverted because of some CUDA build issues, and then Jeffrey came back with a fix and got it merged again. It's…

The technical bits are pretty fascinating if you're into the weeds. They're splitting combined KV_B tensors into separate K_B and V_B tensors, enabling this Multi-head Latent Attention compression. The tricky part was getting all the CUDA configurations just right across different GPU architectures. There were…

Moving on to user…

We…

Nearby episodes from Ollama

Smooth Onboarding for New Users 2026-02-02T11:01:39Z
Polish and Perfectionism - The Art of Getting the Details Right 2026-01-30T11:01:45Z
Cleaning Up the Config Game 2026-01-28T11:01:39Z
Speed Boost and Model Magic 2026-01-25T11:04:03Z
Making Ollama Play Nice with Everyone 2026-01-23T11:05:03Z
The Great Cleanup - Manifests Get Their Own Home 2026-01-22T11:03:05Z
New Model Architecture and Image Generation Fixes 2026-01-21T11:05:07Z
New Model Support and Memory Management Wins 2026-01-20T11:03:12Z