Ollama: Performance Lessons and Gemma4 Refinements

The Ollama team tackled critical Gemma4 performance issues, with a fascinating story of enabling flash attention only to revert it due to a 60% performance regression. Major improvements included reworking tool call handling with cleaner code and fixing ROCm build issues for better GPU compatibility.

2026-04-04T10:00:33Z

Duration: PT3M53S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

Show: Ollama
Published: 2026-04-04T10:00:33Z
Audio duration: PT3M53S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Hey there, developers! Welcome back to another episode of the Ollama podcast. I'm so excited to catch up with you today because we've got a really interesting story from yesterday's development work - one of those classic tales that reminds us why thorough performance testing is absolutely crucial in our field.

Let's dive right into the main event, because this is honestly fascinating. The team has been working hard on Gemma4 improvements, and there's this perfect example of how software development really works in practice. Daniel Hiltgen submitted a pull request to enable flash attention for Gemma4 - which sounds great,…

But here's where it gets interesting. Sometimes in software development, what looks like a good idea on paper doesn't work out in practice. The team ran their performance benchmarks and discovered that enabling flash attention actually caused a massive 60% performance regression for Gemma4 prefill operations. That's…

Speaking of Gemma4 improvements, Devon Rifkin made some really excellent changes to the tool call handling. This is one of those refactoring wins that I absolutely love to see. Devon replaced a custom argument normalizer with what they call a…

Jesse…

And…

Nearby episodes from Ollama

Gemma4 Parser Improvements 2026-04-11T00:00:00Z
Model Updates and Tool Call Fixes 2026-04-10T00:00:00Z
Error Handling and Modelfile Fixes 2026-04-09T00:00:00Z
Weekly Recap - Gemma4 Integration & Audio Support 2026-04-06T00:00:00Z
Gemma4 Arrives with Audio Magic 2026-04-03T10:00:29Z
Modernizing Codex Configuration 2026-04-02T10:00:33Z
Tokenizer Love and Better Model Support 2026-04-01T10:00:33Z
Legacy Compatibility and Developer Experience Wins 2026-03-30T10:00:58Z