Ollama: Weekly Recap - Infrastructure Modernization
Ollama completed a major architectural shift this week, removing CGO engines and standardizing on llama-server for all GGUF models. The team also addressed compatibility issues for newer model formats including Gemma 4.
Duration: PT2M19S
Transcript
Good morning. This is your Ollama weekly recap for May 25th through June 1st, 2026.
4 PRs merged, 4 additional commits this week.
This week marked a significant infrastructure milestone for Ollama with the completion of a major architectural modernization that will accelerate the project's ability to adopt new capabilities from upstream llama.cpp.
The headline change came through PR 16031, which removed the entire CGO-based inference engine in favor of using llama-server exclusively for GGUF-based models. This represents months of engineering work to eliminate the vendored GGML and llama.cpp backends, the CGO runner, and Go-based model implementations. The new architecture keeps safetensor-based models running on the MLX engine while standardizing everything else on llama-server built from upstream llama.cpp.
For developers, this change means faster access to new llama.cpp features and fixes, but it does require recent AMD driver versions supporting ROCm version 7 on Windows systems. The architectural shift also brought significant build system improvements, with better developer experience through revised CMake configuration and enhanced GPU discovery capabilities.
Model compatibility received focused attention this week. PR 16367 added proper handling for Gemma 4 and LFM2 models' beginning-of-sequence token overrides in the llama server. Meanwhile, PR 16362 delivered improvements to the laguna parser and renderer system. These changes ensure newer model formats work correctly within the updated infrastructure.
The week also included essential maintenance work. PR 16355 fixed a development mode search path issue introduced during the llama-server transition, where MLX library resolution wasn't updated to match the new superbuild structure.
Looking ahead, developers can expect more rapid integration of upstream llama.cpp improvements thanks to the simplified architecture. The infrastructure changes also position Ollama to better handle the growing variety of model formats being released by the community.
That's your Ollama recap for this week.