Ollama

Ollama: Major Architecture Overhaul Removes CGO Dependencies

Ollama has completed a massive refactoring, removing CGO engines and switching exclusively to llama-server for GGML models while fixing MLX development paths. The changes span over 1,100 files and streamline the inference architecture.

Duration: PT1M55S

https://podlog.io/listen/ollama-3aed006f/episode/ollama-major-architecture-overhaul-removes-cgo-dependencies-cfffb02e

Transcript

Good morning, I'm your host with the Ollama developer briefing for Thursday, May 30th, 2026.

Daniel Hiltgen merged a massive architectural overhaul that removes CGO engines and adopts llama-server exclusively for GGML models. This change eliminates the vendored GGML and llama.cpp backend, the CGO runner, and Go model implementations. The refactoring spans 1,100 files with over 28,000 additions and 430,000 deletions. Safetensor-based models continue running on the MLX engine, while this change enables faster adoption of upstream llama.cpp capabilities and fixes. The update requires recent AMD driver versions supporting ROCm version 7 on Windows systems.

Hiltgen also merged a smaller fix addressing MLX development mode search paths that were broken during the llama-server transition. This update corrects library resolution code to match the new superbuild structure.

The architectural changes include significant improvements to GPU discovery, moving away from parsing llama-server output to using dynamic library loading for more reliable hardware detection. The update also introduces better batch sizing for performance optimization and enhanced Vulkan support with Windows integrated GPU detection.

Notable technical additions include compatibility layers for Ollama-format GGUF files in llama-server, support for multiple new model architectures including Gemma4, Qwen3.5, and Mistral3, and improved multi-GPU filtering capabilities.

What's next: Teams should test the new llama-server integration thoroughly, particularly on Windows systems with AMD hardware. The simplified architecture should enable faster feature adoption from upstream llama.cpp releases.

That's your Ollama briefing. Back tomorrow with more developer updates.