Ollama: Major Architecture Overhaul Removes CGO Dependencies

Ollama has completed a massive refactoring, removing CGO engines and switching exclusively to llama-server for GGML models while fixing MLX development paths. The changes span over 1,100 files and streamline the inference architecture.

2026-05-30T10:00:31Z

Duration: PT1M55S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

Show: Ollama
Published: 2026-05-30T10:00:31Z
Audio duration: PT1M55S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Good morning, I'm your host with the Ollama developer briefing for Thursday, May 30th, 2026.

Daniel Hiltgen merged a massive architectural overhaul that removes CGO engines and adopts llama-server exclusively for GGML models. This change eliminates the vendored GGML and llama.cpp backend, the CGO runner, and Go model implementations. The refactoring spans 1,100 files with over 28,000 additions and 430,000…

Hiltgen also merged a smaller fix addressing MLX development mode search paths that were broken during the llama-server transition. This update corrects library resolution code to match the new superbuild structure.

The architectural changes include significant improvements to GPU discovery, moving away from parsing llama-server output to using dynamic library loading for more reliable hardware detection. The update also introduces better batch sizing for performance optimization and enhanced Vulkan support with Windows…

Notable technical additions include compatibility layers for Ollama-format GGUF files in llama-server, support for multiple new model architectures including Gemma4, Qwen3.5, and Mistral3, and improved multi-GPU filtering capabilities.

What's next: Teams should…

Nearby episodes from Ollama

LLaMA Server Integration Hardening 2026-06-03T13:00:43Z
Integration Platform Expansion 2026-06-02T13:00:50Z
Model Integration Updates 2026-06-01T13:00:05Z
Weekly Recap - Infrastructure Modernization 2026-06-01T09:06:25Z
MLX Model Display Fixes and Template Parser Cleanup 2026-05-25T10:00:18Z
Weekly Recap - Performance Optimization & Launch System Improvements 2026-05-24T10:00:53Z
DFlash Speculative Decoding Rollback 2026-05-23T10:00:48Z
Model Inventory Refactoring 2026-05-22T10:00:38Z