Ollama: MLX Create Pipeline Rewrite Lands
The MLX engine's create and import path was rewritten into a clear five-phase pipeline, with immediate follow-up fixes for Qwen model tensor handling and new CI coverage — the biggest structural change to model creation in this update.
Duration: PT2M24S
Episode overview
This episode is a short developer briefing from Ollama.
It explains recent repository work in plain language.
- Show: Ollama
- Published: 2026-07-04T13:00:33Z
- Audio duration: PT2M24S
Transcript excerpt
This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.
Good morning. It's July 4th, and today's developer briefing centers on one big architectural shift: how Ollama creates and imports MLX models.
The headline is PR 16919, Patrick Devine's rewrite of the create functionality for the MLX engine. Instead of one tangled import process, it's now broken into five distinct phases: read, classify, plan, write, and manifest creation. Each model architecture gets its own "policy" governing how it converts correctly,…
That rewrite already has ripple effects. PR 17020, from pd95, fixes a real regression it exposed: Qwen 3.5 and Qwen 3.6 models were getting their MTP tensors processed twice — once during creation and again at runtime — corrupting generation. The fix restores dropping those tensors at create time until the runtime…
Meanwhile, Danny Hiltgen's PR 17022 wires up MLX unit tests for pull request runs, using a helper script to fetch a matching release build rather than running the expensive full MLX build. Worth noting: that PR depends on 16919 landing first, so expect a rebase there.
Two smaller but concrete fixes round out the day: PR 17027 addresses OpenAI-compatible streaming clients like Chatbox and OpenWebUI appearing to hang, by…
Wha…
Nearby episodes from Ollama
- Agent Harness Lands, Hardware Support Gets a Cleanup
- Gemma 4 Support and Platform Improvements
- Weekly Recap - MLX Performance & Path Handling
- Memory Management and Multimodal Parsing Fixes
- GPU Offloading and Tool Call Fixes
- Performance Optimizations and Model Handling Improvements
- Infrastructure Updates and Platform Fixes
- Multimodal Fixes and Developer Experience Updates