Ollama: MLX Create Pipeline Rewrite Lands

The MLX engine's create and import path was rewritten into a clear five-phase pipeline, with immediate follow-up fixes for Qwen model tensor handling and new CI coverage — the biggest structural change to model creation in this update.

2026-07-04T13:00:33Z

Duration: PT2M24S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

Show: Ollama
Published: 2026-07-04T13:00:33Z
Audio duration: PT2M24S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Good morning. It's July 4th, and today's developer briefing centers on one big architectural shift: how Ollama creates and imports MLX models.

The headline is PR 16919, Patrick Devine's rewrite of the create functionality for the MLX engine. Instead of one tangled import process, it's now broken into five distinct phases: read, classify, plan, write, and manifest creation. Each model architecture gets its own "policy" governing how it converts correctly,…

That rewrite already has ripple effects. PR 17020, from pd95, fixes a real regression it exposed: Qwen 3.5 and Qwen 3.6 models were getting their MTP tensors processed twice — once during creation and again at runtime — corrupting generation. The fix restores dropping those tensors at create time until the runtime…

Meanwhile, Danny Hiltgen's PR 17022 wires up MLX unit tests for pull request runs, using a helper script to fetch a matching release build rather than running the expensive full MLX build. Worth noting: that PR depends on 16919 landing first, so expect a rebase there.

Two smaller but concrete fixes round out the day: PR 17027 addresses OpenAI-compatible streaming clients like Chatbox and OpenWebUI appearing to hang, by…

Wha…

Nearby episodes from Ollama

Agent Harness Lands, Hardware Support Gets a Cleanup 2026-07-03T14:05:17Z
Gemma 4 Support and Platform Improvements 2026-06-15T13:01:10Z
Weekly Recap - MLX Performance & Path Handling 2026-06-15T09:08:58Z
Memory Management and Multimodal Parsing Fixes 2026-06-14T13:00:49Z
GPU Offloading and Tool Call Fixes 2026-06-13T13:01:39Z
Performance Optimizations and Model Handling Improvements 2026-06-12T13:01:28Z
Infrastructure Updates and Platform Fixes 2026-06-11T13:00:54Z
Multimodal Fixes and Developer Experience Updates 2026-06-10T13:00:43Z