Ollama: Weekly Recap - MLX Performance & Codex Integration
Sixteen pull requests were merged this week focusing on MLX runner improvements, speculative decoding, and new Codex App integration. Major infrastructure updates include optimized release builds and hardened update flows.
Duration: PT2M30S
Transcript
Good morning. This is your Ollama weekly recap for May 10th through 17th, 2026.
Sixteen PRs merged and 16 additional commits this week, with significant focus on MLX performance and new integrations.
Starting with new features: The team shipped Codex App integration as a launch option, joining Claude Code, Hermes, and OpenClaw in the top integrations list. This includes full install, open, and configuration handling. The OpenCode launch integration now supports image modalities for vision-capable models, automatically detecting and advertising image input capabilities through model probing.
On the performance front, major MLX runner improvements landed. The team added DFlash speculative decoding with support for Qwen 3.6 models, draft model recurrent cache playback, and enhanced RoPE/YaRN implementations. The MLX sampler received a complete rework, replacing the transform chain with an explicit distribution pipeline that keeps sparse token data on GPU rather than full vocabulary scores. This change also fixes seed parameter handling for reproducible sampling.
Infrastructure updates include optimized release builds that should significantly reduce build times. The team switched to ninja's load targeting instead of fixed parallelism, adjusted compression settings - Windows 7zip from level 9 to 7, Linux zstd from 22 to 19 - and separated MLX into different archive files. App update flows were hardened with new unit tests for Mac and Windows verification.
Several critical fixes were implemented. MLX status timeouts during inference were resolved by caching memory samples and implementing best-effort refresh when the worker thread is busy. A macOS 26 target leakage issue in Metal v3 libraries was fixed by relinking AIR files with metallib. The Anthropic adapter now properly preserves Claude's local image-path tool results while maintaining renderer-owned prompt formatting.
Additional commits this week largely mirrored the merged pull requests, indicating focused development without significant experimental branches.
Looking ahead, the MLX performance improvements and Codex integration suggest continued emphasis on both runtime optimization and developer tooling expansion.
That's your Ollama weekly recap. I'm your host, we'll see you next week.