Ollama: Speculative Decoding and Codex App Updates
The Ollama team merged five pull requests focusing on MLX runner performance improvements through DFlash speculative decoding and several Codex app refinements including restart mechanisms and documentation updates.
Duration: PT0S
Transcript
Good morning, this is your Ollama developer briefing for May 15th, 2026.
Patrick Devine merged a significant performance enhancement, adding DFlash speculative decoding to the MLX runner. This 1,900-line addition introduces block diffusion speculative decoding with support for Qwen 3.6 models, both mixture-of-experts and dense variants. The implementation includes draft model recurrent cache playback, RoPE and YaRN optimizations, and support for greedy sampling along with Leviathan and Chen sampling methods.
Parth Sareen contributed four merged pull requests centered around the Codex app. The first addressed restart reliability issues by implementing more robust restart mechanisms while maintaining existing safeguards. Sareen also updated UI copy across the launch commands and registry components, and made substantial documentation improvements by adding comprehensive Codex app documentation with annotated screenshots and updated integration guides. Additionally, the team temporarily hid Codex app documentation until the official launch.
The additional commits mirror these merged pull requests, with no standalone changes beyond the integrated work.
What's next: The team appears focused on stabilizing the Codex app for launch while continuing MLX runner optimizations. Performance testing of the new speculative decoding implementation will likely be a priority.
That's your Ollama update for today. Back tomorrow with more developer news.