Ollama: Weekly Recap - Performance Optimization & Launch System Improvements
This week brought significant performance improvements with reduced startup times for large model stores and extensive updates to the launch system integrations. A major MLX runner feature was reverted due to architectural concerns.
Duration: PT2M33S
Transcript
Good morning, I'm your host with the Ollama weekly recap for May 17th through 24th, 2026.
Five pull requests merged and eight additional commits this week, focusing heavily on performance and integration improvements.
Starting with performance enhancements: Daniel Hiltgen merged a significant optimization reducing startup model hydration. This change introduces a lightweight model list cache for tags and launch inventory while keeping show cache population lazy. The improvement specifically targets users with large model stores who were experiencing slow startup times. The implementation spans 16 files with over 1,300 lines added, including new caching infrastructure and comprehensive test coverage.
Moving to integration updates: The launch system saw substantial improvements this week. Eva H added a Codex model metadata catalog that generates model configuration files and wires them into launch profiles. This eliminates the model metadata fallback warnings users were encountering. Parth Sareen followed with enriched model inventory changes, updating interfaces to populate enriched models centrally rather than handling logic per integration. This refactoring touched 34 files and leverages the new tags endpoint to avoid show API calls.
A smaller but important Codex fix came from Bruce MacDonald, removing the patch tool type that was causing schema compatibility issues with newer Codex versions.
In our fixes category: Jesse Gross made the significant decision to revert the DFlash speculative decoding feature for MLX runner. The integration was deemed too invasive, threading DFlash-specific logic through pipelines, base model interfaces, and cache layers. The revert removes over 1,600 lines across 13 files. Gross noted that useful components like YaRN RoPE improvements and draft architecture detection will be reintroduced as separate, more focused commits.
Notable additional commits include Jesse Gross implementing several MLX runner improvements: keeping gated-delta recurrent state in float32 for better precision, adding automatic draft architecture detection from config files, and moving YaRN RoPE helpers into shared neural network utilities for broader reuse.
Next week, we expect to see the reintroduction of those self-contained MLX runner improvements and continued refinements to the launch system based on user feedback.
That's your Ollama weekly recap - stay tuned for next week's developments.