Ollama
Track Ollama development. Run large language models locally.
https://podlog.io/listen/ollama-3aed006f
Episodes
-
Ollama: LLaMA Server Integration Hardening
Ollama development focused heavily on stabilizing the new LLaMA server integration introduced in version 0.30, with multiple fixes for load timeouts, token counting, and streaming behavior. Additional work expanded hardware support and…
-
Ollama: Integration Platform Expansion
Major expansion of third-party integration support with new Qwen Code and Cline integrations, plus critical llama.cpp server fixes addressing multi-GPU support and embedding API consistency.
-
Ollama: Model Integration Updates
Two focused updates improve model compatibility in Ollama, addressing parsing and server handling for newer model variants. Both changes target specific integration points in the Llama server and Laguna parser components.
-
Ollama: Weekly Recap - Infrastructure Modernization
Ollama completed a major architectural shift this week, removing CGO engines and standardizing on llama-server for all GGUF models. The team also addressed compatibility issues for newer model formats including Gemma 4.
-
Ollama: Major Architecture Overhaul Removes CGO Dependencies
Ollama has completed a massive refactoring, removing CGO engines and switching exclusively to llama-server for GGML models while fixing MLX development paths. The changes span over 1,100 files and streamline the inference architecture.
-
Ollama: MLX Model Display Fixes and Template Parser Cleanup
Two pull requests were merged addressing MLX model information display issues and removing duplicate template parsing code. Both changes include comprehensive test coverage.
-
Ollama: Weekly Recap - Performance Optimization & Launch System Improvements
This week brought significant performance improvements with reduced startup times for large model stores and extensive updates to the launch system integrations. A major MLX runner feature was reverted due to architectural concerns.
-
Ollama: DFlash Speculative Decoding Rollback
Jesse Gross reverted the recently merged DFlash speculative decoding feature due to invasive code integration, then re-implemented useful components as separate, cleaner commits. The rollback removed over 1,600 lines of code while…
-
Ollama: Model Inventory Refactoring
Parth Sareen merged a significant refactoring that centralizes model inventory logic and introduces a new `/api/tags` endpoint, streamlining integration handling across 34 files.
-
Ollama: Startup Performance Optimization
Two pull requests were merged focusing on performance improvements and compatibility fixes. The main change introduces a lightweight model list cache to reduce startup times for users with large model collections.
-
Ollama: Codex Integration Enhancement
Developer Eva H merged a pull request adding automated model metadata catalog generation for Codex integration. The change eliminates fallback warnings users were experiencing when launching Ollama models with Codex.
-
Ollama: Weekly Recap - MLX Performance & Codex Integration
Sixteen pull requests were merged this week focusing on MLX runner improvements, speculative decoding, and new Codex App integration. Major infrastructure updates include optimized release builds and hardened update flows.
-
Ollama: Release Build Optimization
Daniel Hiltgen merged two pull requests focused on improving build performance and reducing log verbosity. The main change optimizes release builds to save several minutes by adjusting parallelism and compression settings.
-
Ollama: Speculative Decoding and Codex App Updates
The Ollama team merged five pull requests focusing on MLX runner performance improvements through DFlash speculative decoding and several Codex app refinements including restart mechanisms and documentation updates.
-
Ollama: MLX Sampler Overhaul and Codex Integration
Three pull requests were merged focusing on MLX sampler improvements and new Codex app integration. The MLX sampler received a complete rework with an explicit distribution pipeline, while the launch system gained Codex App integration…
-
Ollama: Vision Model Integration Enhancement
A pull request was merged to add image input capabilities for vision models in the OpenCode launch integration. The update enables proper capability metadata reporting for models that support both text and image inputs.
-
Ollama: MLX Threading and Claude Image Fixes
Five pull requests merged addressing MLX runner stability issues, Claude image handling improvements, and macOS build compatibility fixes. Key changes include thread affinity updates for MLX runners and enhanced Windows update flows.
-
Ollama: Model Transfer Optimization and Test Reliability
Two pull requests merged focusing on improved model push/pull performance for safetensor models and integration test stability. The updates include reduced default parallelism and new configuration options for network transfer optimization.
-
Ollama: Claude Desktop Integration Removed
Ollama has disabled its Claude Desktop integration following Anthropic's decision to block third-party access for non-Anthropic models. The change affects 15 files across the launch system and user interface.
-
Ollama: Launch Command Enhancements
Two significant updates to the Ollama launch command were merged, improving backup management for integrations and adding plan-aware model access controls. Both changes focus on streamlining user experience while adding sophisticated…
-
Ollama: Speed Revolution - MTP Decoding and Smart Caching
Three major performance-focused PRs landed today, headlined by Patrick Devine's massive MTP speculative decoding implementation for Gemma4 models and Parth Sareen's server-side caching system for model metadata. Daniel Hiltgen also…
-
Ollama: Go 1.26 Runtime Update
The Ollama team upgraded their Go runtime to version 1.26, streamlining their testing infrastructure and removing deprecated experimental features. The update included migrating synchronous testing APIs and cleaning up build configurations.
-
Ollama: Weekly Recap - MLX Threading & Model Recommendations
Major MLX threading fixes for macOS stability, new experimental model recommendations endpoint, and significant batching improvements landed this week. The team also expanded launch integrations with Claude App support and improved…
-
Ollama: MLX Threading Fixes and Claude App Integration
Two significant updates merged today: threading fixes for MLX on macOS addressing OS-thread-local execution state, and full Claude App integration with launch commands.
-
Ollama: Model Recommendations and Windows Gateway Fix
Two pull requests merged today focused on improving the app's model recommendation system and fixing a Windows-specific OpenClaw gateway timeout issue. The changes streamline the frontend architecture while resolving a critical…
-
Ollama: Metal GPU Stability and Gemma4 Updates
Four pull requests were merged focusing on Metal GPU initialization hardening, Gemma4 renderer improvements, and VRAM-based model recommendations. The main highlight is a significant stability fix for Metal GPU systems experiencing…
-
Ollama: Launch Experience Improvements and Model Recommendations
Three pull requests merged today focus on enhancing Ollama's launch functionality, including a new experimental model recommendations cache endpoint and UI alignment between the app and command-line launcher.
-
Ollama: Multi-Sequence Batching and New Model Support
Ollama merged four major pull requests on April 28th, including foundational work for multi-sequence batching in the MLX runner and support for new model architectures. The team also fixed a critical desktop app issue that was…
-
Ollama: Tokenizer Bug Fix for BPE Processing
A critical tokenizer bug affecting multi-regex byte-pair encoding has been resolved, preventing text duplication and inflated token counts in multi-stage BPE tokenizers.
-
Ollama: Weekly Recap - MLX Performance & Launch Integrations
This week brought significant MLX runner optimizations with logprobs support and batched sampling, plus new Kimi CLI integration and improved OpenClaw onboarding flows. Performance improvements of up to 1.5% were achieved across multiple…
-
Ollama: MLX Sampling Performance Enhancement
Jesse Gross merged a significant pull request implementing batch sampling across multiple sequences in the MLX runner, along with optimizations to use fixed-size ring buffers for tracking sampler history.
-
Ollama: OpenAI Reasoning Integration
Ollama merged two pull requests from Parth Sareen that add support for "max" thinking values and integrate OpenAI's reasoning effort mapping. The changes include comprehensive test coverage and validation for unsupported effort values.
-
Ollama: Launch System Improvements and Integration Fixes
Three pull requests merged to the Ollama repository on April 23rd, focusing on launch system enhancements including model ordering fixes, OpenClaw onboarding improvements, and configuration drift resolution.
-
Ollama: Launch System Overhaul and Documentation Updates
Two pull requests were merged today focusing on simplifying the OpenClaw web search integration and updating structured outputs documentation for cloud compatibility.
-
Ollama: MLX Performance Boost and Model Updates
Six pull requests merged with significant MLX runner optimizations delivering 1.5% throughput improvements and better concurrent processing. Model recommendations updated to feature kimi-k2.6.
-
Ollama: New CLI Integration and Performance Improvements
Ollama added Kimi CLI integration through the launch command and optimized MLX model performance with sigmoid router fusion for approximately 1% speed improvement.
-
Ollama: Weekly Recap - MLX Performance & Launch Integration Expansion
Ollama merged 14 pull requests this week focusing on MLX performance improvements, new launch integrations for Copilot CLI and Hermes, and enhanced Gemma4 model rendering capabilities.
-
Ollama: MLX Sampler Improvements
Developer Daniel Hiltgen merged improvements to repeat penalty handling in the MLX runner's sampling system. The update refactors penalty application logic across seven files with expanded test coverage.
-
Ollama: Windows WSL Integration Simplified
Ollama simplified its Windows WSL integration by removing automatic handoff functionality and updated Hermes documentation. The changes reduce complexity and provide clearer guidance for Windows users.
-
Ollama: Gemma4 Enhancements and Copilot CLI Integration
Ollama merged seven pull requests focused on Gemma4 model improvements and new GitHub Copilot CLI integration. Key changes include size-based rendering for Gemma4 models and fixes for MLX platform compatibility.
-
Ollama: Hermes Agent Integration and Gemma4 Improvements
Ollama added Hermes agent integration through a new launch command and implemented several Gemma4 performance and quality improvements. The development team also fixed model recommendation ordering in the launch interface.
-
Ollama: Gemma 4 MLX Support and Mixed-Precision Improvements
The Ollama team merged significant MLX backend improvements including mixed-precision quantization and capability detection enhancements. A major addition brings Gemma 4 model support to the MLX runtime with text-only functionality.
-
Ollama: Weekly Recap - Model Integration and Tooling Enhancements
This week saw significant improvements to model integrations and developer tooling with 29 commits focused on enhancing API error handling, updating model renderers, and refining the launch system. Notable updates include Gemma4 template…
-
Ollama: ROCm 7.2.1 Performance Update
The Ollama team merged a pull request upgrading ROCm to version 7.2.1, bringing performance improvements and bug fixes to the Linux build pipeline.
-
Ollama: Gemma4 Parser Improvements
Devon Rifkin improved the Gemma4 model parser to be more flexible with whitespace handling before bare keys. The update includes extensive test coverage with 141 new test lines.
-
Ollama: Model Updates and Tool Call Fixes
Four commits landed today focusing on model integrations, template updates, and bug fixes. Key changes include new Hermes Agent documentation, Gemma4 template synchronization, and parallel tool call index corrections across multiple models.
-
Ollama: Error Handling and Modelfile Fixes
Five commits were merged today focusing on API error improvements, modelfile functionality fixes, and dependency cleanup. Key changes include better error messages for the responses API and fixes to the /save command for…
-
Ollama: Weekly Recap - Gemma4 Integration & Audio Support
This week featured the major addition of Gemma4 model support with comprehensive text, vision, and audio capabilities, alongside critical bug fixes for ROCm builds and tokenization improvements.
-
Ollama: Performance Lessons and Gemma4 Refinements
The Ollama team tackled critical Gemma4 performance issues, with a fascinating story of enabling flash attention only to revert it due to a 60% performance regression. Major improvements included reworking tool call handling with cleaner…
-
Ollama: Gemma4 Arrives with Audio Magic
A landmark day for Ollama with the massive Gemma4 integration bringing text, vision, and audio capabilities in one comprehensive update. The team also shipped a new launch tab to help users discover integrations, plus important tokenizer…
-
Ollama: Modernizing Codex Configuration
Today we're diving into a nice cleanup effort by Eva H who tackled some deprecation warnings in the Ollama-Codex integration. The main pull request replaces the old OPENAI_BASE_URL environment variable with a proper config.toml profile…
-
Ollama: Tokenizer Love and Better Model Support
Today we're diving into some fantastic tokenizer improvements that make Ollama even more versatile! Daniel Hiltgen delivered two key enhancements - adding SentencePiece-style BPE support for better model compatibility, and fixing a…
-
Ollama: Legacy Compatibility and Developer Experience Wins
Jeffrey Morgan delivered two solid improvements to Ollama today, tackling both backward compatibility and developer workflow enhancements. The main highlight is a compatibility fix for the qwen3-next model that resolves legacy projection…
-
Ollama: Smoothing the Launch Experience
The Ollama team focused heavily on polishing the launch functionality with four major pull requests merged. Key improvements include removing the launch banner, auto-installing pi with npm, reverting context length warnings for…
-
Ollama: Fixing the Inconsistencies That Matter
Today we're diving into 7 merged PRs and 9 commits that tackle some really important quality-of-life issues in Ollama. The team fixed false "out of date" model warnings that were bugging users, improved tool calling reliability for…
-
Ollama: Smart Caching and Better User Experience
Today brings exciting performance improvements with smart caching snapshots for long prompts, plus thoughtful user experience enhancements. The team focused on making Ollama more reliable for heavy workloads while polishing the developer…
-
Ollama: VS Code Integration Takes Center Stage
Today we're diving into a major milestone for Ollama - native VS Code integration! Eva H and the team delivered a comprehensive `ollama launch vscode` command that seamlessly connects your local models with GitHub Copilot Chat. Plus,…
-
Ollama: Precision Revolution - New Float Formats and Testing Powerhouse
The Ollama team delivered three major improvements focused on precision and testing capabilities. Patrick Devine introduced support for cutting-edge float formats (mxfp4, mxfp8, nvfp4) that promise better model efficiency, while Daniel…
-
Ollama: MLX Performance Breakthrough and Smarter Caching
The Ollama team delivered major MLX improvements with a massive update that brings 6.4x speed improvements through new CUDA kernels, plus smarter caching logic for transformer models. Daniel Hiltgen led the MLX update while Jesse Gross…
-
Ollama: Nvidia Partnership Takes Center Stage
Bruce MacDonald led the charge with two key integrations focusing on Nvidia's ecosystem. The team added comprehensive NemoClaw documentation and fixed a critical headless installation issue with OpenClaw gateway health checks. These…
-
Ollama: Bug Squashing Bonanza
Today we're celebrating some serious stability improvements with 4 merged pull requests and 5 additional commits that tackle everything from desktop app connectivity issues to subprocess deadlocks. Notable contributions from hoyyeva…
-
Ollama: The Caching Revolution
Jesse Gross delivered a massive performance breakthrough with smart KV cache sharing across conversations, while Bruce MacDonald polished the user experience with multiple fixes for model selection and headless systems. The team also…
-
Ollama: Bug Squashing and Launch Improvements
Today we're diving into three solid commits that make Ollama more reliable and feature-rich. The team tackled a sneaky error handling bug in model allocation, improved container compatibility for the launch command, and enhanced web…
-
Ollama: Launch Command Gets a Major Polish
This episode covers 11 merged PRs focused heavily on improving the launch command experience, with significant work from ParthSareen on headless flows and TUI model ordering. The team also tackled cloud integration fixes, Anthropic…
-
Ollama: Spring Cleaning and Performance Gains
The Ollama team delivered a major refactor of the TUI and launch system, removing over 5,000 lines of divergent code paths to make integration testing much easier. Performance enthusiasts will love the MLX improvements that streamlined…
-
Ollama: Thinking Streams and Local Tool Power-ups
The Ollama team delivered three solid improvements focusing on AI streaming capabilities and local model empowerment. ParthSareen tackled the complex challenge of properly splitting mixed thinking streams in OpenAI compatibility, while…
-
Ollama: Stability First - Error Handling and Performance Fixes
The Ollama team focused on stability and reliability with 5 merged pull requests, including crucial MLX error handling improvements and a performance-related revert. Notable contributions came from dhiltgen's error handling hardening,…
-
Ollama: MLX Gets a Major Upgrade and Web Search Goes Live
Daniel Hiltgen led a massive MLX infrastructure overhaul with header vendoring that simplifies the build process, while Parth Sareen introduced experimental web search capabilities. The team also updated ROCm to version 7.2 for Linux and…
-
Ollama: Simplifying the Sampling Story
Patrick Devine merged a significant refactor that streamlines how Ollama's MLX runner handles text generation sampling. The change replaces a complex chain of sampling interfaces with a single, stateful sampler that's much easier to work…
-
Ollama: Cloud Models Get Smarter & Build Performance Boost
Today we're diving into a busy day with 6 merged PRs and 7 commits that brought some major improvements to Ollama! The team tackled cloud model handling, fixed XML parsing issues with GLM models, and made Docker builds way more…
-
Ollama: Cloud Integrations Get Some Love
Today we're diving into a focused day of polish and bug fixes for Ollama's cloud integrations. Parth Sareen led the charge with two substantial PRs fixing model limit lookups and context window handling, while the team also cleaned up…
-
Ollama: Smarter Constraints and Qwen3.5 Boost
Today we're diving into two focused improvements from ParthSareen that make Ollama more flexible and capable. The first loosens thinking level constraints by removing unnecessary validation at the routes level, while the second adds…
-
Ollama: Cloud Integration Drama and AI Model Expansion
The Ollama team had quite the eventful day with a major cloud integration feature getting merged, reverted, reapplied, and reverted again - showing how careful they are about releases. Meanwhile, they expanded AI model support with Qwen…
-
Ollama: Smarter Sampling and Crash Prevention
Jeffrey Morgan merged two key improvements today - a substantial enhancement to the sampling system with repeat-based sampling capabilities, and a crucial fix preventing crashes in the Qwen3Next model's DeltaNet when using split…
-
Ollama: Building Bridges for Better Model Compatibility
Today we're diving into a fantastic compatibility improvement for the Qwen3next model architecture. Jeffrey Morgan merged a substantial pull request that adds support for imported GGUF models, solving a real user pain point with over 200…
-
Ollama: MLX Runner Gets Rock Solid
Jesse Gross delivered a comprehensive overhaul of the MLX runner with two major pull requests and supporting commits focused on memory management and reliability. The changes include proper memory reporting through `ollama ps`, context…
-
Ollama: Tool Calling Gets Smarter
Four significant pull requests merged today focusing on tool calling improvements and system reliability. Jeffrey Morgan and Parth Sareen led major enhancements to Qwen3 and GLM parsers for better tool calling behavior, while Eva H fixed…
-
Ollama: Cleaner Shutdowns and Faster Startups
Today we're diving into two fantastic merged PRs that make Ollama more reliable and responsive. Jesse Gross tackled a tricky issue with MLX runner request cancellation that could cause background computation to continue and even trigger…
-
Ollama: Qwen 3.5 Architecture Lands with Safety Upgrades
The Ollama team delivered major model architecture updates with full Qwen 3.5 support, including the new 27B parameter variant. Meanwhile, Bruce MacDonald strengthened the foundation with tensor validation improvements that catch sizing…
-
Ollama: Memory Management Revolution
The Ollama team shipped seven major pull requests focused heavily on memory optimization and user experience improvements. Jesse Gross led a complete overhaul of MLX memory management, fixing critical memory leaks and crashes, while Eva…
-
Ollama: Nemotron Architecture Lands with Unified Cache Vision
Jeffrey Morgan merged a massive pull request adding Nemotron architecture support to Ollama, bringing over 3,000 lines of new code across 22 files. This foundational change introduces a unified recurrent cache system that paves the way…
-
Ollama: Fixing the WSL Plugin Problem
Today we're diving into a clever fix for a tricky cross-platform issue that was blocking plugins on Windows Subsystem for Linux. Parth Sareen tackled a security permission problem by moving web search plugin installation from the package…
-
Ollama: Smarter UIs and Smoother Onboarding
The Ollama team shipped three solid improvements focused on user experience - exposing server context length to eliminate duplicate logic in the UI, a comprehensive onboarding flow for openclaw integration, and cleaning up noisy error…
-
Ollama: Tokenizer Consolidation & MLX Library Improvements
The Ollama team merged two significant improvements on February 19th: a major tokenizer consolidation by Patrick Devine that adds a new unified tokenizer package with BPE and SentencePiece support, and an MLX library loading fix by…
-
Ollama: Rolling Back and Rolling Forward
Today's episode covers a classic development story - sometimes you need to take a step back to move forward! The team rolled back MLX bindings due to toolchain compatibility issues, while simultaneously shipping important improvements to…
-
Ollama: Editor Integration Revolution
A massive leap forward in editor integration with 7 merged PRs focusing on improved user experience and MLX model support. Parth Sareen led the charge with enhanced model selection workflows, new CLI support for Cline, and better…
-
Ollama: MLX Display Bug Squashing Day
Patrick Devine had a productive bug-fixing session, tackling two MLX-related issues that were causing display problems and missing functionality. The first fix ensures parameter counts show up correctly when using `ollama show` with MLX…
-
Ollama: MLX Runner Gets Major Model Upgrades
Patrick Devine delivered two significant PRs expanding MLX Runner support with Gemma3 and Llama3 architectures, plus streamlined quantization code. The team also cleaned up documentation with a macOS download link fix, making it a solid…
-
Ollama: MLX Performance Breakthrough and Anthropic Search
The Ollama team delivered some impressive performance wins today with a major MLX runner overhaul that boosts GLM 4.7 Flash performance by 150%, plus enabling web search for Anthropic APIs. Patrick Devine led the MLX improvements while…
-
Ollama: MLX Runner Revolution and Documentation Polish
Today we're diving into a massive infrastructure upgrade with Patrick Devine's new MLX runner implementation, bringing method-based bindings and GLM4-MoE-Lite model support in nearly 15,000 lines of new code. We also saw great community…
-
Ollama: Refactoring Rollercoaster and Developer Experience Wins
The Ollama team had a busy day with 9 merged PRs focusing on major code reorganization and developer experience improvements. Notable highlights include a significant tokenizer refactoring (with a quick revert fix), enhanced Claude Code…
-
Ollama: Bug Squashing Bonanza
Today's episode covers six important fixes that landed in Ollama, including a crucial token counting bug that was shortchanging users by one token, improvements to error messaging for cloud models, and several deep fixes to the Qwen3…
-
Ollama: Smooth Onboarding for New Users
Jeffrey Morgan merged a thoughtful user experience improvement that ensures fresh Ollama installations run through proper onboarding before accessing the gateway. This change adds smart detection to check if users have completed the…
-
Ollama: Polish and Perfectionism - The Art of Getting the Details Right
Today we're diving into three beautifully focused pull requests that show how great software is built through attention to detail. From capitalizing brand names properly to improving user experience with placeholder text, plus updating…
-
Ollama: Cleaning Up the Config Game
Today we're diving into a focused improvement from Gabe Hart that streamlines how Ollama handles API configuration across multiple launch config packages. This small but meaningful change touches five files and shows how thoughtful…
-
Speed Boost and Model Magic
Today we're diving into a fantastic performance boost for Ollama with compiler optimizations and some impressive AI model improvements. Jeffrey and Parth shipped 5 solid PRs focusing on build optimizations, GLM4 model fixes, and better…
-
Memory Magic and Command Makeover
Today brought some serious memory optimization wizardry with MLA absorption for GLM models - though it took a few tries to get the CUDA builds just right! Plus, the team made the CLI more intuitive by renaming `ollama config` to `ollama…
-
Making Ollama Play Nice with Everyone
Today brought some fantastic integration improvements to Ollama! The standout feature is a brand new `ollama config` command that makes it super easy to connect Ollama with popular tools like Claude, Codex, and Droid. Plus, we got some…
-
The Great Cleanup - Manifests Get Their Own Home
Today we're diving into some serious spring cleaning in the Ollama codebase! Patrick Devine led a major refactor moving manifest code to its own directory and consolidating model path handling, while Jeffrey Morgan streamlined the image…
-
New Model Architecture and Image Generation Fixes
Ollama adds support for the LFM2 architecture with the new LFM2.5-1.2B-Thinking model, while also fixing crucial image generation bugs around model path resolution. The team merged 4 pull requests with significant architectural additions…