Ollama: Tokenizer Consolidation & MLX Library Improvements
The Ollama team merged two significant improvements on February 19th: a major tokenizer consolidation by Patrick Devine that adds a new unified tokenizer package with BPE and SentencePiece support, and an MLX library loading fix by natl-set that improves compatibility with package managers like Homebrew. These changes streamline the codebase while making Ollama more accessible to developers.
Duration: PT4M13S
Transcript
Hey there, fellow developers! Welcome back to another episode of the Ollama podcast. I'm your host, and wow, do we have some exciting updates to dive into today. Grab your favorite morning beverage because we're talking about some really solid improvements that landed in the codebase yesterday.
So February 19th was quite the day for the Ollama project. We saw two fantastic pull requests get merged that are going to make life better for both the core team and all of us using Ollama in our projects. Let me paint you the picture of what went down.
First up, and this is the big one folks, Patrick Devine just delivered what I'm calling a masterclass in code organization. PR 14327 is all about consolidating the tokenizer, and when I say this is substantial, I mean it – we're talking about 1,807 lines added across 18 files. But here's the beautiful part: this isn't just throwing more code at the problem. This is smart, intentional refactoring.
Patrick created a brand new tokenizer package that brings together BPE and SentencePiece tokenizers under one roof. Think of it like cleaning out your toolshed and organizing everything so you can actually find what you need. The old system had dependencies scattered around, especially with those imagegen tokenizers, and now we're moving toward something much cleaner and more maintainable.
What really gets me excited about this change is the attention to detail. Patrick didn't just move code around – they fixed multibyte decoding issues in the pipeline and added proper benchmark tests. You know what that means? Better performance and fewer weird edge cases that make you scratch your head at 2 AM wondering why your text isn't processing correctly.
Now, Patrick mentioned they're not done yet. The WordPiece tokenizer for BERT models is coming when they add embedding model support, and those old imagegen tokenizers are getting the boot in a follow-up PR. I love seeing this kind of methodical, phased approach to refactoring. It's like renovating your house one room at a time instead of tearing down all the walls at once.
The second PR might seem smaller at first glance, but trust me, it's one of those changes that's going to save so many people from pulling their hair out. natl-set tackled a really annoying issue with MLX library loading on macOS. You know that frustrating moment when you install something via Homebrew, everything should work, but then Ollama can't find the MLX libraries even though they're sitting right there?
Well, that's history now. The problem was that the existing code was being a bit too aggressive, manually searching directories and bypassing the system's rpath entirely. natl-set added a simple but elegant solution – try loading via rpath first, using just the library name like the system expects, and only fall back to directory searching if that doesn't work.
This is exactly the kind of fix that makes me appreciate good software engineering. It's following standard Unix and macOS conventions, which means it plays nicely with package managers and respects how developers actually install and manage their dependencies. No more fumbling around with OLLAMA_LIBRARY_PATH environment variables just to get things working.
What I love about both of these changes is they're making Ollama more accessible and reliable. The tokenizer consolidation makes the codebase easier to understand and contribute to, while the MLX fix removes friction for developers getting started, especially on macOS with Homebrew installations.
Today's focus should be on testing these improvements if you're running the latest build. Try out those tokenization features, see how the performance feels, and if you're on macOS, check if your MLX setup is smoother now. And hey, if you've been thinking about contributing to Ollama, this cleaner tokenizer architecture might be a great entry point to understand how text processing works under the hood.
That's a wrap for today's episode! Keep building amazing things, and remember – great software is built one thoughtful commit at a time. Catch you tomorrow!