LangChain

LangChain: Multimodal Token Counting Gets Smarter

Today we're diving into a game-changing improvement to LangChain's token counting system that finally handles images properly! The star of the show is jiaming2li's multimodal support enhancement that fixes a major overestimation issue, plus we've got some helpful middleware exports and quality-of-life improvements from Mason Daugherty.

Duration: PT3M55S

https://podlog.io/listen/langchain-3d585e97/episode/langchain-multimodal-token-counting-gets-smarter-c722020b

Transcript

Hey there, brilliant builders! Welcome back to another episode of the LangChain podcast. I'm your host, and I am genuinely excited to dig into what's been happening in the codebase. Grab your favorite beverage because we've got some really thoughtful improvements to talk about today.

So let's jump right into the main event - and this one's a real problem-solver. Our contributor jiaming2li just merged a fantastic PR that tackles something I bet many of you have run into: wildly inaccurate token counting when working with images.

Here's the story - you know how LangChain has this `count_tokens_approximately` function that helps you estimate token usage? Well, it had a pretty significant blind spot when it came to multimodal content. When you fed it base64-encoded images, it was treating them like regular text and counting them as roughly twenty-five thousand tokens each! That's absolutely massive and was causing some serious headaches.

The real pain point was that this overestimation was breaking context management tools like `trim_messages`. Imagine you're building a chatbot that handles images, and every single image is being counted as if it's a short novel. Your context trimming logic would go haywire, thinking you're way over your token limits when you're actually nowhere close.

jiaming2li's solution is beautifully pragmatic. Instead of trying to tokenize base64 image data as text, the updated function now uses a fixed penalty of about 85 tokens per image. This is so much more realistic and gives you the accurate approximation you need for proper context management. They added a new `tokens_per_image` parameter that defaults to this sensible value, but you can adjust it if needed.

What I love about this change is that it's solving a real developer pain point with a thoughtful, measured approach. The PR includes solid test coverage too - 65 new lines of tests to make sure this behavior stays reliable.

Now, speaking of thoughtful contributions, Mason Daugherty has been busy with a couple of helpful updates. First up, there's a small but important addition to the middleware exports. They've added `ToolCallRequest` to the exports in the agents middleware module. This might seem tiny, but having the right things exported in the right places makes your development experience so much smoother. No more hunting around for imports or having to dig into internal modules.

Mason also tackled some code quality improvements in what they modestly called "nits" - those small cleanup items that keep a codebase healthy. These kinds of maintenance commits might not be glamorous, but they're the foundation of a project that's pleasant to work with long-term.

You know what strikes me about today's changes? They're all about making LangChain work better for the real scenarios you're building. Handling images properly, making imports more intuitive, keeping the code clean - these aren't flashy features, but they're the kind of improvements that remove friction from your daily development work.

Let's talk about today's focus. If you're working with multimodal applications, this token counting improvement is going to be a game-changer for you. Take a look at how you're currently handling context management with images. You might find that you can be much more aggressive with your context windows now that you're getting accurate token estimates.

And if you're building agents or working with tools, definitely check out that new `ToolCallRequest` export. Clean imports lead to cleaner code, and cleaner code leads to fewer headaches down the road.

Before we wrap up, I want to give a huge shoutout to jiaming2li for tackling that multimodal token counting issue. It's contributors like you who make LangChain better for everyone. And thanks to Mason for the steady stream of thoughtful improvements.

That's a wrap for today's episode! Keep building amazing things, and remember - every small improvement in your development experience adds up to something bigger. Catch you next time!