PostgreSQL

PostgreSQL: Query Optimizer Gets Smarter with NOT IN Magic

Today's PostgreSQL update brings a major query optimization breakthrough with Richard Guo's work converting NOT IN subqueries to anti-joins when safe, plus important improvements to CHECK constraint management and buffer handling. We also see streaming read optimizations for VACUUM operations and several buffer management refinements that collectively boost performance across the board.

Duration: PT3M56S

https://podlog.io/listen/postgresql-9847372b/episode/postgresql-query-optimizer-gets-smarter-with-not-in-magic-db6109f0

Transcript

Hey there, PostgreSQL enthusiasts! Welcome back to another episode of our daily dive into the world's most advanced open source database. I'm your host, and wow, do we have some exciting stuff to unpack today from March 12th.

You know that feeling when you write a query with NOT IN and it just... crawls? Well, Richard Guo just gave us a Christmas present in March with a absolutely brilliant optimization. He's taught the PostgreSQL query planner how to convert "NOT IN" subqueries into anti-joins when it's safe to do so.

Now, this might sound like database wizardry, but here's the story in simple terms. Traditionally, when you write something like "SELECT customers WHERE id NOT IN (SELECT customer_id FROM banned_users)", PostgreSQL has had to handle this very conservatively because of how SQL handles NULL values. The planner couldn't optimize it the way it wanted to because NOT IN and anti-joins behave differently when NULLs are involved.

But Richard figured out something clever - if we can prove that neither side of the comparison can be NULL, and the operator itself won't return NULL, then NOT IN and anti-joins behave identically. This unlocks the planner to treat your subquery as a first-class citizen in join ordering optimization. The result? Potentially massive performance improvements for large datasets. This is the kind of under-the-hood improvement that makes your existing queries faster without you changing a single line of code.

Speaking of making things better, Jian He, with fantastic review work from the community, added support for altering CHECK constraint enforceability. You can now use ALTER TABLE to make CHECK constraints enforced or not enforced, just like you could with foreign keys. When you switch from NOT ENFORCED to ENFORCED, PostgreSQL thoughtfully validates your existing data to make sure everything still makes sense. It's these kinds of quality-of-life improvements that make database administration so much smoother.

On the performance front, we're seeing some serious I/O optimizations. The buffer management system got multiple improvements from Andres Freund and others. There's smarter LSN handling that avoids unnecessary locking on architectures that can read 8 bytes atomically, plus cleanup of some obsolete buffer state tracking. These might seem like small details, but they add up to real performance gains.

And here's something that caught my eye - Michael Paquier and Xuneng Zhou have been on a mission to optimize VACUUM operations using streaming reads. Today we see this applied to both bloom and GIN indexes. Xuneng reported a 5x improvement in runtime for GIN vacuum cleanup under certain conditions, and about 30% better performance for bloom indexes. When VACUUM runs faster, your database stays healthier with less impact on your application.

There's also some great developer experience improvements. Michael Paquier documented those handy environment variables for path substitution in SQL tests - you know, those PG_ABS_SRCDIR and friends that make testing extensions so much easier. It's the kind of documentation that saves developers hours of head-scratching.

Today's Focus: If you're running queries with NOT IN subqueries on large datasets, this optimizer improvement could be a game-changer for you. Consider reviewing your query patterns and maybe running some before-and-after benchmarks when you upgrade. And if you've been wanting more flexibility with CHECK constraints, the new enforceability controls give you exactly that.

The PostgreSQL community continues to impress me with this combination of deep technical improvements and practical developer experience enhancements. Every commit here represents someone caring deeply about making the database better for all of us.

That's a wrap for today's episode! Keep coding, keep learning, and remember - every query you write benefits from the incredible work these contributors are doing. Until tomorrow, happy developing!