PostgreSQL

PostgreSQL: Performance Revolution - ARM Acceleration and Smart Index Scanning

Today's PostgreSQL development focused heavily on performance optimization with ARM CRC32C acceleration, AVX2-powered page checksums, and smarter index scanning strategies. Notable contributors include John Naylor's ARM crypto extensions work, Peter Geoghegan's index scan improvements, and Etsuro Fujita's foreign data wrapper transaction fixes.

Duration: PT4M23S

https://podlog.io/listen/postgresql-9847372b/episode/postgresql-performance-revolution-arm-acceleration-and-smart-index-scanning-3dcc2ebb

Transcript

Hey there, PostgreSQL developers! Welcome back to another episode of the PostgreSQL podcast. I'm absolutely buzzing with excitement today because we've got some incredible performance work to dive into from April 5th, 2026. Grab your favorite beverage because we're talking about some seriously cool optimizations that are going to make PostgreSQL fly even faster.

So here's what's fascinating about today's activity - we didn't have any merged pull requests, but we had seventeen commits that are absolutely packed with performance goodness. Sometimes the most impactful work happens in these focused development sessions, and today is a perfect example.

Let me start with something that's going to make ARM users very happy. John Naylor has been doing incredible work on hardware acceleration, and today he landed ARM CRC32C support using the cryptography extension. If you're running PostgreSQL on ARM processors with the ARMv8 crypto extension, you're looking at roughly double the performance for CRC calculations on longer inputs. What I love about this implementation is that it includes runtime detection, so PostgreSQL will automatically use the fastest available method on your hardware. It's like having PostgreSQL automatically shift into high gear when it detects a sports car underneath.

But John wasn't done there! He also committed AVX2 support for page checksum calculations. This is where things get really exciting for x86 users. We're talking about several-fold performance increases by using 256-bit registers and vector multiplication instead of the older shift-and-add approach. The beauty of this implementation is that it uses function pointers set on first use, so there's no overhead on systems without AVX2, but massive gains on systems that have it. This is especially important if you're using io_uring, since checksum computation isn't parallelized by IO workers in that case.

Now, let's talk about Peter Geoghegan's work on index scanning, because this is some really thoughtful optimization. He's been refactoring and improving how heapam handles index scans, and today we saw several commits that are laying the groundwork for upcoming I/O prefetching. The key insight here is keeping buffer pins across index scan resets. This might sound technical, but it means that nested loop joins and merge joins that frequently restore saved marks will avoid the costly dance of repeatedly pinning and unpinning the same heap pages. It's like keeping your parking spot when you know you'll be back in five minutes.

Peter also moved the index scan code into its own dedicated file, which might seem like housekeeping, but it's actually preparing for substantial expansions to the slot-based table AM interface for index scans. Sometimes the best performance work starts with good organization.

There's also a really important fix from Etsuro Fujita for postgres_fdw that addresses transaction consistency across foreign data wrappers. This ensures that READ ONLY and DEFERRABLE transactions behave consistently whether you're working with local or foreign data. It's one of those fixes that makes the system more predictable and reliable, which is just as important as raw performance.

And I have to give a shout-out to the work on read streams by Andres Freund. The optimization to only increase read-ahead distance when actually waiting for I/O is brilliant. It avoids the CPU overhead of pinning unnecessary buffers when the I/O subsystem is already keeping up, and prevents wasteful prefetching in scenarios like nested loop joins where you might not consume the entire stream.

Today's focus is really about recognizing how these performance optimizations work together. The ARM and AVX2 improvements give us better hardware utilization, the index scanning changes prepare us for smarter I/O patterns, and the read stream optimizations ensure we're not wasting resources. If you're working on performance-critical PostgreSQL deployments, keep an eye on these changes as they make their way through the development cycle.

That's a wrap for today's episode! The PostgreSQL community continues to push the boundaries of what's possible with database performance. Until next time, keep coding, keep optimizing, and remember - every commit gets us closer to an even better PostgreSQL. See you tomorrow!