Go

SIMD Gets Smarter - CPU Feature Detection Overhaul

Austin Clements led a major cleanup of Go's SIMD CPU feature detection system, fixing how FMA operations are handled and streamlining feature requirements across the board. The team also saw improvements to memory allocation function naming and added ML-DSA support to FIPS 140-3 testing.

Duration: PT4M5S

https://podlog.io/listen/go-e282e2e6/episode/simd-gets-smarter-cpu-feature-detection-overhaul-742c1ffa

Transcript

Hey there, Gophers! Welcome back to another episode of the Go podcast. I'm your host, and wow, what a fascinating day in the Go codebase! Grab your favorite beverage because we're diving into some really clever optimizations and architectural improvements that happened on January 14th.

So today we had no merged pull requests, but don't let that fool you - we've got four absolutely fantastic commits that tell a really compelling story about making Go's SIMD operations smarter and more efficient.

The star of today's show is definitely Austin Clements, who's been doing some incredible work on our SIMD CPU feature detection system. Austin tackled a really interesting problem that I think showcases the kind of detective work that makes compiler engineering so fascinating.

Here's the story: Austin discovered that Go was being overly cautious with FMA operations - that's Fused Multiply-Add for those keeping track. The system was requiring AVX-512 support even for smaller 128-bit and 256-bit operations that should work just fine on regular AVX. It turns out this was happening because of a quirky naming convention in the XED database where FMA operations don't have "AVX" in their extension name, even though they're totally part of the AVX family.

What I love about this fix is that it's not just about correctness - it actually deleted a ton of generated code! We're talking about removing unnecessary AVX-512 encodings that were cluttering up the system. There's something deeply satisfying about a change that both fixes a bug and makes the codebase cleaner at the same time.

Austin didn't stop there though. In a follow-up commit, he completely revamped how we handle feature implications in the SIMD system. This is the kind of foundational work that makes everything else better. Now we have a proper table showing which CPU features imply which other features, and the documentation is much clearer about what requirements each operation actually has.

One particularly neat improvement involves the AVXAES feature. Previously, developers had to manually check for both AVX and AES support separately, which was kind of clunky and forced the SIMD API to expose AES checks that weren't really SIMD-related. Now there's a clean AVXAES feature check function that handles all of that complexity behind the scenes.

We also saw Michael Matloob doing some housekeeping in the runtime, renaming the mallocTiny functions to mallocgcTinySize. Now, this might sound like a simple rename, but it's actually part of a larger effort to make the codebase more navigable. By having all memory allocation functions start with "mallocgc", developers can quickly identify and understand the allocation landscape. It's these kinds of thoughtful naming conventions that make large codebases maintainable.

And Filippo Valsorda added ML-DSA support to our FIPS 140-3 functional tests. This is exciting because ML-DSA is becoming an important post-quantum cryptographic standard, and having robust testing infrastructure in place shows Go's commitment to staying ahead of the cryptographic curve.

What really strikes me about today's changes is how they all contribute to making Go more efficient and developer-friendly without breaking anything. Austin's work means better performance and cleaner code generation. Michael's renaming makes the codebase easier to navigate. Filippo's test additions ensure we're ready for the cryptographic future.

Today's Focus: If you're working with SIMD operations in Go, this is a great time to review your CPU feature detection logic. The new feature implication system might simplify some of your code. And if you're doing any cryptographic work, keep an eye on the evolving post-quantum standards - it's an exciting time to be in this space.

That's a wrap for today! These kinds of behind-the-scenes improvements might not be flashy, but they're the foundation that makes everything else possible. Keep coding, keep learning, and I'll see you tomorrow with more Go adventures!