PyTorch

PyTorch: Complex Math Gets Smarter & Build Improvements

Today we're covering 3 merged pull requests and 30 additional commits that bring some really solid improvements to PyTorch. The standout changes include better handling of conjugated tensors in matrix operations, build timeout fixes for ROCm users, and some nice performance optimizations in the inductor. Plus we've got Angela Yi making enums more flexible and several contributors tackling edge cases across the codebase.

Duration: PT4M20S

https://podlog.io/listen/pytorch-2496be96/episode/pytorch-complex-math-gets-smarter-build-improvements-eb4677e1

Transcript

Hey there, PyTorch developers! Welcome back to another episode. I'm your host, and wow, what a productive day March 21st was for the PyTorch community. We've got 3 merged pull requests and 30 additional commits that are really going to make your development experience smoother. Grab your coffee, and let's dive into what the team has been cooking up.

Let's start with our merged PRs, because these are the changes that are going live and ready for you to benefit from right now.

First up, we've got a really important fix for anyone working with complex numbers and matrix operations. The MPS backend now properly handles conjugated tensors in batch matrix multiplication. This might sound super technical, but if you've ever been working with complex-valued neural networks or any kind of signal processing, this is huge. The fix came through from pytorchbot and touches both `bmm` and `addmm` operations. What I love about this change is that it includes regression tests, so we're not just fixing the bug - we're making sure it stays fixed. This actually resolves a specific issue with SVD operations on complex64 tensors that was causing headaches.

Next, we've got some infrastructure love for our ROCm users. If you're running PyTorch on AMD hardware, you've probably noticed some build timeouts lately. Well, the team bumped up the timeout values for both libtorch and manywheel builds. It's not the most glamorous change, but these are exactly the kinds of quality-of-life improvements that make the whole ecosystem more reliable.

And our third merged PR tackles something called FakeProcessGroup - this is for distributed training scenarios. The fix handles uneven output tensor sizes in allgather operations. If you're doing distributed training and running into weird edge cases, this one might just save your day.

Now, let me tell you about some of the standalone commits that really caught my attention. Angela Yi has been absolutely crushing it with not one, but two significant contributions. First, she made enums opaque by default, which means you can now use `make_fx` with enum inputs. This opens up a lot more flexibility in how you structure your programs. Then she optimized the split_module function to support lazy recompilation, which is saving hundreds of milliseconds for graphs with lots of partitions. When you're iterating fast, those milliseconds really add up.

Eddie Yan fixed an attention mask conversion issue in cuDNN that was causing problems with scaled dot-product attention. And here's what I love about this fix - the pull request shows some great collaboration between Eddie and the community, with multiple approvals and even a co-author credit. That's the kind of teamwork that makes open source beautiful.

We've also got some nice performance work from eellison, who deferred input size and stride assertions until they're actually needed. Instead of doing all these checks upfront, the system now waits until just before the first kernel that actually reads the input. It's a simple change that reduces CPU overhead, and those kinds of optimizations compound over time.

The testing and infrastructure improvements are worth celebrating too. We had a revert of a Dynamo test detection feature that wasn't quite ready, and several typing and documentation improvements to the pytree modules. Sometimes the best move is knowing when to step back and get something right.

Today's focus for all of you: if you're working with complex numbers in your models, definitely test out that MPS conjugated tensor fix. And if you're doing distributed training, keep an eye on how that FakeProcessGroup improvement affects your workflows. For anyone using enums in their PyTorch programs, Angela's changes might unlock some new patterns you haven't considered before.

What I'm seeing in today's changes is a really healthy mix of performance improvements, bug fixes, and infrastructure hardening. The PyTorch team is clearly listening to the community and addressing real pain points while keeping an eye on the future.

That's a wrap on today's episode! Keep building amazing things, and I'll catch you tomorrow with more updates from the PyTorch world. Until then, happy coding!