TL;DR: So I’m not entirely sure if my CPU is completely hosed at this point and I’ll have to be getting a new one soon, but I was running into some insane errors while compiling rust recently. Turns out my CPU can’t compile rust reliably when thermal throttled.
I’ve been on this PC build for a little over a year now. The CPU is an Intel i9-13900K. It’s also a custom build. Now it is a bit of my fault that I didn’t really observe CPU temperatures after getting it up and running beyond some basic tests. I think I was more concerned with GPU temps at the time.
Then one day I started getting errors like these:
thread 'optimize module godot_bindings.341dabc26bea8f32-cgu.3' has overflowed its stack
And then rustc would randomly fail with various errors like these:
My first thought? Oh boy, must be nightly rust having some issues. Nope. Ok, maybe I finally hit some sort of weird limit against the compiler with regards to generics, thread stack sizes, etc? Nope. Ok… maybe this is just some weird windows tomfoolery where an update borked my LLVM toolchain? Nope.
It was days of madness. I seriously started to reconsider using rust as a game dev tool because these kind of errors were really draining my motivation to work on things. Google, ChatGPT, and Stackoverflow gave zero insight into what was going on. In that rare case where I’ve run out of things to blame in software either by my own doing or something else, I started to suspect my hardware. Had it not been for the one blue screen I got within this troubleshooting process I think I would’ve still be toiling with “what app screwed me”.
At this point I tried compiling on a laptop and an M2 mini. No issues. Slower, but ultimately no issues. Then I learned about the
-j flag in cargo build. I could indirectly limit the speed of a compile by just limiting the number of cores it would utilize. I tried
cargo build -j2. Took forever, but it compiled fine. What about
-j10? Errors again. Hmmmm….
Eventually I started obsessively monitoring my CPU temps and saw that they basically were pegged at 100c anytime a heavy load started. In my case that was rust compilations. And now that I think about it. I wonder how many “quirks” of my setup were due to CPU issues. My PC was quite finnicky when it came to audio, 2nd monitor would flicker occasionally if I did some setting changes where it wouldn’t make sense. My monitor supports up to 240hz, but now I’m wondering if the weirdness I observed in 240hz was due to a CPU. Namely, alt-tabbing would take forever. Guess I’ll start playing with those settings again to see.
Long story short, I ended up setting the power limit (PL1 and PL2) to something lower (190w) so that the CPUs wouldn’t throttle via heat. It was a tiny bit of performance loss from the usual, but a trade worth making if it means I don’t have to deal with faulty compiles. According to this video, my mobo was simply “letting it rip” on the CPU in hopes that the thermal throttling would keep it in check. Truth be told, I’m not sure why throttling would make the CPU start to go flaky, but after a week long bout of illness, and dealing with software bugs I’m ready to just put it to bed and move on 😆.