AMD Threadripper 2990wx freezing on very high load
Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
AMD Threadripper 2990wx freezing on very high load
I have recently bought a new machine with an AMD Threadripper 2990wx CPU (not overclocked), 4*16GB RAM, Gigabyte X399 Designare EX motherboard, an NVIDIA RTX 2080 GPU and a 1200 Watt PSU. I have Ubuntu 18.04.1 installed on it.
I noticed that when I put very high load on it, the machine will freeze up within a few seconds to the point that it responds to nothing except for the magic SysRq codes which I use to reboot it. Even switching Num Lock doesn't work!
I've been reading a lot about various Ryzen bugs and have tried the following to resolve the issues with no success:
add "idle=nomwait" to the kernel command line (although this is supposed to fix a freezing issue at idle)
add "rcu_nocbs=0-63" to the kernel command line
upgrading to kernel from 4.15.0-45-generic to 4.18.0-15-generic
installing the "amd64-microcode" package
use the "ZenStates" utility to disable the C6 core state
go into the BIOS settings and disable the C6 states from there
go into the BIOS settings and disable "CPU Performance Boost"
adding "processor.max_cstate=1" to the kernel command line
I can't find anything else to try anywhere, and it seems that no one has come across the same issue. I've seen the Ryzen segfault bug and the idle freeze issue but not a high-load freeze issue.
I encountered this when I tried to compile my own kernel. When I run make -j modules, the compilation starts and I see all cores jumping to 100% load, and after 4-5 seconds the machine freezes. If I use make -j 64, I still see all cores on 100% but the build completes with no issues.
I went over /var/log/syslog, /var/log/dmesg, /var/log/kern.log and any other log file I could find there but couldn't find anything that has anything to do with this. It just looks like everything is ok and then there's a pause of a minute and a new kernel boot output starts, no error message, no warning, no oops/panic messages, no soft lock messages.
Any ideas on what this could be and what I should try next?
From bitter hardware experience, next try slowing things down - cpu and bus speeds. Then repeat memtest, if that failed first time. There's also a PSU 'power good' line and that will spring a reboot pronto. You're trying, I gather, to isolate & return a faulty part. Substitute what you can, check contacts, & conductivity (Electrical and thermal). Good hunting.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.