LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 12-22-2020, 09:57 AM   #76
bassmadrigal
LQ Guru
 
Registered: Nov 2003
Location: West Jordan, UT, USA
Distribution: Slackware
Posts: 8,792

Rep: Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656

The issue that the rcu_nocbs option fixes is an idle crash, meaning if the system is left at idle, eventually it will lock up.

Since yours seems tied to system activity (and you're running a 3xxx series when the problem seems to be relegated to just the 1xxx series), I'd imagine there's something else going on with your processor and an RMA might be the best option.
 
1 members found this post helpful.
Old 12-22-2020, 12:26 PM   #77
garpu
Senior Member
 
Registered: Oct 2009
Distribution: Slackware
Posts: 1,611

Rep: Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932
Yeah, with me and the lock-on-idle issue, if it idled for any length of time, it would hang. If I kept vlc streaming my local NPR station overnight, it wouldn't hang.

Are the mitigations still needed in the 5.10 kernel? I haven't tried without, because if it ain't broke, as the saying goes...
 
Old 12-23-2020, 12:54 AM   #78
cycojesus
Member
 
Registered: Dec 2005
Location: Lyon, France
Distribution: Slackware-current
Posts: 116

Rep: Reputation: 79
Quote:
Originally Posted by bassmadrigal View Post
The issue that the rcu_nocbs option fixes is an idle crash, meaning if the system is left at idle, eventually it will lock up.

Since yours seems tied to system activity (and you're running a 3xxx series when the problem seems to be relegated to just the 1xxx series), I'd imagine there's something else going on with your processor and an RMA might be the best option.
Quote:
Originally Posted by garpu View Post
Yeah, with me and the lock-on-idle issue, if it idled for any length of time, it would hang. If I kept vlc streaming my local NPR station overnight, it wouldn't hang.

Are the mitigations still needed in the 5.10 kernel? I haven't tried without, because if it ain't broke, as the saying goes...
I see, I didn't read all the thread and from what you say it may not be the same issue. I'll see about my options to get a new CPU.

I did manage to compile a kernel with SMT disabled and rc_nocbs=0-5 but at the same time Firefox tabs crashed several times.

Regarding mitigations, my understanding is it's kernel's strategies to circumvent CPU's vulnerabilities. So as long as the CPU doesn't change they have no reason to become obsolete.
This being a desktop machine and not an internet facing server, I'm not too concerned about mitigating those vlunerabilities, preferring the extra performance instead.

EDIT: not sure what to do about it. Could it be a faulty RAM module? I'll have to investigate more...

Last edited by cycojesus; 12-23-2020 at 04:19 AM.
 
Old 12-23-2020, 09:15 AM   #79
garpu
Senior Member
 
Registered: Oct 2009
Distribution: Slackware
Posts: 1,611

Rep: Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932
Quote:

EDIT: not sure what to do about it. Could it be a faulty RAM module? I'll have to investigate more...
Yeah, that doesn't sound like the same problem. Have you done a memtest?
 
Old 12-23-2020, 10:10 AM   #80
cycojesus
Member
 
Registered: Dec 2005
Location: Lyon, France
Distribution: Slackware-current
Posts: 116

Rep: Reputation: 79
Quote:
Originally Posted by garpu View Post
Yeah, that doesn't sound like the same problem. Have you done a memtest?
Not yet. I still have the previous hardware around (i5-7500 & 32GB RAM + motherboard and all to make it run) so I'll swap the RAM between the 2 machines. This way I'll check if the problems persist on the Ryzen 3600 with the old (known good) RAM and I'll memtest the new RAM in the old machine. But that'll have to wait until I get back home after some days.

Thank you all.
 
Old 12-28-2020, 04:19 PM   #81
cycojesus
Member
 
Registered: Dec 2005
Location: Lyon, France
Distribution: Slackware-current
Posts: 116

Rep: Reputation: 79
Red face

Quote:
Originally Posted by cycojesus View Post
Not yet. I still have the previous hardware around (i5-7500 & 32GB RAM + motherboard and all to make it run) so I'll swap the RAM between the 2 machines. This way I'll check if the problems persist on the Ryzen 3600 with the old (known good) RAM and I'll memtest the new RAM in the old machine. But that'll have to wait until I get back home after some days.

Thank you all.
And, sure enough, the memtest didn't even finish as it complained about finding too much errors... Compiled 5.11-rc1 successfully and generally running fine using other RAM
Time to RMA the new RAM...
 
Old 01-31-2021, 12:17 PM   #82
bifferos
Member
 
Registered: Jul 2009
Posts: 401

Original Poster
Rep: Reputation: 149Reputation: 149
Oh dear.....

My Ryzen 3700X has arrived now and I've been running it for a while in an attempt to see if was the 1700X that was the problem. I continued to get the lockups.

So the only thing on my system left to change (excluding the case!) was the corsair PSU. Having changed that to a silverstone I haven't seen a lock-up since. I want to leave this a good few weeks to check, but I was previously locking up once a night, and it's been three days now without issue, so I think I've found the culprit.

Interestingly, with the Gigabyte mobo (patched) my lock-ups were every week. With the Asrock mobo they changed to every night (same 1700X CPU). Either the Corsair deteriorated over time, or the motherboards have different patterns of power draw (with the same CPU), or the Gigabyte manages better regulation of a slightly suspect power rail. A different graphics card didn't seem to make any difference to the pattern.

The other thing about this, is I am not a gamer. I switched to a single stick of 16GB RAM (I suspected the RAM initially), and only have a single SSD disk. I wouldn't expect this arrangement to tax a 500W PSU. Things seemed to be worse with more RAM, and as a result I thought my RAM was faulty and spent a long time swapping the four sticks around (2x8GB, 2 x 16GB) to see what made a difference.
 
Old 02-06-2021, 06:23 PM   #83
slackerDude
Member
 
Registered: Jan 2016
Posts: 158

Rep: Reputation: Disabled
Just an update.

My lilo.conf has this:
append="idle=nomwait rcu_nocbs=0-15"
(I have a Ryzen 1700, so 16 cores)

as well, I had to set my BIOS power idle setting to something non-default. Whatever the "use most power" was. Not auto, but typical, or something. MSI B350 Gaming Plus. I think it only appeared on one of the more recent BIOS settings. I may have also disabled some power states, can't remember.

I think I also recompiled the kernel (5.4.53) in July, probably with some combination of the rcu/nocbs settings.

It now seems 100% stable. I haven't re-run the GCC 16-core test (I got an RMA after failing this initially in 2019). I now suspect my motherboard / settings / kernel were more to blame.

In any case, I now have 12 cores constantly running FaH with no issues. Have compiled with 16 cores, stressed the system, etc, no issues.

I'm still a little grumpy it took ~2 years before I was able to get to this level, but now I'm reasonably confident / not unhappy with it. 16 cores is overkill, but hey, it's not like I spent that much more compared to like a 4 or 6 core Intel..
 
1 members found this post helpful.
Old 02-21-2021, 05:51 PM   #84
bifferos
Member
 
Registered: Jul 2009
Posts: 401

Original Poster
Rep: Reputation: 149Reputation: 149
The lock-ups are back with the new PSU. So the situation I have now is the following:

PC1:
3200G continues to run flawlessly with Asrock mobo, and a platinum fanless PSU. Just as well as it's my router and DNS I suppose. Funny that the cheapest AMD CPU in my house now is the only reliable one.

PC2:
R7 1700 is now in the GIGABYTE AB350 Gaming 3 mobo (patched) and still locking from time to time. I don't know if RAM has anything to do with it, but that's got 16GB. I already tried the rcu option there, made no difference. I couldn't find the option about the power supply in the BIOS on that motherboard.

PC3:
3700X, 32GB RAM, ASRock B450M Pro4, locking quite frequently (3 times per day). I've set the 'use most power' option in BIOS, set the "idle=nomwait rcu_nocbs=0-15" as well. That's got a better new PSU now, and it seemed to make a difference at first, then went back to its old ways.

I've run overnight memory checks on both these systems, there were no errors. I'm just wondering should I purchase the pro version of memtest

I'm getting fed up with this. Two different CPUs, two different motherboards from different manufacturers, two different cases, a good 10 years of using AMD CPUs and building PCs I've never had anything like this, I'm a couple of weeks away from never buying AMD again. Sick of this bullshit. Sorry for the rant, but seriously WTF? Has the world just turned to sh*t? Do I have to buy a flipping Apple these days to get something that works?

@slackerDude, you are lucky it's just 2 years. 3 years for me and still not solved!

Last edited by bifferos; 02-21-2021 at 06:55 PM.
 
Old 02-21-2021, 06:36 PM   #85
bassmadrigal
LQ Guru
 
Registered: Nov 2003
Location: West Jordan, UT, USA
Distribution: Slackware
Posts: 8,792

Rep: Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656
I'm guessing your issue is not related to the issues I, and many Ryzen users, have had. I've never heard of the issues not being solved with the rcu_nocbs option and it seemed to only affect the 1st gen Ryzens (the 1x00 series). The only CPU that should be tied to the idle system lockups should be PC2 since it has the Ryzen 1700x, but if the rcu_nocbs option didn't affect it, then it is likely not the same issue many of us saw. PC3 issues are likely something completely different. My 2200G runs without issue and not requiring any kernel parameters, and AFAIK, my brother's 3700X runs fine on his Linux Mint install and I don't believe he's using any kernel parameters.

It may be worth trying different mobos, RAM, PSUs, and CPUs (I know it isn't that easy if you don't have spare components).
 
1 members found this post helpful.
Old 02-21-2021, 06:45 PM   #86
Daedra
Senior Member
 
Registered: Dec 2005
Location: Springfield, MO
Distribution: Slackware64-15.0
Posts: 2,730

Rep: Reputation: 1393Reputation: 1393Reputation: 1393Reputation: 1393Reputation: 1393Reputation: 1393Reputation: 1393Reputation: 1393Reputation: 1393Reputation: 1393
When I was running a 1600x as a stop gap before I built my current machine I was also having lockups. The rcu_nocbs fixed it, but I would still get very rare, but occasional lockups. For me it was my 2666mhz ram. Even though it passed memtest and gave no apparent errors. I had to drop it all the way down to 2133mhz to get the system stable.

Last edited by Daedra; 02-21-2021 at 06:51 PM.
 
1 members found this post helpful.
Old 02-21-2021, 06:46 PM   #87
slackerDude
Member
 
Registered: Jan 2016
Posts: 158

Rep: Reputation: Disabled
Which mobos for the 1700X / 3700X?

I have a B350-based mobo for the 1700, and I could not run a high load until I did:
-append="idle=nomwait rcu_nocbs=0-15"
-disable C and/or P wait / sleep states (I'd have to reboot to look at them)
-set idle current to normal/high instead of auto

I've only tried 12 CPUs busy and haven't repeated the gcc-compile-kernel stress test.

Maybe your ASRock mobo has the options? Is it feasible to swap CPUs?
3700X is 12 cores, right? Then rcu_nocbs=0-11, correct?

Also, with my kernel, there was some weird thing about disabling the rcu_nocbs command-line option. I think I had to re-enable that option during compilation so that the boot option would actually do something - it was disabled by default, maybe? Or maybe I read that it was, but worked ok, can't remember 100%.
 
1 members found this post helpful.
Old 02-21-2021, 06:51 PM   #88
bifferos
Member
 
Registered: Jul 2009
Posts: 401

Original Poster
Rep: Reputation: 149Reputation: 149
@bassmadrigal
That's the option I'm considering now. Pull the PSU out of the only 'working' machine and use it on one of the ones that locks up. The problem with this kind-of mix and match is that these days it seems nobody can sell anything for any length of time. The 3200G that works isn't available any more. Neither is the fanless PSU. I wonder is it a coincidence that the PSU in the system that works is the most expensive, and the only platinum one.

I might also try Mint on the 1700 as it's now the media center so a reinstall won't be too painful. I can't try it on the 3700X unfortunately.
 
Old 02-21-2021, 11:43 PM   #89
garpu
Senior Member
 
Registered: Oct 2009
Distribution: Slackware
Posts: 1,611

Rep: Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932Reputation: 932
When I was having problems with the lock-on-idle, I could run a CPU stress test, it would pass with flying colors, and then I'd surf, and it would lock. Is this what's happening with your box?

Have you tried the zenstates.py script to turn off c6 states? (I and Willy have to do that, otherwise I get a lock up every couple months. I haven't gotten one that wasn't related to some sort of video card issue since.) https://github.com/r4m0n/ZenStates-Linux (Willy's got directions further up in the thread to add it so it runs as part of the boot process.)

If you keep VLC running streaming something in the backround, is that enough to keep it from locking? (I'd have VLC streaming my local NPR station, and it would be enough to keep the lock ups from happening.)

If your RAM is good, and the PSU is OK...have you tested the voltage of the outlet your computer is plugged into? Or the power strip/surge protector?

Also, have you swapped out the cable to your monitor? I'm serious! A bad monitor cable can look a lot like a failing video card or some other lockup problem.
 
1 members found this post helpful.
Old 02-22-2021, 04:36 AM   #90
bifferos
Member
 
Registered: Jul 2009
Posts: 401

Original Poster
Rep: Reputation: 149Reputation: 149
I think it's probably not the video cable. I do get video freezes, where the mouse continues to work, but I can't click on anything. But I also get random shut-downs as well. That's on the 3700X. Usually in the latter case I see some text on the screen, and then the machine reboots, or just becomes unusable from that point. However I'd say around 50% of the time I'm just dumped out of the KDE session. It's almost as if someone hit ctrl -alt -backspace while I was working.

On the 1700 it's a different pattern. So far on that system I've only seen the screen freezing (but the mouse can still move). However since it's a media centre I don't spend as much time on that system. Perhaps it would exhibit both patterns eventually.

I will give the zen states script a try tonight.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Linux Mint 18 keyboard and mouse occasional lockups mazinoz Linux Mint 4 12-31-2016 06:34 PM
system lockups in -current botnet Slackware 25 04-08-2010 01:58 PM
System Lockups carlosinfl Linux - Hardware 2 03-16-2008 09:08 AM
Frequent system lockups pterandon Linux - Newbie 3 08-18-2006 12:54 PM
Dell Latitude D800 occasional system freeze workaround forky Slackware 1 07-30-2004 12:53 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 12:18 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration