LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Kernel (https://www.linuxquestions.org/questions/linux-kernel-70/)
-   -   PC shutting down during games (https://www.linuxquestions.org/questions/linux-kernel-70/pc-shutting-down-during-games-4175723785/)

pd27 04-05-2023 12:38 PM

PC shutting down during games
 
I see in logs next:
Code:

kernel: amdgpu 0000:04:00.0: amdgpu: ERROR: GPU over temperature range(SW CTF) detected!
kernel: amdgpu 0000:04:00.0: amdgpu: ERROR: System is going to shutdown due to GPU SW CTF!

Question: How long this problem in kernel? Why it's happened?
I'm looking for temperature, and can't see more than 72℃. Critical 110℃. If critical 110℃ - why kernel shutting down PC, if vcard have only 72℃ maximum?

uteck 04-05-2023 03:49 PM

What distro and kernel are you using? Also,what hardware?
Kernel 5.8 had patches to deal with this issue.

Jan K. 04-05-2023 07:21 PM

"Only" 72℃? :confused:

I would definitely check fans and thermal paste asap!

Critical 110℃? That would probably kill any processor...

dugan 04-05-2023 07:43 PM

Your kernel is detecting that your video card is overheating, and you think the problem is with the kernel?

szboardstretcher 04-05-2023 08:15 PM

This is a config issue. Or maybe a kernel issue.

My ATI has a max SAFE range of up to 110C,. (not a thing wrong with it in 4 years),.

i wouldn't want my machine deciding to shutdown when its running as advertised either.

https://www.pcgamer.com/fretting-ove...-spec-on-navi/

But its all down to the hardware. Clearly the OP is expecting to run at higher temperatures like mine?

frankbell 04-05-2023 08:43 PM

In addition to what Jan K. suggested, also check to ensure that cooling vents are free of dust and other obstructions.

pd27 04-06-2023 12:51 AM

OpenSuse Leap 15.5
Kernel: 5.14.21-150500.46-default
CPU AMD 3700X
Vcard RX 6700XT nitro+
Collers are clean, radiator is clean.

Previous card - RX 570x had the same temperatures, but never shutting downs.

szboardstretcher 04-06-2023 12:55 PM

https://www.techpowerup.com/review/a...700-xt/33.html

This indicates that the hotspot for that card is 95C and it games at around 80C. So it seems to me that the 70C shutdown is unneccesary.

Seems I might as well point out the obvious... *BEFORE DOING ANY OF THIS* you better make sure you really truly know what you are getting into, because you can raise this safety check too high and smoke your card or system.

To change this,. first you will have to find the appropriate hwmon directory for your GPU in /sys/class/hwmon

Then cat the temp1_crit file to see what the setting is. Once you find the correct directory, and that file, you can change it.

It's likely going to be 70000 (which is 70C) and you can change it to something more befitting the operating temperatures of that card.

pd27 04-11-2023 01:32 AM

Quote:

Originally Posted by szboardstretcher (Post 6422814)
https://www.techpowerup.com/review/a...700-xt/33.html

This indicates that the hotspot for that card is 95C and it games at around 80C. So it seems to me that the 70C shutdown is unneccesary.

Seems I might as well point out the obvious... *BEFORE DOING ANY OF THIS* you better make sure you really truly know what you are getting into, because you can raise this safety check too high and smoke your card or system.

To change this,. first you will have to find the appropriate hwmon directory for your GPU in /sys/class/hwmon

Then cat the temp1_crit file to see what the setting is. Once you find the correct directory, and that file, you can change it.

It's likely going to be 70000 (which is 70C) and you can change it to something more befitting the operating temperatures of that card.

Code:

~> cat /sys/class/hwmon/hwmon1/temp1_crit
110000
~> cat /sys/class/hwmon/hwmon1/temp1_emergency
115000
~> cat /sys/class/hwmon/hwmon1/temp1_input
50000
~> cat /sys/class/hwmon/hwmon1/temp1_label
edge
~> cat /sys/class/hwmon/hwmon1/temp2_crit
110000
~> cat /sys/class/hwmon/hwmon1/temp2_emergency
115000
~> cat /sys/class/hwmon/hwmon1/temp2_input
57000
~> cat /sys/class/hwmon/hwmon1/temp2_label
junction
~> cat /sys/class/hwmon/hwmon1/temp3_crit
105000
~> cat /sys/class/hwmon/hwmon1/temp3_emergency
110000
~> cat /sys/class/hwmon/hwmon1/temp3_input
54000
~> cat /sys/class/hwmon/hwmon1/temp3_label
mem

I have a suspicion that the temperature of the "junction" rolls over in games. And the shutdown is precisely because of it. I'll check later.
But what is this "junction" temperature?
I have no digits like 70000.
I think if 6700 going too hot it must get thermal throttling, instead of shutting down whole system.
I have this card: https://www.techpowerup.com/review/s...-nitro/33.html
In idle my card hotter on 5℃-6℃. I see 49℃-50℃, instead of the temperature on the site above.

pd27 04-17-2023 01:16 PM

Change thermal compound, found out missing part of thermal pad on power.
Now I haven't problems with shutting down. And have lower temperature on GPU.


All times are GMT -5. The time now is 09:03 PM.