LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Nvidia-470-kernel issues with kernel-6.6 and above (https://www.linuxquestions.org/questions/slackware-14/nvidia-470-kernel-issues-with-kernel-6-6-and-above-4175732163/)

UrbanDesimator 12-26-2023 01:05 PM

Nvidia-470-kernel issues with kernel-6.6 and above
 
Nvidia kernel mod causing RIP's and Stack Traces on kernel-6.6 vanila and RT and above. Cured by replacing r8169 driver with r8168 version.

Bit of info I hope may help others on the nvidia-legacy470-kernel version 470.223.02 on kernel 6.6 and above.

The short answer (:-)) and my fix.
After finding it looked like issues with ASPM between r8169 and nvidia drivers after some experimentation I found by removing the r8169 driver and replacing it with r8168-8.052.01.tar.gz from https://github.com/mtorromeo/r8168.
And using thease module options in:
/lib/modprobe.d/r8168.conf
disable_wol_support=1 dynamic_aspm_packet_threshold=0 eee_enable=0 hwoptimize=1

It was only after trying different options that I added dynamic_aspm_packet_threshold=0 to the other 3 options that all the rips/stack traces stopped. I havent checked yet if only the dynamic_aspm*** option works on it's own or if it's the combination. Those options are not available with the r8169 driver. I am gouing to email the r8169 devs with my findings to see if they can determin if changes need to be made to there code or the option's re-enabled if present in there code.
Allways back up and settings or configs before making changes.
I hope this may help any one with similar issues.
UrbanMusic

Below more details of how I got to this fix.

I tried various patches and inttf-kernel-patcher.sh from,
https://nvidia.if-not-true-then-false.com/patcher/ and patches from github/slackbuilds. Nothing was curing the rips/stack traces some were RT related with scheduling while atomic others not related to RT kernel.
I tracked and traced the issue as they were happening sometimes a few minuets apart at most an hour. And found it looked like they were triggered by aspm and nvidia drivers apparent sensitivity to actions by other drivers to/with aspm.

I knew from experience that r8169 driver needed the pcie_aspm=force boot option to be able to disable aspm on my asus AMD sabertooth 990fx board. The nvidia driver wanted to have pcie_aspm=off boot option set which stopped the r8169 device working all together.

After much hunting and debuging my system and searching online I found no cure. opting for listing option's and trying each one.

The ./autorun.sh script in the r8168-8/052.02 pkg will take care of blacklisting the r8169 driver. If you find it doesn't help amd wish to change back blacklist the r8168 driver rename r8169 in /lib/modules/your kernel/kernel/drivers/net/realtek/r8169 to r8169.ko and the issue depmod -a you will then be able to load and use driver again.


All times are GMT -5. The time now is 02:05 PM.