LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Network Doesn't Work With Any Kernel After 5.4.0-65 (https://www.linuxquestions.org/questions/linux-server-73/network-doesnt-work-with-any-kernel-after-5-4-0-65-a-4175710077/)

thund3rstruck 03-28-2022 09:09 AM

Network Doesn't Work With Any Kernel After 5.4.0-65
 
Last year I posted this thread https://www.linuxquestions.org/quest...6/#post6218320

Long story short, I ran a apt-get update && apt-get upgrade and after booting the new kernel, all networking was lost.

I solved that problem by locking grub to kernel 5.4.0-65 and went about my business.

This morning I figured its been 14 months now, certainly the linux devs have fixed whatever breaks networking from kernel 5.4.0-65 and up by now?

I updated the system to 5.13.0-37 this morning and nope, same result. No networking.

So does this mean this server is just stuck on kernel 5.4.0-65 forever?
Is there no way to get networking to work on any kernel above 5.4.0-65?

From kernels on 5.4.0-65 and earlier (Working)

Code:

sudo lshw -C network
  *-network               
      description: Ethernet interface
      product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
      vendor: Realtek Semiconductor Co., Ltd.
      physical id: 0
      bus info: pci@0000:05:00.0
      logical name: enp5s0
      version: 15
      serial: a8:a1:59:1a:22:4e
      size: 1Gbit/s
      capacity: 1Gbit/s
      width: 64 bits
      clock: 33MHz
      capabilities: pm msi pciexpress msix bus_master cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
      configuration: autonegotiation=on broadcast=yes driver=r8169 duplex=full firmware=rtl8168h-2_0.0.2 02/26/15 ip=192.168.2.5 latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
      resources: irq:36 ioport:f000(size=256) memory:f7504000-f7504fff memory:f7500000-f7503fff

From kernels on 5.8.0-49 and newer (Broken)

Code:

sudo lshw -C network
  *-network UNCLAIMED               
      description: Ethernet controller
      product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
      vendor: Realtek Semiconductor Co., Ltd.
      physical id: 0
      bus info: pci@0000:05:00.0
      version: 15
      width: 64 bits
      clock: 33MHz
      capabilities: bus_master cap_list
      resources: import:f000(size=256) memory:f7504000-f7504fff memory:f7500000-f7503fff

Something to do with the realtek modules?

Code:

modprobe: FATAL: Module r8169 not found in directory /lib/modules/5.13.0-37-generic
IDK, I had no choice but to roll back to 5.4.0-65 so I could get networking back.

boughtonp 03-28-2022 09:25 AM

Quote:

Originally Posted by thund3rstruck (Post 6342138)
I updated the system to 5.13.0-37 this morning and nope, same result. No networking.

Is there a reason you pick 5.8 and 5.13 instead of 5.10 or 5.15 or 5.17 ?

Or have you actually tested every version when you assert "5.8.0-49 and newer" are broken? (What about versions 5.5/5.6/5.7?)


NuAngel 03-28-2022 12:00 PM

First and foremost, I'm the farthest thing from an expert. Relative newbie to all things Linux to be sure. However I did experience something similar in the past and I noticed that in your "broken" output, it does not show: logical name: enp5s0.

If your set up is anything like mine, by not having the logical name, many other features (firewalls, DNS, etc...) might not work correctly.

In my case, I had to solve this by creating a new file at /etc/udev/rules.d/70-persistent-net.rules
and then adding the following line:

Code:

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="", NAME="enp2s0f0"
In your case, of course, I think the name would just be "enp2s0" since that was your original name. One you've saved that file, give your server a reboot and see if that helps.

This is a stab in the dark from me, but wanted to offer up something for you to try. Best of luck!

hazel 03-28-2022 12:13 PM

The lost driver (r8169.ko) should be in /lib/modules/5.13.0-37-generic/kernel/drivers/net/ethernet/realtek. Try going down this tree by hand and see at what point the path fails.

smallpond 03-28-2022 12:35 PM

Could you also please post the output of:

Code:

lspci -v -n -s 0000:05:00.0
I suspect you have the 8169 driver but it is not recognizing your vendor/device/subsystem combination.

thund3rstruck 03-28-2022 12:38 PM

Quote:

Originally Posted by boughtonp (Post 6342141)
Is there a reason you pick 5.8 and 5.13 instead of 5.10 or 5.15 or 5.17 ?
Or have you actually tested every version when you assert "5.8.0-49 and newer" are broken? (What about versions 5.5/5.6/5.7?)

5.13 is what apt wanted to install so that's what I went with. At this point I have tried 5.8 (the kernel version where the failure started), 5.10, and 5.13 and all of them fail in the same way. Not sure its worth the trouble to test every kernel when every kernel => 5.8 fails.

ondoho 03-28-2022 12:55 PM

Quote:

Originally Posted by thund3rstruck (Post 6342138)
Something to do with the realtek modules?

Have you followed up on this?
Maybe something changed in how the kernel ships or doesn't ship this module?

Is it weird that lshw reports RTL8111/8168/8411 (for both kernel versions), and modprobe complains about r8169?

Also, have you tried the LTS kernel?

thund3rstruck 03-28-2022 01:08 PM

Quote:

Originally Posted by smallpond (Post 6342194)
Could you also please post the output of:

Code:

lspci -v -n -s 0000:05:00.0
I suspect you have the 8169 driver but it is not recognizing your vendor/device/subsystem combination.

From kernels on 5.4.0-65 and earlier (Working)
Code:

sudo lspci -v -n -s 0000:05:00.0
[sudo] password for developer:
05:00.0 0200: 10ec:8168 (rev 15)
        Subsystem: 1849:8168
        Flags: bus master, fast devsel, latency 0, IRQ 36
        I/O ports at f000 [size=256]
        Memory at f7504000 (64-bit, non-prefetchable) [size=4K]
        Memory at f7500000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 4e-22-1a-59-a1-a8-00-00
        Capabilities: [170] Latency Tolerance Reporting
        Capabilities: [178] L1 PM Substates
        Kernel driver in use: r8169
        Kernel modules: r8169

From kernels on 5.8.0-49 and newer (Broken)
Code:

sudo lspci -v -n -s 0000:05:00.0
[sudo] password for developer:
05:00.0 0200: 10ec:8168 (rev 15)
        Subsystem: 1849:8168
        Flags: bus master, fast devsel, latency 0, IRQ 10
        I/O ports at f000 [size=256]
        Memory at f7504000 (64-bit, non-prefetchable) [size=4K]
        Memory at f7500000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 4e-22-1a-59-a1-a8-00-00
        Capabilities: [170] Latency Tolerance Reporting
        Capabilities: [178] L1 PM Substates

Note the bottom 2 lines in the working kernel:
Code:

        Kernel driver in use: r8169
        Kernel modules: r8169

Last year when I went down this rabbit-hole I tried all the modprobe suggestions but none of them brought my network back.

thund3rstruck 03-28-2022 01:34 PM

Okay, so digging through the bug reports for ubuntu 20.04 it seems the ubuntu team shipped a broken update (did not contain the linux-modules-extra package, which contains the driver pack (r8169) for Realtek Ethernet devices.

1. Rollback to kernel 5.4.0-* (to get internet back) by changing grub to boot older kernel

2. Reinstall linux kernel + headers
Code:

sudo apt install --reinstall linux-generic
3. Reinstall linux core drivers and modules
Code:

sudo apt install --reinstall linux-modules-extra-5.8.0-36-generic
4. Remove bad network module (if present)
Code:

sudo apt purge r8168-dkms
5. Reboot server, network is now working properly

Thank god this is solved, well until the next system update breaks it again. :(

boughtonp 03-28-2022 01:45 PM

Quote:

Originally Posted by thund3rstruck (Post 6342196)
5.13 is what apt wanted to install so that's what I went with.

Fair enough, but if this is Ubuntu Server 20.04 then I would have expected it to limit itself to LTS kernels.

Quote:

Not sure its worth the trouble to test every kernel when every kernel => 5.8 fails.
Well it's worth testing LTS kernels because they might get fixes which non-LTS kernels wouldn't, but you say you've checked the latest 5.10 also.


If this is your hardware maybe the mtorromeo/r8168 listed under "other drivers" option is worth investigating?


Just seen you've solved it... looks like the Hardware for Linux site has notes on some items, which might help others, but no idea on how they get added.


smallpond 03-28-2022 02:25 PM

According to this site Realtek uses the same PCI ID for different chips and distinguishes them by the rev. When I look at the Windows INF file I can see that there are chips with the same vendor, device and revision, but different subsystem, which have different names. Realtek seems to make hundreds of variations.

rkelsen 03-28-2022 03:27 PM

Network Doesn't Work With Any Kernel After 5.4.0-65
 
Realtek support has been spotty for a few years. I'm not sure why, but you have a 50/50 chance of the in kernel driver working.

You can get a better driver from Realtek:
https://www.realtek.com/en/component...press-software

You need to compile it for your kernel.

selfprogrammed 08-07-2022 10:35 PM

The problem is that someone added some new software that does some PHY ID checking, and it is now breaking drivers that were working.
The second problem is that there is an attitude that they can blame the Gigabyte BIOS. Meaning they do not want to fix what they broke.
Users are supposed to fix their hardware (upgrade the BIOS); apparently the driver would be dirtied if it had to accept that PHY ID that has been reported the last 10 years.
The suggestion was actually made too not use that hardware and buy another network card.

At this point my blood is boiling a bit, and I will avoid saying too much. There are too many knee-jerk defenders here that will leap in to bash anything that sounds like a complaint.

Read it for yourself.

https://bugzilla.kernel.org/show_bug.cgi?id=204343

https://bugzilla.kernel.org/show_bug.cgi?id=213469


There is a patch for kernel 5.10 there:
The patch simply adds the PHY ID reported by the Gigabyte BIOS to the list of known PHY.
It matches what I see on my hardware.
I will be trying it on my kernel build.

The driver has been modified since, so the patch will not apply directly.
No, they were not fixing this problem. I still see it on 5.15.19 (slackware).


Quote:

petr.bahula 2021-06-17 10:13:49 UTC

Hi,
we have two GIGABITE MB with this onboard chip.
The chip is detected differ on each MB:

[ 1.702543] r8169 0000:03:00.0: no dedicated PHY driver found for PHY ID 0xc2077002, maybe realtek.ko needs to be added to initramfs?
[ 1.702544] r8169 0000:03:00.0: no dedicated PHY driver found for PHY ID 0xc1071002, maybe realtek.ko needs to be added to initramfs?

In my case following (not fully correct, but working) patch for kernel 5.10.27 helped:

--- a/drivers/net/phy/realtek.c 2020-12-13 23:41:30.000000000 +0100
+++ b/drivers/net/phy/realtek.c 2021-06-17 11:51:00.854994117 +0200
@@ -674,6 +674,14 @@
.config_intr = genphy_no_config_intr,
.suspend = genphy_suspend,
.resume = genphy_resume,
+ }, {
+ .phy_id = 0xc0070002,
+ .phy_id_mask = 0xf0ff0fff,
+ .name = "Generic RTL PHY",
+ .get_features = genphy_read_abilities,
+ .suspend = genphy_suspend,
+ .resume = genphy_resume,
+ .set_loopback = genphy_loopback,
},
};

thund3rstruck 08-08-2022 08:11 AM

Quote:

Originally Posted by selfprogrammed (Post 6372573)
At this point my blood is boiling a bit, and I will avoid saying too much. There are too many knee-jerk defenders here that will leap in to bash anything that sounds like a complaint.

Thank you for posting this. I've given up trying to explain this problem to people so everytime I see there is a kernel update I know its gonna break my networking so I pre-emptively follow the steps I mentioned in this thread before I reboot the server. Pretty soon I am going to need to upgrade to V22.04 and I'm 100% certain that is going to fail and break this server :(


All times are GMT -5. The time now is 03:22 PM.