LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Gigabyte Ethernet RTL8168 broken on new Kernel releases (https://www.linuxquestions.org/questions/slackware-14/gigabyte-ethernet-rtl8168-broken-on-new-kernel-releases-4175734884/)

selfprogrammed 03-13-2024 08:15 PM

Gigabyte Ethernet RTL8168 broken on new Kernel releases
 
Can the Slackware release of the kernel include a patch so those of us with Gigabyte motherboards can have working Ethernet.

The patch is just to add a PHY recognition code to the RealTek driver. This has been discussed on the kernel bug reporting, and is a known problem. It is not getting fixed because the maintainer is blocking it because of the "purity" of his software. He claims the BIOS is buggy and users must replace their hardware or try upgrading the BIOS, rather than have him include the recognition of that PHY the way it is on existing hardware.


I just got burned by this, AGAIN, when I tried to use the Ethernet after having upgraded to a newer kernel using the Slackware patches.
The Ethernet interface just does not come up. This is going to happen every time I upgrade the Slackware kernel.

It cost me 8 hours of trying to figure out why the Ethernet was not working.
This problem does not report anything that can easily be found. There is one message in dmesg, that can be found, once you realize the problem is a broken kernel driver.
It worked in previous kernels, so the problem became how could it suddenly be broken. Too many wrong rabbit holes that had to be investigated.

https://www.linuxquestions.org/quest...-a-4175710077/


---
https://bugzilla.kernel.org/show_bug.cgi?id=204343

https://bugzilla.kernel.org/show_bug.cgi?id=213469


There is a patch for kernel 5.10 there:
The patch simply adds the PHY ID reported by the Gigabyte BIOS to the list of known PHY.
It matches what I see on my hardware.

The driver has been modified since, so the patch will not apply directly.
No, they were not fixing this problem. I still see it on 5.15.19 (slackware).

petr.bahula 2021-06-17 10:13:49 UTC

Hi,
we have two GIGABITE MB with this onboard chip.
The chip is detected differ on each MB:

[ 1.702543] r8169 0000:03:00.0: no dedicated PHY driver found for PHY ID 0xc2077002, maybe realtek.ko needs to be added to initramfs?
[ 1.702544] r8169 0000:03:00.0: no dedicated PHY driver found for PHY ID 0xc1071002, maybe realtek.ko needs to be added to initramfs?

In my case following (not fully correct, but working) patch for kernel 5.10.27 helped:

Code:

--- a/drivers/net/phy/realtek.c 2020-12-13 23:41:30.000000000 +0100
+++ b/drivers/net/phy/realtek.c 2021-06-17 11:51:00.854994117 +0200
@@ -674,6 +674,14 @@
.config_intr = genphy_no_config_intr,
.suspend = genphy_suspend,
.resume = genphy_resume,
+ }, {
+ .phy_id = 0xc0070002,
+ .phy_id_mask = 0xf0ff0fff,
+ .name = "Generic RTL PHY",
+ .get_features = genphy_read_abilities,
+ .suspend = genphy_suspend,
+ .resume = genphy_resume,
+ .set_loopback = genphy_loopback,
},
};


henca 03-14-2024 02:10 AM

Usually Slackware ships unpatched kernels as provided from the upstream source.

What kernel were you installing using Slackware patches? The 5.15.19 kernel was the original kernel in Slackware 15.0 and has since then been updated with a few security updates.

The right way to "fix" this kind of problem is to make sure that the fix goes into the upstream kernel. If it would be any good reason for the kernel developers not to accept such a patch that would probably also be a good reason not to apply such a patch to the kernels shipped with Slackware.

Regardless of what kernel developers and Slackware maintainers say you are of course free to patch your own kernels as much as you want. You will probably not have to recompile your entire kernel, most likely it will be enought to recompile the patched kernel module for your NIC.

regards Henrik

mw.decavia 03-14-2024 07:37 AM

Just as a comparison, the driver for the rtl8812au (wifi) has not been included at all as a kernel driver in slackware, that looks like it will never change. So there is a github repo for the driver, which is quite popular.

Maybe some Gigabyte realtek users could maintain a github repo for an alternate driver?

There does seem to be some anti-realtek hw snoberry among developers. They make less effort to write a good driver because (they say) "it's just a realtek, they should get better hardware". But they still support hw that is older and less-good than a realtek.

drumz 03-14-2024 10:33 AM

Quote:

Originally Posted by selfprogrammed (Post 6489583)
The patch is just to add a PHY recognition code to the RealTek driver. This has been discussed on the kernel bug reporting, and is a known problem. It is not getting fixed because the maintainer is blocking it because of the "purity" of his software. He claims the BIOS is buggy and users must replace their hardware or try upgrading the BIOS, rather than have him include the recognition of that PHY the way it is on existing hardware.

If this is truly the case, escalate the issue. All the way to Linus, if necessary. He doesn't put up with such nonsense.

garpu 03-14-2024 11:20 AM

Quote:

Originally Posted by drumz (Post 6489699)
If this is truly the case, escalate the issue. All the way to Linus, if necessary. He doesn't put up with such nonsense.

Yeah, this sounds like breaking user space. We all know how he feels about that.

hazel 03-14-2024 11:39 AM

I'm puzzled. I'm using kernel 5.15.145 with that driver and my ethernet works OK.
Code:

$ lspci -v
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
        Subsystem: Lenovo RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: I/O ports at e000 [size=256]
        Region 2: Memory at b0604000 (64-bit, non-prefetchable) [size=4K]
        Region 4: Memory at b0600000 (64-bit, prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: r8169
        Kernel modules: r8169
$ ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether c0:3f:d5:e6:c9:30 brd ff:ff:ff:ff:ff:ff


business_kid 03-14-2024 11:49 AM

I'm surprised by this.

Sure, Realtek doesn't support linux. Many motherboards use the same 8111/8168/8211/8411 IP core. I have a hardware background. Much as I dislike their scorn of Linux, Realtek's hardware works - all of it.

Since the year dot, this driver has been stable and working in the code. It was there in 2008, the first time I needed it. Has someone broken the Golden Rule: "If it works, don't fix it?"

mw.decavia 03-14-2024 12:00 PM

I think that (maybe) what the OP meant was that the r8169 kernel driver ends up broken on Gigabyte (brand) motherboards with a builtin realtek ethernet chip, because Gigabyte builds them with a slightly non-standard "phy" chip.

From the look of your "lspci" fragment, it looks like you are using a Lenovo. Is it built using a Gigabyte motherboard?

Quote:

Originally Posted by hazel (Post 6489709)
I'm puzzled. I'm using kernel 5.15.145 with that driver and my ethernet works OK.
Code:

$ lspci -v
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
        Subsystem: Lenovo RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: I/O ports at e000 [size=256]
        Region 2: Memory at b0604000 (64-bit, non-prefetchable) [size=4K]
        Region 4: Memory at b0600000 (64-bit, prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: r8169
        Kernel modules: r8169
$ ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether c0:3f:d5:e6:c9:30 brd ff:ff:ff:ff:ff:ff



hazel 03-14-2024 12:13 PM

Quote:

Originally Posted by mw.decavia (Post 6489711)
From the look of your "lspci" fragment, it looks like you are using a Lenovo. Is it built using a Gigabyte motherboard?

I have no idea. Hardware has never been my forte. It's a Lenovo Thinkcentre if that's any use. It's also a bit weird in that it runs on an external power unit; someone once told me it's really a laptop in a tower case.

business_kid 03-14-2024 01:01 PM

Then it could be anything. What would be best is to take a close-up of the biggest black IC on the motherboard that is not the cpu, post it on imgur, and I can likely identify it as a hardware guy. Then we can locate what the issue is. Here is another line
Code:

sudo lspci |grep -i Chipset
If that returns anything, post the results.

Paulo2 03-14-2024 01:01 PM

Mine works OK too with stock-upgraded Slackware 15.0 kernel (and also with 6.7.2 currently running in Slackware 15.0).
It seems that it relates with the nic revision https://www.linuxquestions.org/quest...-a-4175731772/ , mine is rev 0c, same as hazel.

Maybe the 8168 driver at SBo would work in that case?

hazel 03-14-2024 01:10 PM

Quote:

Originally Posted by business_kid (Post 6489721)
Then it could be anything. What would be best is to take a close-up of the biggest black IC on the motherboard that is not the cpu, post it on imgur, and I can likely identify it as a hardware guy. Then we can locate what the issue is. Here is another line
Code:

sudo lspci |grep -i Chipset
If that returns anything, post the results.

Nope, doesn't return anything. I don't have a camera so I can't show you the mobo. It's probably not relevant to the OP's problem and I don't want to hijack the thread, but clearly some RTL Gigabit chips are OK with this driver.

nons1 03-14-2024 03:37 PM

This issue doesn't affect all Gigabyte mainboards, just a few from 2009/2010 with a RTL8168d NIC.
On these boards the BIOS is buggy and on boot the NIC reports an invalid PHY ID.
You can verify that it's invalid by checking the PHY ID again after loading r8169 with the hack mentioned by the OP.
You'll find that /sys/class/net/<if>/phydev/phy_id reports the correct value 0x001cc912 then on these boards.

At least for some of the affected boards Gigabyte fixed the issue with a BIOS update.
For GA-880GA-UD3H rev 2.1 with version F5.

As described by some users in bug report comments, enabling LAN boot rom option in BIOS may also help.
Another alternative is using Realtek's driver r8168 on these boards.

the3dfxdude 03-14-2024 05:21 PM

Realtek supplies GPLv2 source code for their NICs. Sure they have been known to be ugly, and may have bugs, but they do support linux and I've known them to work ok. Their r8168 driver supports back to 2.4.20.

I have a Gigabyte board from 2009-2010, and a r8168 (10ec:8168 rev2) onboard and it is working fine. I guess my board had a BIOS update? Looks like it may have for the NIC. The phy reports a proper id during probing. It's the RTL8211B. Your bugzilla seems to indicate that you have the rev3 and your BIOS is still busted?

The thing I have been trying to solve here is how did your NIC / phy used to work? The phy drivers have been around for a while. If they changed the probing to be more exact, and believe a lying BIOS over what used to work, seems like a step in the wrong direction. I've seen buggy BIOSes before, and worked with kernel devs, who were more than happy to supply a way to fix the problem ignoring the buggy BIOS. If anything, the kernel should not only rely on the BIOS here. If the only thing is that they need something a little more reassuring than "Generic Realtek PHY" then what was it detected as before? They should go immediately back to that on this case, or someone should supply the proper id for your board. Do the official Realtek drivers work? Do they do any probing? I'd have to study it. I'm guessing, the official driver is simple and the linux code is doing too much and doing it wrong. I'd have to study the kernel code to figure out what they decided to do here, then go back to the official driver and see. Assuming the official driver is working.

I agree with others, that you should escalate this to a higher level maintainer. This is a regression.

nons1 03-15-2024 02:40 AM

The Realtek vendor driver is a big, monolithic mess and far from meeting mainline code standards. Like kernels before 4.19 it assumes a specific PHY model, based on the NIC id.
Since 4.19 r8169 makes use of phylib, what solved several issues. However phylib doesn't have access to the NIC id, it reads the PHY ID from the PHY and treats it accordingly.
For few kernel versions since 4.19 PHY's like the one of the OP have been handled by the genphy driver (due to the unknown, invalid PHY ID), what caused other problems,
because certain actions of the genphy driver can cause a RTL8211B to hang. The hack mentioned by the OP may work on this board, but may result in PHY's being falsely detected
as Realtek PHY on other systems. So it's a tradeoff, any known workaround for this specific old board model may negatively impact other, more common systems.


All times are GMT -5. The time now is 12:44 AM.