LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 01-23-2017, 09:50 AM   #1
CJ Chitwood
Member
 
Registered: Dec 2006
Location: Northern Half of Florida
Distribution: PCLinuxOS on one home machine, Debian Buster on the other. I forget what's on the laptops.
Posts: 146

Rep: Reputation: 28
Post [Solved] [I Think] e1000e detected hardware unit hang on Lenovo Thinkpad P50 running OpenSuSE LEAP 42.1


Hello all,

Normally I come here looking for help. Today I did, but I think I resolved it, so I decided to go ahead and post anyway in hopes that it will help someone else.

I had some trouble getting OpenSuSE LEAP 42.1 to work correctly on my Lenovo P50. There are guides and information galore for other things like getting Bumblebee / Optimus to work (which sadly do not seem to work for me but I don't absolutely need them), and there are guides for getting power management to work so that the battery will last longer than a mere two hours (I now get around 4 hours by switching from the default power management to tuned). However, I have not found any information on getting rid of a pesky little problem with the gigabit ethernet adapter.

I came here to post my problem looking for help, but in gathering information for this post, a few things became apparent, which helped me -- I think resolve -- this issue. However, since I had already typed up the majority of my issue, I'm sure there are others looking for something similar for which the prescribed Google medication just doesn't knock out the disease.



For now at least, I believe the fix was updating drivers not from OpenSuSE (which would have been my preference) but rather directly from the manufacturer, Intel.






Summary / TL;DNR
In short, my adapter decides at oddball times to quit working. Unfortunately, I did not save the logs and I'm not sure where to get them now (/var/log/?/) but the key line was

e1000e detected hardware unit hang

The link would renegotiate at 100 Mbps instead of gigabit, if I was lucky, but even then, it would not pass traffic.

I would lose connection intermittently, and it would reconnect for about ten or twenty seconds before losing it again, eventually to all-out fail entirely, never to return. A reboot would resolve it for a random duration, sometimes only a minute or two. The usual Google results -- while pertinent, perhaps -- did not resolve my issue, and it seems nobody else is talking about this problem for which Google's results didn't help.

What I've tried
I checked dmesg to get the logs for this card's issue, and used key words I found there to search on Google to find many pages relating to similar symptoms with an apparently different root cause (as the fixes found did not work).

Before posting this today, I searched here for "e1000e detected hardware unit hang" with and without the e1000e, no results. I then searched "detected hardware unit hang" with and without quotes. No joy on the few results I found (which appeared to be completely not network related).

Google results recommend, like doctors prescribing Ibuprofen when they don't have any other answers, to disable TCP Checksum Offloading (tso off) using ethtool. They also recommend disabling a few other features. However, contrary to the majority of Google results, disabling these features using ethtool does not help. Likewise, disabling PCIe Power Management by booting with the kernel parameter "pcie_aspm=off" does not help.


Details
I have a Lenovo P50 (model type 20EQ-S1B200 manufactured April of 2016).

I have tried updating my driver (e1000e) to the latest available on the OpenSuSE repositories, which at first seemed to solve the issue, until I went to configure a wireless controller again and couldn't connect. Maybe it helped at first, but now the behavior has returned.

I was beginning to believe I had a physical hardware fault, possibly due to heat. This laptop does get warm at times, but sitting on a desk the fans almost never come on, so I assume it's not "too" hot for hardware.

I have been unable to lock it down to a specific action or set of actions that cause the issue to occur, so I don't know if it's at all related to suspending the system or physically disconnecting and reconnecting it to different switch gear. It doesn't seem to be, so I'm unsure.


lspci -vv (that's two "v", not a "w") (pertinent lines only) (after problem resolved -- may be different now)
Code:
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
	Subsystem: Lenovo Device 2233
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 135
	Region 0: Memory at d5800000 (32-bit, non-prefetchable) [size=128K]
	Capabilities: [c8] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee003b8  Data: 0000
	Capabilities: [e0] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: e1000e
	Kernel modules: e1000e
To be thorough for people searching, if I run "lspci -vvn" instead, those first two lines change:
Code:
00:1f.6 0200: 8086:15b7 (rev 31)
	Subsystem: 17aa:2233

ethtool -k eth0
Code:
Features for eth0:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [requested on]
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: off [requested on]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
hw-switch-offload: off [fixed]
ethtool -i eth0
Code:
driver: e1000e
version: 2.3.2-k
firmware-version: 0.8-3
expansion-rom-version: 
bus-info: 0000:00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
(which is peculiar -- my software management clearly shows I've installed intel-e1000e version 3.3.5-1.1. Interesting... But when I look at the file list, only five files were installed, and none of them the driver.

So I went to Intel's source code download page to get it straight from the horse's mouth, and after unpacking it just ran "make install" and rebooted when it finished (rather than screwing with unload/reload and also to give a firm hardware reset). Now I'm running version 3.3.4-NAPI:

ethtool -i eth0
Code:
driver: e1000e
version: 3.3.4-NAPI
firmware-version: 0.8-3
expansion-rom-version: 
bus-info: 0000:00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
However, I'm still linking at 100 megabit, so after some more searching I find that my card isn't advertising that it's gigabit capable.

ethtool eth0
Code:
Settings for eth0:
	Supported ports: [ TP ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Supported pause frame use: No
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Speed: 100Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 1
	Transceiver: internal
	Auto-negotiation: on
	MDI-X: on (auto)
	Supports Wake-on: pumbg
	Wake-on: d
	Current message level: 0x00000007 (7)
			       drv probe link
	Link detected: yes

I ran "ethtool -s eth0 speed 1000", and it replied that it could not advertise that speed. Odd. So I ran "ethtool -s eth0 advertise 1000" and it linked up fine, after a moment, at a gigabit. Now, the "ethtool eth0" shows an additional line for advertisements: "1000baseT/Full"


So far I haven't lost connection yet on the e1000e since doing this.



CONCLUSION
Appearances are that one must upgrade from the stock OpenSuSE LEAP 42.1 driver -- version 2.3.2-k -- to something (anything) newer. However, OpenSuSE's download page for the driver appears to my untrained eye to be broken, so it's best to get it directly from Intel's page.

Run "make install" (with appropriate credentials) on the driver after untarring it to an appropriate location in the filesystem, then reboot.

Run "ethtool -s eth0 advertise 1000" (if necessary) to ensure the driver will advertise to switch gear that it's gig capable.

Run "ethtool -s eth0 speed 1000" (if still necessary) to attempt to force gigabit connection (assuming the switch gear is capable and also configured for gig).

So far, the connection has been stable for me for the past half hour or so. I hope this helps someone else along the way. As I said, I looked for similar threads but found bupkes.
 
  


Reply

Tags
hang, hardware, lenovo, opensuse, unit



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: openSUSE Leap 42.2 is out, how to upgrade safely from openSUSE Leap 42.1 LXer Syndicated Linux News 0 11-17-2016 07:18 AM
[SOLVED] Sound non-functional on Lenovo P50 running Mint 18 after failed update wingman358 Linux - Newbie 2 11-09-2016 10:49 PM
Webcam not found - Lenovo Thinkpad E550 (OpenSuse Leap42.1) gonzovaldez Linux - Desktop 6 01-02-2016 11:33 AM
LXer: Lenovo Announces New ThinkPad P50, P70 ‘Mobile Workstation’ LXer Syndicated Linux News 0 08-12-2015 12:51 PM
[solved/workaround] nic e1000e disappears in 13.37 riwi Slackware - Installation 7 07-06-2011 04:04 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 06:19 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration