My Intel Linux NICs Have Developed A Nasty Habit Of Becoming Hung
The past few days I've encountered a strange issue where all of my systems with Intel NICs using the e1000e Linux driver like to be left in a hung state. The past two days of waking up I find my main system no longer has any working network connectivity as the e1000e driver reports "detected hardware unit hang." And my many other test systems that also use the e1000e driver are also left in a hung state.
This is really strange as I've made no network changes in the past few days, haven't added any new volatile systems to the network, and it's on a mix of Ubuntu / Fedora / Clear Linux / Debian boxes with a range of different motherboards but the only common denominator seems to be that the affected systems are using the e1000e NIC driver. The Linux kernels on these distributions aren't even the same but they are all running Linux 4.x.
Aside from magically getting hung overnight, they tend to get hung a handful of times too during the morning. I haven't figured out the cause yet. The router in use is the high-end ASUS AC88U. I upgraded the firmware on it yesterday to see if that was the cause of all these e1000e hangs, but the problem persists this morning so I've even tried a beta firmware on it to see if something is going awry on that end given the scope of hardware affected by the network issues. But that doesn't seem like that should be possible either if a faulty network device is able to cause a number of NICs to have the hardware left in a hung state...
At this point I am not too sure what is going on with this extremely annoying problem. Usually after rebooting the affected systems, things will work fine for a short time. But as the day progresses, the NICs generally are back to being fine until the night.
With the number of systems being benchmarked at any given time, any network disruption is absolutely frustrating, especially when said problems happen on a daily basis.
If anyone has run into any similar e1000e issues recently, I'd love to hear about it. It's been quite a while since I previously encountered such network hardware hangs with the Intel driver and searching around most of the related e1000e hang bug reports are mostly old. If you have any insight, feel free to share via Twitter or by commenting on this article in our forums.