Linux - KernelThis forum is for all discussion relating to the Linux kernel.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
So why didn't I get the usual panic report with a traceback?
Maybe the buffer was filled with so much output the report did not fit anymore. But that's just a guess.
Usually the kernel guys recommend to ssh into the machine and check if really frozen solid or just a display froze.
But to report anything upstream, you must reproduce it on the most recent version of any given branch.
Distribution: openSUSE(Leap and Tumbleweed) and a (not so) regularly changing third and fourth
Posts: 629
Rep:
Quote:
Originally Posted by petelq
Take your laptop to M&S and use their wifi. It's free and no snags to start, no strings.
You can have a very nice coffee while you work.
I thought I'd seen a post about you having a laptop at some point so I just assumed and you know what's said about assuming.
I stand corrected.
I think we're about the same age and I have a smart phone and I build roms for it. I would have thought you were much more tech savvy than me (and you probably are).
1) The fault is reproducible, although the output differs this morning. I guess it depends on the precise moment of the crash. This time I got some lines of kernel output, repeated over and over. I transcribed them by hand (hope this is accurate):
Code:
BUG: Unable to handle page fault for address ffffffffd83ea180
#PF: supervisor read access in kernel mode
#PF: error_code (0x0000) - not-present page
PGD 15612067 P4D 15612067 PUD 15614067 PMD 0
general protection fault, probably for non-canonical address 0x720072007200720:0000 [#12] PREEMPT SMP PT1
CPU: 0 PID 119539488 Comm: /*d*r*i*v*e*r*s*/*x* Not tainted 5.15.19scroll #1
Hardware name: LENOVO 90BX0018UK/Aptio CRB, BIOS 007KT39AUS 06/18/2014
2) When I tried to ssh from the laptop I got:
Code:
ssh: connect to host bigboy port 22: No route to host
Bigboy is in the /etc/hosts file btw with the address 192.168.2.100. When I pinged that address, I got destination unreachable. So it really was a crash, not just a console freeze.
Now I need to repeat all that with a standard unpatched kernel.
I am now running the 5.15.27 official Slackware kernel and there is no bug. Of course, that's not a 100% reliable test because it's a later version and there could be a real kernel bug that got fixed. To be certain, I shall need to rebuild 5.15.19 without the scrolling tty patch and try booting from that. But I'm fairly certain now that the bug is in the patch and not in the main kernel. After all, this is a strictly unofficial patch.
I think I have a hazy idea of what is going on here. It has to do with memory management. PF stands for page fault, and PGD, etc., are the kernel's various page indexes. d*r*i*v*e*r*s probably means that a driver (the patched tty driver) is triggering the problem. Output to a scrolling console has to be stored in memory so that you can get it back when you scroll upwards. I vaguely remember from past reading that it goes into video memory, but I have a system-on-a-chip with built-in video so there probably isn't that much difference physically between video and main memory on my machine. Anyway, I think what is happening is that the patched driver and the part of the kernel that does memory management aren't quite in sync. When you cause a lot of output to be dumped at once, as I do in this test op, maybe it can't get written to memory fast enough and something gets out of step.
If anyone else is using this patch, perhaps they could try to replicate the bug. Just give a command that dumps several screenfuls of output almost instantaneously and see what happens.
I just booted from an unpatched 5.15.19 kernel and there is no bug! So the problem is either in the patch itself or possibly in a mismatch between the requirements of a scrolling console and the expectations of a memory manager that no longer supports it. In either case, there's nothing that needs to be reported upwards.
hazel, glad you got it fixed. I was interested in your comments regarding "age" related stuff. Born in '44, I'm just now using some of the smartness of my phone. Its usually off most of the time. No need to thumb my life. When I exercise outside, I laugh at all the "thumb" people out and about. I want to enjoy the sights, smells, etc of the outdoors. I witness people stop running when the phone rings?!? Can't leave home without it, it seems. I'm to conservative for today's society.
It isn't fixed! It's unfixable (by me anyway). But the question is solved because we now know what's causing the problem. Since it only manifests when you use this unauthorised patch, reporting it upwards isn't going to do anything useful.
How serious it is depends on what you want to use a scrolling console for. If you just want to be able to examine kernel messages after a failed boot, that's fine because they don't come out fast enough to trigger the problem. But if you wanted to use it to build extra software on a skeletal LFS system which has no graphical interface as yet, you probably would crash it, because messages from a build come out very fast indeed. You would have to either direct the output into a file and examine it afterwards or use something like screen or tmux. And that takes away the purpose of using a scrolling console in the first place. You might as well use screen/tmux with a standard kernel.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.