Weird kernel behaviour on my Lenovo: has anyone else seen symptoms like this before?
Linux - KernelThis forum is for all discussion relating to the Linux kernel.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have installed in Slackware-15 a recent kernel image and its modules from Slackware-current. This kernel is guaranteed to be configured properly! I will make an initrd for it using Patrick's wonderful script and set it up in elilo.conf as an alternative Slack boot.
The generic kernel from slackware-current now has filesystem drivers built in, so it doesn't usually need an initrd any more. (The generic kernel of slackware 15 still needs an initrd.)
The generic kernel from slackware-current now has filesystem drivers built in, so it doesn't usually need an initrd any more. (The generic kernel of slackware 15 still needs an initrd.)
Brilliant! But I'm so edgy now, I want to use belt and braces. I can foresee a nightmare a year or so down the line when I am stuck with hardware I can't use any more. I want to find out what's going wrong and fix it while I still feel I can.
Update: Thank God! It isn't me misconfiguring things, it really is the kernel code that's misbehaving on my machine. Even with the official Slackware build, 6.9.0 still halts and buzzes. Incidently tpm.ko is built into this kernel.
Linux-6.4.0 boots with my tpm patch. This is a definite step forward. 6.7.4 still doesn't, so I shall need to do a new bisection. I'm guessing that there is a further problem with the tpm driver, as later kernels have a lot of extra code in this area. Over the next few days I'll work on narrowing it down to two successive releases as I did before, then clone the relevant twig overnight and do a proper bisect. Thank God for the at command!
I'm convinced that the buzz which these later kernels produce is an error signal of some kind. I can't imagine random machine noise producing that perfect V-sign.
Note: This is interesting: I found it in an arcolinux forum:
Quote:
Every time I try to run the installation from grub of ArcoLinux I get an error :
'Unkown TPM Error
Unkown TPM Error
Unkown TPM Error
Failed to load /arch/boot/x86_64/vmlinuz-linux' (3 times the Unknown TPM Error and then the other one)
Notice the pattern: three plus one. If these messages are accompanied by buzzes (the poster doesn't say), it would give the same pattern as I have been observing.
I think you have already done everything possible.
You told that 6.3.6 was good, so that means 6.3.0 was good. And 6.3.7 was bad. You already found the culprit between the stable 6.3.6 and 6.3.7 ("tpm, tpm_tis: Request threaded interrupt handler"). I think you also told that 6.4.0 worked with that patch reversed, so 6.4.0 was bad if I understood you correctly. That patch was added to the stable tree while 6.4 was worked on, before 6.4 was ready. This is the patch that was added in Linus's tree (mainline): https://git.kernel.org/pub/scm/linux...1d7de7c3eb2cea. You also told that 6.9 does not work, so it's not fixed yet.
This is from ChangeLog-6.4:
Code:
commit 1a0beef98b582b69a2ba44e468f7dfecbcfab48e
Merge: dc7e22a368c2a bd8621ca1510e
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon Apr 24 11:40:26 2023 -0700
Merge tag 'tpmdd-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
- The .machine keyring, used for Machine Owner Keys (MOK), acquired the
ability to store only CA enforced keys, and put rest to the .platform
keyring, thus separating the code signing keys from the keys that are
used to sign certificates.
This essentially unlocks the use of the .machine keyring as a trust
anchor for IMA. It is an opt-in feature, meaning that the additional
contraints won't brick anyone who does not care about them.
- Enable interrupt based transactions with discrete TPM chips (tpm_tis).
There was code for this existing but it never really worked so I
consider this a new feature rather than a bug fix. Before the driver
just fell back to the polling mode.
Link: https://lore.kernel.org/linux-integrity/a93b6222-edda-d43c-f010-a59701f2aeef@gmx.de/
Link: https://lore.kernel.org/linux-integrity/20230302164652.83571-1-eric.snowberg@oracle.com/
* tag 'tpmdd-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: (29 commits)
tpm: Add !tpm_amd_is_rng_defective() to the hwrng_unregister() call site
tpm_tis: fix stall after iowrite*()s
tpm/tpm_tis_synquacer: Convert to platform remove callback returning void
tpm/tpm_tis: Convert to platform remove callback returning void
tpm/tpm_ftpm_tee: Convert to platform remove callback returning void
tpm: tpm_tis_spi: Mark ACPI and OF related data as maybe unused
tpm: st33zp24: Mark ACPI and OF related data as maybe unused
tpm, tpm_tis: Enable interrupt test
tpm, tpm_tis: startup chip before testing for interrupts
tpm, tpm_tis: Claim locality when interrupts are reenabled on resume
tpm, tpm_tis: Claim locality in interrupt handler
tpm, tpm_tis: Request threaded interrupt handler
tpm, tpm: Implement usage counter for locality
tpm, tpm_tis: do not check for the active locality in interrupt handler
tpm, tpm_tis: Move interrupt mask checks into own function
tpm, tpm_tis: Only handle supported interrupts
tpm, tpm_tis: Claim locality before writing interrupt registers
tpm, tpm_tis: Do not skip reset of original interrupt vector
tpm, tpm_tis: Disable interrupts if tpm_tis_probe_irq() failed
tpm, tpm_tis: Claim locality before writing TPM_INT_ENABLE register
...
Maybe it's time to report it?
Last edited by Petri Kaukasoina; 05-27-2024 at 08:46 AM.
But the problem is only half-solved! The patch that I found was not the only one I need. I noticed right at the beginning that 6.3.4 and some earlier kernels refused to boot but did so silently. Later kernels not only refuse to boot but make that buzzing noise. So there are at least two problems here, maybe more.
The patch that I found takes me as far as 6.4.16 and maybe a little bit further, but kernels from 6.5.6 onward still won't boot even with this patch in place and they buzz at me too. So somewhere betweeen 6.4.16 and 6.5.6 is another piece of bad code that I also need to track down. And maybe there are more of these: the tpm driver has been heavily worked over of late.
I do intend to report my problem but not until I've found all the patches that I need for the official LFS 12 kernel, 6.7.4, and it will be easy to see if I can get there with just two patches or not.
Even with the official Slackware build, 6.9.0 still halts and buzzes. Incidently tpm.ko is built into this kernel.
Yes, it is: CONFIG_TCG_TPM=y
One thing that you could still do is to revert the culprit away from a current kernel to see if it still helps. There is a difference of one empty line between code now and then, but here is a version of the patch that applies in 6.9.2, without adding the empty line. (It's same kind of reverse patch as the one before.)
So somewhere betweeen 6.4.16 and 6.5.6 is another piece of bad code that I also need to track down.
To bisect stable kernels, I'd think it would be a good idea to stay in one stable branch. For example, if 6.5.0 is good and 6.5.6 is bad, bisect between them. 6.4.16 and 6.5.6 are in sort of different branches of the tree. In fact, stable kernels 6.4.16 and 6.5.3 were released on the same day, and they probably received about the same patches from the development of upcoming 6.6.
To bisect stable kernels, I'd think it would be a good idea to stay in one stable branch.
That's exactly what I do. I try different kernel releases in a rough bisection until I find a good and a bad one adjacent to each other. Then I clone overnight the twig that ends in the first bad release and use git bisect to close in on the actual bad commit. That's how I found the previous patch and I'm sure I can find the next one in the same way. I know now that it's somewhere on the 6.5 branch, not later than 6.5.6.
When I've found it, I'll try patching 6.7.4 with both patches and, if it then boots, that's the end of the search. If it doesn't, there will be a third patch to be tracked down.
The point is that Slackware and AntiX will eventually be using these kernels (6.7 is hardly bleeding edge!) and so I need to be ready for them.
So the second patch will be somewhere between 6.4.16 and 6.5. And I think I am going to need a third one, because 6.5 with my first patch halts silently but the patched 6.5.1 halts with buzzing.
Last night I put on a new clone to run at 12:15AM. But I think I did it wrong. My clone instruction was
I got a clone and git can find v6.5 on it as my bad revision, but can't find v6.4.16. It says
Code:
error: Bad rev input: v6.4.16
Should I have used 6.5.y as my branch rather than 6.5?
PS: I just got an email from my ISP that I am approaching my download limit for the month and I've still got 19 days to go! So no more clones until after June 19th. I do have a buffer allowance (currently 9.6 GB) but I don't want to chance it.
OK, I need more help. I just ran "git tag -l" on my clone because I wanted to find out what tags were actually available for my bisection, and the only ones listed are minor releases: 6.5, 6.4, 6.3, etc, together with their rc release candidates. No patch releases at all. That explains why it couldn't find v6.4.16, which I wanted to set as my last good kernel.
Obviously the clone command works differently for x.y twigs and x.y.z twigs. I'm very annoyed because I have almost used up my monthly allocation for what seems to be a dud download. What git command ought I to have used for a case like this?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.