9.2 Pro won't boot into GUI after YOU (online update)

Steerpike · 11-26-2004, 12:34 AM

Did a clean, default install of 9.2 pro last week on my dell latitude D600 laptop. Got wireless networking working using ndiswrapper, no problems. Rebooted several times.

Yesterday, enabled some powersave modules (ACPI) and tested, worked fine - suspended to disk, resumed, etc.

Today, ran 'YOU' (yast online update) and applied all updates. This was my first run of 'YOU'.

On reboot, problems - Gui 'splash' comes up (the Welcome screen), with a mouse cursor, but no logon. Mouse moves the cursor.

Ctrl-Alt-F1 to tty1 console. See nothing bad on the screen, other than perhaps
"Starting syslog services syslogd: cannot create /dev/log: Input/output error"

move down to /var/log, do an ls -l -t to see latest logs, open up /var/log/boot.msg ... only messages of interest reported here -

<4>ReiserFS: wanring: is_leaf: free space seems wrong: level-1, nr_items-33, free_space=65448 rdkey
<4>ReiserFS: hda2: warning: vs-5150: search_by_key: invalid format found in block 2425132. Fsch?
<4>ReiserFS: hda2: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occured trying to find stat data of [6 62774 0x0 SD]

There's quite a few more of these 'bad' looking messages, then a few good ones ("usbcore: registered new driver usbfs", etc) Then, a bunch more of these 'ReiserFS: warning: is_leaf: ... messages.

Then I see this, further down:

Checking root file system ...
fsck 1.35 (28-feb-2004)
Reiserfs super block in block 16 on 0x302 of format 3.6 with standard journal
...
Filesystem is NOT clean
Filesystem seems mounetd read-only. Skipping journal replay.
Checking internal tree..finished
done<notice>exit status of (boot.rootfsck) is (0)

...

Activating device mapper ...
...
<notice>exit status of (boot.md boot.device-mapper) is (0 0)
...
Scanning for LVM volume groups...
Reading all physical volumes. ...
No volume groups found
Activating LVM volume groups ...
No volume groups found
done
...
showconsole: can not follow disk: Permission denied
showconsole: can not follow lirc: Permission denied
Checking file systems...
...
done
...
Mounting local file systems...
proc on /proc type proc (rw)
tmpfs on /dev/shm type tmpfs (rw)
devpts on /dev/pts type devpts (rw, mode....)
...
done<notice>exit status of (boot.localfs) is (0)
...
Loading required kernel modules
doneCreating /var/log/boot.msg
modprobe: FATAL: Error inserting hw_random (...) ... Input/output error
[[[note: I've seen this "random" error before on 'good' boots]]]]

...
Setting up hostname 'linux'done
...
System Boot Control: The system has been set up
System Boot Control: Running /etc/init.d/boot.local
done<notice>exit status of (boot.ipconfig) is (0)
<otice>killproc: kill(1537,3)

INIT: Entering runlevel: 5

Boot logging started ....

...
Starting syslog services ... ...
@@@@@@@ syslogd: cannot create /dev/og: Input/output error
...

[[[lots of 'starting ...' messages, all seem fine ]]]]]

Master Resource Control: runlevel 5 has been reached.
Failed services in runlevel 5: network
Skipped services in runlevel 5: smbfs nfs
<notice>exit status of (cron) is (0)
<notice>killproc: kill(3406,3)

Boot logging started

=============================
Any ideas what may have happened ...?

rjlee · 11-26-2004, 03:55 AM

It looks like fsck.reiserfs has signed off your filesystem as broken, and refused to let it be mounted read-write in case this breaks it further.

Normally at this stage, one would expect fsck.reiserfs to run to fix the problem, but that doesn't seem to have happened. You should take a copy of your boot log for this session, add a note to explain what you did and when (on which date) you did it, and email it to SuSE as a bug report. If it's silently ignoring a dirty filesystem then you may have found a rather nasty bug.

The solution to getting you up and running again is to firstly check if the filesystem is mounted read-only

Code:

less /proc/mounts | grep hda2

you should see “ro” in the output. If not, reboot into a rescue system (type “rescue” at the boot prompt where you would normally type “linux”).

Next, run the command

Code:

fsck.reiserfs /dev/hda2

You may be prompted if you want to fix various faults (this is normally a good idea) or even told to run fsck again with different command-line options.

After this, you can

Code:

/sbin/reboot

and hopefullly get back into a working system.

Steerpike · 11-26-2004, 04:22 PM

Quote:

Originally posted by rjlee
It looks like fsck.reiserfs has signed off your filesystem as broken, and refused to let it be mounted read-write in case this breaks it further.

Normally at this stage, one would expect fsck.reiserfs to run to fix the problem, but that doesn't seem to have happened. You should take a copy of your boot log for this session, add a note to explain what you did and when (on which date) you did it, and email it to SuSE as a bug report. If it's silently ignoring a dirty filesystem then you may have found a rather nasty bug.

dumb question, but is boot.msg recreated each boot, or is it cumulative? I see entries that seem to pertain to earlier boots, as indicated by timestamps, but that could also be a difference between 'universal time' and 'pacific time' - that is, I see some earlier messages, but they could be due to the 8 hour difference - early boot messages have not adjusted to local time?

If cumulative, can I just rename existing boot log and reboot, or should I be using some logging utility to do this?

Quote:

Originally posted by rjlee
The solution to getting you up and running again is to firstly check if the filesystem is mounted read-only

Code:

less /proc/mounts | grep hda2

you should see “ro” in the output. If not, reboot into a rescue system (type “rescue” at the boot prompt where you would normally type “linux”).

Next, run the command

Code:

fsck.reiserfs /dev/hda2

You may be prompted if you want to fix various faults (this is normally a good idea) or even told to run fsck again with different command-line options.

Prior to reading your response, I googled around and decided to do the following:

Code:

shutdown now
umount /dev/hda2
reiserfsck --rebuild-tree /dev/hda2

This founds lots of errors, and eventually finished. I decided to run again, and it again found lots of errors (I was expecting to get a 'clean' run at some point). Ran it about 6 times, kept getting various errors.

Finally, booted into the second instance of suse I have on the hard drive (installed to /dev/hda3), and ran the reiserfsck /dev/hda2, with no qualifiers, and it suggested all was well.

Then was able to boot ok into the previously damaged environment (hooray!).

Just now, for giggles, I did a shutdown, unmount, and ran just the check and it came out 'clean'. But then - just for giggles - I ran the --rebuild-tree option one more time, to see what happened, and sure enough, it AGAIN found problems:
...block xxx The number of items (2) is incorrect, should be (1) - corrected
...block xxx The free space (0) is incorrect, should be (xxx) - corrected
...pass0: vpf-10110: block xxx, item (0): unknown item tpye found ....

Pass 1 and 2 is fine, pass 3 complains about /lost+foundvpf - 10650: The direcotry [xxx] has teh wrong size in the StatData ... corrected ...

That's pretty much it.

If I re-run it again, it again finds things, seemingly different things.

But if I run reiserfsck /dev/hda2 (no rebuild-tree), it says all is well.

Is it illogical to expect a 'rebuild-tree' to run without error if the readonly consistency check passes?

Now, I can reboot and all is weel. Boot.msg indicates no problems whatsoever.

The only issue now is, probably unrelated, that I can't use tty1 (the main boot console?). When I use ctrl-alt-F1, I see the end of the boot sequence, but the last message on the screen is

INIT: Id "1" respawning too fast: disabled for 5 minutes

This repeats every 5 mins or so and the console is unusable.

I can switch to other consoles with no problem.

From other console, I do 'more messages | grep mingetty'
and see
... linux mingetty[,number>]: /dev/tty1: No such file or direcotry

lots and lots of them... .. The GUI seems to work fine, though.

rjlee · 11-27-2004, 06:53 PM

On my (SuSE) system, the boot.msg file is created by the /etc/init.d/boot.klog script, which overwrites the file every time. This may not be so always for every distro, though.

--rebuild-tree will look for leaf nodes that don't have parents. I suspect that this could include files that have been deleted; such files are often partially overwritten and so fragments will still exist that are not valid files (i.e. it will find errors). But that's only a guess; I haven't studied ReiserFS from a structural point of view.

The error message for terminal 1 means that mingetty (the program that handles the user logins on text consoles) is dying very frequently. The file you need to look at here is /etc/inittab. You should have a set of lines something like:
1:2345:respawn:/sbin/mingetty --noclear tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

Each entry is for a different terminal (set by tty1, tty2 etc.) Make sure that the entry for tty1 is like the other entries; the --noclear flag here means that the previous contents of the terminal (the boot log) won't get cleared.

There are various other options to mingetty, and even other varient programs that will run the login, so your best bet is to base the entry for tty1 on tty2.