LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 08-28-2023, 08:26 AM   #1
plznobug
LQ Newbie
 
Registered: Aug 2023
Posts: 10

Rep: Reputation: 0
IO hung because request_queue is quiesced


Hi guys,
There is a strange problem with my machine. There are many processes that have been suspended. I artificially crash the kernel to generate a vmcore for analysis. I analyzed the vmcore using the crash tool and found that some processes in UD are scheduled because of blk_mq_get_tag.

The backtrace of one of those process is as follow.
#0 [ffffba09c97f7890] __schedule at ffffffffa1074402
#1 [ffffba09c97f7930] schedule at ffffffffa1074a68
#2 [ffffba09c97f7938] io_schedule at ffffffffa08e3f32
#3 [ffffba09c97f7948] blk_mq_get_tag at ffffffffa0bf28f9
#4 [ffffba09c97f79b8] blk_mq_get_request at ffffffffa0bed45a
#5 [ffffba09c97f79f0] blk_mq_submit_bio at ffffffffa0befb30
#6 [ffffba09c97f7a78] generic_make_request at ffffffffa0be430f
#7 [ffffba09c97f7ad0] submit_bio at ffffffffa0be4595
#8 [ffffba09c97f7b10] ext4_mpage_readpages at ffffffffc05c6727 [ext4]
#9 [ffffba09c97f7bf8] read_pages at ffffffffa0a2934b
#10 [ffffba09c97f7c70] __do_page_cache_readahead at ffffffffa0a29631
#11 [ffffba09c97f7d08] ondemand_readahead at ffffffffa0a29849
#12 [ffffba09c97f7d50] generic_file_buffered_read at ffffffffa0a1707c
#13 [ffffba09c97f7e40] new_sync_read at ffffffffa0ac38e1
#14 [ffffba09c97f7ec8] vfs_read at ffffffffa0ac6131
#15 [ffffba09c97f7f00] ksys_read at ffffffffa0ac656f
#16 [ffffba09c97f7f38] do_syscall_64 at ffffffffa08041db
#17 [ffffba09c97f7f50] entry_SYSCALL_64_after_hwframe at ffffffffa12000ad


The scheduler is md-deadline.
The system has 64 schedule tags 32 driver tags.
I found several strange things by using crash tool.
1) The difference between rq_dispatched and rq_completed is greater than the number of schedule/driver tag.
crash> struct blk_mq_ctx.cpu,rq_dispatched,rq_completed ffffd9f9bf807a40
cpu = 0
rq_dispatched = {2573711, 8197327}
rq_completed = {2573709, 8197214}
crash> p (8197327-8197214)
$1 = 113
crash> struct blk_mq_ctx.cpu,rq_dispatched,rq_completed ffffd9f9bf847a40
cpu = 1
rq_dispatched = {2562919, 7252082}
rq_completed = {2562919, 7251995}
crash> p (7252082 - 7251995)
$2 = 87
crash> struct blk_mq_ctx.cpu,rq_dispatched,rq_completed ffffd9f9bf887a40
cpu = 2
rq_dispatched = {2891531, 7214861}
rq_completed = {2891531, 7214803}
crash> pd (7214861-7214803)
$3 = 58

2)The sched_tags have been used up, but the tags are still available.
struct blk_mq_hw_ctx.tags ffff9ff8bc1b8000
crash> struct blk_mq_tags.bitmap_tags 0xffff9ff8b2f77440
bitmap_tags = {
sb = {
depth = 32,
shift = 3,
map_nr = 4,
map = 0xffff9ff8b2f65000
},
alloc_hint = 0x3a0100204c10,
wake_batch = 4,
wake_index = {
counter = 0
},
ws = 0xffff9ff8b2e70200,
ws_active = {
counter = 0
},
round_robin = true,
min_shallow_depth = 4294967295
}
crash> px ((struct sbitmap_word *)0xffff9ff8b2f65000)[0]
$13 = {
depth = 0x8,
word = 0xff,
cleared = 0xac,
...
}
crash> px ((struct sbitmap_word *)0xffff9ff8b2f65000)[1]
$14 = {
depth = 0x8,
word = 0xff,
cleared = 0xf4,
...
}
crash> px ((struct sbitmap_word *)0xffff9ff8b2f65000)[2]
$15 = {
depth = 0x8,
word = 0xff,
cleared = 0x18,
...
}
crash> px ((struct sbitmap_word *)0xffff9ff8b2f65000)[3]
$16 = {
depth = 0x8,
word = 0xff,
cleared = 0x6,
...
}

Here is sched_tags
crash> struct blk_mq_hw_ctx.sched_tags ffff9ff8bc1b8000
sched_tags = 0xffff9fe983baa540
crash> struct blk_mq_tags 0xffff9fe983baa540
struct blk_mq_tags {
nr_tags = 64,
nr_reserved_tags = 0,
active_queues = {
counter = 0
},
bitmap_tags = {
sb = {
depth = 64,
shift = 4,
map_nr = 4,
map = 0xffff9ff8bc1bac00
},
alloc_hint = 0x3a0100207694,
wake_batch = 8,
wake_index = {
counter = 1
},
ws = 0xffff9ff8bb200c00,
ws_active = {
counter = 41
},
round_robin = true,
min_shallow_depth = 4294967295
},
....
}

crash> px ((struct sbitmap_word *)0xffff9ff8bc1bac00)[0]
$4 = {
depth = 0x10,
word = 0xffff,
cleared = 0x0,
}
crash> px ((struct sbitmap_word *)0xffff9ff8bc1bac00)[1]
$5 = {
depth = 0x10,
word = 0xffff,
cleared = 0x0,
}
crash> px ((struct sbitmap_word *)0xffff9ff8bc1bac00)[2]
$6 = {
depth = 0x10,
word = 0xffff,
cleared = 0x0,
}
crash> px ((struct sbitmap_word *)0xffff9ff8bc1bac00)[3]
$7 = {
depth = 0x10,
word = 0xffff,
cleared = 0x0,
}
crash> px ((struct sbitmap_word *)0xffff9ff8bc1bac00)[4]
$8 = {
depth = 0x0,
word = 0x0,
cleared = 0x0,
}


3) The request_queue is in QUEUE_FLAG_QUIESCED but the scsi_device.sdev_state is SDEV_RUNNING.
crash> dev -d
MAJOR GENDISK NAME REQUEST_QUEUE TOTAL ASYNC SYNC
8 ffff9fe9833c1000 sda ffff9ff8a02489b8 1930 2 1928

crash> struct request_queue.queue_flags ffff9ff8a02489b8
queue_flags = 219287104
crash> eval (219287104>>26)
hexadecimal: 3
decimal: 3
octal: 3
binary: 0000000000000000000000000000000000000000000000000000000000000011

crash> struct request_queue ffff9ff8a02489b8 | grep queuedata
queuedata = 0xffff9ff8bc1c2000,
crash> struct scsi_device 0xffff9ff8bc1c2000 | grep sdev_state
sdev_state = SDEV_RUNNING,

These phenomenon make me feel like there is a bug in the kernel. Any help?

Last edited by plznobug; 08-28-2023 at 09:19 PM. Reason: More information
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
killing a hung up process with a script lockejr Linux - General 2 04-13-2006 11:40 PM
cannot login because file system check fails because of bad magic number trutnev Linux - Newbie 1 03-23-2004 06:44 PM
LFS 4.1 book install hung at section 6.14: Glibc-2.3.1 SparceMatrix Linux From Scratch 4 05-30-2003 04:56 PM
Redhat 9 install hung up at kernel boot up(usb probe failure) digizer0 Linux - General 0 05-25-2003 08:14 AM
LILO hung up after RH installs? r_jensen11 Linux - Newbie 1 05-16-2003 10:30 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 01:38 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration