The problem environment is below:
1. host OS is (uname -r):3.0.93-0.8-xen
2. guest OS kernel is 3-10_redhat-7.1
3. guest OS configure 4 vCPU for guest OS. (cat /proc/cpuinfo)
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
stepping : 4
microcode : 0x424
cpu MHz : 2100.060
cache size : 15360 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips : 4200.12
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
4. guest OS cmdline info
remove some needless info)
cat /proc/cmdline
crash_kexec_post_notifiers console=ttyS0,115200 console=tty0 earlyprintk=serial,ttyS0,115200 softlockup_panic=1 systemd.debug rd.debug panic=3 irqpoll nr_cpus=1 reset_devices mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug systemd.debug rd.debug disable_cpu_apicid=0 elfcorehdr=867700K
The problem will be 100% occurred if met below condition:
Here I think BSP cpu is vCPU0 in system kernel.if oops occurred in AP vCPU, the crashkernel will hang.
the last call trace info is :
smp_reboot_interrupt -> stop_this_cpu -> halt
all above description was using xen. I hope carshkernel can be startup successfully.
I have also try using KVM and it works good.
In fact I modified the guest OS kernel code to adapt xen. I have done 2 things.
1. unregister the panic_notifier_list in xen_panic_handler_init
2. remove if (is_kdump_kernel()) in xen_hvm_platform
My question is that:
If oops occurred in AP vCPU, the crashkernel will be started by the AP vCPU which send reboot_vector to all the other vCPUs included the BSP vCPU, but BSP vCPU has the BSP flag, so it will not handle the reboot vector, so the whole system stopped. am I right? there is any patches for such bug?