Hi Rachita,
I'm looking at an alt-sysrq-c generated crash on an x86_64 kernel,
and I am seeing something similar to what you are with respect
to the transition from the interrupt stack back to the process
stack.
Here's a "bt -a", where cpu 0 shows that it received a shutdown
NMI while in the idle loop, and the transition from the NMI
exception stack back to the process stack was clean. But on
the cpu which took the alt-sysrq-c keyboard interrupt, the
transition from the per-cpu interrupt stack back to the
process stack is similar to what you're seeing:
crash> bt -a
PID: 0 TASK: ffffffff8034ce60 CPU: 0 COMMAND: "swapper"
#0 [ffffffff80481f30] crash_nmi_callback at ffffffff8007742f
#1 [ffffffff80481f40] do_nmi at ffffffff80063c2c
#2 [ffffffff80481f50] nmi at ffffffff8006312f
[exception RIP: mwait_idle+54]
RIP: ffffffff800553b7 RSP: ffffffff80437f90 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff80055381 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff8034e838
RBP: 0000000000099000 R8: ffffffff80436000 R9: 000000000000003e
R10: ffff810037d0c038 R11: 0000000000000048 R12: 0000000000090000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <exception stack> ---
#3 [ffffffff80437f90] mwait_idle at ffffffff800553b7
#4 [ffffffff80437f90] cpu_idle at ffffffff800473be
PID: 0 TASK: ffff81003fe48100 CPU: 1 COMMAND: "swapper"
#0 [ffff81003fe6bb40] crash_kexec at ffffffff800ab7c4
#1 [ffff81003fe6bbc8] mwait_idle at ffffffff800553b7
#2 [ffff81003fe6bc00] sysrq_handle_crashdump at ffffffff8019301f
#3 [ffff81003fe6bc10] __handle_sysrq at ffffffff80192e1c
#4 [ffff81003fe6bc50] kbd_event at ffffffff8018dbc1
#5 [ffff81003fe6bca0] input_event at ffffffff801e9b9f
#6 [ffff81003fe6bcd0] hidinput_hid_event at ffffffff801e42cb
#7 [ffff81003fe6bd00] hid_process_event at ffffffff801df66b
#8 [ffff81003fe6bd40] hid_input_report at ffffffff801df9d9
#9 [ffff81003fe6bdc0] hid_irq_in at ffffffff801e0dc0
#10 [ffff81003fe6bde0] usb_hcd_giveback_urb at ffffffff801d33d8
#11 [ffff81003fe6be10] uhci_giveback_urb at ffffffff88168724
#12 [ffff81003fe6be50] uhci_scan_schedule at ffffffff88168e07
#13 [ffff81003fe6bed0] uhci_irq at ffffffff8816ac08
#14 [ffff81003fe6bf10] usb_hcd_irq at ffffffff801d3dc7
#15 [ffff81003fe6bf20] handle_IRQ_event at ffffffff80010704
#16 [ffff81003fe6bf50] __do_IRQ at ffffffff800b5238
#17 [ffff81003fe6bf58] __do_softirq at ffffffff80011c0b
#18 [ffff81003fe6bf90] do_IRQ at ffffffff8006a762
--- <IRQ stack> ---
#19 [ffff81003fe65e70] ret_from_intr at ffffffff8005bac9
[exception RIP: cpu_idle+149]
RIP: ffffffff800473be RSP: ffffffff8042e220 RFLAGS: ffffffff80074188
RAX: ffffffffffffff16 RBX: 0000000000000000 RCX: ffffffff800553b7
RDX: 0000000000000010 RSI: 0000000000000246 RDI: ffff81003fe65ef0
RBP: ffff81003fe64000 R8: ffffffff8034e838 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000003f
R13: ffff810037d0c008 R14: 0000000000000246 R15: 0000000000000001
ORIG_RAX: 0000000000000018 CS: 0020 SS: 0000
bt: WARNING: possibly bogus exception frame
crash>
crash dutifully reports that the exception frame looks bogus
because of the CS value.
The end of the per-cpu interrupt stack looks like this:
crash> rd -s ffff81003fe68000 2048
...
ffff81003fe6bf30: 000000000000e900 00000000000000e9
ffff81003fe6bf40: ffff810037ca4bc0 irq_desc+59708
ffff81003fe6bf50: __do_IRQ+164 __do_softirq+94
ffff81003fe6bf60: 00000000000000e9 ffff81003fe65e48
ffff81003fe6bf70: 00000000000000ff cpu_data+256
ffff81003fe6bf80: 0000000000000100 cpu_core_map+32
ffff81003fe6bf90: do_IRQ+231 ffff81003fe65e70
ffff81003fe6bfa0: mwait_idle ffff81003fe65e70
ffff81003fe6bfb0: ret_from_intr ffff81003fe65e70
ffff81003fe6bfc0: 0000000000000000 0000000000000000
ffff81003fe6bfd0: 0000000000000000 0000000000000000
ffff81003fe6bfe0: 0000000000000000 0000000000000000
ffff81003fe6bff0: 0000000000000000 0000000000000000
crash>
...hence the supposed pointer to the generating exception frame
is presumed to be ffff81003fe65e70 (which is bogus).
Interestingly, though, is if I do an exception frame search
for that task, I do find the "real" exception frame:
crash> bt -e
PID: 0 TASK: ffff81003fe48100 CPU: 1 COMMAND: "swapper"
KERNEL-MODE EXCEPTION FRAME AT: ffff81003fe65e48
RIP: ffffffff800553b7 RSP: ffff81003fe65ef0 RFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff8034e838
RBP: 0000000000000000 R8: ffff81003fe64000 R9: 000000000000003f
R10: ffff810037d0c008 R11: 0000000000000246 R12: ffff810037fef040
R13: 0000000000000001 R14: ffff81003fe482f0 R15: 0000000002245f5e
ORIG_RAX: ffffffffffffff16 CS: 0010 SS: 0018
crash> sym ffffffff800553b7
ffffffff800553b7 (t) mwait_idle+0x36 include/asm/thread_info.h: 63
crash>
and if I just grep for that address within the per-cpu interrupt
stack, I see several refernces to it:
crash> rd -s ffff81003fe68000 2048 | grep ffff81003fe65e48
ffff81003fe6bb30: ffff81003ac7c000 ffff81003fe65e48
ffff81003fe6bc60: 00ffffff8001c1fd ffff81003fe65e48
ffff81003fe6bce0: ffff81003ec68000 ffff81003fe65e48
ffff81003fe6bd50: ffff81003fe65e48 0000000180087720
ffff81003fe6bda0: ffff810037cb7400 ffff81003fe65e48
ffff81003fe6bdb0: ffff810037cb7550 ffff81003fe65e48
ffff81003fe6be60: ffff81003cf37488 ffff81003fe65e48
ffff81003fe6bec0: ffff81003fe65e48 ffff81003fe65e48
ffff81003fe6bed0: uhci_irq+0x13f ffff81003fe65e48
ffff81003fe6bf00: ffff81003fe65e48 ffff81003fe65e48
ffff81003fe6bf60: 00000000000000e9 ffff81003fe65e48
crash>
But none of them are located at the "fixed" location of one
word below the 64-byte block at the top of the interrupt
stack.
So I don't know what's going on in this case...
I don't ever recall seeing such a bogus interrupt-to-process
stack transition on any netdump or diskdump generated vmcores.
And all the "test" kdump kernel dumpfiles I've used have only
been generated by using "echo c > /proc/sysrq-trigger", so the
crash path never went off the process stack.
So without blatantly pointing the finger, I wonder whether there's
something that the kexec/kdump code path does that could possibly
be tinkering with the contents of the interrupt stack?
I also want to try to force another crash but with all cpus
forcibly running something other than the idle task, in case
there's something strange about the interrupt bringing the
cpu out of that "mwait" instruction? Grasping at straws a
bit here...
Also, that's why I'm always asking for back-trace tests that
do something "real" -- instead of just having the kernel
call panic(), or that do a sys_write() to /proc/sysrq-trigger
to force an oops on the process stack. At least an alt-sysrq-c
on the console keyboard generates an interrupt, as does your
forced jprobes deal...
BTW, I also note the the reading of the module unwind tables
is reading invalid data, because it's being done before the
non-unity-mapped address translation can even work! In other
words, vmalloc addresses can only be read after vm_init() is
complete. So I've added another machdep_init() argument
(POST_VM) that is called just after vm_init(), and in the
case of x86_64 (and x86), can call init_unwind_table().
Thanks,
Dave
Show replies by date