Hi Rachita,
I've figured out why the x86_64 interrupt-stack-to-process-stack
transition is showing a bogus exception frame. It's not kdump
or jprobes -- I think it may have been introduced with the DWARF
CFI changes.
Anyway, in older x86_64 kernels, when an interrupt was taken,
the pt_regs exception frame would be laid down on the current stack,
and the rdi register would contain a pointer to it. Then the stack
pointer would be switched to the per-cpu interrupt stack. (Actually
it is switched to a point 64 bytes from the top of the interrupt
stack, presumably for cache line purposes). The first thing
done after having been switched to the interrupt stack is to push
the rdi register, which again, contains a pointer to the exception
frame on the other stack. Then it calls the interrupt handler.
Here's the "old" code, where the last 4 instructions in the macro
shown below perform the steps outlined above:
1. get the per-cpu interrupt stack address,
2. move it into rsp -- which effectively switches stacks,
3. then the rdi register is pushed,
4. and the interrupt handler called:
.macro interrupt func
CFI_STARTPROC simple
CFI_DEF_CFA rsp,(SS-RDI)
CFI_REL_OFFSET rsp,(RSP-ORIG_RAX)
CFI_REL_OFFSET rip,(RIP-ORIG_RAX)
cld
#ifdef CONFIG_DEBUG_INFO
SAVE_ALL
movq %rsp,%rdi
/*
* Setup a stack frame pointer. This allows gdb to trace
* back to the original stack.
*/
movq %rsp,%rbp
CFI_DEF_CFA_REGISTER rbp
#else
SAVE_ARGS
leaq -ARGOFFSET(%rsp),%rdi # arg1 for handler
#endif
testl $3,CS(%rdi)
je 1f
swapgs
1: addl $1,%gs:pda_irqcount # RED-PEN should check preempt count
movq %gs:pda_irqstackptr,%rax
cmoveq %rax,%rsp
pushq %rdi # save old stack
call \func
.endm
However, in current x86_64 kernels, the interrupt macro has changed
to look like this:
.macro interrupt func
cld
SAVE_ARGS
leaq -ARGOFFSET(%rsp),%rdi # arg1 for handler
pushq %rbp
CFI_ADJUST_CFA_OFFSET 8
CFI_REL_OFFSET rbp, 0
movq %rsp,%rbp
CFI_DEF_CFA_REGISTER rbp
testl $3,CS(%rdi)
je 1f
swapgs
1: incl %gs:pda_irqcount # RED-PEN should check preempt count
cmoveq %gs:pda_irqstackptr,%rsp
push %rbp # backlink for old unwinder
/*
* We entered an interrupt context - irqs are off:
*/
TRACE_IRQS_OFF
call \func
.endm
Note that rdi still contains the pt_regs pointer, as evidenced by
the "testl $3,CS(%rdi)" instruction, which is checking the CS register
contents in the pt_regs for whether it was operating in user-space
when the interrupt occurred. But more importantly, note that just
prior to calling the handler, it does a "push %rbp" instead of a
"pushq %rdi" like it used to.
I'm pretty sure it's being done purposely, because instead of the
having "old unwinder" dumping kernel text addresses starting inside
of the pt_regs exception frame, it bumps the starting point up to
whatever's contained in $rbp, which is above the exception frame
on the old stack. So it would avoid dumping text return addresses
that happen to be sitting in the pt_regs register dump.
Just to verify, I patched the current kernel to push rdi instead
of rbp. Again, here's what the unpatched alt-sysrq-c backtrace
looks like:
crash> bt
PID: 0 TASK: ffff81003fe48100 CPU: 1 COMMAND: "swapper"
#0 [ffff81003fe6bb40] crash_kexec at ffffffff800ab798
#1 [ffff81003fe6bbc8] mwait_idle at ffffffff80055375
#2 [ffff81003fe6bc00] sysrq_handle_crashdump at ffffffff80192fdc
#3 [ffff81003fe6bc10] __handle_sysrq at ffffffff80192dae
#4 [ffff81003fe6bc50] kbd_event at ffffffff8018db52
#5 [ffff81003fe6bca0] input_event at ffffffff801e9b6d
#6 [ffff81003fe6bcd0] hidinput_hid_event at ffffffff801e4299
#7 [ffff81003fe6bd00] hid_process_event at ffffffff801df639
#8 [ffff81003fe6bd40] hid_input_report at ffffffff801df9a7
#9 [ffff81003fe6bdc0] hid_irq_in at ffffffff801e0d8e
#10 [ffff81003fe6bde0] usb_hcd_giveback_urb at ffffffff801d33a2
#11 [ffff81003fe6be10] uhci_giveback_urb at ffffffff8817b724
#12 [ffff81003fe6be50] uhci_scan_schedule at ffffffff8817be07
#13 [ffff81003fe6bed0] uhci_irq at ffffffff8817dc08
#14 [ffff81003fe6bf10] usb_hcd_irq at ffffffff801d3d91
#15 [ffff81003fe6bf20] handle_IRQ_event at ffffffff800106fd
#16 [ffff81003fe6bf50] __do_IRQ at ffffffff800b520c
#17 [ffff81003fe6bf58] __do_softirq at ffffffff80011bfa
#18 [ffff81003fe6bf90] do_IRQ at ffffffff8006a729
--- <IRQ stack> ---
#19 [ffff81003fe65e70] ret_from_intr at ffffffff8005ba89
[exception RIP: cpu_idle+149]
RIP: ffffffff800473a7 RSP: ffffffff8042e220 RFLAGS: ffffffff80074153
RAX: ffffffffffffff16 RBX: 0000000000000000 RCX: ffffffff80055375
RDX: 0000000000000010 RSI: 0000000000000246 RDI: ffff81003fe65ef0
RBP: ffff81003fe64000 R8: ffffffff8034e818 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000003f
R13: ffff810037d0c008 R14: 0000000000000246 R15: 0000000000000001
ORIG_RAX: 0000000000000018 CS: 0020 SS: 0000
bt: WARNING: possibly bogus exception frame
crash>
And when the kernel is patched to push rdi instead, the
"old" behavior is emulated:
crash> bt
PID: 0 TASK: ffffffff8034ce60 CPU: 0 COMMAND: "swapper"
#0 [ffffffff8047eb40] crash_kexec at ffffffff800ab798
#1 [ffffffff8047ebc8] mwait_idle at ffffffff80055375
#2 [ffffffff8047ec00] sysrq_handle_crashdump at ffffffff80192fdc
#3 [ffffffff8047ec10] __handle_sysrq at ffffffff80192dae
#4 [ffffffff8047ec50] kbd_event at ffffffff8018db52
#5 [ffffffff8047eca0] input_event at ffffffff801e9b6d
#6 [ffffffff8047ecd0] hidinput_hid_event at ffffffff801e4299
#7 [ffffffff8047ecd8] ip_route_input at ffffffff8003662f
#8 [ffffffff8047ed00] hid_process_event at ffffffff801df639
#9 [ffffffff8047ed40] hid_input_report at ffffffff801df9a7
#10 [ffffffff8047edc0] hid_irq_in at ffffffff801e0d8e
#11 [ffffffff8047ede0] usb_hcd_giveback_urb at ffffffff801d33a2
#12 [ffffffff8047ee10] uhci_giveback_urb at ffffffff88126724
#13 [ffffffff8047ee50] uhci_scan_schedule at ffffffff88126e07
#14 [ffffffff8047eed0] uhci_irq at ffffffff88128c08
#15 [ffffffff8047ef10] usb_hcd_irq at ffffffff801d3d91
#16 [ffffffff8047ef20] handle_IRQ_event at ffffffff800106fd
#17 [ffffffff8047ef50] __do_IRQ at ffffffff800b520c
#18 [ffffffff8047ef58] __do_softirq at ffffffff80011bfa
#19 [ffffffff8047ef90] do_IRQ at ffffffff8006a729
--- <IRQ stack> ---
#20 [ffffffff80437ee8] ret_from_intr at ffffffff8005ba89
[exception RIP: mwait_idle+54]
RIP: ffffffff80055375 RSP: ffffffff80437f90 RFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000099000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff8034e818
RBP: 0000000000099000 R8: ffffffff80436000 R9: 000000000000003e
R10: ffff810037d0c038 R11: ffff81003f48e580 R12: ffff810037fef7a0
R13: 0000000000000000 R14: ffffffff8034d050 R15: 0000000002246128
ORIG_RAX: ffffffffffffff16 CS: 0010 SS: 0018
#21 [ffffffff80437f90] cpu_idle at ffffffff800473a7
crash>
Anyway, we'll have to come up with a differentiator so that
both types of interrupt-stack-linkages are handled. It looks
like the rbp value is fixed with relationship to the exception
frame, so something can be done.
Just FYI,
Dave