----- "Lucas Silacci" <Lucas.Silacci(a)teradata.com> wrote:
My only guess is that there is something in the transition between
the
regular kernel and the kdump kernel (somewhere in the kexec path) that
re-opens the door for a queued up NMI to come in just before the kdump
kernel takes over. I've been digging through that code, but so far
haven't come up with anything that explains it yet.
Right -- I'm wondering who called smp_send_stop() while it was running
on the NMI exception stack?
PID: 0 TASK: ffffffff8038c340 CPU: 0 COMMAND:
"swapper"
#0 [ffffffff8046dc50] machine_kexec at ffffffff8011a95b
#1 [ffffffff8046dd20] crash_kexec at ffffffff80154351
#2 [ffffffff8046dde0] panic at ffffffff801327fa
#3 [ffffffff8046ded0] dumpsw_notify at ffffffff8831c0c3
#4 [ffffffff8046dee0] notifier_call_chain at ffffffff8032481f
#5 [ffffffff8046df00] default_do_nmi at ffffffff80322fab
#6 [ffffffff8046df40] do_nmi at ffffffff80323365
#7 [ffffffff8046df50] nmi at ffffffff8032268f
[exception RIP: smp_send_stop+84]
RIP: ffffffff80116e44 RSP: ffffffff8046ddd8 RFLAGS: 00000246
> > RAX: 00000000000000ff RBX: ffffffff8831c1f8 RCX: 000041049c7256e8
> > RDX: 0000000000000005 RSI: 000000005238a938 RDI: 00000000002896a0
> > RBP: ffffffff8046df08 R8: 00000000000040fb R9: 000000005238a7e8
> > R10: 0000000000000002 R11: 0000ffff0000ffff R12: 000000000000000c
> > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > --- <NMI exception stack> ---
> > #8 [ffffffff8046ddd8] smp_send_stop at ffffffff80116e44