On Sat, 24 May 2014 20:24:30 +0800
oliver yang <yangoliver(a)gmail.com> wrote:
2014-04-29 19:27 GMT+08:00 Petr Tesarik <ptesarik(a)suse.cz>:
>
> It will show an incorrect register dump, but the backtrace continues.
> For example:
>
Hi Petr,
The back trace looks good.
How did you know the register dump is incorrect?
The saved registers did not make any sense in the interrupted code. ;-)
And one of them, which should have been a pointer, looked like RFLAGS.
At least the value of RSP saved in NMI stack seemed to be good,
RSP: ffff880232b2ff18
Yes, SS, RSP, RFLAGS, CS, and RIP may look good, because they are
pushed onto stack by the CPU. But they may point back to a NMI if it
was a nested NMI. See my comments below.
Recently, I'm working on a core file analysis, and found crash
tool
couldn't give the correct NMI back trace.
But I can find right stack trace by using IST pointer.
I'm wondering whether your patch could work for my cases.
May I can try your fix after it is ready.
See
https://www.redhat.com/archives/crash-utility/2014-April/msg00038.html
It's now also in crash git, see
commit 8e15958e1b7183bbfbdf004f0ad8f2b62f023f9f.
So, how do you recognize wrong register dump?
Some symptoms:
> PID: 0 TASK: ffff880232b2c440 CPU: 7 COMMAND:
"kworker/0:1"
> #0 [ffff88023fdc7e40] crash_nmi_callback at ffffffff8102428f
> #1 [ffff88023fdc7e50] notifier_call_chain at ffffffff81461ec7
> #2 [ffff88023fdc7e80] __atomic_notifier_call_chain at ffffffff81461f0d
> #3 [ffff88023fdc7e90] notify_die at ffffffff81461f5d
> #4 [ffff88023fdc7ec0] default_do_nmi at ffffffff8145f3a7
> #5 [ffff88023fdc7ee0] do_nmi at ffffffff8145f5d8
> #6 [ffff88023fdc7ef0] restart_nmi at ffffffff8145eb2d
> [exception RIP: mwait_idle+423]
> RIP: ffffffff8100b217 RSP: ffff880232b2ff18 RFLAGS: 00000246
> RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000246
RAX is the kernel code segment (copied CS)
RBX is the kernel code segment (saved CS)
RCX looks like RFLAGS (note the typical 246 at the end).
> RDX: ffff880232b2ff18 RSI: 0000000000000018 RDI:
0000000000000001
RDX points to a kernel stack
RSI is the kernel data segment (copied SS)
RDI is always 1 (the NMI executing flag)
> RBP: ffffffff8100b217 R8: ffffffff8100b217 R9:
0000000000000018
RBP points to kernel text
R8 points to kernel text
R9 is the kernel data segment (saved SS)
> R10: ffff880232b2ff18 R11: 0000000000000246 R12:
ffffffffffffffff
R10 points to a kernel stack
R11 looks like RFLAGS
HTH,
Petr Tesarik
> R13: ffffffff81d36108 R14: ffff880232b2ffd8 R15:
0000000000000000
> ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
> --- <NMI exception stack> ---
> #7 [ffff880232b2ff18] mwait_idle at ffffffff8100b217
> #8 [ffff880232b2ff30] cpu_idle at ffffffff81002126
>
> If there is a nested NMI, reading the code suggests crash may loop again
> to the NMI stack, but I don't have a sample dump file ATM.
>
> Petr T
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
>
https://www.redhat.com/mailman/listinfo/crash-utility
>