Re: [Crash-utility] Broken backtrace with nested NMIs

Saturday, 24 May 2014

On Sat, 24 May 2014 20:24:30 +0800
oliver yang <yangoliver(a)gmail.com&gt; wrote:

...
 2014-04-29 19:27 GMT+08:00 Petr Tesarik <ptesarik(a)suse.cz&gt;:

 >
 > It will show an incorrect register dump, but the backtrace continues.
 > For example:
 >

 Hi Petr,

 The back trace looks good.

 How did you know the register dump is incorrect? 
The saved registers did not make any sense in the interrupted code. ;-)
And one of them, which should have been a pointer, looked like RFLAGS.

...
 At least the value of RSP saved in NMI stack seemed to be good,

 RSP: ffff880232b2ff18 
Yes, SS, RSP, RFLAGS, CS, and RIP may look good, because they are
pushed onto stack by the CPU. But they may point back to a NMI if it
was a nested NMI. See my comments below.

...
 Recently, I'm working on a core file analysis, and found crash
tool
 couldn't give the correct NMI back trace.
 But I can find right stack trace by using IST pointer.

 I'm wondering whether your patch could work for my cases.

 May I can try your fix after it is ready. 
See https://www.redhat.com/archives/crash-utility/2014-April/msg00038.html

It's now also in crash git, see
commit 8e15958e1b7183bbfbdf004f0ad8f2b62f023f9f.

So, how do you recognize wrong register dump?
Some symptoms:

...
 > PID: 0      TASK: ffff880232b2c440  CPU: 7   COMMAND:
"kworker/0:1"
 >  #0 [ffff88023fdc7e40] crash_nmi_callback at ffffffff8102428f
 >  #1 [ffff88023fdc7e50] notifier_call_chain at ffffffff81461ec7
 >  #2 [ffff88023fdc7e80] __atomic_notifier_call_chain at ffffffff81461f0d
 >  #3 [ffff88023fdc7e90] notify_die at ffffffff81461f5d
 >  #4 [ffff88023fdc7ec0] default_do_nmi at ffffffff8145f3a7
 >  #5 [ffff88023fdc7ee0] do_nmi at ffffffff8145f5d8
 >  #6 [ffff88023fdc7ef0] restart_nmi at ffffffff8145eb2d
 >     [exception RIP: mwait_idle+423]
 >     RIP: ffffffff8100b217  RSP: ffff880232b2ff18  RFLAGS: 00000246
 >     RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000246 
RAX is the kernel code segment (copied CS)
RBX is the kernel code segment (saved CS)
RCX looks like RFLAGS (note the typical 246 at the end).

...
 >     RDX: ffff880232b2ff18  RSI: 0000000000000018  RDI:
0000000000000001 
RDX points to a kernel stack
RSI is the kernel data segment (copied SS)
RDI is always 1 (the NMI executing flag)

...
 >     RBP: ffffffff8100b217   R8: ffffffff8100b217   R9:
0000000000000018 
RBP points to kernel text
 R8 points to kernel text
 R9 is the kernel data segment (saved SS)

...
 >     R10: ffff880232b2ff18  R11: 0000000000000246  R12:
ffffffffffffffff 
R10 points to a kernel stack
R11 looks like RFLAGS

HTH,
Petr Tesarik

...
 >     R13: ffffffff81d36108  R14: ffff880232b2ffd8  R15:
0000000000000000
 >     ORIG_RAX: 0000000000000000  CS: 0010  SS: 0018
 > --- <NMI exception stack> ---
 >  #7 [ffff880232b2ff18] mwait_idle at ffffffff8100b217
 >  #8 [ffff880232b2ff30] cpu_idle at ffffffff81002126
 >
 > If there is a nested NMI, reading the code suggests crash may loop again
 > to the NMI stack, but I don't have a sample dump file ATM.
 >
 > Petr T
 >
 > --
 > Crash-utility mailing list
 > Crash-utility(a)redhat.com
 > https://www.redhat.com/mailman/listinfo/crash-utility
 >

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Broken backtrace with nested NMIs