2014-04-29 19:27 GMT+08:00 Petr Tesarik <ptesarik@suse.cz>:

It will show an incorrect register dump, but the backtrace continues.
For example:

Hi Petr,

The back trace looks good. 

How did you know the register dump is incorrect?

At least the value of RSP saved in NMI stack seemed to be good,

RSP: ffff880232b2ff18 

Recently, I'm working on a core file analysis, and found crash tool couldn't give the correct NMI back trace.
But I can find right stack trace by using IST pointer.

I'm wondering whether your patch could work for my cases.

May I can try your fix after it is ready.
 

PID: 0      TASK: ffff880232b2c440  CPU: 7   COMMAND: "kworker/0:1"
 #0 [ffff88023fdc7e40] crash_nmi_callback at ffffffff8102428f
 #1 [ffff88023fdc7e50] notifier_call_chain at ffffffff81461ec7
 #2 [ffff88023fdc7e80] __atomic_notifier_call_chain at ffffffff81461f0d
 #3 [ffff88023fdc7e90] notify_die at ffffffff81461f5d
 #4 [ffff88023fdc7ec0] default_do_nmi at ffffffff8145f3a7
 #5 [ffff88023fdc7ee0] do_nmi at ffffffff8145f5d8
 #6 [ffff88023fdc7ef0] restart_nmi at ffffffff8145eb2d
    [exception RIP: mwait_idle+423]
    RIP: ffffffff8100b217  RSP: ffff880232b2ff18  RFLAGS: 00000246
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000246
    RDX: ffff880232b2ff18  RSI: 0000000000000018  RDI: 0000000000000001
    RBP: ffffffff8100b217   R8: ffffffff8100b217   R9: 0000000000000018
    R10: ffff880232b2ff18  R11: 0000000000000246  R12: ffffffffffffffff
    R13: ffffffff81d36108  R14: ffff880232b2ffd8  R15: 0000000000000000
    ORIG_RAX: 0000000000000000  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #7 [ffff880232b2ff18] mwait_idle at ffffffff8100b217
 #8 [ffff880232b2ff30] cpu_idle at ffffffff81002126

If there is a nested NMI, reading the code suggests crash may loop again to the NMI stack, but I don't have a sample dump file ATM.

Petr T



--
------------------
Oliver Yang