On Fri, Feb 3, 2023 at 9:36 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
On 2023/02/02 19:53, lijiang wrote:
>> Thank you for the comment, Kazu.
>>
>> On Thu, Feb 2, 2023 at 5:16 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com <mailto:k-hagio-ab@nec.com>> wrote:
>>
>>     Hi Lianbo,
>>
>>     thanks for the patch.
>>
>>     On 2023/02/02 16:01, Lianbo Jiang wrote:
>>     > Currently, the "bt" command may print a bogus exception frame
>>     > and the rest frame will be truncated on x86 64 when using the
>>     > "virsh send-key <kvm guest> KEY_LEFTALT KEY_SYSRQ KEY_C" command
>>     > to trigger a panic from the KVM host. For example:
>>     >
>>     >    crash> bt
>>     >    PID: 0        TASK: ffff9e7a47e32f00  CPU: 3    COMMAND: "swapper/3"
>>     >     #0 [ffffba7900118bb8] machine_kexec at ffffffff87e5c2c7
>>     >     #1 [ffffba7900118c08] __crash_kexec at ffffffff87f9500d
>>     >     #2 [ffffba7900118cd0] panic at ffffffff87edfff9
>>     >     #3 [ffffba7900118d50] sysrq_handle_crash at ffffffff883ce2c1
>>     >     #4 [ffffba7900118d58] __handle_sysrq.cold.15 at ffffffff883ceb56
>>     >     #5 [ffffba7900118d88] sysrq_filter at ffffffff883ce9a2
>>     >     #6 [ffffba7900118dc0] input_to_handler at ffffffff884fb0bf
>>     >     #7 [ffffba7900118df8] input_pass_values at ffffffff884fc1b7
>>     >     #8 [ffffba7900118e20] input_handle_event at ffffffff884fe278
>>     >     #9 [ffffba7900118e50] input_event at ffffffff884fe74b
>>     >     #10 [ffffba7900118e88] atkbd_interrupt at ffffffff88504e2f
>>     >     #11 [ffffba7900118ee0] serio_interrupt at ffffffff884f7516
>>     >     #12 [ffffba7900118f10] i8042_interrupt at ffffffff884f8b04
>>     >     #13 [ffffba7900118f50] __handle_irq_event_percpu at ffffffff87f51430
>>     >     #14 [ffffba7900118f90] handle_irq_event_percpu at ffffffff87f51590
>>     >     #15 [ffffba7900118fb8] handle_irq_event at ffffffff87f51616
>>     >     #16 [ffffba7900118fd8] handle_edge_irq at ffffffff87f559f2
>>     >     #17 [ffffba7900118ff0] asm_call_on_stack at ffffffff88800fa2
>>     >     --- <IRQ stack> ---
>>     >     #18 [ffffba790008bda0] asm_call_on_stack at ffffffff88800fa2
>>     >         RIP: ffffffffffffffff  RSP: 0000000000000124  RFLAGS: 00000003
>>     >         RAX: 0000000000000000  RBX: 0000000000000001  RCX: 0000000000000000
>>     >         RDX: ffffffff88800c1e  RSI: 0000000000000000  RDI: 0000000000000000
>>     >         RBP: 0000000000000001   R8: 0000000000000000   R9: 0000000000000000
>>     >         R10: 0000000000000000  R11: ffffffff88760555  R12: ffffba790008be08
>>     >         R13: ffffffff87f18002  R14: ffff9e7a47e32f00  R15: ffff9e7bb6198e00
>>     >         ORIG_RAX: 0000000000000000  CS: 0003  SS: 0000
>>     >    bt: WARNING: possibly bogus exception frame
>>     >    crash>
>>
>>     Could I have the bt result with the patch?
>>
>> Sure. I tested it on upstream kernel with the patch:
>>
>> crash> bt
>> PID: 0        TASK: ffff9e7a47e32f00  CPU: 3    COMMAND: "swapper/3"
>>  #0 [ffffba7900118bb8] machine_kexec at ffffffff87e5c2c7
>>  #1 [ffffba7900118c08] __crash_kexec at ffffffff87f9500d
>>  #2 [ffffba7900118cd0] panic at ffffffff87edfff9
>>  #3 [ffffba7900118d50] sysrq_handle_crash at ffffffff883ce2c1
>>  #4 [ffffba7900118d58] __handle_sysrq.cold.15 at ffffffff883ceb56
>>  #5 [ffffba7900118d88] sysrq_filter at ffffffff883ce9a2
>>  #6 [ffffba7900118dc0] input_to_handler at ffffffff884fb0bf
>>  #7 [ffffba7900118df8] input_pass_values at ffffffff884fc1b7
>>  #8 [ffffba7900118e20] input_handle_event at ffffffff884fe278
>>  #9 [ffffba7900118e50] input_event at ffffffff884fe74b
>> #10 [ffffba7900118e88] atkbd_interrupt at ffffffff88504e2f
>> #11 [ffffba7900118ee0] serio_interrupt at ffffffff884f7516
>> #12 [ffffba7900118f10] i8042_interrupt at ffffffff884f8b04
>> #13 [ffffba7900118f50] __handle_irq_event_percpu at ffffffff87f51430
>> #14 [ffffba7900118f90] handle_irq_event_percpu at ffffffff87f51590
>> #15 [ffffba7900118fb8] handle_irq_event at ffffffff87f51616
>> #16 [ffffba7900118fd8] handle_edge_irq at ffffffff87f559f2
>> #17 [ffffba7900118ff0] asm_call_on_stack at ffffffff88800fa2
>> --- <IRQ stack> ---
>> #18 [ffffba790008bd50] asm_call_on_stack at ffffffff88800fa2
>>     [exception RIP: common_interrupt+181]
>>     RIP: ffffffff88760555  RSP: 0000000000000000  RFLAGS: 00000000
>>     RAX: ffff9e7bb6198e00  RBX: ffff9e7bb6198e00  RCX: ffff9e7a47e32f00
>>     RDX: ffffffff87f18002  RSI: ffffba790008be08  RDI: 0000000000000001
>>     RBP: 0000000d25808fab   R8: ffffffff87f17d1c   R9: ffffffff87e29ec5
>>     R10: ffffffff87e63dcd  R11: ffff9e7a47e32f00  R12: ffffffff8808ea76
>>     R13: ffffffff8808e1d2  R14: 0000000000000046  R15: ffffffff87f762f1
>>     ORIG_RAX: 0000000000000001  CS: 0000  SS: 0000
>> #19 [ffffba790008be00] asm_common_interrupt at ffffffff88800c1e
>> #20 [ffffba790008be88] default_idle at ffffffff8876ffc5
>> #21 [ffffba790008bed8] do_idle at ffffffff87f1982d
>> #22 [ffffba790008bf20] cpu_startup_entry at ffffffff87f19a29
>> #23 [ffffba790008bf30] start_secondary at ffffffff87e5118f
>> #24 [ffffba790008bf50] secondary_startup_64 at ffffffff87e000e6

Thanks.  Hmm, I think it's strange that RIP is in commont_interrupt and
RSP is zero.  I guess RIP should be in default_idle.


Thank you for pointing out this issue. The RIP is in the default_idle according to the kernel log output.

I tried to hardcode and get the expected eframe information, but the backtrace is incorrect:

crash> bt
PID: 0        TASK: ffff9e7a47e32f00  CPU: 3    COMMAND: "swapper/3"
 #0 [ffffba7900118bb8] machine_kexec at ffffffff87e5c2c7
 #1 [ffffba7900118c08] __crash_kexec at ffffffff87f9500d
 #2 [ffffba7900118cd0] panic at ffffffff87edfff9
 #3 [ffffba7900118d50] sysrq_handle_crash at ffffffff883ce2c1
 #4 [ffffba7900118d58] __handle_sysrq.cold.15 at ffffffff883ceb56
 #5 [ffffba7900118d88] sysrq_filter at ffffffff883ce9a2
 #6 [ffffba7900118dc0] input_to_handler at ffffffff884fb0bf
 #7 [ffffba7900118df8] input_pass_values at ffffffff884fc1b7
 #8 [ffffba7900118e20] input_handle_event at ffffffff884fe278
 #9 [ffffba7900118e50] input_event at ffffffff884fe74b
#10 [ffffba7900118e88] atkbd_interrupt at ffffffff88504e2f
#11 [ffffba7900118ee0] serio_interrupt at ffffffff884f7516
#12 [ffffba7900118f10] i8042_interrupt at ffffffff884f8b04
#13 [ffffba7900118f50] __handle_irq_event_percpu at ffffffff87f51430
#14 [ffffba7900118f90] handle_irq_event_percpu at ffffffff87f51590
#15 [ffffba7900118fb8] handle_irq_event at ffffffff87f51616
#16 [ffffba7900118fd8] handle_edge_irq at ffffffff87f559f2
#17 [ffffba7900118ff0] asm_call_on_stack at ffffffff88800fa2
--- <IRQ stack> ---
#18 [ffffba790008be08] asm_call_on_stack at ffffffff88800fa2
    [exception RIP: default_idle+37]
    RIP: ffffffff8876ffc5  RSP: ffffba790008beb8  RFLAGS: 00000246
    RAX: ffffffff8876ffa0  RBX: 0000000000000003  RCX: 0000000000000001
    RDX: 000000000001a6de  RSI: 0000000000000087  RDI: 0000000000000003
    RBP: 0000000000000003   R8: 000000146ccbc1f0   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000124  R12: ffffffffffffffff
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#19 [ffffba790008bed8] do_idle at ffffffff87f1982d
#20 [ffffba790008bf20] cpu_startup_entry at ffffffff87f19a29
#21 [ffffba790008bf30] start_secondary at ffffffff87e5118f
#22 [ffffba790008bf50] secondary_startup_64 at ffffffff87e000e6
crash> 
 
Let me debug it more and to see what happened.

What is printed as the panic message in "log" command?


The kernel panic message:
 
[   57.542538] Kernel panic - not syncing: sysrq triggered crash
[   57.545512] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.7.0+ #7
[   57.549160] Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015
[   57.552655] Call Trace:
[   57.553994]  <IRQ>
[   57.555143]  dump_stack+0x57/0x70
[   57.556930]  panic+0xfb/0x2d7
[   57.558516]  ? printk+0x58/0x6f
[   57.560194]  sysrq_handle_crash+0x11/0x20
[   57.562371]  __handle_sysrq.cold.15+0x48/0x102
[   57.564663]  sysrq_filter+0x312/0x3c0
[   57.566579]  input_to_handler+0x4f/0xf0
[   57.568583]  input_pass_values.part.7+0x117/0x130
[   57.571188]  input_handle_event+0x138/0x5c0
[   57.573380]  input_event+0x4b/0x70
[   57.575173]  atkbd_interrupt+0x61f/0x6a0
[   57.577228]  serio_interrupt+0x46/0x90
[   57.579192]  i8042_interrupt+0x154/0x240
[   57.581328]  __handle_irq_event_percpu+0x40/0x170
[   57.583788]  handle_irq_event_percpu+0x30/0x80
[   57.586081]  handle_irq_event+0x36/0x53
[   57.588071]  handle_edge_irq+0x82/0x190
[   57.590096]  asm_call_on_stack+0x12/0x20
[   57.592142]  </IRQ>
[   57.593264]  common_interrupt+0xb5/0x140
[   57.595311]  asm_common_interrupt+0x1e/0x40
[   57.597474] RIP: 0010:default_idle+0x25/0x150
[   57.599735] Code: cc cc cc cc cc 66 66 66 66 90 41 55 41 54 55 65 8b 2d af 23 8a 77 53 66 66 66 66 90 e9 07 00 00 00 0f 00 2d 9f 37 49 00 fb f4 <65> 8b 2d 94 23 8a 77 66 66 66 66 90 5b 5d 41 5c 41 5d c3 65 8b 05
[   57.609196] RSP: 0018:ffffba790008beb8 EFLAGS: 00000246
[   57.611875] RAX: ffffffff8876ffa0 RBX: 0000000000000003 RCX: 0000000000000001
[   57.615506] RDX: 000000000001a6de RSI: 0000000000000087 RDI: 0000000000000003
[   57.619142] RBP: 0000000000000003 R08: 000000146ccbc1f0 R09: 0000000000000000
[   57.622770] R10: 0000000000000000 R11: 0000000000000124 R12: ffffffffffffffff
[   57.626386] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   57.630029]  ? __cpuidle_text_start+0x8/0x8
[   57.632209]  do_idle+0x1bd/0x240
[   57.633914]  cpu_startup_entry+0x19/0x20
[   57.635966]  start_secondary+0x15f/0x1b0
[   57.638021]  secondary_startup_64+0xb6/0xc0
 
Thanks.
Lianbo

>> crash>
>>
>>     I've still not learned the idtentry and etc. enough, but at least,
>>     a wrong exception frame is still shown at my end.  This is a RHEL9.1
>>     vmcore that reproduces the bogus exception frame error.
>>
>>     crash> bt
>>     PID: 0        TASK: ffffffff8881a940  CPU: 0    COMMAND: "swapper/0"
>>       #0 [ffffa5ab40003bb0] machine_kexec at ffffffff86e6973d
>>       #1 [ffffa5ab40003c00] __crash_kexec at ffffffff86fbe29d
>>       #2 [ffffa5ab40003cc8] panic at ffffffff8788cb6e
>>     ...
>>     #16 [ffffa5ab40003fc8] __common_interrupt at ffffffff86e2666e
>>     #17 [ffffa5ab40003ff0] common_interrupt at ffffffff878d83ee
>>     --- <IRQ stack> ---
>>     #18 [ffffffff88803d80] common_interrupt at ffffffff878d83ee
>>          [exception RIP: unknown or invalid address]             <<<-- invalid
>>          RIP: 0000000000000000  RSP: 000000007d3187b0  RFLAGS: ffffffff87a00ce2
>>          RAX: ffffffff86e71564  RBX: 0000000000000c56  RCX: ffffffff878da876
>>          RDX: 0000000000000000  RSI: ffffffff878dab3e  RDI: ffffffff878d8384
>>          RBP: 0000000000000000   R8: 0000000000000000   R9: 000000134ab46fed
>>          R10: ffff9143bbc2afc0  R11: ffff9143b9f2d540  R12: 0000000000000000
>>          R13: ffffffff86f390ae  R14: ffff9143b9f2d540  R15: ffff9143b9f2d540
>>          ORIG_RAX: 0000000000000000  CS: 0000  SS: 7d3187a8
>>     #19 [ffffffff88803e98] default_idle at ffffffff878e8570
>>     #20 [ffffffff88803ec0] default_idle_call at ffffffff878e86d3
>>     #21 [ffffffff88803ec8] cpuidle_idle_call at ffffffff86f4ac3d
>>     #22 [ffffffff88803f00] do_idle at ffffffff86f4ad1b
>>     ...
>>
>>     crash> dis default_idle
>>     0xffffffff878e8560 <default_idle>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
>>     0xffffffff878e8565 <default_idle+5>:    xchg   %ax,%ax
>>     0xffffffff878e8567 <default_idle+7>:    verw   0x51b2c2(%rip)        # 0xffffffff87e03830 <ds.0>
>>     0xffffffff878e856e <default_idle+14>:   sti
>>     0xffffffff878e856f <default_idle+15>:   hlt
>>     0xffffffff878e8570 <default_idle+16>:   ret
>>
>>     I guess "0xffffffff878e8570 <default_idle+16>" should be the exception
>>     RIP here..

I also should have check the panic message...
[   82.842430]  common_interrupt+0x7e/0xa0
[   82.842741]  </IRQ>
[   82.842927]  asm_common_interrupt+0x22/0x40
[   82.843166] RIP: 0010:default_idle+0x10/0x20
[   82.843616] Code: cc 0f ae f0 0f ae 38 0f ae f0 eb b5 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 66 90 0f 00 2d c2 b2 51 00 fb f4 <c3> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 65
[   82.844714] RSP: 0018:ffffffff88803ec0 EFLAGS: 00000242
[   82.845002] RAX: ffffffff878e8560 RBX: ffffffff8881a940 RCX: 0000000000000001
[   82.845396] RDX: 000000000000f92b RSI: 0000000000000083 RDI: 000000000000f92c
[   82.845853] RBP: 0000000000000000 R08: 00000014e57bc624 R09: 0000000000000001
[   82.846263] R10: 0000000000000400 R11: 0000000000000000 R12: 0000000000000000
[   82.846733] R13: 000000007da66a18 R14: 000000007d3187a8 R15: 000000007d3187b0
[   82.847143]  ? mwait_idle+0x80/0x80
[   82.847349]  default_idle_call+0x33/0xe0
[   82.847640]  cpuidle_idle_call+0x15d/0x1c0

In this case, RIP is "0xffffffff878e8570 <default_idle+16>" and
at ffffffff88803e98.  This will be pt_regs.ip.

crash> bt -r
PID: 0        TASK: ffffffff8881a940  CPU: 0    COMMAND: "swapper/0"
...
ffffffff88803e70:  0000000000000001 000000000000f92b
ffffffff88803e80:  0000000000000083 000000000000f92c
ffffffff88803e90:  ffffffffffffffff default_idle+16  <<<
ffffffff88803ea0:  0000000000000010 0000000000000242
ffffffff88803eb0:  init_thread_union+16064 0000000000000018
...
crash> struct -o pt_regs.ip
struct pt_regs {
   [128] unsigned long ip;
}
crash> px 0xffffffff88803e98-128
$6 = 0xffffffff88803e18
crash> struct -x pt_regs 0xffffffff88803e18
struct pt_regs {
   r15 = 0x7d3187b0,
   r14 = 0x7d3187a8,
   r13 = 0x7da66a18,
   r12 = 0x0,
   bp = 0x0,
   bx = 0xffffffff8881a940,
   r11 = 0x0,
   r10 = 0x400,
   r9 = 0x1,
   r8 = 0x14e57bc624,
   ax = 0xffffffff878e8560,
   cx = 0x1,
   dx = 0xf92b,
   si = 0x83,
   di = 0xf92c,
   orig_ax = 0xffffffffffffffff,
   ip = 0xffffffff878e8570,
   cs = 0x10,
   flags = 0x242,
   sp = 0xffffffff88803ec0,
   ss = 0x18
}

This looks same as the saved regs in the panic message.
>>
>>
>> The above address of RIP is zero. Maybe this is a specific RHEL9 vmcore issue, or it
>> is related to some kernel specific config? I will dig into the details later.
>>
>>     >
>>     > Kernel commit fa5e5c409213 ("x86/entry: Use idtentry for interrupts")
>>     > change causes the current issue, crash needs to adjust the value of
>>     > irq eframe.
>>     >
>>     > Signed-off-by: Lianbo Jiang <lijiang@redhat.com <mailto:lijiang@redhat.com>>
>>     > ---
>>     >   x86_64.c | 15 +++++++++++----
>>     >   1 file changed, 11 insertions(+), 4 deletions(-)
>>     >
>>     > diff --git a/x86_64.c b/x86_64.c
>>     > index 7a5d6f050c89..6504d16fb17e 100644
>>     > --- a/x86_64.c
>>     > +++ b/x86_64.c
>>     > @@ -4647,10 +4647,15 @@ x86_64_exception_frame(ulong flags, ulong kvaddr, char *local,
>>     >       r14 = ULONG(pt_regs_buf + ms->pto.r14);
>>     >       r15 = ULONG(pt_regs_buf + ms->pto.r15);
>>     >
>>     > -        verified = x86_64_eframe_verify(bt,
>>     > -             kvaddr ? kvaddr : (local - bt->stackbuf) + bt->stackbase,
>>     > -             cs, ss, rip, rsp, rflags);
>>     > -
>>     > +     /*
>>     > +      * Do not verify when the flags is not set to EFRAME_VERIFY.
>>     > +      */
>>     > +     if (symbol_search("asm_common_interrupt") && !(flags & EFRAME_VERIFY))
>>     > +             verified = TRUE;
>>     > +     else
>>     > +             verified = x86_64_eframe_verify(bt,
>>     > +                     kvaddr ? kvaddr : (local - bt->stackbuf) + bt->stackbase,
>>     > +                     cs, ss, rip, rsp, rflags);
>>     >       /*
>>     >        *  If it's print-if-verified request, don't print bogus eframes.
>>     >        */
>>     > @@ -6577,6 +6582,8 @@ x86_64_irq_eframe_link(ulong stkref, struct bt_info *bt, FILE *ofp)
>>     >               return stkref;
>>     >
>>     >       irq_eframe = stkref - machdep->machspec->irq_eframe_link;
>>     > +     if (symbol_search("asm_common_interrupt"))
>>     > +             irq_eframe -= 80;
>>
>>     So, where does this 80 come from?
>>
>>
>> crash> dis asm_common_interrupt
>> 0xffffffff88800c00 <asm_common_interrupt>:      data16 xchg %ax,%ax
>> 0xffffffff88800c03 <asm_common_interrupt+3>:    call   0xffffffff88801130 <error_entry>
>> 0xffffffff88800c08 <asm_common_interrupt+8>:    mov    %rsp,%rdi
>> 0xffffffff88800c0b <asm_common_interrupt+11>:   mov    0x78(%rsp),%rsi
>> 0xffffffff88800c10 <asm_common_interrupt+16>:   movq   $0xffffffffffffffff,0x78(%rsp)
>> 0xffffffff88800c19 <asm_common_interrupt+25>:   call   0xffffffff887604a0 <common_interrupt>
>> 0xffffffff88800c1e <asm_common_interrupt+30>:   jmp    0xffffffff88801210 <error_return>
>> crash>
>>
>> The rsp offset is 0x78, because the irq_eframe_link is initialized to 40, so the irq eframe offset needs to
>> subtract 80( 0x78 = 120 = 40 + 80).

hmm, I will check how the irq_eframe_link is initialized.

Thanks,
Kazu