On Thu, Feb 16, 2023 at 2:22 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
On 2023/02/16 13:05, lijiang wrote:
>>> On Thu, Feb 16, 2023 at 8:48 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com <mailto:k-hagio-ab@nec.com>> wrote:
>>>
>>>     On 2023/02/15 17:24, Lianbo Jiang wrote:
>>>     > Kernel commit 7d65f4a65532 ("irq: Consolidate do_softirq() arch overriden
>>>     > implementations") renamed the call_softirq to do_softirq_own_stack.
>>>     > Crash may incorrectly output the eframe stack with a warning as below:
>>>     >
>>>     >    crash> foreach bt
>>>     >    ...
>>>     >    PID: 0        TASK: ffff914f820a8000  CPU: 25   COMMAND: "swapper/25"
>>>     >     #0 [fffffe0000504e48] crash_nmi_callback at ffffffffa665d763
>>>     >     #1 [fffffe0000504e50] nmi_handle at ffffffffa662a423
>>>     >     #2 [fffffe0000504ea8] default_do_nmi at ffffffffa6fe7dc9
>>>     >     #3 [fffffe0000504ec8] do_nmi at ffffffffa662a97f
>>>     >     #4 [fffffe0000504ef0] end_repeat_nmi at ffffffffa70015e8
>>>     >        [exception RIP: clone_endio+172]
>>>     >        RIP: ffffffffc005c1ec  RSP: ffffa1d403d08e98  RFLAGS: 00000246
>>>     >        RAX: 0000000000000000  RBX: ffff915326fba230  RCX: 0000000000000018
>>>     >        RDX: ffffffffc0075400  RSI: 0000000000000000  RDI: ffff915326fba230
>>>     >        RBP: ffff915326fba1c0   R8: 0000000000001000   R9: ffff915308d6d2a0
>>>     >        R10: 000000a97dfe5e10  R11: ffffa1d40038fe98  R12: ffff915302babc40
>>>     >        R13: ffff914f94360000  R14: 0000000000000000  R15: 0000000000000000
>>>     >        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>     >    --- <NMI exception stack> ---
>>>     >     #5 [ffffa1d403d08e98] clone_endio at ffffffffc005c1ec [dm_mod]
>>>     >     #6 [ffffa1d403d08ed0] blk_update_request at ffffffffa6a96954
>>>     >     #7 [ffffa1d403d08f10] scsi_end_request at ffffffffa6c9b968
>>>     >     #8 [ffffa1d403d08f48] scsi_io_completion at ffffffffa6c9bb3e
>>>     >     #9 [ffffa1d403d08f90] blk_complete_reqs at ffffffffa6aa0e95
>>>     >     #10 [ffffa1d403d08fa0] __softirqentry_text_start at ffffffffa72000dc
>>>     >     #11 [ffffa1d403d08ff0] do_softirq_own_stack at ffffffffa7000f9a
>>>     >    --- <IRQ stack> ---
>>>     >     #12 [ffffa1d40038fe70] do_softirq_own_stack at ffffffffa7000f9a
>>>     >        [exception RIP: unknown or invalid address]
>>>     >        RIP: 0000000000000000  RSP: 0000000000000000  RFLAGS: 00000000
>>>     >        RAX: ffffffffa672eae5  RBX: ffffffffa83b34e0  RCX: ffffffffa672eb12
>>>     >        RDX: 0000000000000010  RSI: 8b7d6c8869010c00  RDI: 0000000000000085
>>>     >        RBP: 0000000000000286   R8: ffff914f820a8000   R9: ffffffffa67a94e0
>>>     >        R10: 0000000000000286  R11: ffffffffa66fb4c5  R12: ffffffffa67a898b
>>>     >        R13: 0000000000000000  R14: fffffffffffffff8  R15: ffffffffa67a1e68
>>>     >        ORIG_RAX: 0000000000000000  CS: 0000  SS: ffffffffa672edff
>>>     >     bt: WARNING: possibly bogus exception frame
>>>     >     #13 [ffffa1d40038ff30] start_secondary at ffffffffa665fa2c
>>>     >     #14 [ffffa1d40038ff50] secondary_startup_64_no_verify at ffffffffa6600116
>>>     >     ...
>>>
>>>     Thank you for the patch.  Could I have "bt -c 25" output with the patch
>>>     and "bt -r -c 25 | tail -n 40" output for checking?
>>>
>>> Sure.

Thank you, looks better.  Applied with Georges' suggestion and a few
commit log changes.
https://github.com/crash-utility/crash/commit/e0e6e4a7ee03b3d00b50a9e4db2f2ea6f7da0da3


Looks good.
 
The irq_eframe_link calculated on RHEL8 also may be wrong, but
 
Those kernel commits are not introduced:
    [1] v5.8: 931b94145981 ("x86/entry: Provide helpers for executing on the irqstack")
    [2] v5.8: fa5e5c409213 ("x86/entry: Use idtentry for interrupts")
    [3] v5.12: 52d743f3b712 ("x86/softirq: Remove indirection in do_softirq_own_stack()")

I tested it(based on 452) and didn't get any errors:

crash> bt
PID: 0        TASK: ffff9ed24128a800  CPU: 3    COMMAND: "swapper/3"
 #0 [ffffb7c300db8b80] machine_kexec at ffffffff8f46b063
 #1 [ffffb7c300db8bd8] __crash_kexec at ffffffff8f5b357a
 #2 [ffffb7c300db8c98] panic at ffffffff8f4f50ff
 #3 [ffffb7c300db8d18] sysrq_handle_crash at ffffffff8f9f75e1
 #4 [ffffb7c300db8d20] __handle_sysrq.cold.13 at ffffffff8f9f7f04
 #5 [ffffb7c300db8d50] sysrq_filter at ffffffff8f9f7cbd
 #6 [ffffb7c300db8d88] input_to_handler at ffffffff8fb3780f
 #7 [ffffb7c300db8dc0] input_pass_values at ffffffff8fb388ab
 #8 [ffffb7c300db8de8] input_handle_event at ffffffff8fb3a9e8
 #9 [ffffb7c300db8e18] input_event at ffffffff8fb3ae9f
#10 [ffffb7c300db8e50] atkbd_interrupt at ffffffff8fb413a3
#11 [ffffb7c300db8ea8] serio_interrupt at ffffffff8fb33a86
#12 [ffffb7c300db8ed8] i8042_interrupt at ffffffff8fb35134
#13 [ffffb7c300db8f18] __handle_irq_event_percpu at ffffffff8f569d70
#14 [ffffb7c300db8f58] handle_irq_event_percpu at ffffffff8f569ef0
#15 [ffffb7c300db8f80] handle_irq_event at ffffffff8f569f76
#16 [ffffb7c300db8fa0] handle_edge_irq at ffffffff8f56e3d2
#17 [ffffb7c300db8fb8] handle_irq at ffffffff8f42924c
#18 [ffffb7c300db8fc0] do_IRQ at ffffffff8fe01e89
--- <IRQ stack> ---
#19 [ffffb7c300cf3de8] ret_from_intr at ffffffff8fe00a8f
    [exception RIP: native_safe_halt+14]
    RIP: ffffffff8fdf7eae  RSP: ffffb7c300cf3e98  RFLAGS: 00000206
    RAX: ffffffff8fdf7d40  RBX: 0000000000000003  RCX: 0000000000000001
    RDX: 0000000000929b82  RSI: 0000000000000083  RDI: 0000000000000003
    RBP: 0000000000000003   R8: 0000b7ac3793def2   R9: 000000000004f515
    R10: 0000000000000000  R11: 0000000000000019  R12: 0000000000000000
    R13: 0000000000000000  R14: ffffffffffffffff  R15: ffff9ed24128a800
    ORIG_RAX: ffffffffffffffdb  CS: 0010  SS: 0018
#20 [ffffb7c300cf3e98] default_idle at ffffffff8fdf7d4a
#21 [ffffb7c300cf3ea0] default_idle_call at ffffffff8fdf8034
#22 [ffffb7c300cf3ec0] do_idle at ffffffff8f52eb63
#23 [ffffb7c300cf3f10] cpu_startup_entry at ffffffff8f52edff
#24 [ffffb7c300cf3f30] start_secondary at ffffffff8f45fa2c
#25 [ffffb7c300cf3f50] secondary_startup_64_no_verify at ffffffff8f400116

crash> log
...
[161537.153436] Call Trace:
[161537.161011]  <IRQ>
[161537.162223]  dump_stack+0x41/0x60
[161537.166883]  panic+0xe7/0x2ac
[161537.168950]  ? printk+0x58/0x73
[161537.170718]  sysrq_handle_crash+0x11/0x20
[161537.177654]  __handle_sysrq.cold.13+0x48/0xff
[161537.180096]  sysrq_filter+0x32d/0x3f0
[161537.182079]  input_to_handler+0x4f/0xf0
[161537.185589]  input_pass_values.part.7+0x11b/0x140
[161537.188160]  input_handle_event+0x138/0x5a0
[161537.190729]  input_event+0x4f/0x80
[161537.192653]  atkbd_interrupt+0x623/0x6b0
[161537.195040]  serio_interrupt+0x46/0x90
[161537.201579]  i8042_interrupt+0x154/0x240
[161537.203750]  __handle_irq_event_percpu+0x40/0x190
[161537.208421]  handle_irq_event_percpu+0x30/0x80
[161537.210869]  handle_irq_event+0x36/0x57
[161537.212946]  handle_edge_irq+0x82/0x190
[161537.215345]  handle_irq+0x1c/0x30
[161537.217128]  do_IRQ+0x49/0xd0
[161537.219876]  common_interrupt+0xf/0xf
[161537.221828]  </IRQ>
[161537.222944] RIP: 0010:native_safe_halt+0xe/0x20
[161537.225239] Code: 00 f0 80 48 02 20 48 8b 00 a8 08 75 c0 e9 79 ff ff ff 90 9
0 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 96 22 41 00 fb f4 <e9> 8d 95 2
0 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 e9 07 00 00
[161537.234417] RSP: 0018:ffffb7c300cf3e98 EFLAGS: 00000206 ORIG_RAX: ffffffffffffffdb
[161537.238201] RAX: ffffffff8fdf7d40 RBX: 0000000000000003 RCX: 0000000000000001
[161537.241756] RDX: 0000000000929b82 RSI: 0000000000000083 RDI: 0000000000000003
[161537.245308] RBP: 0000000000000003 R08: 0000b7ac3793def2 R09: 000000000004f515
[161537.248868] R10: 0000000000000000 R11: 0000000000000019 R12: 0000000000000000
[161537.252420] R13: 0000000000000000 R14: ffffffffffffffff R15: ffff9ed24128a800
[161537.255983]  ? tdx_safe_halt+0x80/0x80
[161537.259858]  default_idle+0xa/0x20
[161537.261677]  default_idle_call+0x44/0xf0
[161537.263719]  do_idle+0x213/0x2d0
[161537.265694]  cpu_startup_entry+0x6f/0x80
[161537.267750]  start_secondary+0x18c/0x1d0
[161537.270058]  secondary_startup_64_no_verify+0xd1/0xdb
...   
 
crash> help -m |grep link
          irq_eframe_link: 40

I have a plan of changing that part and will revisit with it.


Sounds good.

Thanks.
Lianbo
 
Thanks,
Kazu

>>>
>>> crash> bt -c 25
>>> PID: 0        TASK: ffff914f820a8000  CPU: 25   COMMAND: "swapper/25"
>>>  #0 [fffffe0000504e48] crash_nmi_callback at ffffffffa665d763
>>>  #1 [fffffe0000504e50] nmi_handle at ffffffffa662a423
>>>  #2 [fffffe0000504ea8] default_do_nmi at ffffffffa6fe7dc9
>>>  #3 [fffffe0000504ec8] do_nmi at ffffffffa662a97f
>>>  #4 [fffffe0000504ef0] end_repeat_nmi at ffffffffa70015e8
>>>     [exception RIP: clone_endio+172]
>>>     RIP: ffffffffc005c1ec  RSP: ffffa1d403d08e98  RFLAGS: 00000246
>>>     RAX: 0000000000000000  RBX: ffff915326fba230  RCX: 0000000000000018
>>>     RDX: ffffffffc0075400  RSI: 0000000000000000  RDI: ffff915326fba230
>>>     RBP: ffff915326fba1c0   R8: 0000000000001000   R9: ffff915308d6d2a0
>>>     R10: 000000a97dfe5e10  R11: ffffa1d40038fe98  R12: ffff915302babc40
>>>     R13: ffff914f94360000  R14: 0000000000000000  R15: 0000000000000000
>>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>> --- <NMI exception stack> ---
>>>  #5 [ffffa1d403d08e98] clone_endio at ffffffffc005c1ec [dm_mod]
>>>  #6 [ffffa1d403d08ed0] blk_update_request at ffffffffa6a96954
>>>  #7 [ffffa1d403d08f10] scsi_end_request at ffffffffa6c9b968
>>>  #8 [ffffa1d403d08f48] scsi_io_completion at ffffffffa6c9bb3e
>>>  #9 [ffffa1d403d08f90] blk_complete_reqs at ffffffffa6aa0e95
>>> #10 [ffffa1d403d08fa0] __softirqentry_text_start at ffffffffa72000dc
>>> #11 [ffffa1d403d08ff0] do_softirq_own_stack at ffffffffa7000f9a
>>> --- <IRQ stack> ---
>>> #12 [ffffa1d40038fe70] update_ts_time_stats at ffffffffa67a1e68
>>> #13 [ffffa1d40038fea0] do_softirq at ffffffffa66fb4c5
>>> #14 [ffffa1d40038feb0] flush_smp_call_function_queue at ffffffffa67a94e0
>>> #15 [ffffa1d40038fec0] do_idle at ffffffffa672eae5
>>> #16 [ffffa1d40038ff10] cpu_startup_entry at ffffffffa672edff
>>> #17 [ffffa1d40038ff30] start_secondary at ffffffffa665fa2c
>>> #18 [ffffa1d40038ff50] secondary_startup_64_no_verify at ffffffffa6600116
>>> crash>
>>> crash> bt -r -c 25 | tail -n 40
>>> ffffa1d40038fd80:  ffff9156ff26afc0 0000002027c3e7a1
>>> ffffa1d40038fd90:  __next_timer_interrupt+166 ffff9156ff25aa68
>>> ffffa1d40038fda0:  00000000fffd85da 0000000000000082
>>> ffffa1d40038fdb0:  0000000000000082 hrtimer_get_next_event+78
>>> ffffa1d40038fdc0:  ffff9156ff25aa40 000000200cd15980
>>> ffffa1d40038fdd0:  __hrtimer_next_event_base+192 ffff9156ff25cf40
>>> ffffa1d40038fde0:  7fffffffffffffff ffff9156ff25d580
>>> ffffa1d40038fdf0:  0000000000000082 ffffffffffffffff
>>> ffffa1d40038fe00:  0000000000000082 hrtimer_next_event_without+96
>>> ffffa1d40038fe10:  ffff9156ff25d580 000000200cff2040
>>> ffffa1d40038fe20:  000000200cdb406e ffff9153052f8c00
>>> ffffa1d40038fe30:  sched_clock+5    sched_clock_cpu+12
>>> ffffa1d40038fe40:  acpi_idle_driver+136 ffff9153052f8c00
>>> ffffa1d40038fe50:  cpuidle_enter_state+181 000000203501afdb
>>> ffffa1d40038fe60:  00ffffffa82c44c0 read_tsc
>>> ffffa1d40038fe70:  update_ts_time_stats+88 fffffffffffffff8
>>> ffffa1d40038fe80:  0000000000000000 __flush_smp_call_function_queue+219
>>> ffffa1d40038fe90:  0000000000000286 __cpu_online_mask
>>> ffffa1d40038fea0:  do_softirq+69    0000000000000286
>>> ffffa1d40038feb0:  flush_smp_call_function_queue+96 ffff914f820a8000
>>> ffffa1d40038fec0:  do_idle+405      do_idle+450
>>> ffffa1d40038fed0:  0000000000000010 8b7d6c8869010c00
>>> ffffa1d40038fee0:  0000000000000085 0000000000000000
>>> ffffa1d40038fef0:  0000000000000000 0000000000000000
>>> ffffa1d40038ff00:  0000000000000000 0000000000000000
>>> ffffa1d40038ff10:  cpu_startup_entry+111 8b7d6c8869010c00
>>> ffffa1d40038ff20:  534e004fc6e00600 ffffa1d40038ff38
>>> ffffa1d40038ff30:  start_secondary+396 534e004fc6e00600
>>> ffffa1d40038ff40:  0000000000000000 0000000040000000
>>> ffffa1d40038ff50:  secondary_startup_64_no_verify+209 0000000000000000
>>> ffffa1d40038ff60:  0000000000000000 0000000000000000
>>> ffffa1d40038ff70:  0000000000000000 0000000000000000
>>> ffffa1d40038ff80:  0000000000000000 0000000000000000
>>> ffffa1d40038ff90:  0000000000000000 0000000000000000
>>> ffffa1d40038ffa0:  0000000000000000 0000000000000000
>>> ffffa1d40038ffb0:  0000000000000000 0000000000000000
>>> ffffa1d40038ffc0:  0000000000000000 0000000000000000
>>> ffffa1d40038ffd0:  0000000000000000 0000000000000000
>>> ffffa1d40038ffe0:  0000000000000000 0000000000000000
>>> ffffa1d40038fff0:  0000000000000000 0000000000000000
>>> crash>
>>> Thanks.
>>> Lianbo
>>>
>>>     Thanks,
>>>     Kazu
>>>
>>>     >
>>>     > Also no exception frame when coming from do_softirq_own_stack.
>>>     >
>>>     > Reported-by: Marco Patalano <mpatalan@redhat.com <mailto:mpatalan@redhat.com>>
>>>     > Signed-off-by: Lianbo Jiang <lijiang@redhat.com <mailto:lijiang@redhat.com>>
>>>     > ---
>>>     >   x86_64.c | 5 +++--
>>>     >   1 file changed, 3 insertions(+), 2 deletions(-)
>>>     >
>>>     > diff --git a/x86_64.c b/x86_64.c
>>>     > index 5b671bd97775..3428bed417df 100644
>>>     > --- a/x86_64.c
>>>     > +++ b/x86_64.c
>>>     > @@ -3825,10 +3825,11 @@ in_exception_stack:
>>>     >               up -= 1;
>>>     >                   bt->instptr = *up;
>>>     >               /*
>>>     > -              *  No exception frame when coming from call_softirq.
>>>     > +              *  No exception frame when coming from call_softirq
>>>     > +              *  or do_softirq_own_stack.
>>>     >                */
>>>     >               if ((sp = value_search(bt->instptr, &offset)) &&
>>>     > -                 STREQ(sp->name, "call_softirq"))
>>>     > +                 (STREQ(sp->name, "call_softirq") || STREQ(sp->name, "do_softirq_own_stack")))
>>>     >                       irq_eframe = 0;
>>>     >                   bt->frameptr = 0;
>>>     >                   done = FALSE;
>>>