On 2024/03/26 17:25, lijiang wrote:
On Tue, Mar 26, 2024 at 2:59 PM HAGIO KAZUHITO(萩尾 一仁)
<k-hagio-ab(a)nec.com <mailto:k-hagio-ab@nec.com>> wrote:
On 2024/03/26 15:44, lijiang wrote:
> Thanks for the comment, Kazu.
>
> On Tue, Mar 26, 2024 at 10:28 AM HAGIO KAZUHITO(萩尾 一仁)
> <k-hagio-ab(a)nec.com <mailto:k-hagio-ab@nec.com>
<mailto:k-hagio-ab@nec.com <mailto:k-hagio-ab@nec.com>>> wrote:
>
> Hi Lianbo,
>
> thanks for the patch.
>
> What is the kernel version of this vmcore?
>
>
> The kernel version is 5.14.0, but I did not reproduce it, it
seems it's
> not easy to reproduce.
I see, thanks.
If it's a RHEL kernel, please let me know the release number e.g.
5.14.0-362.8.1.el9_3.x86_64 ?
Not 8.1, it's the 5.14.0-362.2.1.el9_3.x86_64.
>
> and could I have "bt 0 -c 8 | tail -n 30" output?
>
> crash> bt 0 -c 8 | tail -n 30
oh my bad, lack of "bt -r" option...
how about "bt 0 -c 8 -r | tail -n 30" ?
crash> bt 0 -c 8 -r | tail -n 30
ffffbec3c022fe20: 0000000000000000 0000000000000000
ffffbec3c022fe30: ffff9948c08f6278 pick_next_task+82
ffffbec3c022fe40: ffffbec3c022fea0 0000000000000000
ffffbec3c022fe50: 0000000000000000 __switch_to_asm+58
ffffbec3c022fe60: finish_task_switch+140 0000000000000000
ffffbec3c022fe70: ffff9948c08f5640 ffff9948e6f03980
ffffbec3c022fe80: 0000000000000000 tick_nohz_next_event+90
ffffbec3c022fe90: ffff994c2f2a2ae0 0000000000000000
ffffbec3c022fea0: 0000000000000000 0000000000000008
ffffbec3c022feb0: ct_kernel_enter.constprop.0+64 0000000000000046
ffffbec3c022fec0: read_tsc ktime_get+56
ffffbec3c022fed0: 0000000000000000 __flush_smp_call_function_queue+206
ffffbec3c022fee0: 0000000000000286 ffff9948c08f5640
ffffbec3c022fef0: 0000000000000046 0000000000000286
ffffbec3c022ff00: flush_smp_call_function_queue+72 0000000000000008
ffffbec3c022ff10: do_idle+168 0000000040000000
ffffbec3c022ff20: 0000000000000094 cpu_startup_entry+25
ffffbec3c022ff30: 0000000000000000 start_secondary+269
ffffbec3c022ff40: 000000089726a2d0 e48885e126bc1600
ffffbec3c022ff50: secondary_startup_64_no_verify+229 0000000000000000
ffffbec3c022ff60: 0000000000000000 0000000000000000
ffffbec3c022ff70: 0000000000000000 0000000000000000
ffffbec3c022ff80: 0000000000000000 0000000000000000
ffffbec3c022ff90: 0000000000000000 0000000000000000
ffffbec3c022ffa0: 0000000000000000 0000000000000000
ffffbec3c022ffb0: 0000000000000000 0000000000000000
ffffbec3c022ffc0: 0000000000000000 0000000000000000
ffffbec3c022ffd0: 0000000000000000 0000000000000000
ffffbec3c022ffe0: 0000000000000000 0000000000000000
ffffbec3c022fff0: 0000000000000000 0000000000000000
crash> >
Thanks,
Kazu
> #4 [fffffe1788788ef0] end_repeat_nmi at ffffffff980015f9
> [exception RIP: __update_load_avg_se+13]
> RIP: ffffffff9736b16d RSP: ffffbec3c08acc78 RFLAGS: 00000046
> RAX: 0000000000000000 RBX: ffff994c2f2b1a40 RCX:
ffffbec3c08acdc0
> RDX: ffff9948e4fe1d80 RSI: ffff994c2f2b1a40 RDI:
0000001d7ad7d55d
> RBP: ffffbec3c08acc88 R8: 0000001d921fca6f R9:
ffff994c2f2b1328
> R10: 00000000fffd0010 R11: ffffffff98e060c0 R12:
0000001d7ad7d55d
> R13: 0000000000000005 R14: ffff994c2f2b19c0 R15:
0000000000000001
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> --- <NMI exception stack> ---
> #5 [ffffbec3c08acc78] __update_load_avg_se at ffffffff9736b16d
> #6 [ffffbec3c08acce0] enqueue_entity at ffffffff9735c9ab
> #7 [ffffbec3c08acd28] enqueue_task_fair at ffffffff9735cef8
> #8 [ffffbec3c08acd60] enqueue_task at ffffffff973481fa
> #9 [ffffbec3c08acd88] ttwu_do_activate at ffffffff9734aeed
> #10 [ffffbec3c08acdb0] try_to_wake_up at ffffffff9734c7d7
> #11 [ffffbec3c08ace08] __queue_work at ffffffff9732a4d2
> #12 [ffffbec3c08ace50] queue_work_on at ffffffff9732a6a4
> #13 [ffffbec3c08ace60] iomap_dio_bio_end_io at ffffffff976a7b4c
> #14 [ffffbec3c08ace90] clone_endio at ffffffffc090315f [dm_mod]
> #15 [ffffbec3c08aced0] blk_update_request at ffffffff9779b49d
> #16 [ffffbec3c08acf28] scsi_end_request at ffffffff97a3d5a7
> #17 [ffffbec3c08acf58] scsi_io_completion at ffffffff97a3e606
> #18 [ffffbec3c08acf90] blk_complete_reqs at ffffffff977978d0
> #19 [ffffbec3c08acfa0] __do_softirq at ffffffff97e66f7a
> #20 [ffffbec3c08acff0] do_softirq at ffffffff9730f6ef
> --- <IRQ stack> ---
> #21 [ffffbec3c022ff28] cpu_startup_entry at ffffffff973684a9
> #22 [ffffbec3c022ff38] start_secondary at ffffffff9726a3dd
> #23 [ffffbec3c022ff50] secondary_startup_64_no_verify at
ffffffff9720015a
> crash>
>
> If it's RHEL9, probably that do_softirq is called with this path.
>
> cpu_startup_entry
> do_idle
> flush_smp_call_function_queue
> do_softirq
>
> but do_idle is skipped as below, I'd like to check just in case..
>
> Good question. I noticed the call trace, but this may be another
issue.
Thank you for the bt -r information. Yes, it looks like they are
skipped probably due to x86_64_irq_eframe_link, but I don't have a good
idea for now. Let's fix this first.
I've moved "do_softirq" first to be checked and applied.
https://github.com/crash-utility/crash/commit/ce47cb8dabb56c88e2d753026a9...
Thanks,
Kazu
> > Thanks
> > Lianbo
> >
> > > #20 [ffffbec3c08acff0] do_softirq at ffffffff9730f6ef
> > > --- <IRQ stack> ---
> > > #21 [ffffbec3c022ff28] cpu_startup_entry at
> ffffffff973684a9
> >
> > Thanks,
> > Kazu
> >
> >
> > On 2024/03/19 16:59, Lianbo Jiang wrote:
> > > The "bogus exception frame" warning was observed again
on
> a specific
> > > vmcore, and the remaining frame was truncated on X86_64
> machine, when
> > > executing the "bt" command as below:
> > >
> > > crash> bt 0 -c 8
> > > PID: 0 TASK: ffff9948c08f5640 CPU: 8 COMMAND:
> > "swapper/8"
> > > #0 [fffffe1788788e58] crash_nmi_callback at
> ffffffff972672bb
> > > #1 [fffffe1788788e68] nmi_handle at ffffffff9722eb8e
> > > #2 [fffffe1788788eb0] default_do_nmi at ffffffff97e51cd0
> > > #3 [fffffe1788788ed0] exc_nmi at ffffffff97e51ee1
> > > #4 [fffffe1788788ef0] end_repeat_nmi at ffffffff980015f9
> > > [exception RIP: __update_load_avg_se+13]
> > > RIP: ffffffff9736b16d RSP: ffffbec3c08acc78
> RFLAGS: 00000046
> > > RAX: 0000000000000000 RBX: ffff994c2f2b1a40 RCX:
> > ffffbec3c08acdc0
> > > RDX: ffff9948e4fe1d80 RSI: ffff994c2f2b1a40 RDI:
> > 0000001d7ad7d55d
> > > RBP: ffffbec3c08acc88 R8: 0000001d921fca6f R9:
> > ffff994c2f2b1328
> > > R10: 00000000fffd0010 R11: ffffffff98e060c0 R12:
> > 0000001d7ad7d55d
> > > R13: 0000000000000005 R14: ffff994c2f2b19c0 R15:
> > 0000000000000001
> > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > > --- <NMI exception stack> ---
> > > #5 [ffffbec3c08acc78] __update_load_avg_se at
> ffffffff9736b16d
> > > #6 [ffffbec3c08acce0] enqueue_entity at ffffffff9735c9ab
> > > #7 [ffffbec3c08acd28] enqueue_task_fair at
> ffffffff9735cef8
> > > #8 [ffffbec3c08acd60] enqueue_task at ffffffff973481fa
> > > #9 [ffffbec3c08acd88] ttwu_do_activate at ffffffff9734aeed
> > > #10 [ffffbec3c08acdb0] try_to_wake_up at ffffffff9734c7d7
> > > #11 [ffffbec3c08ace08] __queue_work at ffffffff9732a4d2
> > > #12 [ffffbec3c08ace50] queue_work_on at ffffffff9732a6a4
> > > #13 [ffffbec3c08ace60] iomap_dio_bio_end_io at
> ffffffff976a7b4c
> > > #14 [ffffbec3c08ace90] clone_endio at ffffffffc090315f
> [dm_mod]
> > > #15 [ffffbec3c08aced0] blk_update_request at
> ffffffff9779b49d
> > > #16 [ffffbec3c08acf28] scsi_end_request at ffffffff97a3d5a7
> > > #17 [ffffbec3c08acf58] scsi_io_completion at
> ffffffff97a3e606
> > > #18 [ffffbec3c08acf90] blk_complete_reqs at
> ffffffff977978d0
> > > #19 [ffffbec3c08acfa0] __do_softirq at ffffffff97e66f7a
> > > #20 [ffffbec3c08acff0] do_softirq at ffffffff9730f6ef
> > > --- <IRQ stack> ---
> > > #21 [ffffbec3c022ff18] do_idle at ffffffff97368288
> > > [exception RIP: unknown or invalid address]
> > > RIP: 0000000000000000 RSP: 0000000000000000
> RFLAGS: 00000000
> > > RAX: 0000000000000000 RBX: 000000089726a2d0 RCX:
> > 0000000000000000
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 0000000000000000
> > > RBP: ffffffff9726a3dd R8: 0000000000000000 R9:
> > 0000000000000000
> > > R10: ffffffff9720015a R11: e48885e126bc1600 R12:
> > 0000000000000000
> > > R13: ffffffff973684a9 R14: 0000000000000094 R15:
> > 0000000040000000
> > > ORIG_RAX: 0000000000000000 CS: 0000 SS: 0000
> > > bt: WARNING: possibly bogus exception frame
> > > crash>
> > >
> > > Actually there is no exception frame, when called from
> do_softirq().
> > > With the patch:
> > >
> > > crash> bt 0 -c 8
> > > PID: 0 TASK: ffff9948c08f5640 CPU: 8 COMMAND:
> > "swapper/8"
> > > #0 [fffffe1788788e58] crash_nmi_callback at
> ffffffff972672bb
> > > #1 [fffffe1788788e68] nmi_handle at ffffffff9722eb8e
> > > #2 [fffffe1788788eb0] default_do_nmi at ffffffff97e51cd0
> > > #3 [fffffe1788788ed0] exc_nmi at ffffffff97e51ee1
> > > #4 [fffffe1788788ef0] end_repeat_nmi at ffffffff980015f9
> > > [exception RIP: __update_load_avg_se+13]
> > > RIP: ffffffff9736b16d RSP: ffffbec3c08acc78
> RFLAGS: 00000046
> > > RAX: 0000000000000000 RBX: ffff994c2f2b1a40 RCX:
> > ffffbec3c08acdc0
> > > RDX: ffff9948e4fe1d80 RSI: ffff994c2f2b1a40 RDI:
> > 0000001d7ad7d55d
> > > RBP: ffffbec3c08acc88 R8: 0000001d921fca6f R9:
> > ffff994c2f2b1328
> > > R10: 00000000fffd0010 R11: ffffffff98e060c0 R12:
> > 0000001d7ad7d55d
> > > R13: 0000000000000005 R14: ffff994c2f2b19c0 R15:
> > 0000000000000001
> > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > > --- <NMI exception stack> ---
> > > #5 [ffffbec3c08acc78] __update_load_avg_se at
> ffffffff9736b16d
> > > #6 [ffffbec3c08acce0] enqueue_entity at ffffffff9735c9ab
> > > #7 [ffffbec3c08acd28] enqueue_task_fair at
> ffffffff9735cef8
> > > #8 [ffffbec3c08acd60] enqueue_task at ffffffff973481fa
> > > #9 [ffffbec3c08acd88] ttwu_do_activate at ffffffff9734aeed
> > > #10 [ffffbec3c08acdb0] try_to_wake_up at ffffffff9734c7d7
> > > #11 [ffffbec3c08ace08] __queue_work at ffffffff9732a4d2
> > > #12 [ffffbec3c08ace50] queue_work_on at ffffffff9732a6a4
> > > #13 [ffffbec3c08ace60] iomap_dio_bio_end_io at
> ffffffff976a7b4c
> > > #14 [ffffbec3c08ace90] clone_endio at ffffffffc090315f
> [dm_mod]
> > > #15 [ffffbec3c08aced0] blk_update_request at
> ffffffff9779b49d
> > > #16 [ffffbec3c08acf28] scsi_end_request at ffffffff97a3d5a7
> > > #17 [ffffbec3c08acf58] scsi_io_completion at
> ffffffff97a3e606
> > > #18 [ffffbec3c08acf90] blk_complete_reqs at
> ffffffff977978d0
> > > #19 [ffffbec3c08acfa0] __do_softirq at ffffffff97e66f7a
> > > #20 [ffffbec3c08acff0] do_softirq at ffffffff9730f6ef
> > > --- <IRQ stack> ---
> > > #21 [ffffbec3c022ff28] cpu_startup_entry at
> ffffffff973684a9
> > > #22 [ffffbec3c022ff38] start_secondary at ffffffff9726a3dd
> > > #23 [ffffbec3c022ff50] secondary_startup_64_no_verify at
> > ffffffff9720015a
> > > crash>
> > >
> > > Reported-by: Jie Li <jieli(a)redhat.com
> <mailto:jieli@redhat.com> <mailto:jieli@redhat.com
> <mailto:jieli@redhat.com>>>
> > > Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com
> <mailto:lijiang@redhat.com>
> > <mailto:lijiang@redhat.com
<mailto:lijiang@redhat.com>>>
> > > ---
> > > x86_64.c | 7 ++++---
> > > 1 file changed, 4 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/x86_64.c b/x86_64.c
> > > index 502817d3b2bd..c672a0c3e8fc 100644
> > > --- a/x86_64.c
> > > +++ b/x86_64.c
> > > @@ -3841,11 +3841,12 @@ in_exception_stack:
> > > up -= 1;
> > > bt->instptr = *up;
> > > /*
> > > - * No exception frame when coming from
> > do_softirq_own_stack
> > > - * or call_softirq.
> > > + * No exception frame when coming from
> > do_softirq_own_stack,
> > > + * call_softirq or do_softirq.
> > > */
> > > if ((sp = value_search(bt->instptr,
&offset)) &&
> > > - (STREQ(sp->name,
"do_softirq_own_stack") ||
> > STREQ(sp->name, "call_softirq")))
> > > + (STREQ(sp->name,
"do_softirq_own_stack") ||
> > STREQ(sp->name, "call_softirq")
> > > + || STREQ(sp->name,
"do_softirq")))
> > > irq_eframe = 0;
> > > bt->frameptr = 0;
> > > done = FALSE;
> >
>