[Crash-utility] Re: [PATCH v4 07/16] Stop stack unwinding at non-kernel address

Thursday, 25 July 2024

Hi Lijiang,

On Thu, Jul 25, 2024 at 9:08 PM lijiang <lijiang(a)redhat.com&gt; wrote:
...

 On Thu, Jul 25, 2024 at 3:31 PM Tao Liu <ltao(a)redhat.com&gt; wrote:
>
> Hi Lijiang,
>
> On Thu, Jul 25, 2024 at 6:27 PM lijiang <lijiang(a)redhat.com&gt; wrote:
> >
> > On Thu, Jul 25, 2024 at 7:59 AM Tao Liu <ltao(a)redhat.com&gt; wrote:
> >>
> >> Hi Lijiang,
> >>
> >> On Wed, Jul 24, 2024 at 6:37 PM lijiang <lijiang(a)redhat.com&gt; wrote:
> >> >
> >> > Hi, Tao
> >> >
> >> > On Tue, Jul 23, 2024 at 2:24 PM lijiang <lijiang(a)redhat.com&gt;
wrote:
> >> >>
> >> >> On Tue, Jul 23, 2024 at 1:55 PM Tao Liu <ltao(a)redhat.com&gt;
wrote:
> >> >>>
> >> >>> Hi Aditya & Lianbo,
> >> >>>
> >> >>> On Mon, Jun 24, 2024 at 11:46 PM Aditya Gupta
<adityag(a)linux.ibm.com&gt; wrote:
> >> >>> >
> >> >>> > Hello Lianbo,
> >> >>> >
> >> >>> > On 24/06/24 05:32PM, lijiang wrote:
> >> >>> > > > <...snip...>
> >> >>> > > >
> >> >>> > > > Before:
> >> >>> > > > crash> gdb bt
> >> >>> > > >  #0  0xffffffff816a8f65 in context_switch ...
> >> >>> > > >  #1  __schedule () ...
> >> >>> > > >  #2  0xffffffff816a94e9 in schedule ...
> >> >>> > > >  #3  0xffffffff816a86fd in
schedule_hrtimeout_range_clock ...
> >> >>> > > >  #4  0xffffffff816a8733 in
schedule_hrtimeout_range ...
> >> >>> > > >  #5  0xffffffff8124bb7e in ep_poll ...
> >> >>> > > >  #6  0xffffffff8124d00d in SYSC_epoll_wait ...
> >> >>> > > >  #7  SyS_epoll_wait ...
> >> >>> > > >  #8  <signal handler called>
> >> >>> > > >  #9  0x00007f0449407923 in ?? ()
> >> >>> > > >  #10 0xffff880100000001 in ?? ()
> >> >>> > > >  #11 0xffff880169b3c010 in ?? ()
> >> >>> > > >  #12 0x0000000000000040 in irq_stack_union ()
> >> >>> > > >  #13 0xffff880169b3c058 in ?? ()
> >> >>> > > >  #14 0xffff880169b3c048 in ?? ()
> >> >>> > > >  #15 0xffff880169b3c050 in ?? ()
> >> >>> > > >  #16 0x0000000000000000 in ?? ()
> >> >>> > > >
> >> >>> > > > After:
> >> >>> > > > crash> gdb bt
> >> >>> > > >  #0  0xffffffff816a8f65 in context_switch ...
> >> >>> > > >  #1  __schedule () ...
> >> >>> > > >  #2  0xffffffff816a94e9 in schedule () ...
> >> >>> > > >  #3  0xffffffff816a86fd in
schedule_hrtimeout_range_clock ...
> >> >>> > > >  #4  0xffffffff816a8733 in
schedule_hrtimeout_range ...
> >> >>> > > >  #5  0xffffffff8124bb7e in ep_poll ...
> >> >>> > > >  #6  0xffffffff8124d00d in SYSC_epoll_wait ...
> >> >>> > > >  #7  SyS_epoll_wait ...
> >> >>> > > >  #8  <signal handler called>
> >> >>> > > >  #9  0x00007f0449407923 in ?? ()
> >> >>> > > >
> >> >>> > > >
> >> >>> > > It seems that there are still some non-kernel
addresses that do not get
> >> >>> > > filtered. Can you help double check?
> >> >>>
> >> >>> Yes, it is a non-kernel address which does not get filtered.
> >> >>>
> >> >>> > >
> >> >>> > >
> >> >>> > > For example:
> >> >>> > >
> >> >>> > > crash> gdb bt
> >> >>> > > #0  crash_setup_regs (newregs=0xffffb5bb4f197938,
oldregs=0x0) at
> >> >>> > > ./arch/x86/include/asm/kexec.h:114
> >> >>> > > #1  0xffffffff8e61e32e in __crash_kexec
(regs=regs@entry=0x0) at
> >> >>> > > kernel/crash_core.c:122
> >> >>> > > #2  0xffffffff8e51a64d in panic
(fmt=fmt@entry=0xffffffff8fa51609 "sysrq
> >> >>> > > triggered crash\n") at kernel/panic.c:366
> >> >>> > > #3  0xffffffff8ec21f86 in sysrq_handle_crash
(key=<optimized out>) at
> >> >>> > > drivers/tty/sysrq.c:154
> >> >>> > > #4  0xffffffff8ec22550 in __handle_sysrq
(key=<optimized out>,
> >> >>> > > check_mask=check_mask@entry=false) at
drivers/tty/sysrq.c:612
> >> >>> > > #5  0xffffffff8ec22bf5 in write_sysrq_trigger
(file=<optimized out>,
> >> >>> > > buf=<optimized out>, count=2,
ppos=<optimized out>) at
> >> >>> > > drivers/tty/sysrq.c:1183
> >> >>> > > #6  0xffffffff8e935ae5 in pde_write
(ppos=<optimized out>, count=<optimized
> >> >>> > > out>, buf=<optimized out>,
file=0xffffb5bb4f197938, pde=0xffff98338b78e0c0)
> >> >>> > > at fs/proc/inode.c:334
> >> >>> > > #7  proc_reg_write (file=0xffffb5bb4f197938, buf=0x0,
count=1, ppos=0x0) at
> >> >>> > > fs/proc/inode.c:346
> >> >>> > > #8  0xffffffff8e88d382 in vfs_write
(file=file@entry=0xffff98338b789200,
> >> >>> > > buf=buf@entry=0x5614d58a22c0 <error: Cannot access
memory at address
> >> >>> > > 0x5614d58a22c0>, count=count@entry=2,
pos=pos@entry=0xffffb5bb4f197b78) at
> >> >>> > > fs/read_write.c:588
> >> >>> > > #9  0xffffffff8e88d9ff in ksys_write
(fd=<optimized out>,
> >> >>> > > buf=0x5614d58a22c0 <error: Cannot access memory at
address 0x5614d58a22c0>,
> >> >>> > > count=2) at fs/read_write.c:643
> >> >>> > > #10 0xffffffff8f124429 in do_syscall_x64 (nr=1,
regs=0xffffb5bb4f197f58) at
> >> >>> > > arch/x86/entry/common.c:52
> >> >>> > > #11 do_syscall_64 (regs=0xffffb5bb4f197f58, nr=1) at
> >> >>> > > arch/x86/entry/common.c:83
> >> >>> > > #12 0xffffffff8f20012b in entry_SYSCALL_64 () at
> >> >>> > > arch/x86/entry/entry_64.S:121
> >> >>> > > #13 0x00007f9a147f69e0 in ?? ()
> >> >>> > >
> >> >>> > > The frame #13 looks like a non-kernel address.
> >> >>>
> >> >>> The address usually to be the user space address before
entering
> >> >>> kernel, you can see it by:
> >> >>>
> >> >>> crash> gdb bt
> >> >>> ...snip...
> >> >>> #7  SyS_epoll_wait ...
> >> >>> #8  <signal handler called>
> >> >>> #9  0x00007f0449407923 in ?? ()
> >> >>>
> >> >>> crash> bt
> >> >>> ...snip...
> >> >>> #6 [ffff880169b3bf80] system_call_fastpath at ffffffff816b5009
> >> >>>     RIP: 00007f0449407923 ...
> >> >>>
> >> >>> So I think leaving the last frame here is useful and
shouldn't be
> >> >>> filtered. Though it looks like some garbage data, it can help
for some
> >> >>> experienced users...
> >> >>
> >> >>
> >> >>
> >> >> Hmm, normally it should be filtered, otherwise this looks weird.
> >> >>
> >> >> Let me dig into the details and see if that can be filtered out.
> >> >
> >> >
> >> Thanks for your patch!
> >>
> >> > Can you help try this? I saw the non-kernel addresses are filtered
out.
> >> >
> >> > + #ifdef CRASH_MERGE
> >> > + extern "C" int is_kvaddr(ulong);
> >> > + #endif
> >> >
> >> > static void
> >> > print_frame (const frame_print_options &fp_opts,
> >> >              frame_info *frame, int print_level,
> >> >              enum print_what print_what, int print_args,
> >> >              struct symtab_and_line sal)
> >> > {
> >> >   struct gdbarch *gdbarch = get_frame_arch (frame);
> >> >   struct ui_out *uiout = current_uiout;
> >> >   enum language funlang = language_unknown;
> >> >   struct value_print_options opts;
> >> >   struct symbol *func;
> >> >   CORE_ADDR pc = 0;
> >> >   int pc_p;
> >> >
> >> >   pc_p = get_frame_pc_if_available (frame, &pc);
> >> > + #ifdef CRASH_MERGE
> >> > +  if (!is_kvaddr(pc))
> >> > +        return;
> >> > + #endif
> >> > ...
> >> > }
> >>
> >> Your code change is "stop printing for non-kernel address",
however I
> >> would prefer to "stop stack unwinding for non-kernel address", so
I
> >
> >
> > Yes, they are different.
> >
> > If "stop stack unwinding for non-kernel address" won't truncate
the backtrace, I would also prefer it.
> > Otherwise I would prefer another one.
>
> OK, it won't truncate the backtrace.
>
> >
> >>
> >> think it is better to modify it within the for loop, this is where
> >> stack unwinding happens.
> >>
> >> for (fi = trailing; fi && count--; fi = get_prev_frame (fi))
> >>       ...
> >>       print_frame_info (fp_opts, fi, 1, LOCATION, 1, 0);
> >>
> >> I made the following code change:
> >>
> >>       for (fi = trailing; fi && count--; fi = get_prev_frame (fi))
> >>         {
> >>           QUIT;
> >>           ...
> >> +         CORE_ADDR pc = 0;
> >> +          get_frame_pc_if_available (fi, &pc);
> >> +         if (!is_kvaddr(pc)) {
> >> +               fi = NULL;
> >> +                break;
> >>  +        }
> >>           print_frame_info (fp_opts, fi, 1, LOCATION, 1, 0);
> >>
> >> With the change:
> >>
> >> crash> gdb bt
> >> #0  blk_mq_rq_timed_out (req=0xffff880fdb246000,
> >> reserved=reserved@entry=false) at block/blk-mq.c:640
> >> #1  0xffffffff8130504c in blk_mq_check_expired
> >> (hctx=hctx@entry=0xffff880fda56bc00, rq=<optimized out>,
> >> priv=priv@entry=0xffff880fcf68fde8, reserved=reserved@entry=false) at
> >> block/blk-mq.c:697
> >> #2  0xffffffff81305eb4 in bt_for_each
> >> (hctx=hctx@entry=0xffff880fda56bc00, bt=bt@entry=0xffff88014c14c310,
> >> off=32, fn=fn@entry=0xffffffff81304ff0 <blk_mq_check_expired>,
> >> data=data@entry=0xffff880fcf68fde8, reserved=reserved@entry=false) at
> >> block/blk-mq-tag.c:431
> >> #3  0xffffffff8130686e in blk_mq_queue_tag_busy_iter
> >> (q=q@entry=0xffff88004912a340, fn=fn@entry=0xffffffff81304ff0
> >> <blk_mq_check_expired>, priv=priv@entry=0xffff880fcf68fde8) at
> >> block/blk-mq-tag.c:530
> >> #4  0xffffffff81301d2b in blk_mq_timeout_work
> >> (work=0xffff88004912ab68) at block/blk-mq.c:730
> >> #5  0xffffffff810a881a in process_one_work
> >> (worker=worker@entry=0xffff880fd86efa00, work=0xffff88004912ab68) at
> >> kernel/workqueue.c:2252
> >> #6  0xffffffff810a94e6 in worker_thread (__worker=0xffff880fd86efa00)
> >> at kernel/workqueue.c:2380
> >> #7  0xffffffff810b098f in kthread (_create=0xffff8801695f7d38) at
> >> kernel/kthread.c:202
> >> #8  <signal handler called>
> >
> I don't know, I didn't dive into gdb for this inspection. Frankly I
> didn't have much knowledge on how gdb unwinded each stack frame. All I
> know is, gdb has different handlers for processing each frame
> unwinding, such as the one to process function inline. So I guess for
> this case, gdb thinks it is some signal processing routine.
>

 Thanks for the explanation, Tao.

 One more question:
 Will the backtrace('#8  <signal handler called>') be printed if my changes
are applied?
 Yes, still the same output:

#6  0xffffffff8124d00d in SYSC_epoll_wait (timeout=<optimized out>,
maxevents=29, events=<optimized out>, epfd=<optimized out>) at
fs/eventpoll.c:2043
#7  SyS_epoll_wait (epfd=<optimized out>, events=140721208415648,
maxevents=29, timeout=4294967295) at fs/eventpoll.c:2008
#8  <signal handler called>

Thanks,
Tao Liu

...
 Thanks
 Lianbo

>
> Personally I'm OK with this, if you see the original stack trace:
>
> #7  SyS_epoll_wait (epfd=<optimized out>, events=140721208415648,
> maxevents=29, timeout=4294967295) at fs/eventpoll.c:2008
> #8  <signal handler called>
> #9  0x00007f0449407923 in ?? ()
>
> It usually happens right before the syscall, which is switching from
> user space to kernel space. Maybe it looks like a signal or interrupt
> handling routine. Just my guess...
>
>
> >     ^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > What's this for?
> >
> > Thanks
> > Lianbo
> >
> >>
> >> crash>
> >>
> >> What do you think?
> >>
> >> Thanks,
> >> Tao Liu
> >>
> >> >
> >> > Thanks
> >> > Lianbo
> >> >
> >> >>
> >> >> Thanks
> >> >> Lianbo
> >> >>
> >> >>>
> >> >>> >
> >> >>> > True. Though it seems to be okay for it to print the last
frame with a
> >> >>> > non-kernel address, as in this snippet from gdb:
> >> >>> >
> >> >>> >     for (fi = trailing; fi && count--; fi =
get_prev_frame (fi))
> >> >>> >         ...
> >> >>> >           print_frame_info (fp_opts, fi, 1, LOCATION, 1,
0);
> >> >>> >
> >> >>> > Seems that frame #13, fi was not NULL.
> >> >>> >
> >> >>> > Seeing Tao's change, it compares the current
frame's NIP/PC to see if
> >> >>> > it should return NULL (which I think is nice and works).
Here the
> >> >>> > 'this_frame' would have been frame 12, (which
would have called
> >> >>> > `'get_prev_frame' to get the frame 13)
> >> >>> >
> >> >>> > ```
> >> >>> >   frame_pc_p = get_frame_pc_if_available (this_frame,
&frame_pc);
> >> >>> > #ifdef CRASH_MERGE
> >> >>> >   if (!is_kvaddr(frame_pc)) {
> >> >>> >         return NULL;
> >> >>> >   }
> >> >>> > #endif
> >> >>> > ```
> >> >>> >
> >> >>> > Tao's condition will hit when
'get_prev_frame(this_frame=frame#13)' will
> >> >>> > be called to get the frame #14, which will return NULL and
hence break
> >> >>> > out of the loop.
> >> >>> >
> >> >>> > This is based on what I recall and a quick look at the
implementation,
> >> >>> > please feel free to correct Lianbo/Tao.
> >> >>>
> >> >>> Thanks Aditya for the detailed inspection, which I didn't
dive into.
> >> >>> When I notice the last frame to be the userspace address, I
just keep
> >> >>> it as it is.
> >> >>>
> >> >>> Thanks,
> >> >>> Tao Liu
> >> >>>
> >> >>>
> >> >>> >
> >> >>> > Thanks,
> >> >>> > Aditya Gupta
> >> >>> >
> >> >>> >
> >> >>> > >
> >> >>> > >
> >> >>> > > Thanks
> >> >>> > > Lianbo
> >> >>> > >
> >> >>> > >
> >> >>> > > > Cc: Sourabh Jain
<sourabhjain(a)linux.ibm.com&gt;
> >> >>> > > > Cc: Hari Bathini <hbathini(a)linux.ibm.com&gt;
> >> >>> > > > Cc: Mahesh J Salgaonkar
<mahesh(a)linux.ibm.com&gt;
> >> >>> > > > Cc: Naveen N. Rao
<naveen.n.rao(a)linux.vnet.ibm.com&gt;
> >> >>> > > > Cc: Lianbo Jiang <lijiang(a)redhat.com&gt;
> >> >>> > > > Cc: HAGIO KAZUHITO(萩尾 一仁)
<k-hagio-ab(a)nec.com&gt;
> >> >>> > > > Cc: Tao Liu <ltao(a)redhat.com&gt;
> >> >>> > > > Cc: Alexey Makhalov
<alexey.makhalov(a)broadcom.com&gt;
> >> >>> > > > Signed-off-by: Tao Liu <ltao(a)redhat.com&gt;
> >> >>> > > > ---
> >> >>> > > >  defs.h          |  1 +
> >> >>> > > >  gdb-10.2.patch  | 26
++++++++++++++++++++++++++
> >> >>> > > >  gdb_interface.c |  6 ++++++
> >> >>> > > >  3 files changed, 33 insertions(+)
> >> >>> > > >
> >> >>> > > > diff --git a/defs.h b/defs.h
> >> >>> > > > index 012ffdc..c0e6a29 100644
> >> >>> > > > --- a/defs.h
> >> >>> > > > +++ b/defs.h
> >> >>> > > > @@ -7902,6 +7902,7 @@ extern unsigned char
*gdb_prettyprint_arrays;
> >> >>> > > >  extern unsigned int
*gdb_repeat_count_threshold;
> >> >>> > > >  extern unsigned char *gdb_stop_print_at_null;
> >> >>> > > >  extern unsigned int *gdb_output_radix;
> >> >>> > > > +int is_kvaddr(ulong);
> >> >>> > > >
> >> >>> > > >  /*
> >> >>> > > >   *  gdb/top.c
> >> >>> > > > diff --git a/gdb-10.2.patch b/gdb-10.2.patch
> >> >>> > > > index 0bed96a..3ed40c0 100644
> >> >>> > > > --- a/gdb-10.2.patch
> >> >>> > > > +++ b/gdb-10.2.patch
> >> >>> > > > @@ -16171,3 +16171,29 @@ exit 0
> >> >>> > > >   }
> >> >>> > > >
> >> >>> > > >   /*
> >> >>> > > > +--- gdb-10.2/gdb/frame.c.orig
> >> >>> > > > ++++ gdb-10.2/gdb/frame.c
> >> >>> > > > +@@ -2331,6 +2331,10 @@ inside_entry_func
(frame_info *this_frame)
> >> >>> > > > +    This function should not contain
target-dependent tests, such as
> >> >>> > > > +    checking whether the program-counter is
zero.  */
> >> >>> > > > +
> >> >>> > > > ++#ifdef CRASH_MERGE
> >> >>> > > > ++extern "C" int is_kvaddr(ulong);
> >> >>> > > > ++#endif
> >> >>> > > > ++
> >> >>> > > > + struct frame_info *
> >> >>> > > > + get_prev_frame (struct frame_info
*this_frame)
> >> >>> > > > + {
> >> >>> > > > +@@ -2353,7 +2357,11 @@ get_prev_frame (struct
frame_info *this_frame)
> >> >>> > > > +     get_frame_id (this_frame);
> >> >>> > > > +
> >> >>> > > > +   frame_pc_p = get_frame_pc_if_available
(this_frame, &frame_pc);
> >> >>> > > > +-
> >> >>> > > > ++#ifdef CRASH_MERGE
> >> >>> > > > ++  if (!is_kvaddr(frame_pc)) {
> >> >>> > > > ++        return NULL;
> >> >>> > > > ++  }
> >> >>> > > > ++#endif
> >> >>> > > > +   /* tausq/2004-12-07: Dummy frames are
skipped because it doesn't make
> >> >>> > > > much
> >> >>> > > > +      sense to stop unwinding at a dummy frame.
 One place where a dummy
> >> >>> > > > +      frame may have an address
"inside_main_func" is on HPUX.  On HPUX,
> >> >>> > > > the
> >> >>> > > > diff --git a/gdb_interface.c b/gdb_interface.c
> >> >>> > > > index b13d5fd..e76ecc6 100644
> >> >>> > > > --- a/gdb_interface.c
> >> >>> > > > +++ b/gdb_interface.c
> >> >>> > > > @@ -947,6 +947,12 @@
gdb_lookup_module_symbol(ulong addr, ulong *offset)
> >> >>> > > >         }
> >> >>> > > >  }
> >> >>> > > >
> >> >>> > > > +int
> >> >>> > > > +is_kvaddr(ulong addr)
> >> >>> > > > +{
> >> >>> > > > +       return IS_KVADDR(addr);
> >> >>> > > > +}
> >> >>> > > > +
> >> >>> > > >  /*
> >> >>> > > >   *  Used by gdb_interface() to catch
gdb-related errors, if desired.
> >> >>> > > >   */
> >> >>> > > > --
> >> >>> > > > 2.40.1
> >> >>> > > >
> >> >>> >
> >> >>>
> >>
> 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[Crash-utility] Re: [PATCH v4 07/16] Stop stack unwinding at non-kernel address