[Crash-utility] Re: [PATCH] Fix for "set" command failure

Sunday, 2 November 2025

On Fri, Oct 31, 2025 at 5:31 AM Tao Liu <ltao(a)redhat.com&gt; wrote:

...
 Hi lianbo,

 On Tue, Oct 28, 2025 at 9:57 PM Lianbo Jiang <lijiang(a)redhat.com&gt; wrote:
 >
 > Recently we have observed some failures as below:
 >
 >   crash> set 2276866
 >   set: invalid kernel virtual address: 0  type: "stack contents"
 >   set: read of stack at 0 failed
 >
 >   crash> ps 2276866
 >         PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS
 COMM
 >     2276866 2276750  47  ff3a19fbd3c80000  ZO   0.0        0        0  sh
 >
 > This is a regression issue that introduced by adding gdb stack unwind
 > support. When attempting to read from the stack, firstly, need to check
 > if the stack exists, otherwise it may fail in some corner cases. E.g:
 > there are some zombie processes(ZO) and the stack does not exist.
 > Furthermore this may also break the switching thread in gdb.
 >
 > With the patch:
 >   crash> set 2276866
 >       PID: 2276866
 >   COMMAND: "sh"
 >      TASK: ff3a19fbd3c80000  [THREAD_INFO: ff3a19fbd3c80000]
 >       CPU: 47
 >     STATE: EXIT_DEAD|EXIT_ZOMBIE
 >
 > Reported-by: Buland Kumar Singh <bsingh(a)redhat.com&gt;
 > Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com&gt;
 > ---
 >  arm64.c  | 2 ++
 >  ppc64.c  | 2 ++
 >  x86_64.c | 2 ++
 >  3 files changed, 6 insertions(+)
 >
 > diff --git a/arm64.c b/arm64.c
 > index 354d17ab6a19..17235950bb60 100644
 > --- a/arm64.c
 > +++ b/arm64.c
 > @@ -234,6 +234,8 @@ arm64_get_current_task_reg(int regno, const char
 *name,
 >
 >         BZERO(&bt_setup, sizeof(struct bt_info));
 >         clone_bt_info(&bt_setup, &bt_info, tc);
 > +       if (bt_info.stackbase == 0)
 > +               return FALSE;
 >         fill_stackbuf(&bt_info);
 >
 >         get_dumpfile_regs(&bt_info, &sp, &ip);
 > diff --git a/ppc64.c b/ppc64.c
 > index d1a506773c93..9c5c0a460c7a 100644
 > --- a/ppc64.c
 > +++ b/ppc64.c
 > @@ -2606,6 +2606,8 @@ ppc64_get_current_task_reg(int regno, const char
 *name, int size,
 >
 >         BZERO(&bt_setup, sizeof(struct bt_info));
 >         clone_bt_info(&bt_setup, &bt_info, tc);
 > +       if (bt_info.stackbase == 0)
 > +               return FALSE;
 >         fill_stackbuf(&bt_info);
 >
 >         // reusing the get_dumpfile_regs function to get pt regs
 structure
 > diff --git a/x86_64.c b/x86_64.c
 > index d7da536d20d8..b2cddbf8ba3d 100644
 > --- a/x86_64.c
 > +++ b/x86_64.c
 > @@ -9383,6 +9383,8 @@ x86_64_get_current_task_reg(int regno, const char
 *name,
 >
 >         BZERO(&bt_setup, sizeof(struct bt_info));
 >         clone_bt_info(&bt_setup, &bt_info, tc);
 > +       if (bt_info.stackbase == 0)
 > +               return FALSE;

 The fix makes sense to me, however, exit directly will make the
 register cache unrefreshed. That is, with the return "FALSE", "set
 2276866" will succeed in task switching, but the register cache is
 still the old one, so "gdb bt" still outputs the previous stackstrace
 which is not 2276866's stack. I suggest adding a warning telling users

Actually, I haven't seen the case you mentioned, and it works as expected:

Without the patch:
crash> set 2276866
set: invalid kernel virtual address: 0  type: "stack contents"
set: read of stack at 0 failed

crash> bt
PID: 2276866  TASK: ff3a19fbd3c80000  CPU: 47   COMMAND: "sh"
(no stack)

crash> gdb bt
#0  crash_setup_regs (oldregs=0x0, newregs=0xff43e468633c7d38) at
./arch/x86/include/asm/processor.h:58
#1  __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:952
#2  0xffffffff86cf976f in panic (fmt=fmt@entry=0xffffffff87f69f99 "sysrq
triggered crash\n") at kernel/panic.c:230
#3  0xffffffff87210201 in sysrq_handle_crash (key=<optimized out>) at
drivers/tty/sysrq.c:142
#4  0xffffffff87210b24 in __handle_sysrq (key=99, check_mask=<optimized
out>) at drivers/tty/sysrq.c:559
#5  0xffffffff872109cb in write_sysrq_trigger (file=<optimized out>,
buf=<optimized out>, count=2, ppos=<optimized out>) at
drivers/tty/sysrq.c:1106
#6  0xffffffff86ff5fc9 in proc_reg_write (file=<optimized out>,
buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at
fs/proc/inode.c:241
#7  0xffffffff86f6e845 in vfs_write (pos=0xff43e468633c7f08, count=2,
buf=0x7ffc5b412780 <error: Cannot access memory at address 0x7ffc5b412780>,
file=0xff3a19e92ee37b00) at fs/read_write.c:549
#8  vfs_write (file=0xff3a19e92ee37b00, buf=0x7ffc5b412780 <error: Cannot
access memory at address 0x7ffc5b412780>, count=<optimized out>,
pos=0xff43e468633c7f08) at fs/read_write.c:533
#9  0xffffffff86f6eacf in ksys_write (fd=<optimized out>,
buf=0x7ffc5b412780 <error: Cannot access memory at address 0x7ffc5b412780>,
count=2) at fs/read_write.c:598
#10 0xffffffff86c03cab in do_syscall_64 (nr=1, regs=0xff43e468633c7f58) at
arch/x86/entry/common.c:303
#11 0xffffffff8780012e in entry_SYSCALL_64 () at
arch/x86/entry/entry_64.S:147
crash>

The above case breaks the switching thread in gdb, just like the patch log
I mentioned.

With the patch:
crash> set 2276866
    PID: 2276866
COMMAND: "sh"
   TASK: ff3a19fbd3c80000  [THREAD_INFO: ff3a19fbd3c80000]
    CPU: 47
  STATE: EXIT_DEAD|EXIT_ZOMBIE

crash> bt
PID: 2276866  TASK: ff3a19fbd3c80000  CPU: 47   COMMAND: "sh"
(no stack)

crash> gdb bt
crash>

That is expected behavior, and I did not see the case that you pointed out.

that gdb related commands such as 'bt', 'frame', 'up',
'down', 'info
...
 locals' are not workable, like:

Have you reproduced  the case that the register cache is unrefreshed?

Thanks
Lianbo

...
 Warning: registers unable to refresh, the outputs of the following
gdb
 related commands are not reliable: 'bt', 'frame', 'up',
'down', 'info
 locals'.

 What do you think?

 Thanks,
 Tao Liu

 >         fill_stackbuf(&bt_info);
 >
 >         // reusing the get_dumpfile_regs function to get pt regs
 structure
 > --
 > 2.50.1

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[Crash-utility] Re: [PATCH] Fix for "set" command failure