On Fri, Oct 31, 2025 at 5:31 AM Tao Liu <ltao(a)redhat.com> wrote:
Hi lianbo,
On Tue, Oct 28, 2025 at 9:57 PM Lianbo Jiang <lijiang(a)redhat.com> wrote:
>
> Recently we have observed some failures as below:
>
> crash> set 2276866
> set: invalid kernel virtual address: 0 type: "stack contents"
> set: read of stack at 0 failed
>
> crash> ps 2276866
> PID PPID CPU TASK ST %MEM VSZ RSS
COMM
> 2276866 2276750 47 ff3a19fbd3c80000 ZO 0.0 0 0 sh
>
> This is a regression issue that introduced by adding gdb stack unwind
> support. When attempting to read from the stack, firstly, need to check
> if the stack exists, otherwise it may fail in some corner cases. E.g:
> there are some zombie processes(ZO) and the stack does not exist.
> Furthermore this may also break the switching thread in gdb.
>
> With the patch:
> crash> set 2276866
> PID: 2276866
> COMMAND: "sh"
> TASK: ff3a19fbd3c80000 [THREAD_INFO: ff3a19fbd3c80000]
> CPU: 47
> STATE: EXIT_DEAD|EXIT_ZOMBIE
>
> Reported-by: Buland Kumar Singh <bsingh(a)redhat.com>
> Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com>
> ---
> arm64.c | 2 ++
> ppc64.c | 2 ++
> x86_64.c | 2 ++
> 3 files changed, 6 insertions(+)
>
> diff --git a/arm64.c b/arm64.c
> index 354d17ab6a19..17235950bb60 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -234,6 +234,8 @@ arm64_get_current_task_reg(int regno, const char
*name,
>
> BZERO(&bt_setup, sizeof(struct bt_info));
> clone_bt_info(&bt_setup, &bt_info, tc);
> + if (bt_info.stackbase == 0)
> + return FALSE;
> fill_stackbuf(&bt_info);
>
> get_dumpfile_regs(&bt_info, &sp, &ip);
> diff --git a/ppc64.c b/ppc64.c
> index d1a506773c93..9c5c0a460c7a 100644
> --- a/ppc64.c
> +++ b/ppc64.c
> @@ -2606,6 +2606,8 @@ ppc64_get_current_task_reg(int regno, const char
*name, int size,
>
> BZERO(&bt_setup, sizeof(struct bt_info));
> clone_bt_info(&bt_setup, &bt_info, tc);
> + if (bt_info.stackbase == 0)
> + return FALSE;
> fill_stackbuf(&bt_info);
>
> // reusing the get_dumpfile_regs function to get pt regs
structure
> diff --git a/x86_64.c b/x86_64.c
> index d7da536d20d8..b2cddbf8ba3d 100644
> --- a/x86_64.c
> +++ b/x86_64.c
> @@ -9383,6 +9383,8 @@ x86_64_get_current_task_reg(int regno, const char
*name,
>
> BZERO(&bt_setup, sizeof(struct bt_info));
> clone_bt_info(&bt_setup, &bt_info, tc);
> + if (bt_info.stackbase == 0)
> + return FALSE;
The fix makes sense to me, however, exit directly will make the
register cache unrefreshed. That is, with the return "FALSE", "set
2276866" will succeed in task switching, but the register cache is
still the old one, so "gdb bt" still outputs the previous stackstrace
which is not 2276866's stack. I suggest adding a warning telling users
Actually, I haven't seen the case you mentioned, and it works as expected:
Without the patch:
crash> set 2276866
set: invalid kernel virtual address: 0 type: "stack contents"
set: read of stack at 0 failed
crash> bt
PID: 2276866 TASK: ff3a19fbd3c80000 CPU: 47 COMMAND: "sh"
(no stack)
crash> gdb bt
#0 crash_setup_regs (oldregs=0x0, newregs=0xff43e468633c7d38) at
./arch/x86/include/asm/processor.h:58
#1 __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:952
#2 0xffffffff86cf976f in panic (fmt=fmt@entry=0xffffffff87f69f99 "sysrq
triggered crash\n") at kernel/panic.c:230
#3 0xffffffff87210201 in sysrq_handle_crash (key=<optimized out>) at
drivers/tty/sysrq.c:142
#4 0xffffffff87210b24 in __handle_sysrq (key=99, check_mask=<optimized
out>) at drivers/tty/sysrq.c:559
#5 0xffffffff872109cb in write_sysrq_trigger (file=<optimized out>,
buf=<optimized out>, count=2, ppos=<optimized out>) at
drivers/tty/sysrq.c:1106
#6 0xffffffff86ff5fc9 in proc_reg_write (file=<optimized out>,
buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at
fs/proc/inode.c:241
#7 0xffffffff86f6e845 in vfs_write (pos=0xff43e468633c7f08, count=2,
buf=0x7ffc5b412780 <error: Cannot access memory at address 0x7ffc5b412780>,
file=0xff3a19e92ee37b00) at fs/read_write.c:549
#8 vfs_write (file=0xff3a19e92ee37b00, buf=0x7ffc5b412780 <error: Cannot
access memory at address 0x7ffc5b412780>, count=<optimized out>,
pos=0xff43e468633c7f08) at fs/read_write.c:533
#9 0xffffffff86f6eacf in ksys_write (fd=<optimized out>,
buf=0x7ffc5b412780 <error: Cannot access memory at address 0x7ffc5b412780>,
count=2) at fs/read_write.c:598
#10 0xffffffff86c03cab in do_syscall_64 (nr=1, regs=0xff43e468633c7f58) at
arch/x86/entry/common.c:303
#11 0xffffffff8780012e in entry_SYSCALL_64 () at
arch/x86/entry/entry_64.S:147
crash>
The above case breaks the switching thread in gdb, just like the patch log
I mentioned.
With the patch:
crash> set 2276866
PID: 2276866
COMMAND: "sh"
TASK: ff3a19fbd3c80000 [THREAD_INFO: ff3a19fbd3c80000]
CPU: 47
STATE: EXIT_DEAD|EXIT_ZOMBIE
crash> bt
PID: 2276866 TASK: ff3a19fbd3c80000 CPU: 47 COMMAND: "sh"
(no stack)
crash> gdb bt
crash>
That is expected behavior, and I did not see the case that you pointed out.
that gdb related commands such as 'bt', 'frame', 'up',
'down', 'info
locals' are not workable, like:
Have you reproduced the case that the register cache is unrefreshed?
Thanks
Lianbo
Warning: registers unable to refresh, the outputs of the following
gdb
related commands are not reliable: 'bt', 'frame', 'up',
'down', 'info
locals'.
What do you think?
Thanks,
Tao Liu
> fill_stackbuf(&bt_info);
>
> // reusing the get_dumpfile_regs function to get pt regs
structure
> --
> 2.50.1