On Mon, Nov 3, 2025 at 11:57 AM Tao Liu <ltao(a)redhat.com> wrote:
On Mon, Nov 3, 2025 at 2:58 PM lijiang <lijiang(a)redhat.com>
wrote:
>
> On Fri, Oct 31, 2025 at 5:31 AM Tao Liu <ltao(a)redhat.com> wrote:
>>
>> Hi lianbo,
>>
>> On Tue, Oct 28, 2025 at 9:57 PM Lianbo Jiang <lijiang(a)redhat.com>
wrote:
>> >
>> > Recently we have observed some failures as below:
>> >
>> > crash> set 2276866
>> > set: invalid kernel virtual address: 0 type: "stack contents"
>> > set: read of stack at 0 failed
>> >
>> > crash> ps 2276866
>> > PID PPID CPU TASK ST %MEM VSZ
RSS COMM
>> > 2276866 2276750 47 ff3a19fbd3c80000 ZO 0.0 0
0 sh
>> >
>> > This is a regression issue that introduced by adding gdb stack unwind
>> > support. When attempting to read from the stack, firstly, need to
check
>> > if the stack exists, otherwise it may fail in some corner cases. E.g:
>> > there are some zombie processes(ZO) and the stack does not exist.
>> > Furthermore this may also break the switching thread in gdb.
>> >
>> > With the patch:
>> > crash> set 2276866
>> > PID: 2276866
>> > COMMAND: "sh"
>> > TASK: ff3a19fbd3c80000 [THREAD_INFO: ff3a19fbd3c80000]
>> > CPU: 47
>> > STATE: EXIT_DEAD|EXIT_ZOMBIE
>> >
>> > Reported-by: Buland Kumar Singh <bsingh(a)redhat.com>
>> > Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com>
>> > ---
>> > arm64.c | 2 ++
>> > ppc64.c | 2 ++
>> > x86_64.c | 2 ++
>> > 3 files changed, 6 insertions(+)
>> >
>> > diff --git a/arm64.c b/arm64.c
>> > index 354d17ab6a19..17235950bb60 100644
>> > --- a/arm64.c
>> > +++ b/arm64.c
>> > @@ -234,6 +234,8 @@ arm64_get_current_task_reg(int regno, const char
*name,
>> >
>> > BZERO(&bt_setup, sizeof(struct bt_info));
>> > clone_bt_info(&bt_setup, &bt_info, tc);
>> > + if (bt_info.stackbase == 0)
>> > + return FALSE;
>> > fill_stackbuf(&bt_info);
>> >
>> > get_dumpfile_regs(&bt_info, &sp, &ip);
>> > diff --git a/ppc64.c b/ppc64.c
>> > index d1a506773c93..9c5c0a460c7a 100644
>> > --- a/ppc64.c
>> > +++ b/ppc64.c
>> > @@ -2606,6 +2606,8 @@ ppc64_get_current_task_reg(int regno, const
char *name, int size,
>> >
>> > BZERO(&bt_setup, sizeof(struct bt_info));
>> > clone_bt_info(&bt_setup, &bt_info, tc);
>> > + if (bt_info.stackbase == 0)
>> > + return FALSE;
>> > fill_stackbuf(&bt_info);
>> >
>> > // reusing the get_dumpfile_regs function to get pt regs
structure
>> > diff --git a/x86_64.c b/x86_64.c
>> > index d7da536d20d8..b2cddbf8ba3d 100644
>> > --- a/x86_64.c
>> > +++ b/x86_64.c
>> > @@ -9383,6 +9383,8 @@ x86_64_get_current_task_reg(int regno, const
char *name,
>> >
>> > BZERO(&bt_setup, sizeof(struct bt_info));
>> > clone_bt_info(&bt_setup, &bt_info, tc);
>> > + if (bt_info.stackbase == 0)
>> > + return FALSE;
>>
>> The fix makes sense to me, however, exit directly will make the
>> register cache unrefreshed. That is, with the return "FALSE",
"set
>> 2276866" will succeed in task switching, but the register cache is
>> still the old one, so "gdb bt" still outputs the previous stackstrace
>> which is not 2276866's stack. I suggest adding a warning telling users
>
>
> Actually, I haven't seen the case you mentioned, and it works as
expected:
>
> Without the patch:
> crash> set 2276866
> set: invalid kernel virtual address: 0 type: "stack contents"
> set: read of stack at 0 failed
>
> crash> bt
> PID: 2276866 TASK: ff3a19fbd3c80000 CPU: 47 COMMAND: "sh"
> (no stack)
>
> crash> gdb bt
> #0 crash_setup_regs (oldregs=0x0, newregs=0xff43e468633c7d38) at
./arch/x86/include/asm/processor.h:58
> #1 __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:952
> #2 0xffffffff86cf976f in panic (fmt=fmt@entry=0xffffffff87f69f99
"sysrq triggered crash\n") at kernel/panic.c:230
> #3 0xffffffff87210201 in sysrq_handle_crash (key=<optimized out>) at
drivers/tty/sysrq.c:142
> #4 0xffffffff87210b24 in __handle_sysrq (key=99, check_mask=<optimized
out>) at drivers/tty/sysrq.c:559
> #5 0xffffffff872109cb in write_sysrq_trigger (file=<optimized out>,
buf=<optimized out>, count=2, ppos=<optimized out>) at
drivers/tty/sysrq.c:1106
> #6 0xffffffff86ff5fc9 in proc_reg_write (file=<optimized out>,
buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at
fs/proc/inode.c:241
> #7 0xffffffff86f6e845 in vfs_write (pos=0xff43e468633c7f08, count=2,
buf=0x7ffc5b412780 <error: Cannot access memory at address 0x7ffc5b412780>,
file=0xff3a19e92ee37b00) at fs/read_write.c:549
> #8 vfs_write (file=0xff3a19e92ee37b00, buf=0x7ffc5b412780 <error:
Cannot access memory at address 0x7ffc5b412780>, count=<optimized out>,
pos=0xff43e468633c7f08) at fs/read_write.c:533
> #9 0xffffffff86f6eacf in ksys_write (fd=<optimized out>,
buf=0x7ffc5b412780 <error: Cannot access memory at address 0x7ffc5b412780>,
count=2) at fs/read_write.c:598
> #10 0xffffffff86c03cab in do_syscall_64 (nr=1, regs=0xff43e468633c7f58)
at arch/x86/entry/common.c:303
> #11 0xffffffff8780012e in entry_SYSCALL_64 () at
arch/x86/entry/entry_64.S:147
> crash>
>
> The above case breaks the switching thread in gdb, just like the patch
log I mentioned.
>
> With the patch:
> crash> set 2276866
> PID: 2276866
> COMMAND: "sh"
> TASK: ff3a19fbd3c80000 [THREAD_INFO: ff3a19fbd3c80000]
> CPU: 47
> STATE: EXIT_DEAD|EXIT_ZOMBIE
>
> crash> bt
> PID: 2276866 TASK: ff3a19fbd3c80000 CPU: 47 COMMAND: "sh"
> (no stack)
>
> crash> gdb bt
> crash>
>
> That is expected behavior, and I did not see the case that you pointed
out.
>
>
>> that gdb related commands such as 'bt', 'frame', 'up',
'down', 'info
>> locals' are not workable, like:
>
>
> Have you reproduced the case that the register cache is unrefreshed?
Right, I re-test the patch and it work as expected, sorry for the
confusion. For the patch, ack.
No worries. Thanks for the review, Tao.
Lianbo
Thanks,
Tao Liu
>
> Thanks
> Lianbo
>
>>
>> Warning: registers unable to refresh, the outputs of the following gdb
>> related commands are not reliable: 'bt', 'frame', 'up',
'down', 'info
>> locals'.
>>
>> What do you think?
>>
>> Thanks,
>> Tao Liu
>>
>>
>>
>> > fill_stackbuf(&bt_info);
>> >
>> > // reusing the get_dumpfile_regs function to get pt regs
structure
>> > --
>> > 2.50.1