Hi, Aditya
Sorry for the late reply, and thank you for the update.

On Wed, Aug 9, 2023 at 4:38 AM <crash-utility-request@redhat.com> wrote:
Date: Wed,  9 Aug 2023 02:03:17 +0530
From: Aditya Gupta <adityag@linux.ibm.com>
To: crash-utility@redhat.com
Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>, Sourabh Jain
        <sourabhjain@linux.ibm.com>, Hari Bathini <hbathini@linux.ibm.com>
Subject: [Crash-utility] [RFC PATCH v2 0/4] Improve stack unwind on
        ppc64
Message-ID: <20230808203321.241732-1-adityag@linux.ibm.com>
Content-Type: text/plain; charset=UTF-8

The Problem:
============

Currently crash is unable to show function arguments and local variables, as
 
That's true, we have to calculate and infer their values from the stack/registers, because they may be stored in registers or stack. This is not friendly to most kernel developers and debuggers.

Anyway, this is a good point. If inline functions can also be displayed, it would be better.

gdb can do. And functionality for moving between frames ('up'/'down') is not
working in crash.

Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs
'bt', 'frame', 'info locals', 'up', 'down' are not working either, due to
gdb not getting the register values from `crash_target::fetch_registers`,
which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64

Proposed Solution:
==================

Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64.
This way, "gdb mode in crash" will support this feature for both ELF and
kdump-compressed vmcore formats, while "gdb" would only have supported ELF
format

Implications on Architectures:
====================================

No architecture other than PPC64 has been affected, other than in case of
'frame' command

 
BTW: Can this feature be implemented on other architectures such as X86 64, etc? Have you investigated?

 
As mentioned in patch #2, since frame will not be prohibited, so it will print:

        crash> frame
        #0  <unavailable> in ?? ()

Instead of before prohibited message:

        crash> frame
        crash: prohibited gdb command: frame

On PPC64, the default mode ("crash mode") will not have ANY OTHER changes,
other than 'frame' as mentioned above.

Major change will be in 'gdb mode' on PPC64, that it will print the frames, and
local variables, instead of failing with errors showing no frame, or showing
that couldn't get PC

Testing:
========

Git tree with this patch series applied:
https://github.com/adi-g15-ibm/crash/tree/stack-unwind-rfc2

To test gdb passthroughs:

        crash> set gdb on
        gdb> thread 3 # or any other thread number to change context in gdb
        gdb> bt
        gdb> frame
        gdb> up
        gdb> down
        gdb> info locals

 
I did a simple test as below(kernel commit: 99d99825fc07):

gdb> info threads
  Id   Target Id         Frame
  1    CPU 0             <unavailable> in ?? ()
  2    CPU 1            
gdb> thread 2
[Switching to thread 2 (CPU 1)]
#0  0xc0000000002843f8 in crash_setup_regs (oldregs=<optimized out>, newregs=0xc00000003dbd7958) at ./arch/powerpc/include/asm/kexec.h:69
69                      ppc_save_regs(newregs);
gdb> bt
#0  0xc0000000002843f8 in crash_setup_regs (oldregs=<optimized out>, newregs=0xc00000003dbd7958) at ./arch/powerpc/include/asm/kexec.h:69
#1  __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:1064
#2  0xc00000000014e018 in panic (fmt=0xc000000001443d80 "sysrq triggered crash\n") at kernel/panic.c:359
#3  0xc0000000009b8978 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155
#4  0xc0000000009b946c in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:602
#5  0xc0000000009b9ce8 in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>, count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163
#6  0xc0000000006919fc in pde_write (ppos=<optimized out>, count=<optimized out>, buf=<optimized out>, file=<optimized out>, pde=0xc00000000556fcc0) at fs/proc/inode.c:340
#7  proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:352
#8  0xc0000000005b7cb8 in vfs_write (file=file@entry=0xc000000036fa5f00, buf=buf@entry=0x10027835560 <error: Cannot access memory at address 0x10027835560>, count=count@entry=2, pos=pos@entry=0xc00000003dbd7de0) at fs/read_write.c:582
#9  0xc0000000005b83a4 in ksys_write (fd=<optimized out>, buf=0x10027835560 <error: Cannot access memory at address 0x10027835560>, count=2) at fs/read_write.c:637
#10 0xc000000000031454 in system_call_exception (regs=0xc00000003dbd7e80, r0=<optimized out>) at arch/powerpc/kernel/syscall.c:153
#11 0xc00000000000cedc in system_call_vectored_common () at arch/powerpc/kernel/interrupt_64.S:198
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
gdb> frame 7
#7  proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:352
352                     rv = pde_write(pde, file, buf, count, ppos);
gdb> info rv
gdb: gdb request failed: info rv
gdb>
 
Seems that the 'info locals' command is not working as expected. I haven't investigated the details.


Known Issues:
=============

1. In gdb mode, 'info threads' might hang for few seconds, and print only 2
   threads

Hmm, it only prints 2 threads, and one of which is unavailable on my side. Can you try to dig into the details?
 
2. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
   from older kernels. This is a known issue due to register mismatch, and
   its fix has been merged upstream:

Commit: https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef785819e72db79

TODO:
=====

1. Introduce automatic thread selection in gdb mode, to select the crashing
   thread in gdb, eliminating the need to manually run "thread <id>" after
   switching to gdb mode.

Changelog:
==========

RFC V2:
  - removed patch implementing 'frame', 'up', 'down' in crash
  - updated the cover letter by removing the mention of those commands other
        than the respective gdb passthrough


In addition, the get_dumpfile_regs() is not invoked in the [patch 1], I would suggest moving it into the [patch 2]. Just a glance, I haven't looked at the patchset carefully.

Thanks.
Lianbo

Aditya Gupta (4):
  add generic get_dumpfile_regs to read registers
  ppc64: fix gdb passthrough by implementing machdep->get_cpu_reg
  remove 'frame' from prohibited commands list
  make cpu context change transparent to crash/gdb

 defs.h          | 125 ++++++++++++++++++++++++++++++++++++++++++++++++
 gdb-10.2.patch  |  28 +++++++++++
 gdb_interface.c |   2 +-
 kernel.c        |  33 +++++++++++++
 ppc64.c         | 105 ++++++++++++++++++++++++++++++++++++++--
 tools.c         |  12 +++--
 6 files changed, 298 insertions(+), 7 deletions(-)

--
2.41.0