Hi, Aditya
Sorry for the late reply, and thank you for the update.
On Wed, Aug 9, 2023 at 4:38 AM <crash-utility-request(a)redhat.com> wrote:
 Date: Wed,  9 Aug 2023 02:03:17 +0530
 From: Aditya Gupta <adityag(a)linux.ibm.com>
 To: crash-utility(a)redhat.com
 Cc: Mahesh J Salgaonkar <mahesh(a)linux.ibm.com>, Sourabh Jain
         <sourabhjain(a)linux.ibm.com>, Hari Bathini <hbathini(a)linux.ibm.com>
 Subject: [Crash-utility] [RFC PATCH v2 0/4] Improve stack unwind on
         ppc64
 Message-ID: <20230808203321.241732-1-adityag(a)linux.ibm.com>
 Content-Type: text/plain; charset=UTF-8
 The Problem:
 ============
 Currently crash is unable to show function arguments and local variables,
 as
 
That's true, we have to calculate and infer their values from the
stack/registers, because they may be stored in registers or stack. This is
not friendly to most kernel developers and debuggers.
Anyway, this is a good point. If inline functions can also be displayed, it
would be better.
gdb can do. And functionality for moving between frames ('up'/'down') is
not
 working in crash.
 Crash has 'gdb passthroughs' for things gdb can do, but the gdb
 passthroughs
 'bt', 'frame', 'info locals', 'up', 'down' are
not working either, due to
 gdb not getting the register values from `crash_target::fetch_registers`,
 which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64
 Proposed Solution:
 ==================
 Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64.
 This way, "gdb mode in crash" will support this feature for both ELF and
 kdump-compressed vmcore formats, while "gdb" would only have supported ELF
 format
 Implications on Architectures:
 ====================================
 No architecture other than PPC64 has been affected, other than in case of
 'frame' command
 
BTW: Can this feature be implemented on other architectures such as X86 64,
etc? Have you investigated?
 As mentioned in patch #2, since frame will not be prohibited, so it
will
 print:
         crash> frame
         #0  <unavailable> in ?? ()
 Instead of before prohibited message:
         crash> frame
         crash: prohibited gdb command: frame
 On PPC64, the default mode ("crash mode") will not have ANY OTHER changes,
 other than 'frame' as mentioned above.
 Major change will be in 'gdb mode' on PPC64, that it will print the
 frames, and
 local variables, instead of failing with errors showing no frame, or
 showing
 that couldn't get PC
 Testing:
 ========
 Git tree with this patch series applied:
 
https://github.com/adi-g15-ibm/crash/tree/stack-unwind-rfc2
 To test gdb passthroughs:
         crash> set gdb on
         gdb> thread 3 # or any other thread number to change context in gdb
         gdb> bt
         gdb> frame
         gdb> up
         gdb> down
         gdb> info locals
 
I did a simple test as below(kernel commit: 99d99825fc07):
gdb> info threads
  Id   Target Id         Frame
  1    CPU 0             <unavailable> in ?? ()
  2    CPU 1
gdb> thread 2
[Switching to thread 2 (CPU 1)]
#0  0xc0000000002843f8 in crash_setup_regs (oldregs=<optimized out>,
newregs=0xc00000003dbd7958) at ./arch/powerpc/include/asm/kexec.h:69
69                      ppc_save_regs(newregs);
gdb> bt
#0  0xc0000000002843f8 in crash_setup_regs (oldregs=<optimized out>,
newregs=0xc00000003dbd7958) at ./arch/powerpc/include/asm/kexec.h:69
#1  __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:1064
#2  0xc00000000014e018 in panic (fmt=0xc000000001443d80 "sysrq triggered
crash\n") at kernel/panic.c:359
#3  0xc0000000009b8978 in sysrq_handle_crash (key=<optimized out>) at
drivers/tty/sysrq.c:155
#4  0xc0000000009b946c in __handle_sysrq (key=key@entry=99,
check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:602
#5  0xc0000000009b9ce8 in write_sysrq_trigger (file=<optimized out>,
buf=<optimized out>, count=2, ppos=<optimized out>) at
drivers/tty/sysrq.c:1163
#6  0xc0000000006919fc in pde_write (ppos=<optimized out>, count=<optimized
out>, buf=<optimized out>, file=<optimized out>, pde=0xc00000000556fcc0)
at
fs/proc/inode.c:340
#7  proc_reg_write (file=<optimized out>, buf=<optimized out>,
count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:352
#8  0xc0000000005b7cb8 in vfs_write (file=file@entry=0xc000000036fa5f00,
buf=buf@entry=0x10027835560 <error: Cannot access memory at address
0x10027835560>, count=count@entry=2, pos=pos@entry=0xc00000003dbd7de0) at
fs/read_write.c:582
#9  0xc0000000005b83a4 in ksys_write (fd=<optimized out>, buf=0x10027835560
<error: Cannot access memory at address 0x10027835560>, count=2) at
fs/read_write.c:637
#10 0xc000000000031454 in system_call_exception (regs=0xc00000003dbd7e80,
r0=<optimized out>) at arch/powerpc/kernel/syscall.c:153
#11 0xc00000000000cedc in system_call_vectored_common () at
arch/powerpc/kernel/interrupt_64.S:198
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
gdb> frame 7
#7  proc_reg_write (file=<optimized out>, buf=<optimized out>,
count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:352
352                     rv = pde_write(pde, file, buf, count, ppos);
gdb> info rv
gdb: gdb request failed: info rv
gdb>
Seems that the 'info locals' command is not working as expected. I haven't
investigated the details.
Known Issues:
 =============
 1. In gdb mode, 'info threads' might hang for few seconds, and print only 2
    threads
 
Hmm, it only prints 2 threads, and one of which is unavailable on my side.
Can you try to dig into the details?
 2. In gdb mode, 'bt' might fail to show backtrace in few
vmcores collected
    from older kernels. This is a known issue due to register mismatch, and
    its fix has been merged upstream:
 Commit:
 
https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef78581...
 TODO:
 =====
 1. Introduce automatic thread selection in gdb mode, to select the crashing
    thread in gdb, eliminating the need to manually run "thread <id>"
after
    switching to gdb mode.
 Changelog:
 ==========
 RFC V2:
   - removed patch implementing 'frame', 'up', 'down' in crash
   - updated the cover letter by removing the mention of those commands
 other
         than the respective gdb passthrough
 
In addition, the get_dumpfile_regs() is not invoked in the [patch 1], I
would suggest moving it into the [patch 2]. Just a glance, I haven't looked
at the patchset carefully.
Thanks.
Lianbo
Aditya Gupta (4):
   add generic get_dumpfile_regs to read registers
   ppc64: fix gdb passthrough by implementing machdep->get_cpu_reg
   remove 'frame' from prohibited commands list
   make cpu context change transparent to crash/gdb
  defs.h          | 125 ++++++++++++++++++++++++++++++++++++++++++++++++
  gdb-10.2.patch  |  28 +++++++++++
  gdb_interface.c |   2 +-
  kernel.c        |  33 +++++++++++++
  ppc64.c         | 105 ++++++++++++++++++++++++++++++++++++++--
  tools.c         |  12 +++--
  6 files changed, 298 insertions(+), 7 deletions(-)
 --
 2.41.0