On Mon, Aug 26, 2024 at 11:55 AM <devel-request@lists.crash-utility.osci.io> wrote:
Date: Mon, 26 Aug 2024 15:52:26 +1200
From: Tao Liu <ltao@redhat.com>
Subject: [Crash-utility] [PATCH v6 00/14] gdb stack unwinding support
        for crash utility
To: devel@lists.crash-utility.osci.io
Cc: Tao Liu <ltao@redhat.com>
Message-ID: <20240826035240.14781-1-ltao@redhat.com>
Content-Type: text/plain; charset=UTF-8

This patchset is a rebase/merged version of the following 3 patchsets:

1): [PATCH v10 0/5] Improve stack unwind on ppc64 [1]
2): [PATCH 0/5] x86_64 gdb stack unwinding support [2]
3): Clean up on top of one-thread-v2 [3]

A complete description of gdb stack unwinding support for crash can be
found in [1].

This patchset can be divided into the following 2 parts:

1) part1: arch independent, mainly modify on the
   crash_target.c/gdb_interface.c files, in preparation of the
   gdb side.
2) part2: arch specific part, for implementing ppc64/x86_64/arm64/vmware
   gdb stack unwinding support.

=== part 2

- arm64:
arm64: Add gdb stack unwinding support

- vmware:
vmware_guestdump: Various format versions support
set_context(): check if context is already current

- x86_64:
x86_64: Fix invalid input "=>" for bt command
Fix cpumask_t recursive dependence issue
x86_64: Add gdb stack unwinding support

- ppc64:
ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg

=== part 1

Stop stack unwinding at non-kernel address
Fix gdb_interface: restore gdb's output streams at end of gdb_interface
Print task pid/command instead of CPU index
Rename get_cpu_reg to get_current_task_reg
Let crash change gdb context
Leave only one gdb thread for crash
Remove 'frame' from prohibited commands list
===

v6 -> v5:
1) Refactor patch 4 & 9, which changed the function signature of struct
   get_cpu_reg/get_current_task_reg, and let each patch compile with no
   error when added on.
2) Rebased the patchset on top of latest upstream:
   ("79b93ecb2e72ec Fix a "Bus error" issue caused by 'crash --osrelease' or
   crash loading")


Thank you for the update, Tao.

I have a few more comments here:

[1] [PATCH v6 04/14]:

+       * Task is active, grab CPU's registers
+       */
+       if (is_task_active(tc->task) && VMSS_DUMPFILE())
+               return vmware_vmss_get_cpu_reg(tc->processor, regno, name, size, value);

Can you help confirm that it only works for the active task? Is this expected behavior?

[2] [PATCH v6 07/14]

#ifdef CRASH_MERGE
          CORE_ADDR pc = 0;
          get_frame_pc_if_available (fi, &pc);
          if (!is_kvaddr(pc)) {
            printf_filtered (_("Backtrace stopped due to non-kernel addr: %lx\n"),pc);
            fi = NULL;
            break;
          }
#endif

I would suggest removing the above printf_filtered(...), otherwise it will be always displayed.

[3] [PATCH v6 08/14]

a. warning
cc -c -g -DPPC64 -m64 -DLZO -DGDB_10_2  ppc64.c -Wall -O2 -Wstrict-prototypes -Wmissing-prototypes -fstack-protector -Wformat-security
ppc64.c: In function ‘ppc64_get_current_task_reg’:
ppc64.c:2519:15: warning: unused variable ‘task’ [-Wunused-variable]
 2519 |         ulong task;
      |               ^~~~

b. There are different results between 'bt' and 'gdb bt' commands. Can you help double check?
crash> bt
PID: 6298     TASK: c000000050bbcb00  CPU: 2    COMMAND: "bash"
 R0:  c0000000002be63c    R1:  c00000005102b710    R2:  c000000001bf8f00  
 R3:  c00000005102b728    R4:  0000000000000000    R5:  0000000000000000  
 R6:  c00000005102b8d8    R7:  c00000000f5d0000    R8:  0000000000000000  
 R9:  c0000000557d3c00    R10: 0000000000000001    R11: 0000000000002000  
 R12: c0000000027634a8    R13: c00000000f6cdf00    R14: 0000000000000000  
 R15: 0000000000000000    R16: 0000000000000000    R17: 0000000000000000  
 R18: 0000000000000000    R19: 0000000000000000    R20: 0000000000000000  
 R21: 0000000000000000    R22: 0000000000000000    R23: 0000000000000000  
 R24: 0000000000000007    R25: 0000000000000000    R26: 0000000000000000  
 R27: c00000000274d898    R28: c000000002d0a8d0    R29: c000000002e23df0  
 R30: 0000000000000000    R31: c000000002e23ddc  
 NIP: c0000000002bdf64    MSR: 8000000000009033    OR3: 0000000000000000
 CTR: 0000000000000000    LR:  c0000000002be63c    XER: 0000000020040004
 CCR: 0000000024222280    MQ:  0000000000000001    DAR: 0000000000000000
 DSISR: 0000000000000000     Syscall Result: 0000000000000000
 [NIP  : crash_setup_regs+68]
 [LR   : __crash_kexec+156]
 #0 [c00000005102b710] crash_setup_regs at c0000000002bdf64
crash> gdb bt
#0  0xc0000000002bdf64 in crash_setup_regs (newregs=0xc00000005102b728, oldregs=0x0) at ./arch/powerpc/include/asm/kexec.h:133
#1  0xc0000000002be658 in __crash_kexec (regs=0x0) at kernel/crash_core.c:122
#2  0xc00000000016c284 in panic (fmt=0xc0000000015dd018 "sysrq triggered crash\n") at kernel/panic.c:367
#3  0xc000000000a66e78 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:154
#4  0xc000000000a67994 in __handle_sysrq (key=key@entry=99 'c', check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:612
#5  0xc000000000a68454 in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>, count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1181
#6  0xc00000000072868c in pde_write (pde=0xc000000003671f80, file=<optimized out>, buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:334
#7  proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:346
#8  0xc00000000063bf20 in vfs_write (file=0xc000000009cb5b00, buf=0x100123a71a0 <error: Cannot access memory at address 0x100123a71a0>, count=2, pos=0xc00000005102bc00) at fs/read_write.c:588
#9  vfs_write (file=0xc000000009cb5b00, buf=0x100123a71a0 <error: Cannot access memory at address 0x100123a71a0>, count=<optimized out>, pos=0xc00000005102bc00) at fs/read_write.c:570
#10 0xc00000000063c4d0 in ksys_write (fd=<optimized out>, buf=0x100123a71a0 <error: Cannot access memory at address 0x100123a71a0>, count=2) at fs/read_write.c:643
#11 0xc000000000031a28 in system_call_exception (regs=0xc00000005102be80, r0=<optimized out>) at arch/powerpc/kernel/syscall.c:153
#12 0xc00000000000d05c in system_call_vectored_common () at arch/powerpc/kernel/interrupt_64.S:198
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
crash> 

[4] [PATCH v6 10/14]
diff --git a/xen_hyper.c b/xen_hyper.c
index 32e56fa..54c7f57 100644
--- a/xen_hyper.c
+++ b/xen_hyper.c
@@ -52,7 +52,7 @@ xen_hyper_init(void)
         */
        xht->xen_virt_start &= 0xffffffffc0000000;
 #endif
-
+       STRUCT_SIZE_INIT(cpumask_t, "cpumask_t");
        if (machine_type("X86_64") &&
            symbol_exists("xen_phys_start") && !xen_phys_start())
                error(WARNING,

Could you please explain why the above changes are needed?

[5] [PATCH v6 14/14]

a. warning
cc -c -g -DARM64 -DLZO -DGDB_10_2  arm64.c -Wall -O2 -Wstrict-prototypes -Wmissing-prototypes -fstack-protector -Wformat-security
arm64.c: In function ‘arm64_get_stack_frame’:
arm64.c:4138:13: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable]
 4138 |         int ret;
      |             ^~~

b.  indentation issue
-                               bt->task);
+                ur_bitmap = (struct user_regs_bitmap_struct *)GETBUF(sizeof(*ur_bitmap));
+                memset(ur_bitmap, 0, sizeof(*ur_bitmap));
+               ur_bitmap->ur.pc = stackframe.pc;

[6] The 'info threads' command only displays the current task, not all known tasks. And this looks different behavior from the gdb
gdb> info threads
  Id   Target Id         Frame
* 1    6298 bash         0xc0000000002bdf64 in crash_setup_regs (newregs=0xc00000005102b728, oldregs=0x0) at ./arch/powerpc/include/asm/kexec.h:133

[7] the 'thread' command can not switch to another task.
crash> ps
...
     6297    6287   0  c000000057517e80  IN   0.0    22784    11776  sshd-session
>    6298    6297   2  c000000050bbcb00  RU   0.0     9664     6272  bash
...

gdb> thread 6297
gdb: gdb request failed: thread 6297
gdb> 

Finally I have to do it with 'set' command as below:

crash> set 6297
    PID: 6297
COMMAND: "sshd-session"
   TASK: c000000057517e80  [THREAD_INFO: c000000057517e80]
    CPU: 0
  STATE: TASK_INTERRUPTIBLE
crash> set gdb on
gdb: on
gdb> bt
#0  0xc00000009cd57650 in ?? ()
gdb: gdb request failed: bt
gdb> 



In addition, the code is changed, but its patch log is not updated accordingly. Could you please double check? I won't list them here.

Thanks.
Lianbo

v5 -> v4:
1) Plenty of code refactoring based on Lianbo's comments on v4.
2) Removed the magic number when dealing with regs bitmap, see [6].
3) Rebased the patchset on top of latest upstream:
   ("1c6da3eaff8207 arm64: Fix bt command show wrong stacktrace on ramdump source")

v4 -> v3:
Fixed the author issue in [PATCH v3 06/16] Fix gdb_interface: restore gdb's
output streams at end of gdb_interface.

v3 -> v2:
1) Updated CC list as pointed out in [4]
2) Compiling issues as in [5]

v2 -> v1:
1) Added the patch: x86_64: Fix invalid input "=>" for bt command,
   thanks for Kazu's testing.
2) Modify the patch: x86_64: Add gdb stack unwinding support, added the
   pcp_save, spp_save and sp, for restoring the value in match of the original
   code logic.

[1]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00469.html
[2]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00488.html
[3]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00554.html
[4]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00681.html
[5]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00715.html
[6]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00819.html

Aditya Gupta (3):
  Remove 'frame' from prohibited commands list
  Fix gdb_interface: restore gdb's output streams at end of
    gdb_interface
  ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg

Alexey Makhalov (2):
  set_context(): check if context is already current
  vmware_guestdump: Various format versions support

Tao Liu (9):
  Leave only one gdb thread for crash
  Let crash change gdb context
  Rename get_cpu_reg to get_current_task_reg
  Print task pid/command instead of CPU index
  Stop stack unwinding at non-kernel address
  x86_64: Add gdb stack unwinding support
  Fix cpumask_t recursive dependence issue
  x86_64: Fix invalid input "=>" for bt command
  arm64: Add gdb stack unwinding support

 arm64.c            | 115 +++++++++++++++-
 crash_target.c     |  71 ++++++----
 defs.h             | 194 ++++++++++++++++++++++++++-
 gdb-10.2.patch     |  82 ++++++++++++
 gdb_interface.c    |  35 ++---
 kernel.c           |  63 +++++++--
 ppc64.c            | 175 +++++++++++++++++++++++-
 symbols.c          |  15 +++
 task.c             |  34 +++--
 tools.c            |  10 +-
 unwind_x86_64.h    |   4 -
 vmware_guestdump.c | 321 +++++++++++++++++++++++++++++++-------------
 x86_64.c           | 323 ++++++++++++++++++++++++++++++++++++++++-----
 xen_hyper.c        |   2 +-
 14 files changed, 1225 insertions(+), 219 deletions(-)

--
2.40.1