New subject: [PATCH v6 00/14] gdb stack unwinding support for crash utility

Friday, 30 August 2024

On Mon, Aug 26, 2024 at 11:55 AM <devel-request(a)lists.crash-utility.osci.io&gt;
wrote:

...
 Date: Mon, 26 Aug 2024 15:52:26 +1200
 From: Tao Liu <ltao(a)redhat.com&gt;
 Subject: [Crash-utility] [PATCH v6 00/14] gdb stack unwinding support
         for crash utility
 To: devel(a)lists.crash-utility.osci.io
 Cc: Tao Liu <ltao(a)redhat.com&gt;
 Message-ID: <20240826035240.14781-1-ltao(a)redhat.com&gt;
 Content-Type: text/plain; charset=UTF-8

 This patchset is a rebase/merged version of the following 3 patchsets:

 1): [PATCH v10 0/5] Improve stack unwind on ppc64 [1]
 2): [PATCH 0/5] x86_64 gdb stack unwinding support [2]
 3): Clean up on top of one-thread-v2 [3]

 A complete description of gdb stack unwinding support for crash can be
 found in [1].

 This patchset can be divided into the following 2 parts:

 1) part1: arch independent, mainly modify on the
    crash_target.c/gdb_interface.c files, in preparation of the
    gdb side.
 2) part2: arch specific part, for implementing ppc64/x86_64/arm64/vmware
    gdb stack unwinding support.

 === part 2

 - arm64:
 arm64: Add gdb stack unwinding support

 - vmware:
 vmware_guestdump: Various format versions support
 set_context(): check if context is already current

 - x86_64:
 x86_64: Fix invalid input "=>" for bt command
 Fix cpumask_t recursive dependence issue
 x86_64: Add gdb stack unwinding support

 - ppc64:
 ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg

 === part 1

 Stop stack unwinding at non-kernel address
 Fix gdb_interface: restore gdb's output streams at end of gdb_interface
 Print task pid/command instead of CPU index
 Rename get_cpu_reg to get_current_task_reg
 Let crash change gdb context
 Leave only one gdb thread for crash
 Remove 'frame' from prohibited commands list
 ===

 v6 -> v5:
 1) Refactor patch 4 & 9, which changed the function signature of struct
    get_cpu_reg/get_current_task_reg, and let each patch compile with no
    error when added on.
 2) Rebased the patchset on top of latest upstream:
    ("79b93ecb2e72ec Fix a "Bus error" issue caused by 'crash
--osrelease'
 or
    crash loading")

 Thank you for the update, Tao.

I have a few more comments here:

[1] [PATCH v6 04/14]:

+       * Task is active, grab CPU's registers
+       */
+       if (is_task_active(tc->task) && VMSS_DUMPFILE())
+               return vmware_vmss_get_cpu_reg(tc->processor, regno, name,
size, value);

Can you help confirm that it only works for the active task? Is this
expected behavior?

[2] [PATCH v6 07/14]

#ifdef CRASH_MERGE
          CORE_ADDR pc = 0;
          get_frame_pc_if_available (fi, &pc);
          if (!is_kvaddr(pc)) {
            printf_filtered (_("Backtrace stopped due to non-kernel addr:
%lx\n"),pc);
            fi = NULL;
            break;
          }
#endif

I would suggest removing the above printf_filtered(...), otherwise it will
be always displayed.

[3] [PATCH v6 08/14]

a. warning
cc -c -g -DPPC64 -m64 -DLZO -DGDB_10_2  ppc64.c -Wall -O2
-Wstrict-prototypes -Wmissing-prototypes -fstack-protector
-Wformat-security
ppc64.c: In function ‘ppc64_get_current_task_reg’:
ppc64.c:2519:15: warning: unused variable ‘task’ [-Wunused-variable]
 2519 |         ulong task;
      |               ^~~~

b. There are different results between 'bt' and 'gdb bt' commands. Can
you
help double check?
crash> bt
PID: 6298     TASK: c000000050bbcb00  CPU: 2    COMMAND: "bash"
 R0:  c0000000002be63c    R1:  c00000005102b710    R2:  c000000001bf8f00
 R3:  c00000005102b728    R4:  0000000000000000    R5:  0000000000000000
 R6:  c00000005102b8d8    R7:  c00000000f5d0000    R8:  0000000000000000
 R9:  c0000000557d3c00    R10: 0000000000000001    R11: 0000000000002000
 R12: c0000000027634a8    R13: c00000000f6cdf00    R14: 0000000000000000
 R15: 0000000000000000    R16: 0000000000000000    R17: 0000000000000000
 R18: 0000000000000000    R19: 0000000000000000    R20: 0000000000000000
 R21: 0000000000000000    R22: 0000000000000000    R23: 0000000000000000
 R24: 0000000000000007    R25: 0000000000000000    R26: 0000000000000000
 R27: c00000000274d898    R28: c000000002d0a8d0    R29: c000000002e23df0
 R30: 0000000000000000    R31: c000000002e23ddc
 NIP: c0000000002bdf64    MSR: 8000000000009033    OR3: 0000000000000000
 CTR: 0000000000000000    LR:  c0000000002be63c    XER: 0000000020040004
 CCR: 0000000024222280    MQ:  0000000000000001    DAR: 0000000000000000
 DSISR: 0000000000000000     Syscall Result: 0000000000000000
 [NIP  : crash_setup_regs+68]
 [LR   : __crash_kexec+156]
 #0 [c00000005102b710] crash_setup_regs at c0000000002bdf64
crash> gdb bt
#0  0xc0000000002bdf64 in crash_setup_regs (newregs=0xc00000005102b728,
oldregs=0x0) at ./arch/powerpc/include/asm/kexec.h:133
#1  0xc0000000002be658 in __crash_kexec (regs=0x0) at
kernel/crash_core.c:122
#2  0xc00000000016c284 in panic (fmt=0xc0000000015dd018 "sysrq triggered
crash\n") at kernel/panic.c:367
#3  0xc000000000a66e78 in sysrq_handle_crash (key=<optimized out>) at
drivers/tty/sysrq.c:154
#4  0xc000000000a67994 in __handle_sysrq (key=key@entry=99 'c',
check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:612
#5  0xc000000000a68454 in write_sysrq_trigger (file=<optimized out>,
buf=<optimized out>, count=2, ppos=<optimized out>) at
drivers/tty/sysrq.c:1181
#6  0xc00000000072868c in pde_write (pde=0xc000000003671f80,
file=<optimized out>, buf=<optimized out>, count=<optimized out>,
ppos=<optimized out>) at fs/proc/inode.c:334
#7  proc_reg_write (file=<optimized out>, buf=<optimized out>,
count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:346
#8  0xc00000000063bf20 in vfs_write (file=0xc000000009cb5b00,
buf=0x100123a71a0 <error: Cannot access memory at address 0x100123a71a0>,
count=2, pos=0xc00000005102bc00) at fs/read_write.c:588
#9  vfs_write (file=0xc000000009cb5b00, buf=0x100123a71a0 <error: Cannot
access memory at address 0x100123a71a0>, count=<optimized out>,
pos=0xc00000005102bc00) at fs/read_write.c:570
#10 0xc00000000063c4d0 in ksys_write (fd=<optimized out>, buf=0x100123a71a0
<error: Cannot access memory at address 0x100123a71a0>, count=2) at
fs/read_write.c:643
#11 0xc000000000031a28 in system_call_exception (regs=0xc00000005102be80,
r0=<optimized out>) at arch/powerpc/kernel/syscall.c:153
#12 0xc00000000000d05c in system_call_vectored_common () at
arch/powerpc/kernel/interrupt_64.S:198
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
crash>

[4] [PATCH v6 10/14]

diff --git a/xen_hyper.c b/xen_hyper.c
index 32e56fa..54c7f57 100644
--- a/xen_hyper.c
+++ b/xen_hyper.c
@@ -52,7 +52,7 @@ xen_hyper_init(void)
         */
        xht->xen_virt_start &= 0xffffffffc0000000;
 #endif
-
+       STRUCT_SIZE_INIT(cpumask_t, "cpumask_t");
        if (machine_type("X86_64") &&
            symbol_exists("xen_phys_start") && !xen_phys_start())
                error(WARNING,

Could you please explain why the above changes are needed?

[5] [PATCH v6 14/14]

a. warning
cc -c -g -DARM64 -DLZO -DGDB_10_2  arm64.c -Wall -O2 -Wstrict-prototypes
-Wmissing-prototypes -fstack-protector -Wformat-security
arm64.c: In function ‘arm64_get_stack_frame’:
arm64.c:4138:13: warning: variable ‘ret’ set but not used
[-Wunused-but-set-variable]
 4138 |         int ret;
      |             ^~~

b.  indentation issue
-                               bt->task);
+                ur_bitmap = (struct user_regs_bitmap_struct
*)GETBUF(sizeof(*ur_bitmap));
+                memset(ur_bitmap, 0, sizeof(*ur_bitmap));
+               ur_bitmap->ur.pc = stackframe.pc;

[6] The 'info threads' command only displays the current task, not all
known tasks. And this looks different behavior from the gdb
gdb> info threads
  Id   Target Id         Frame
* 1    6298 bash         0xc0000000002bdf64 in crash_setup_regs
(newregs=0xc00000005102b728, oldregs=0x0) at
./arch/powerpc/include/asm/kexec.h:133

[7] the 'thread' command can not switch to another task.
crash> ps
...
     6297    6287   0  c000000057517e80  IN   0.0    22784    11776
 sshd-session
...
    6298    6297   2  c000000050bbcb00  RU   0.0     9664     6272 
bash ...

gdb> thread 6297
gdb: gdb request failed: thread 6297
gdb>

Finally I have to do it with 'set' command as below:

crash> set 6297
    PID: 6297
COMMAND: "sshd-session"
   TASK: c000000057517e80  [THREAD_INFO: c000000057517e80]
    CPU: 0
  STATE: TASK_INTERRUPTIBLE
crash> set gdb on
gdb: on
gdb> bt
#0  0xc00000009cd57650 in ?? ()
gdb: gdb request failed: bt
gdb>



In addition, the code is changed, but its patch log is not updated
accordingly. Could you please double check? I won't list them here.

Thanks.
Lianbo

v5 -> v4:
...
 1) Plenty of code refactoring based on Lianbo's comments on v4.
 2) Removed the magic number when dealing with regs bitmap, see [6].
 3) Rebased the patchset on top of latest upstream:
    ("1c6da3eaff8207 arm64: Fix bt command show wrong stacktrace on ramdump
 source")

 v4 -> v3:
 Fixed the author issue in [PATCH v3 06/16] Fix gdb_interface: restore gdb's
 output streams at end of gdb_interface.

 v3 -> v2:
 1) Updated CC list as pointed out in [4]
 2) Compiling issues as in [5]

 v2 -> v1:
 1) Added the patch: x86_64: Fix invalid input "=>" for bt command,
    thanks for Kazu's testing.
 2) Modify the patch: x86_64: Add gdb stack unwinding support, added the
    pcp_save, spp_save and sp, for restoring the value in match of the
 original
    code logic.

 [1]:
 https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00469.html
 [2]:
 https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00488.html
 [3]:
 https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00554.html
 [4]:
 https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00681.html
 [5]:
 https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00715.html
 [6]:
 https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00819.html

 Aditya Gupta (3):
   Remove 'frame' from prohibited commands list
   Fix gdb_interface: restore gdb's output streams at end of
     gdb_interface
   ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg

 Alexey Makhalov (2):
   set_context(): check if context is already current
   vmware_guestdump: Various format versions support

 Tao Liu (9):
   Leave only one gdb thread for crash
   Let crash change gdb context
   Rename get_cpu_reg to get_current_task_reg
   Print task pid/command instead of CPU index
   Stop stack unwinding at non-kernel address
   x86_64: Add gdb stack unwinding support
   Fix cpumask_t recursive dependence issue
   x86_64: Fix invalid input "=>" for bt command
   arm64: Add gdb stack unwinding support

  arm64.c            | 115 +++++++++++++++-
  crash_target.c     |  71 ++++++----
  defs.h             | 194 ++++++++++++++++++++++++++-
  gdb-10.2.patch     |  82 ++++++++++++
  gdb_interface.c    |  35 ++---
  kernel.c           |  63 +++++++--
  ppc64.c            | 175 +++++++++++++++++++++++-
  symbols.c          |  15 +++
  task.c             |  34 +++--
  tools.c            |  10 +-
  unwind_x86_64.h    |   4 -
  vmware_guestdump.c | 321 +++++++++++++++++++++++++++++++-------------
  x86_64.c           | 323 ++++++++++++++++++++++++++++++++++++++++-----
  xen_hyper.c        |   2 +-
  14 files changed, 1225 insertions(+), 219 deletions(-)

 --
 2.40.1
 
    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [PATCH v6 00/14] gdb stack unwinding support for crash utility