On Wed, Sep 18, 2024 at 3:37 PM <devel-request@lists.crash-utility.osci.io> wrote:
Date: Wed, 18 Sep 2024 11:42:04 +1200
From: Tao Liu <ltao@redhat.com>
Subject: [Crash-utility] [PATCH v1 2/3] ppc64: check sp at the start
        of stack back trace
To: devel@lists.crash-utility.osci.io
Cc: adityag@linux.ibm.com,      Tao Liu <ltao@redhat.com>
Message-ID: <20240917234205.7783-3-ltao@redhat.com>
Content-Type: text/plain; charset="US-ASCII"; x-default=true

Previously the newsp read from stack is unchecked, it would expect
the newsp hold a valid sp pointer of the next stack frame by default,
which is not always true, see the following stack:

I guess it might be related to the Power Arch 64-Bit ELF V2 ABI changes, please refer to this link(see the section: The Stack Frame):
[1] https://ftp.rtems.org/pub/rtems/people/sebh/ABI64BitOpenPOWERv1.1_16July2015_pub.pdf
[2] https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#STACK
[3] https://openpowerfoundation.org/specifications/64bitelfabi/
Before that It was always correct, but It might not be correct after ABI changes. I copied it here with a minor change, maybe it is not very accurate.
High Address

          +-> Back chain
          |   Floating point register save area
          |   General register save area
          |   VRSAVE save word (32-bits)
          |   Alignment padding (4 or 12 bytes)
          |   Vector register save area (quadword aligned)
          |   Local variable space
          |   Parameter save area    (SP + 48)
          |   TOC save area          (SP + 40)
          |   link editor doubleword (SP + 32)
          |   compiler doubleword    (SP + 24)
          |   LR save area           (SP + 16)
          |   CR save area           (SP + 8)
SP  --->  +-- Back chain             (SP + 0)

Low Address
|
|
\/
High Address

          +-> Back chain
          |   Floating point register save area
          |   General register save area
          |   Alignment padding (4 or 12 bytes)
          |   Vector register save area (quadword aligned)
          |   Local variable space
          |   Parameter save area    (SP + 32)
          |   TOC Pointer Doubleword (SP + 24)
          |   LR save area           (SP + 16)
          |   Reserved               (SP + 12)
          |   CR save area           (SP + 8)
SP  --->  +-- Back chain             (SP + 0)

Low Address


    R1: c00000000a327660  NIP: c0000000002b9984
 #0 [c00000000a327660] crash_setup_regs at c0000000002b9984
    c00000000a327660: c00000000015a7b0 c000000001bc8600
    c00000000a327670: 0000000000000000 c000000002e13b54
    c00000000a327680: c00000000a327850 0000000000004000
    c00000000a327690: c0000000002ba078 c0000000026f2ca8

The newsp will be c00000000015a7b0, which is not a valid stack pointer
and as a result it will fail the stack back trace afterwards. This usually
happens at the beginning of back trace, once we get the correct sp value,
the stack can be back traced successfully.

This patch will search and check for the valid newsp at the beginning of
stack back trace. For the above example, a valid newsp would be assigned
as c00000000a327850.

Before:
  crash> bt
  ...
  #0 [c00000000a327660] crash_setup_regs at c0000000002b9984
After:
  crash> bt
  ...
  #0 [c00000000a327660] crash_setup_regs at c0000000002b9984
  #1 [c00000000a327850] panic at c000000000167de4
  #2 [c00000000a3278f0] sysrq_handle_crash at c000000000a4d3a8
  #3 [c00000000a327950] __handle_sysrq at c000000000a4dec4
  #4 [c00000000a3279f0] write_sysrq_trigger at c000000000a4e984
  ...

Signed-off-by: Tao Liu <ltao@redhat.com>
---
 ppc64.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/ppc64.c b/ppc64.c
index 6e5f155..8af7b8a 100644
--- a/ppc64.c
+++ b/ppc64.c
@@ -2148,6 +2148,19 @@ ppc64_back_trace(struct gnu_request *req, struct bt_info *bt)

        while (INSTACK(req->sp, bt)) {
                newsp = *(ulong *)&bt->stackbuf[req->sp - bt->stackbase];
+
+               if (!newpc) {
+                       ulong *up;
+                       int i;
+                       for (i = 0, up = (ulong *)&bt->stackbuf[req->sp - bt->stackbase];
+                            i < (req->sp - bt->stackbase) / sizeof(ulong); i++, up++) {
+                               if (INSTACK(*up, bt) && is_kernel_text(*(up + 2))) {
+                                       newsp = *up;
+                                       break;
+                               }
+                       }
+               }
+

Can we decide what to do based on whether the ELF ABI V2 is enabled or not? Because the Stack Frame Organization won't be changed in the spec v2. Just an idea.

Thanks
Lianbo


                if ((req->name = closest_symbol(req->pc)) == NULL) {
                        if (CRASHDEBUG(1)) {
                                error(FATAL,
--
2.40.1