On Wed, Jun 7, 2023 at 8:00 PM <crash-utility-request@redhat.com> wrote:
Date: Wed,  7 Jun 2023 18:37:33 +0900
From: HATAYAMA Daisuke <d.hatayama@fujitsu.com>
To: crash-utility@redhat.com
Cc: d.hatayama@fujitsu.com
Subject: [Crash-utility] [PATCH 1/2] Revert "Fix segfault in
        arm64_is_kernel_exception_frame() when corrupt stack pointer address
        is given"
Message-ID: <20230607093734.247-1-d.hatayama@fujitsu.com>
Content-Type: text/plain; charset="US-ASCII"; x-default=true

This reverts commit 9868ebc8e648e5791764a51567a23efae7170d9b.

The commit 9868ebc8e648e5791764a51567a23efae7170d9b causes the issue
that bt command fails to show backtraces for the tasks that is running
in the user mode at the moment of the kernel panic as follows:

  crash> bt 1734
  PID: 1734     TASK: ffff000000392200  CPU: 4     COMMAND: "insmod"
  bt: invalid stack pointer is given


Thank you for pointing out this issue, HATAYAMA.

Anyway, I did not reproduce the above issue. Seems it can not always be reproduced.

# ./crash /home/vmlinux /var/crash/127.0.0.1-2023-06-09-05\:20\:38/vmcore -s
WARNING: cpu 2: invalid NT_PRSTATUS note (n_type != NT_PRSTATUS)
WARNING: cpu 1: cannot find NT_PRSTATUS note
WARNING: cpu 2: cannot find NT_PRSTATUS note
crash> ps insmod
      PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
     1684    1683   0  ffff06738f1cdd00  ZO   0.0        0        0  insmod
crash> bt 1684
PID: 1684     TASK: ffff06738f1cdd00  CPU: 0    COMMAND: "insmod"
(no stack)
crash>

Thanks.
Lianbo
 
The root cause is that while the commit added a sanity check into
STACK_OFFSET_TYPE() to validate if a given candidate address of any
interrupt or exception frame is contained within the range of the
corresponding kernel stack, the premise that the STACK_OFFSET_TYPE()
should not return out-of-the-buffer address, is wrong.

Reexamining the relevant surrounding part of the backtracing code, it
looks to me now that the STACK_OFFSET_TYPE() is originally expected to
return an out-of-the-buffer address, like the address of the top of
the corresponding kernel stack, e.g. at here:

  static int
  arm64_in_kdump_text(struct bt_info *bt, struct arm64_stackframe *frame)
  {
  ...
          if (bt->flags & BT_USER_SPACE)
                  start = (ulong *)&bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(bt->stacktop))];
          else {

Note that the above bt 1734 aborts here.

Hence, the current implementation policy around STACK_OFFSET_TYPE()
looks that the caller side is responsible for understanding the fact
in advance and for avoiding making buffer overrun carefully.

To fix this issue, revert the commit.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@fujitsu.com>
---
 defs.h | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/defs.h b/defs.h
index bfda0c4..3e7d6cf 100644
--- a/defs.h
+++ b/defs.h
@@ -976,10 +976,7 @@ struct bt_info {

 #define STACK_OFFSET_TYPE(OFF) \
   (((ulong)(OFF) > STACKSIZE()) ? \
-  (((ulong)(OFF) < (ulong)(bt->stackbase) || (ulong)(OFF) >= (ulong)(bt->stackbase) + STACKSIZE()) ? \
-  error(FATAL, "invalid stack pointer is given\n") :                   \
-   (ulong)((ulong)(OFF) - (ulong)(bt->stackbase))) :                   \
-   (ulong)(OFF))
+  (ulong)((ulong)(OFF) - (ulong)(bt->stackbase)) : (ulong)(OFF))

 #define GET_STACK_ULONG(OFF) \
  *((ulong *)((char *)(&bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(OFF))])))
--
2.25.1