Date: Wed, 7 Jun 2023 18:37:33 +0900
From: HATAYAMA Daisuke <d.hatayama@fujitsu.com>
To: crash-utility@redhat.com
Cc: d.hatayama@fujitsu.com
Subject: [Crash-utility] [PATCH 1/2] Revert "Fix segfault in
arm64_is_kernel_exception_frame() when corrupt stack pointer address
is given"
Message-ID: <20230607093734.247-1-d.hatayama@fujitsu.com>
Content-Type: text/plain; charset="US-ASCII"; x-default=true
This reverts commit 9868ebc8e648e5791764a51567a23efae7170d9b.
The commit 9868ebc8e648e5791764a51567a23efae7170d9b causes the issue
that bt command fails to show backtraces for the tasks that is running
in the user mode at the moment of the kernel panic as follows:
crash> bt 1734
PID: 1734 TASK: ffff000000392200 CPU: 4 COMMAND: "insmod"
bt: invalid stack pointer is given
Thank you for pointing out this issue, HATAYAMA.
Anyway, I did not reproduce the above issue. Seems it can not always be reproduced.
# ./crash /home/vmlinux /var/crash/127.0.0.1-2023-06-09-05\:20\:38/vmcore -s
WARNING: cpu 2: invalid NT_PRSTATUS note (n_type != NT_PRSTATUS)
WARNING: cpu 1: cannot find NT_PRSTATUS note
WARNING: cpu 2: cannot find NT_PRSTATUS note
crash> ps insmod
PID PPID CPU TASK ST %MEM VSZ RSS COMM
1684 1683 0 ffff06738f1cdd00 ZO 0.0 0 0 insmod
crash> bt 1684
PID: 1684 TASK: ffff06738f1cdd00 CPU: 0 COMMAND: "insmod"
(no stack)
crash>
Thanks.
Lianbo
The root cause is that while the commit added a sanity check into
STACK_OFFSET_TYPE() to validate if a given candidate address of any
interrupt or exception frame is contained within the range of the
corresponding kernel stack, the premise that the STACK_OFFSET_TYPE()
should not return out-of-the-buffer address, is wrong.
Reexamining the relevant surrounding part of the backtracing code, it
looks to me now that the STACK_OFFSET_TYPE() is originally expected to
return an out-of-the-buffer address, like the address of the top of
the corresponding kernel stack, e.g. at here:
static int
arm64_in_kdump_text(struct bt_info *bt, struct arm64_stackframe *frame)
{
...
if (bt->flags & BT_USER_SPACE)
start = (ulong *)&bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(bt->stacktop))];
else {
Note that the above bt 1734 aborts here.
Hence, the current implementation policy around STACK_OFFSET_TYPE()
looks that the caller side is responsible for understanding the fact
in advance and for avoiding making buffer overrun carefully.
To fix this issue, revert the commit.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@fujitsu.com>
---
defs.h | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/defs.h b/defs.h
index bfda0c4..3e7d6cf 100644
--- a/defs.h
+++ b/defs.h
@@ -976,10 +976,7 @@ struct bt_info {
#define STACK_OFFSET_TYPE(OFF) \
(((ulong)(OFF) > STACKSIZE()) ? \
- (((ulong)(OFF) < (ulong)(bt->stackbase) || (ulong)(OFF) >= (ulong)(bt->stackbase) + STACKSIZE()) ? \
- error(FATAL, "invalid stack pointer is given\n") : \
- (ulong)((ulong)(OFF) - (ulong)(bt->stackbase))) : \
- (ulong)(OFF))
+ (ulong)((ulong)(OFF) - (ulong)(bt->stackbase)) : (ulong)(OFF))
#define GET_STACK_ULONG(OFF) \
*((ulong *)((char *)(&bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(OFF))])))
--
2.25.1