On Mon, Jun 12, 2023 at 7:44 PM Daisuke Hatayama (Fujitsu) <d.hatayama@fujitsu.com> wrote:
> I still have one question: Why does this one only need to be fixed,
> but the others are not needed(it won't be out of range)? The
> STACK_OFFSET_TYPE() is invoked multiple times in arm64.c, and
> similar calls can be seen on other arches(grep -nr "GET_STACK_ULONG"
> *.c or grep -nr "GET_STACK_DATA" *.c).
>
> # grep -nr "STACK_OFFSET_TYPE" *.c
> arm64.c:2384:        regs = (struct arm64_pt_regs *)&bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(stkptr))];
> arm64.c:2821: ptregs = (struct arm64_pt_regs *)&bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(orig_sp))];
> arm64.c:3476: base = (ulong *)&bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(bt->stackbase))];
> arm64.c:3478: start = (ulong *)&bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(bt->stacktop))];
> arm64.c:3481: start = (ulong *)&bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(frame->fp))];
> arm64.c:3483: start = (ulong *)&bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(bt->stacktop))];
> arm64.c:3801: &bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(sp))];
> arm64.c:3822:       &bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(pt_regs))];
> x86.c:1075: if (STACK_OFFSET_TYPE(ep->eframe_addr) > STACKSIZE())
> [root@hpe-apollo-cn99xx-13-vm-01 crash]# grep -nr "STACK_OFFSET_TYPE" *.h
> defs.h:977:#define STACK_OFFSET_TYPE(OFF) \
> defs.h:985: *((ulong *)((char *)(&bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(OFF))])))
> defs.h:988:    (void *)(&bt->stackbuf[(ulong)STACK_OFFSET_TYPE(OFF)]), (size_t)(SZ))

As explained in the patch descriptions, STACK_OFFSET_TYPE() is used in
each occurrences in the different context. To check if each is
implemented correctly needs to understand what each does. I don't know
whether there is another place where returned value of
STACK_OFFSET_TYPE() is handled incorrectly. My quick look didn't find

Got it. Thanks.
 
the part that might touch invalid range of memory. The reason why I'm
trying to fix the arm64_is_kernel_exception_frame() is I found the
issue there.


So far I haven't observed this issue on my side. As you mentioned, the corrupt stack pointer address may be related to any kernel bugs or hardware issues. At least the real reason for the corrupt stack pointer address is not very clear, so how about adding some debugging information? Just an idea. HATAYAMA and Kazu.

+       if (stkptr > STACKSIZE() && !INSTACK(stkptr, bt)) {
+               if (CRASHDEBUG(1))
+                        error(WARNING, "The stkptr(0x%lx) is an address outside the range of kernel stack.\n", stkptr);
+               return FALSE;
+       }
+


Thanks.
Lianbo

Thanks.
HATAYAMA, Daisuke