Hi Aditya & lianbo,
On Mon, Sep 23, 2024 at 5:00 AM Aditya Gupta <adityag(a)linux.ibm.com> wrote:
Hi Tao,
On 18/09/24 05:12, Tao Liu wrote:
> Previously the newsp read from stack is unchecked, it would expect
> the newsp hold a valid sp pointer of the next stack frame by default,
> which is not always true, see the following stack:
>
> R1: c00000000a327660 NIP: c0000000002b9984
> #0 [c00000000a327660] crash_setup_regs at c0000000002b9984
> c00000000a327660: c00000000015a7b0 c000000001bc8600
> c00000000a327670: 0000000000000000 c000000002e13b54
> c00000000a327680: c00000000a327850 0000000000004000
> c00000000a327690: c0000000002ba078 c0000000026f2ca8
>
> The newsp will be c00000000015a7b0, which is not a valid stack pointer
> and as a result it will fail the stack back trace afterwards. This usually
> happens at the beginning of back trace, once we get the correct sp value,
> the stack can be back traced successfully.
Seems weird that this issue occurred in the first place. The stack frame
should have a valid previous sp at 0th offset to sp, according to the
PowerISA document.
I finally figured out the reason for the faulty stack. The reason is
due to gcc, the previous inlined function crash_setup_regs is no
longer inlined, which led to a malformed stack, see the following:
crash> bt -f
R1: c00000000a327660 NIP: c0000000002b9984
[NIP : crash_setup_regs+68]
[LR : __crash_kexec+156]
#0 [c00000000a327660] crash_setup_regs at c0000000002b9984
c00000000a327660: c00000000015a7b0 c000000001bc8600
c00000000a327670: 0000000000000000 c000000002e13b54
crash> dis __crash_kexec
...
0xc0000000002ba054 <__crash_kexec+148>: addi r3,r1,40
0xc0000000002ba058 <__crash_kexec+152>: bl 0xc0000000002b9948
<crash_setup_regs+8>
0xc0000000002ba05c <__crash_kexec+156>: bl 0xc0000000002b7c58
<crash_save_vmcoreinfo+8>
0xc0000000002ba060 <__crash_kexec+160>: nop
0xc0000000002ba064 <__crash_kexec+164>: addi r3,r1,40
0xc0000000002ba068 <__crash_kexec+168>: bl 0xc00000000015a6e8
<machine_crash_shutdown+8>
0xc0000000002ba06c <__crash_kexec+172>: nop
0xc0000000002ba070 <__crash_kexec+176>: ld r3,0(r29)
0xc0000000002ba074 <__crash_kexec+180>: bl 0xc00000000015a738
<machine_kexec+8>
0xc0000000002ba078 <__crash_kexec+184>: nop
...
crash> dis crash_setup_regs
...
0xc0000000002b9948 <crash_setup_regs+8>: mflr r0
0xc0000000002b994c <crash_setup_regs+12>: cmpdi r4,0
0xc0000000002b9950 <crash_setup_regs+16>: std r0,16(r1)
0xc0000000002b9954 <crash_setup_regs+20>: stdu r1,-32(r1)
0xc0000000002b9958 <crash_setup_regs+24>: beq
0xc0000000002b9980 <crash_setup_regs+64>
...
0xc0000000002b9980 <crash_setup_regs+64>: bl
0xc00000000007a964 <ppc_save_regs>
0xc0000000002b9984 <crash_setup_regs+68>: nop
...
crash> dis ppc_save_regs
0xc00000000007a964 <ppc_save_regs>: addi r3,r3,-48
0xc00000000007a968 <ppc_save_regs+4>: std r0,48(r3)
0xc00000000007a96c <ppc_save_regs+8>: std r2,64(r3)
...
0xc00000000007a9fc <ppc_save_regs+152>: mflr r0
0xc00000000007aa00 <ppc_save_regs+156>: std r0,304(r3)
...
As we can see, NIP is c0000000002b9984, which is right after "bl
ppc_save_regs", which is expected because the NIP is assigned as the
value of LR in instruction 0xc00000000007a9fc and 0xc00000000007aa00,
and R1(sp) is also stored in ppc_save_regs. Since there is no stack
push in ppc_save_regs, the stack pointer remains to be the same as
crash_setup_regs()'s. crash_setup_regs() have stack push in
0xc0000000002b9954, so it has a lower stack frame than its parent
function __crash_kexec.
Then the execution will finally enter into machine_kexec():
crash_setup_regs(&fixed_regs, regs);
crash_save_vmcoreinfo();
machine_crash_shutdown(&fixed_regs);
machine_kexec(kexec_crash_image);
So the stack frame will be modified, that's why the c00000000a327660
address doesn't hold a valid stack address. However if function
crash_setup_regs() is inlined, it will have the same stack as its
parent function __crash_kexec, and later function calling will grow
down the stack thus will not modify this stack frame.
In fact, the crash_setup_regs() should always be inlined, in order to
have the same stack frame as its parent function __crash_kexec()'s. We
can see the function should be inlined in different archs:
static inline void crash_setup_regs(...)
But according to the disassembly above, crash_setup_regs() is not
inlined, which leads to this bug. For the above kernel, it's
6.10.0-15.el10.ppc64le compiled by gcc (GCC) 14.1.1. If we compile the
same kernel code with rhel9's gcc, aka gcc (GCC) 11.5.0, then the
function can be inlined as expected.
The same issue also exists in x86_64 arch. However I cannot dive into
gcc code why newer versions will make the inline function like this.
Or we can modify crash_setup_regs() to be the following:
static inline void __attribute__((always_inline)) crash_setup_regs(...)
Any thoughts?
Thanks,
Tao Liu
Anyways this patch is also good, doesn't seem problematic for the
previous working cases.
Thanks,
Aditya Gupta
> This patch will search and check for the valid newsp at the beginning of
> stack back trace. For the above example, a valid newsp would be assigned
> as c00000000a327850.
>
> Before:
> crash> bt
> ...
> #0 [c00000000a327660] crash_setup_regs at c0000000002b9984
> After:
> crash> bt
> ...
> #0 [c00000000a327660] crash_setup_regs at c0000000002b9984
> #1 [c00000000a327850] panic at c000000000167de4
> #2 [c00000000a3278f0] sysrq_handle_crash at c000000000a4d3a8
> #3 [c00000000a327950] __handle_sysrq at c000000000a4dec4
> #4 [c00000000a3279f0] write_sysrq_trigger at c000000000a4e984
> ...
> Signed-off-by: Tao Liu <ltao(a)redhat.com>
> ---
> ppc64.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/ppc64.c b/ppc64.c
> index 6e5f155..8af7b8a 100644
> --- a/ppc64.c
> +++ b/ppc64.c
> @@ -2148,6 +2148,19 @@ ppc64_back_trace(struct gnu_request *req, struct bt_info
*bt)
>
> while (INSTACK(req->sp, bt)) {
> newsp = *(ulong *)&bt->stackbuf[req->sp -
bt->stackbase];
> +
> + if (!newpc) {
> + ulong *up;
> + int i;
> + for (i = 0, up = (ulong *)&bt->stackbuf[req->sp -
bt->stackbase];
> + i < (req->sp - bt->stackbase) / sizeof(ulong);
i++, up++) {
> + if (INSTACK(*up, bt) && is_kernel_text(*(up +
2))) {
> + newsp = *up;
> + break;
> + }
> + }
> + }
> +
> if ((req->name = closest_symbol(req->pc)) == NULL) {
> if (CRASHDEBUG(1)) {
> error(FATAL,