Hi Aditya,
On Tue, Oct 8, 2024 at 10:18 PM Aditya Gupta <adityag(a)linux.ibm.com> wrote:
Hi Tao,
On 03/10/24 16:04, Tao Liu wrote:
> Hi Aditya & lianbo,
>
> On Mon, Sep 23, 2024 at 5:00 AM Aditya Gupta <adityag(a)linux.ibm.com> wrote:
>> Hi Tao,
>>
>> On 18/09/24 05:12, Tao Liu wrote:
>>> Previously the newsp read from stack is unchecked, it would expect
>>> the newsp hold a valid sp pointer of the next stack frame by default,
>>> which is not always true, see the following stack:
>>>
>>> R1: c00000000a327660 NIP: c0000000002b9984
>>> #0 [c00000000a327660] crash_setup_regs at c0000000002b9984
>>> c00000000a327660: c00000000015a7b0 c000000001bc8600
>>> c00000000a327670: 0000000000000000 c000000002e13b54
>>> c00000000a327680: c00000000a327850 0000000000004000
>>> c00000000a327690: c0000000002ba078 c0000000026f2ca8
>>>
>>> The newsp will be c00000000015a7b0, which is not a valid stack pointer
>>> and as a result it will fail the stack back trace afterwards. This usually
>>> happens at the beginning of back trace, once we get the correct sp value,
>>> the stack can be back traced successfully.
>> Seems weird that this issue occurred in the first place. The stack frame
>> should have a valid previous sp at 0th offset to sp, according to the
>> PowerISA document.
> I finally figured out the reason for the faulty stack. The reason is
> due to gcc, the previous inlined function crash_setup_regs is no
> longer inlined, which led to a malformed stack, see the following:
>
> crash> bt -f
> R1: c00000000a327660 NIP: c0000000002b9984
> [NIP : crash_setup_regs+68]
> [LR : __crash_kexec+156]
> #0 [c00000000a327660] crash_setup_regs at c0000000002b9984
> c00000000a327660: c00000000015a7b0 c000000001bc8600
> c00000000a327670: 0000000000000000 c000000002e13b54
>
> crash> dis __crash_kexec
> ...
> 0xc0000000002ba054 <__crash_kexec+148>: addi r3,r1,40
> 0xc0000000002ba058 <__crash_kexec+152>: bl 0xc0000000002b9948
> <crash_setup_regs+8>
> 0xc0000000002ba05c <__crash_kexec+156>: bl 0xc0000000002b7c58
> <crash_save_vmcoreinfo+8>
> 0xc0000000002ba060 <__crash_kexec+160>: nop
> 0xc0000000002ba064 <__crash_kexec+164>: addi r3,r1,40
> 0xc0000000002ba068 <__crash_kexec+168>: bl 0xc00000000015a6e8
> <machine_crash_shutdown+8>
> 0xc0000000002ba06c <__crash_kexec+172>: nop
> 0xc0000000002ba070 <__crash_kexec+176>: ld r3,0(r29)
> 0xc0000000002ba074 <__crash_kexec+180>: bl 0xc00000000015a738
> <machine_kexec+8>
> 0xc0000000002ba078 <__crash_kexec+184>: nop
> ...
>
> crash> dis crash_setup_regs
> ...
> 0xc0000000002b9948 <crash_setup_regs+8>: mflr r0
> 0xc0000000002b994c <crash_setup_regs+12>: cmpdi r4,0
> 0xc0000000002b9950 <crash_setup_regs+16>: std r0,16(r1)
> 0xc0000000002b9954 <crash_setup_regs+20>: stdu r1,-32(r1)
> 0xc0000000002b9958 <crash_setup_regs+24>: beq
> 0xc0000000002b9980 <crash_setup_regs+64>
> ...
> 0xc0000000002b9980 <crash_setup_regs+64>: bl
> 0xc00000000007a964 <ppc_save_regs>
> 0xc0000000002b9984 <crash_setup_regs+68>: nop
> ...
>
> crash> dis ppc_save_regs
> 0xc00000000007a964 <ppc_save_regs>: addi r3,r3,-48
> 0xc00000000007a968 <ppc_save_regs+4>: std r0,48(r3)
> 0xc00000000007a96c <ppc_save_regs+8>: std r2,64(r3)
> ...
> 0xc00000000007a9fc <ppc_save_regs+152>: mflr r0
> 0xc00000000007aa00 <ppc_save_regs+156>: std r0,304(r3)
> ...
>
> As we can see, NIP is c0000000002b9984, which is right after "bl
> ppc_save_regs", which is expected because the NIP is assigned as the
> value of LR in instruction 0xc00000000007a9fc and 0xc00000000007aa00,
> and R1(sp) is also stored in ppc_save_regs. Since there is no stack
> push in ppc_save_regs, the stack pointer remains to be the same as
> crash_setup_regs()'s. crash_setup_regs() have stack push in
> 0xc0000000002b9954, so it has a lower stack frame than its parent
> function __crash_kexec.
>
> Then the execution will finally enter into machine_kexec():
>
> crash_setup_regs(&fixed_regs, regs);
> crash_save_vmcoreinfo();
> machine_crash_shutdown(&fixed_regs);
> machine_kexec(kexec_crash_image);
>
> So the stack frame will be modified, that's why the c00000000a327660
> address doesn't hold a valid stack address. However if function
> crash_setup_regs() is inlined, it will have the same stack as its
> parent function __crash_kexec, and later function calling will grow
> down the stack thus will not modify this stack frame.
>
> In fact, the crash_setup_regs() should always be inlined, in order to
> have the same stack frame as its parent function __crash_kexec()'s. We
> can see the function should be inlined in different archs:
>
> static inline void crash_setup_regs(...)
>
> But according to the disassembly above, crash_setup_regs() is not
> inlined, which leads to this bug. For the above kernel, it's
> 6.10.0-15.el10.ppc64le compiled by gcc (GCC) 14.1.1. If we compile the
> same kernel code with rhel9's gcc, aka gcc (GCC) 11.5.0, then the
> function can be inlined as expected.
>
> The same issue also exists in x86_64 arch. However I cannot dive into
> gcc code why newer versions will make the inline function like this.
> Or we can modify crash_setup_regs() to be the following:
>
> static inline void __attribute__((always_inline)) crash_setup_regs(...)
>
> Any thoughts?
Interesting. I had fixed a similar issue in the kernel earlier, with the
ppc64 register saving part.
Let's see, GCC apart, crash & gdb should have been able to handle this
inlining/not inlined right ?
Yes, inline or not inline is not a problem for crash/gdb. But for the
not inlined case, the data on the stack is malformed, aka the CPU
register state no longer matches with the state of stack. So when we
analyse the vmcore based on the register state and stack, there are
problems.
To make the story short, assume the following are assembly instructions:
__crash_kexec:
...
call crash_setup_regs <<-- save all registers status
call xxx
call yyy
call machine_kexec <<--- jump into the 2nd kernel
crash_setup_regs:
push sp <<----- prologue of function
...
sub sp, <stack_size> <<---- create a stack for this function
...
mov r0, <some_where> <<-- saving r0
mov r1, <some_where>
mov sp, <some_where> <<--- saving sp
...
if crash_setup_regs is not inlined, there will be a prologue of the
function, so sp will be decreased and saved. So the sp will represent
the stack frame of crash_setup_regs(). Since there are other functions
called after crash_setup_regs(), the same stack frame area will be
reused by them, and the data within the area will be modified. So the
stack status no longer matches with the saved cpu register status.
But if crash_setup_regs is inlined, there will be no prologue if
crash_setup_regs(), when sp is saved, it represents __crash_kexec()'s
stack, which will not be modified by calling sub functions xxx/yyy, so
the cpu registers matches with stack, thus no problem for later vmcore
analyze.
Thanks,
Tao Liu
Will look into it. Currently on leave till next 2 days. Will look after
that.
Sure, thanks for your help!
Thanks,
Tao Liu
Thanks,
Aditya Gupta
>
> Thanks,
> Tao Liu
>
>
>
>
>
>
>
>> Anyways this patch is also good, doesn't seem problematic for the
>> previous working cases.
>>
>>
>> Thanks,
>>
>> Aditya Gupta
>>
>>> This patch will search and check for the valid newsp at the beginning of
>>> stack back trace. For the above example, a valid newsp would be assigned
>>> as c00000000a327850.
>>>
>>> Before:
>>> crash> bt
>>> ...
>>> #0 [c00000000a327660] crash_setup_regs at c0000000002b9984
>>> After:
>>> crash> bt
>>> ...
>>> #0 [c00000000a327660] crash_setup_regs at c0000000002b9984
>>> #1 [c00000000a327850] panic at c000000000167de4
>>> #2 [c00000000a3278f0] sysrq_handle_crash at c000000000a4d3a8
>>> #3 [c00000000a327950] __handle_sysrq at c000000000a4dec4
>>> #4 [c00000000a3279f0] write_sysrq_trigger at c000000000a4e984
>>> ...
>>> Signed-off-by: Tao Liu <ltao(a)redhat.com>
>>> ---
>>> ppc64.c | 13 +++++++++++++
>>> 1 file changed, 13 insertions(+)
>>>
>>> diff --git a/ppc64.c b/ppc64.c
>>> index 6e5f155..8af7b8a 100644
>>> --- a/ppc64.c
>>> +++ b/ppc64.c
>>> @@ -2148,6 +2148,19 @@ ppc64_back_trace(struct gnu_request *req, struct
bt_info *bt)
>>>
>>> while (INSTACK(req->sp, bt)) {
>>> newsp = *(ulong *)&bt->stackbuf[req->sp -
bt->stackbase];
>>> +
>>> + if (!newpc) {
>>> + ulong *up;
>>> + int i;
>>> + for (i = 0, up = (ulong
*)&bt->stackbuf[req->sp - bt->stackbase];
>>> + i < (req->sp - bt->stackbase) /
sizeof(ulong); i++, up++) {
>>> + if (INSTACK(*up, bt) &&
is_kernel_text(*(up + 2))) {
>>> + newsp = *up;
>>> + break;
>>> + }
>>> + }
>>> + }
>>> +
>>> if ((req->name = closest_symbol(req->pc)) == NULL) {
>>> if (CRASHDEBUG(1)) {
>>> error(FATAL,