Re: [Crash-utility] [PATCH] arm64: fix backtraces of KASAN kernel dumpfile truncated

Wednesday, 7 December 2022

Hi Kazu,

On 2022/12/5 9:05, HAGIO KAZUHITO(萩尾 一仁) wrote:
...
 On 2022/12/02 17:31, dinghui wrote:
> On 2022/12/2 15:44, HAGIO KAZUHITO(萩尾 一仁) wrote:
>> On 2022/12/01 16:01, Ding Hui wrote:
>>> We met "bt" cmd on KASAN kernel vmcore display truncated
backtraces
>>> like this:
>>>
>>> crash> bt
>>> PID: 4131   TASK: ffff8001521df000  CPU: 3   COMMAND: "bash"
>>>     #0 [ffff2000224b0cb0] machine_kexec_prepare at ffff2000200bff4c
>>>
>>> After digging the root cause, it turns out that arm64_in_kdump_text()
>>> found wrong bt->bptr at "machine_kexec" branch.
>>>
>>> Disassemble machine_kexec() of KASAN vmlinux (gcc 7.3.0):
>>>
>>> crash> dis -x machine_kexec
>>> 0xffff2000200bff50 <machine_kexec>:     stp     x29, x30, [sp,#-208]!
>>> 0xffff2000200bff54 <machine_kexec+0x4>: mov     x29, sp
>>> 0xffff2000200bff58 <machine_kexec+0x8>: stp     x19, x20, [sp,#16]
>>> 0xffff2000200bff5c <machine_kexec+0xc>: str     x24, [sp,#56]
>>> 0xffff2000200bff60 <machine_kexec+0x10>:        str     x26, [sp,#72]
>>> 0xffff2000200bff64 <machine_kexec+0x14>:        mov     x2, #0x8ab3
>>> 0xffff2000200bff68 <machine_kexec+0x18>:        add     x1, x29, #0x70
>>> 0xffff2000200bff6c <machine_kexec+0x1c>:        lsr     x1, x1, #3
>>> 0xffff2000200bff70 <machine_kexec+0x20>:        movk    x2, #0x41b5,
lsl #16
>>> 0xffff2000200bff74 <machine_kexec+0x24>:        mov     x19,
#0x200000000000
>>> 0xffff2000200bff78 <machine_kexec+0x28>:        adrp    x3,
0xffff2000224b0000
>>> 0xffff2000200bff7c <machine_kexec+0x2c>:        movk    x19, #0xdfff,
lsl #48
>>> 0xffff2000200bff80 <machine_kexec+0x30>:        add     x3, x3, #0xcb0
>>> 0xffff2000200bff84 <machine_kexec+0x34>:        add     x4, x1, x19
>>> 0xffff2000200bff88 <machine_kexec+0x38>:        stp     x2, x3,
[x29,#112]
>>> 0xffff2000200bff8c <machine_kexec+0x3c>:        adrp    x2,
0xffff2000200bf000 <swsusp_arch_resume+0x1e8>
>>> 0xffff2000200bff90 <machine_kexec+0x40>:        add     x2, x2, #0xf50
>>> 0xffff2000200bff94 <machine_kexec+0x44>:        str     x2, [x29,#128]
>>> 0xffff2000200bff98 <machine_kexec+0x48>:        mov     w2,
#0xf1f1f1f1
>>> 0xffff2000200bff9c <machine_kexec+0x4c>:        str     w2, [x1,x19]
>>> 0xffff2000200bffa0 <machine_kexec+0x50>:        mov     w2, #0xf200
>>> 0xffff2000200bffa4 <machine_kexec+0x54>:        mov     w1,
#0xf3f3f3f3
>>> 0xffff2000200bffa8 <machine_kexec+0x58>:        movk    w2, #0xf2f2,
lsl #16
>>> 0xffff2000200bffac <machine_kexec+0x5c>:        stp     w2, w1,
[x4,#4]
>>>
>>> We notice that:
>>> 1. machine_kexec() start address is 0xffff2000200bff50
>>> 2. the instruction at machine_kexec+0x44 store the same value
>>>       0xffff2000200bff50 (comes from 0xffff2000200bf000 + 0xf50)
>>>       into stack postion [x29,#128].
>>>
>>> When arm64_in_kdump_text() search LR from stack, it met
>>> 0xffff2000200bff50 firstly, so got wrong bt->bptr.
>>>
>>> We know that the real LR is always great than the start address
>>
>> Seems true.
>>
>> One question, do you see which kernel code stores that value?
>>
>
> Actually, there is no C code stores that value. The source code like this:
>
> void machine_kexec(struct kimage *kimage)
> {
>       phys_addr_t reboot_code_buffer_phys;
>       void *reboot_code_buffer;
>       bool in_kexec_crash = (kimage == kexec_crash_image);
>       bool stuck_cpus = cpus_are_stuck_in_kernel();
>
>       BUG_ON(!in_kexec_crash && (stuck_cpus || (num_online_cpus() > 1)));
>       WARN(in_kexec_crash && (stuck_cpus || smp_crash_stop_failed()),
>           "Some CPUs may be stale, kdump will be unreliable.\n");
> ...
>
> The point is CONFIG_KASAN=y
>
> I compared the gcc args when compiling machine_kexec.o between kasan eanble [1] and
kasan enable but set KASAN_SANITIZE_machine_kexec.o := n [2], the difference is:
>
> [1]: -fsanitize=kernel-address -fasan-shadow-offset=0xdfff200000000000 --param
asan-globals=1   --param asan-instrumentation-with-call-threshold=10000   --param
asan-stack=1
>
> [2]: -fno-builtin
>
> If I remove `--param asan-stack=1` but keep other asan args to compile
machine_kexec.o, those assembly statement disappear.
>

 I see, thanks.

 I can see the similar pattern with CONFIG_KASAN=y also on x86_64, which
 stores the function start address and uses 0xf1f1f1f1 (ASAN_STACK_MAGIC_LEFT
 in gcc) and etc.

 (gdb) disas machine_kexec
 Dump of assembler code for function machine_kexec:
      0xffffffff8109b1c0 <+0>:     callq  0xffffffff81099e60 <__fentry__
 ...
      0xffffffff8109b208 <+72>:    movq   $0xffffffff8109b1c0,0x20(%rsp)
      0xffffffff8109b211 <+81>:    add    %r12,%rax
      0xffffffff8109b214 <+84>:    movl   $0xf1f1f1f1,(%rax)

 (gdb) disas crash_save_cpu
 Dump of assembler code for function crash_save_cpu:
      0xffffffff8126e7e0 <+0>:     callq  0xffffffff81099e60 <__fentry__>
 ...
      0xffffffff8126e817 <+55>:    movq   $0xffffffff8126e7e0,0x10(%rsp)
      0xffffffff8126e820 <+64>:    add    %rbp,%rax
      0xffffffff8126e823 <+67>:    movl   $0xf1f1f1f1,(%rax)

 I wondered whether excluding only their start address was enough to fix
 the issue, but now it seems ok to me.

I found some description about asan-stack at here:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/asan.cc;h=dc7b7f4bcf1803d...

  139    The 32 bytes of LEFT red zone at the bottom of the stack can be
  140    decomposed as such:
...
  156      3/ The following 8 bytes contain the PC of the current 
function which
  157      will be used by the run-time library to print an error message.

...
 Acked-by: Kazuhito Hagio <k-hagio-ab(a)nec.com&gt;

 Let's wait for Lianbo's test and review.

 Thanks,
 Kazu 
-- 
Thanks,
- Ding Hui

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] [PATCH] arm64: fix backtraces of KASAN kernel dumpfile truncated