Hi Lianbo, Tao,
On 2024/06/11 18:05, lijiang wrote:
> > With kernel patch [1], x86_64 will add extra padding
for kernel stack,
> > as a result, the pt_regs will be shift down by the offset of padding.
> > Without the patch, the values of registers read from pt_regs will be
> > incorrect.
> >
> > Though currently the TOP_OF_KERNEL_STACK_PADDING is configured by
> > Kconfig, according to kernel code comment [2], the value may be made
> > dynamicly later. In addition there might be systems compiled without
> > Kconfig avaliable. So in this patch, we will calculate the value of
> > TOP_OF_KERNEL_STACK_PADDING.
> >
> > The calculation is as follows:
> >
> > 1) in startup_64(), there is a lea instruction as:
> > leaq (__end_init_task - TOP_OF_KERNEL_STACK_PADDING -
PTREGS_SIZE)(%rip), %rsp
> >
> > 2) in rewind_stack_and_make_dead(), there is a lea instruction as:
> > leaq -PTREGS_SIZE(%rax), %rsp
> >
> > The disassembled 2 instructions will be like:
> >
> > 1) 0xffffffff93a0007d <startup_64+3>: lea
0x1e03ec4(%rip),%rsp # 0xffffffff95803f48
> >
^^^^^^^^^^^^^^^^^^^^
> > 2) 0xffffffff93a0465a <rewind_stack_and_make_dead+10>: lea
-0xa8(%rax),%rsp
> > ^^^^
> > 0xffffffff95803f48 is the value of (__end_init_task -
> > TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE), and 0xa8 is the value of
> > PTREGS_SIZE, __end_init_task can be get by symbol reading.
>
> Calculating the value of TOP_OF_KERNEL_STACK_PADDING, which looks good, but it
heavily relies on compiler.
> Normally we would use this way unless there is no other choice.
>
> How about the following changes? Although it doesn't handle the case that
the value is dynamic, let's see
> how to change in the kernel in future, and then consider how to reflect it in
crash-utility.
>
Sure, looks good to me, so let's go with this, and update it later
when kernel changes.
Ok. Thanks, Tao.
Applied with minor changes:
https://github.com/crash-utility/crash/commit/48764a14bc5856f0b0bb3068533...
<
https://github.com/crash-utility/crash/commit/48764a14bc5856f0b0bb3068533...
It looks like there is a regression with kernels without CONFIG_X86_FRED.
Could you check?
crash> bt 1
bt: invalid structure size: fred_frame
FILE: x86_64.c LINE: 4089 FUNCTION: x86_64_low_budget_back_trace_cmd()
[/home/k-hagio/bin/crash] error trace: 588df3 => 5cbc72 => 5eb3e1 => 5eb366
PID: 1 TASK: ffff9f94c024b980 CPU: 2 COMMAND: "systemd"
#0 [ffffade44001bca8] __schedule at ffffffffb948ebbb
#1 [ffffade44001bd10] schedule at ffffffffb948f04d
#2 [ffffade44001bd20] schedule_hrtimeout_range_clock at ffffffffb9494fef
#3 [ffffade44001bda8] ep_poll at ffffffffb8c91be8
#4 [ffffade44001be48] do_epoll_wait at ffffffffb8c91d11
#5 [ffffade44001be80] __x64_sys_epoll_wait at ffffffffb8c92590
#6 [ffffade44001bed0] do_syscall_64 at ffffffffb947f459
#7 [ffffade44001bf50] entry_SYSCALL_64_after_hwframe at ffffffffb96000ea
5eb366: SIZE_verify.part.42+70
5eb3e1: SIZE_verify+49
5cbc72: x86_64_low_budget_back_trace_cmd+3010
588df3: back_trace+1523
bt: invalid structure size: fred_frame
FILE: x86_64.c LINE: 4089 FUNCTION: x86_64_low_budget_back_trace_cmd()
crash> sys
KERNEL: vmlinux
DUMPFILE: vmcore
CPUS: 4
DATE: Wed Jun 12 02:28:24 JST 2024
UPTIME: 8 days, 22:17:18
LOAD AVERAGE: 0.03, 0.01, 0.00
TASKS: 182
NODENAME: rhel94u
RELEASE: 5.14.0-427.13.1.el9_4.x86_64
VERSION: #1 SMP PREEMPT_DYNAMIC Wed Apr 10 10:29:16 EDT 2024
MACHINE: x86_64 (3408 Mhz)
MEMORY: 4 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
Thanks,
Kazu