Thank you for pointing out this issue, Kazu.
On 6/12/24 10:42 AM, HAGIO KAZUHITO(萩尾 一仁) wrote:
Hi Lianbo, Tao,
On 2024/06/11 18:05, lijiang wrote:
> > > With kernel patch [1], x86_64 will add extra padding for kernel
stack,
> > > as a result, the pt_regs will be shift down by the offset of
padding.
> > > Without the patch, the values of registers read from pt_regs will be
> > > incorrect.
> > >
> > > Though currently the TOP_OF_KERNEL_STACK_PADDING is configured by
> > > Kconfig, according to kernel code comment [2], the value may be made
> > > dynamicly later. In addition there might be systems compiled without
> > > Kconfig avaliable. So in this patch, we will calculate the value of
> > > TOP_OF_KERNEL_STACK_PADDING.
> > >
> > > The calculation is as follows:
> > >
> > > 1) in startup_64(), there is a lea instruction as:
> > > leaq (__end_init_task - TOP_OF_KERNEL_STACK_PADDING -
PTREGS_SIZE)(%rip), %rsp
> > >
> > > 2) in rewind_stack_and_make_dead(), there is a lea instruction as:
> > > leaq -PTREGS_SIZE(%rax), %rsp
> > >
> > > The disassembled 2 instructions will be like:
> > >
> > > 1) 0xffffffff93a0007d <startup_64+3>: lea
0x1e03ec4(%rip),%rsp # 0xffffffff95803f48
> > >
^^^^^^^^^^^^^^^^^^^^
> > > 2) 0xffffffff93a0465a <rewind_stack_and_make_dead+10>: lea
-0xa8(%rax),%rsp
> > >
^^^^
> > > 0xffffffff95803f48 is the value of (__end_init_task -
> > > TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE), and 0xa8 is the value of
> > > PTREGS_SIZE, __end_init_task can be get by symbol reading.
> >
> > Calculating the value of TOP_OF_KERNEL_STACK_PADDING, which looks good,
but it heavily relies on compiler.
> > Normally we would use this way unless there is no other choice.
> >
> > How about the following changes? Although it doesn't handle the case
that the value is dynamic, let's see
> > how to change in the kernel in future, and then consider how to reflect it
in crash-utility.
> >
> Sure, looks good to me, so let's go with this, and update it later
> when kernel changes.
>
>
> Ok. Thanks, Tao.
>
> Applied with minor changes:
>
>
https://github.com/crash-utility/crash/commit/48764a14bc5856f0b0bb3068533...
<
https://github.com/crash-utility/crash/commit/48764a14bc5856f0b0bb3068533...
It looks like there is a regression with kernels without CONFIG_X86_FRED.
Could you check?
Can you help to check if this can work for you?
diff --git a/x86_64.c b/x86_64.c
index 6777c93e6b47..469d26b05e24 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -4086,7 +4086,7 @@ in_exception_stack:
if (!irq_eframe && !is_kernel_thread(bt->tc->task) &&
(GET_STACKBASE(bt->tc->task) == bt->stackbase)) {
- long stack_padding_size = SIZE(fred_frame) > 0 ? (2*8) : 0;
+ long stack_padding_size = VALID_SIZE(fred_frame) ? (2*8)
: 0;
user_mode_eframe = bt->stacktop - SIZE(pt_regs);
if (last_process_stack_eframe < user_mode_eframe)
x86_64_exception_frame(EFRAME_PRINT, 0,
bt->stackbuf +
@@ -4408,7 +4408,7 @@ in_exception_stack:
if (!irq_eframe && !is_kernel_thread(bt->tc->task) &&
(GET_STACKBASE(bt->tc->task) == bt->stackbase)) {
- long stack_padding_size = SIZE(fred_frame) > 0 ? (2*8) :
0;diff --git a/x86_64.c b/x86_64.c
index 6777c93e6b47..469d26b05e24 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -4086,7 +4086,7 @@ in_exception_stack:
if (!irq_eframe && !is_kernel_thread(bt->tc->task) &&
(GET_STACKBASE(bt->tc->task) == bt->stackbase)) {
- long stack_padding_size = SIZE(fred_frame) > 0 ? (2*8) : 0;
+ long stack_padding_size = VALID_SIZE(fred_frame) ? (2*8)
: 0;
user_mode_eframe = bt->stacktop - SIZE(pt_regs);
if (last_process_stack_eframe < user_mode_eframe)
x86_64_exception_frame(EFRAME_PRINT, 0,
bt->stackbuf +
@@ -4408,7 +4408,7 @@ in_exception_stack:
if (!irq_eframe && !is_kernel_thread(bt->tc->task) &&
(GET_STACKBASE(bt->tc->task) == bt->stackbase)) {
- long stack_padding_size = SIZE(fred_frame) > 0 ? (2*8) : 0;
+ long stack_padding_size = VALID_SIZE(fred_frame) ? (2*8)
: 0;
user_mode_eframe = bt->stacktop - SIZE(pt_regs);
if (last_process_stack_eframe < user_mode_eframe)
x86_64_exception_frame(EFRAME_PRINT, 0,
bt->stackbuf +
+ long stack_padding_size = VALID_SIZE(fred_frame) ? (2*8)
: 0;
user_mode_eframe = bt->stacktop - SIZE(pt_regs);
if (last_process_stack_eframe < user_mode_eframe)
x86_64_exception_frame(EFRAME_PRINT, 0,
bt->stackbuf +
I will fix it later with the patch if it works fine from your side.
Thanks
Lianbo
crash> bt 1
bt: invalid structure size: fred_frame
FILE: x86_64.c LINE: 4089 FUNCTION: x86_64_low_budget_back_trace_cmd()
[/home/k-hagio/bin/crash] error trace: 588df3 => 5cbc72 => 5eb3e1 => 5eb366
PID: 1 TASK: ffff9f94c024b980 CPU: 2 COMMAND: "systemd"
#0 [ffffade44001bca8] __schedule at ffffffffb948ebbb
#1 [ffffade44001bd10] schedule at ffffffffb948f04d
#2 [ffffade44001bd20] schedule_hrtimeout_range_clock at ffffffffb9494fef
#3 [ffffade44001bda8] ep_poll at ffffffffb8c91be8
#4 [ffffade44001be48] do_epoll_wait at ffffffffb8c91d11
#5 [ffffade44001be80] __x64_sys_epoll_wait at ffffffffb8c92590
#6 [ffffade44001bed0] do_syscall_64 at ffffffffb947f459
#7 [ffffade44001bf50] entry_SYSCALL_64_after_hwframe at ffffffffb96000ea
5eb366: SIZE_verify.part.42+70
5eb3e1: SIZE_verify+49
5cbc72: x86_64_low_budget_back_trace_cmd+3010
588df3: back_trace+1523
bt: invalid structure size: fred_frame
FILE: x86_64.c LINE: 4089 FUNCTION: x86_64_low_budget_back_trace_cmd()
crash> sys
KERNEL: vmlinux
DUMPFILE: vmcore
CPUS: 4
DATE: Wed Jun 12 02:28:24 JST 2024
UPTIME: 8 days, 22:17:18
LOAD AVERAGE: 0.03, 0.01, 0.00
TASKS: 182
NODENAME: rhel94u
RELEASE: 5.14.0-427.13.1.el9_4.x86_64
VERSION: #1 SMP PREEMPT_DYNAMIC Wed Apr 10 10:29:16 EDT 2024
MACHINE: x86_64 (3408 Mhz)
MEMORY: 4 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
Thanks,
Kazu