Thank you for pointing out this issue, Kazu.
On 6/12/24 10:42 AM, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Lianbo, Tao,
>
> On 2024/06/11 18:05, lijiang wrote:
>
>> > > With kernel patch [1], x86_64 will add extra padding for
>> kernel stack,
>> > > as a result, the pt_regs will be shift down by the offset
>> of padding.
>> > > Without the patch, the values of registers read from
>> pt_regs will be
>> > > incorrect.
>> > >
>> > > Though currently the TOP_OF_KERNEL_STACK_PADDING is
>> configured by
>> > > Kconfig, according to kernel code comment [2], the value
>> may be made
>> > > dynamicly later. In addition there might be systems
>> compiled without
>> > > Kconfig avaliable. So in this patch, we will calculate the
>> value of
>> > > TOP_OF_KERNEL_STACK_PADDING.
>> > >
>> > > The calculation is as follows:
>> > >
>> > > 1) in startup_64(), there is a lea instruction as:
>> > > leaq (__end_init_task - TOP_OF_KERNEL_STACK_PADDING -
>> PTREGS_SIZE)(%rip), %rsp
>> > >
>> > > 2) in rewind_stack_and_make_dead(), there is a lea
>> instruction as:
>> > > leaq -PTREGS_SIZE(%rax), %rsp
>> > >
>> > > The disassembled 2 instructions will be like:
>> > >
>> > > 1) 0xffffffff93a0007d <startup_64+3>: lea
>> 0x1e03ec4(%rip),%rsp # 0xffffffff95803f48
>> > > ^^^^^^^^^^^^^^^^^^^^
>> > > 2) 0xffffffff93a0465a <rewind_stack_and_make_dead+10>:
>> lea -0xa8(%rax),%rsp
>> > > ^^^^
>> > > 0xffffffff95803f48 is the value of (__end_init_task -
>> > > TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE), and 0xa8 is
>> the value of
>> > > PTREGS_SIZE, __end_init_task can be get by symbol reading.
>> >
>> > Calculating the value of TOP_OF_KERNEL_STACK_PADDING, which
>> looks good, but it heavily relies on compiler.
>> > Normally we would use this way unless there is no other choice.
>> >
>> > How about the following changes? Although it doesn't handle
>> the case that the value is dynamic, let's see
>> > how to change in the kernel in future, and then consider how
>> to reflect it in crash-utility.
>> >
>> Sure, looks good to me, so let's go with this, and update it later
>> when kernel changes.
>>
>>
>> Ok. Thanks, Tao.
>>
>> Applied with minor changes:
>>
>>
https://github.com/crash-utility/crash/commit/48764a14bc5856f0b0bb3068533...
>>
<
https://github.com/crash-utility/crash/commit/48764a14bc5856f0b0bb3068533...
>>
> It looks like there is a regression with kernels without
> CONFIG_X86_FRED.
> Could you check?
Can you help to check if this can work for you?
diff --git a/x86_64.c b/x86_64.c
index 6777c93e6b47..469d26b05e24 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -4086,7 +4086,7 @@ in_exception_stack:
if (!irq_eframe && !is_kernel_thread(bt->tc->task) &&
(GET_STACKBASE(bt->tc->task) == bt->stackbase)) {
- long stack_padding_size = SIZE(fred_frame) > 0 ? (2*8)
: 0;
+ long stack_padding_size = VALID_SIZE(fred_frame) ?
(2*8) : 0;
user_mode_eframe = bt->stacktop - SIZE(pt_regs);
if (last_process_stack_eframe < user_mode_eframe)
x86_64_exception_frame(EFRAME_PRINT, 0,
bt->stackbuf +
@@ -4408,7 +4408,7 @@ in_exception_stack:
if (!irq_eframe && !is_kernel_thread(bt->tc->task) &&
(GET_STACKBASE(bt->tc->task) == bt->stackbase)) {
- long stack_padding_size = SIZE(fred_frame) > 0 ? (2*8)
: 0;diff --git a/x86_64.c b/x86_64.c
index 6777c93e6b47..469d26b05e24 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -4086,7 +4086,7 @@ in_exception_stack:
if (!irq_eframe && !is_kernel_thread(bt->tc->task) &&
(GET_STACKBASE(bt->tc->task) == bt->stackbase)) {
- long stack_padding_size = SIZE(fred_frame) > 0 ? (2*8)
: 0;
+ long stack_padding_size = VALID_SIZE(fred_frame) ?
(2*8) : 0;
user_mode_eframe = bt->stacktop - SIZE(pt_regs);
if (last_process_stack_eframe < user_mode_eframe)
x86_64_exception_frame(EFRAME_PRINT, 0,
bt->stackbuf +
@@ -4408,7 +4408,7 @@ in_exception_stack:
if (!irq_eframe && !is_kernel_thread(bt->tc->task) &&
(GET_STACKBASE(bt->tc->task) == bt->stackbase)) {
- long stack_padding_size = SIZE(fred_frame) > 0 ? (2*8)
: 0;
+ long stack_padding_size = VALID_SIZE(fred_frame) ?
(2*8) : 0;
user_mode_eframe = bt->stacktop - SIZE(pt_regs);
if (last_process_stack_eframe < user_mode_eframe)
x86_64_exception_frame(EFRAME_PRINT, 0,
bt->stackbuf +
+ long stack_padding_size = VALID_SIZE(fred_frame) ?
(2*8) : 0;
user_mode_eframe = bt->stacktop - SIZE(pt_regs);
if (last_process_stack_eframe < user_mode_eframe)
x86_64_exception_frame(EFRAME_PRINT, 0,
bt->stackbuf +
I will fix it later with the patch if it works fine from your side.
Thanks
Lianbo
>
> crash> bt 1
>
> bt: invalid structure size: fred_frame
> FILE: x86_64.c LINE: 4089 FUNCTION:
> x86_64_low_budget_back_trace_cmd()
>
> [/home/k-hagio/bin/crash] error trace: 588df3 => 5cbc72 => 5eb3e1 =>
> 5eb366
> PID: 1 TASK: ffff9f94c024b980 CPU: 2 COMMAND: "systemd"
> #0 [ffffade44001bca8] __schedule at ffffffffb948ebbb
> #1 [ffffade44001bd10] schedule at ffffffffb948f04d
> #2 [ffffade44001bd20] schedule_hrtimeout_range_clock at
> ffffffffb9494fef
> #3 [ffffade44001bda8] ep_poll at ffffffffb8c91be8
> #4 [ffffade44001be48] do_epoll_wait at ffffffffb8c91d11
> #5 [ffffade44001be80] __x64_sys_epoll_wait at ffffffffb8c92590
> #6 [ffffade44001bed0] do_syscall_64 at ffffffffb947f459
> #7 [ffffade44001bf50] entry_SYSCALL_64_after_hwframe at
> ffffffffb96000ea
>
> 5eb366: SIZE_verify.part.42+70
> 5eb3e1: SIZE_verify+49
> 5cbc72: x86_64_low_budget_back_trace_cmd+3010
> 588df3: back_trace+1523
>
> bt: invalid structure size: fred_frame
> FILE: x86_64.c LINE: 4089 FUNCTION:
> x86_64_low_budget_back_trace_cmd()
>
> crash> sys
> KERNEL: vmlinux
> DUMPFILE: vmcore
> CPUS: 4
> DATE: Wed Jun 12 02:28:24 JST 2024
> UPTIME: 8 days, 22:17:18
> LOAD AVERAGE: 0.03, 0.01, 0.00
> TASKS: 182
> NODENAME: rhel94u
> RELEASE: 5.14.0-427.13.1.el9_4.x86_64
> VERSION: #1 SMP PREEMPT_DYNAMIC Wed Apr 10 10:29:16 EDT 2024
> MACHINE: x86_64 (3408 Mhz)
> MEMORY: 4 GB
> PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>
> Thanks,
> Kazu