On Wed, Nov 29, 2023 at 8:45 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com
wrote:
On 2023/11/28 11:31, Lianbo Jiang wrote:
> Currently, the "bt pid" command may not print enough stack trace and the
> remaining frames will be truncated on x86_64. For example:
> Without the patch:
> crash> bt 493113
> PID: 493113 TASK: ff2e34ecbd3ca2c0 CPU: 27 COMMAND:
"sriov_fec_daemo"
> #0 [ff77abc4e81cfb08] __schedule at ffffffff81b239cb
> #1 [ff77abc4e81cfb70] schedule at ffffffff81b23e2d
> #2 [ff77abc4e81cfb88] schedule_timeout at ffffffff81b2c9e8
> RIP: 000000000047cdbb RSP: 000000c0000975a8 RFLAGS: 00000216
> RAX: ffffffffffffffda RBX: 000000c00004e000 RCX:
000000000047cdbb
> RDX: 000000000000000c RSI: 000000c000097798 RDI:
0000000000000009
> RBP: 000000c0000975f8 R8: 0000000000000001 R9:
000000c00098d680
> R10: 000000000000000c R11: 0000000000000216 R12:
000000c000097688
> R13: 0000000000000000 R14: 000000c0006c3520 R15:
00007f5e359946b7
> ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
> With the patch:
> crash> bt 493113
> PID: 493113 TASK: ff2e34ecbd3ca2c0 CPU: 27 COMMAND:
"sriov_fec_daemo"
> #0 [ff77abc4e81cfb08] __schedule at ffffffff81b239cb
> #1 [ff77abc4e81cfb70] schedule at ffffffff81b23e2d
> #2 [ff77abc4e81cfb88] schedule_timeout at ffffffff81b2c9e8
> #3 [ff77abc4e81cfc68] vfio_unregister_group_dev at ffffffffc10e76ae
[vfio]
> #4 [ff77abc4e81cfca8] vfio_pci_core_unregister_device at
ffffffffc11bb599 [vfio_pci_core]
> #5 [ff77abc4e81cfcc0] vfio_pci_remove at ffffffffc103e045 [vfio_pci]
> #6 [ff77abc4e81cfcd0] pci_device_remove at ffffffff815d7513
> #7 [ff77abc4e81cfcf0] device_release_driver_internal at
ffffffff81708baa
> #8 [ff77abc4e81cfd20] unbind_store at ffffffff81705f6f
> #9 [ff77abc4e81cfd50] kernfs_fop_write_iter at ffffffff81454bf1
> #10 [ff77abc4e81cfd88] new_sync_write at ffffffff813aad8c
> #11 [ff77abc4e81cfe20] vfs_write at ffffffff813adb36
> #12 [ff77abc4e81cfe58] ksys_write at ffffffff813adeb2
> #13 [ff77abc4e81cfe90] do_syscall_64 at ffffffff81b17159
> #14 [ff77abc4e81cff50] entry_SYSCALL_64_after_hwframe at
ffffffff81c0009b
> RIP: 000000000047cdbb RSP: 000000c0000975a8 RFLAGS: 00000216
> RAX: ffffffffffffffda RBX: 000000c00004e000 RCX:
000000000047cdbb
> RDX: 000000000000000c RSI: 000000c000097798 RDI:
0000000000000009
> RBP: 000000c0000975f8 R8: 0000000000000001 R9:
000000c00098d680
> R10: 000000000000000c R11: 0000000000000216 R12:
000000c000097688
> R13: 0000000000000000 R14: 000000c0006c3520 R15:
00007f5e359946b7
> ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
> Let's add a check function that jump to
schedule_timeout(), just like
> the schedule_timeout_*() in x86_64_function_called_by().
Thank you for the comments, Kazu.
Sorry, but could you elaborate the cause of this issue and the
reason
why the patch can fix it? I didn't get it with the commit log.
I did the debugging, and also saw that it gets the called function, but
it's not the target function, here, the target function is
schedule_timeout(). So, we need to correct it.
After the instructions 'x/i 0xffffffffc10e76a9', the stack trace was
truncated, but it is the actual back trace. Let's see the following output
for debugging:
crash> bt 493113
PID: 493113 TASK: ff2e34ecbd3ca2c0 CPU: 27 COMMAND: "sriov_fec_daemo"
#0 [ff77abc4e81cfb08] __schedule at ffffffff81b239cb
#1 [ff77abc4e81cfb70] schedule at ffffffff81b23e2d
bt: x86_64_function_called_by 4469: x/i 0xffffffff81b2c9e3
bt: x86_64_function_called_by 4475: 0xffffffff81b2c9e3
<schedule_timeout+131>: call 0xffffffff81b23dd0 <schedule
bt: x86_64_function_called_by 4488: value:ffffffff81b23dd0 offset:0
name:schedule
#2 [ff77abc4e81cfb88] schedule_timeout at ffffffff81b2c9e8
bt: x86_64_function_called_by 4469: x/i 0xffffffffc10e76a9
bt: x86_64_function_called_by 4475: 0xffffffffc10e76a9
<vfio_unregister_group_dev+121>: call 0xffffffff81b24c70
<wait_for_completion_interruptible_timeout
bt: x86_64_function_called_by 4488: value:ffffffff81b24c70 offset:0
name:wait_for_completion_interruptible_timeout
bt: x86_64_function_called_by 4469: x/i 0xffffffffc11bb594
bt: x86_64_function_called_by 4475: 0xffffffffc11bb594
<vfio_pci_core_unregister_device+20>: call 0xffffffffc10e7630
<vfio_unregister_group_dev
bt: x86_64_function_called_by 4488: value:ffffffffc10e7630 offset:0
name:vfio_unregister_group_dev
bt: x86_64_function_called_by 4469: x/i 0xffffffffc103e040
bt: x86_64_function_called_by 4475: 0xffffffffc103e040
<vfio_pci_remove+16>: call 0xffffffffc11bb580
<vfio_pci_core_unregister_device
bt: x86_64_function_called_by 4488: value:ffffffffc11bb580 offset:0
name:vfio_pci_core_unregister_device
bt: x86_64_function_called_by 4469: x/i 0xffffffff81708ba5
bt: x86_64_function_called_by 4475: 0xffffffff81708ba5
<device_release_driver_internal+421>: call 0xffffffff817075a0
<device_remove
bt: x86_64_function_called_by 4488: value:ffffffff817075a0 offset:0
name:device_remove
bt: x86_64_function_called_by 4469: x/i 0xffffffff81705f6a
bt: x86_64_function_called_by 4475: 0xffffffff81705f6a
<unbind_store+282>: call 0xffffffff81708c40 <device_driver_detach
bt: x86_64_function_called_by 4488: value:ffffffff81708c40 offset:0
name:device_driver_detach
bt: x86_64_function_called_by 4469: x/i 0xffffffff813adb31
bt: x86_64_function_called_by 4475: 0xffffffff813adb31 <vfs_write+593>:
call 0xffffffff813aac90 <new_sync_write
bt: x86_64_function_called_by 4488: value:ffffffff813aac90 offset:0
name:new_sync_write
bt: x86_64_function_called_by 4469: x/i 0xffffffff813adead
bt: x86_64_function_called_by 4475: 0xffffffff813adead <ksys_write+93>:
call 0xffffffff813ad8e0 <vfs_write
bt: x86_64_function_called_by 4488: value:ffffffff813ad8e0 offset:0
name:vfs_write
crash
Also, could I have the following information?
Sure.
- output of "bt -t 493113" with the patch
crash> bt -t 493113
PID: 493113 TASK: ff2e34ecbd3ca2c0 CPU: 27 COMMAND: "sriov_fec_daemo"
START: __schedule at ffffffff81b239cb
[ff77abc4e81cfb08] __schedule at ffffffff81b239cb
[ff77abc4e81cfb70] schedule at ffffffff81b23e2d
[ff77abc4e81cfb88] schedule_timeout at ffffffff81b2c9e8
[ff77abc4e81cfbb0] process_timeout at ffffffff811a09d0
[ff77abc4e81cfbf0] __wait_for_common at ffffffff81b24abb
[ff77abc4e81cfbf8] schedule_timeout at ffffffff81b2c960
[ff77abc4e81cfc68] vfio_unregister_group_dev at ffffffffc10e76ae [vfio]
[ff77abc4e81cfca8] vfio_pci_core_unregister_device at ffffffffc11bb599
[vfio_pci_core]
[ff77abc4e81cfcc0] vfio_pci_remove at ffffffffc103e045 [vfio_pci]
[ff77abc4e81cfcd0] pci_device_remove at ffffffff815d7513
[ff77abc4e81cfcf0] device_release_driver_internal at ffffffff81708baa
[ff77abc4e81cfd20] unbind_store at ffffffff81705f6f
[ff77abc4e81cfd50] kernfs_fop_write_iter at ffffffff81454bf1
[ff77abc4e81cfd88] new_sync_write at ffffffff813aad8c
[ff77abc4e81cfe20] vfs_write at ffffffff813adb36
[ff77abc4e81cfe58] ksys_write at ffffffff813adeb2
[ff77abc4e81cfe90] do_syscall_64 at ffffffff81b17159
[ff77abc4e81cfea0] do_syscall_64 at ffffffff81b17169
[ff77abc4e81cfeb0] syscall_exit_to_user_mode at ffffffff81b1b50d
[ff77abc4e81cfec0] do_syscall_64 at ffffffff81b17169
[ff77abc4e81cfed0] do_syscall_64 at ffffffff81b17169
[ff77abc4e81cfed8] do_syscall_64 at ffffffff81b17169
[ff77abc4e81cfee0] syscall_exit_to_user_mode at ffffffff81b1b50d
[ff77abc4e81cfef0] do_syscall_64 at ffffffff81b17169
[ff77abc4e81cfef8] __context_tracking_enter at ffffffff81b1b6fc
[ff77abc4e81cff08] syscall_exit_to_user_mode at ffffffff81b1b50d
[ff77abc4e81cff18] do_syscall_64 at ffffffff81b17169
[ff77abc4e81cff20] do_syscall_64 at ffffffff81b17169
[ff77abc4e81cff28] do_syscall_64 at ffffffff81b17169
[ff77abc4e81cff30] do_syscall_64 at ffffffff81b17169
[ff77abc4e81cff38] __context_tracking_enter at ffffffff81b1b6fc
[ff77abc4e81cff50] entry_SYSCALL_64_after_hwframe at ffffffff81c0009b
RIP: 000000000047cdbb RSP: 000000c0000975a8 RFLAGS: 00000216
RAX: ffffffffffffffda RBX: 000000c00004e000 RCX: 000000000047cdbb
RDX: 000000000000000c RSI: 000000c000097798 RDI: 0000000000000009
RBP: 000000c0000975f8 R8: 0000000000000001 R9: 000000c00098d680
R10: 000000000000000c R11: 0000000000000216 R12: 000000c000097688
R13: 0000000000000000 R14: 000000c0006c3520 R15: 00007f5e359946b7
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash
- output of "bt 493113" after "set debug 2" without the patch.
crash> set debug 2
debug: 2
crash> bt 493113
PID: 493113 TASK: ff2e34ecbd3ca2c0 CPU: 27 COMMAND: "sriov_fec_daemo"
rsp: ff77abc4e81cfad8 rbp: ff2e34ff19aeb140
#0 [ff77abc4e81cfb08] __schedule at ffffffff81b239cb
orc_find:
ip: ffffffff81b239cb idx: 45625
start = orc_lookup[45625]: 419274
stop = orc_lookup[45626] + 1: 419283
ip table start: ffffffff8312bbcc
unwind table start: ffffffff833a0ff8
__orc_find:
ip: ffffffff81b239cb num_entries: 9
orc_ip(ffffffff8312bbcc): fe9f7bbf -> ffffffff81b2378b / __schedule+27 -
ffffffff833a0ff8
orc_ip(ffffffff8312bbd0): fe9f7e1b -> ffffffff81b239eb / __schedule+635
-> ffffffff833a0ffe
orc_ip(ffffffff8312bbd4): fe9f7e18 -> ffffffff81b239ec / __schedule+636
-> ffffffff833a1004
orc_ip(ffffffff8312bbd8): fe9f7e15 -> ffffffff81b239ed / __schedule+637
-> ffffffff833a100a
orc_ip(ffffffff8312bbdc): fe9f7e13 -> ffffffff81b239ef / __schedule+639
-> ffffffff833a1010
orc_ip(ffffffff8312bbe0): fe9f7e11 -> ffffffff81b239f1 / __schedule+641
-> ffffffff833a1016
orc_ip(ffffffff8312bbe4): fe9f7e0f -> ffffffff81b239f3 / __schedule+643
-> ffffffff833a101c
orc_ip(ffffffff8312bbe8): fe9f7e0d -> ffffffff81b239f5 / __schedule+645
-> ffffffff833a1022
orc_ip(ffffffff8312bbec): fe9f7e0e -> ffffffff81b239fa / __schedule+650
-> ffffffff833a1028
found: ffffffff8312bbcc index: 0
orc_entry: ffffffff833a0ff8 sp_offset: 104 bp_offset: -48 sp_reg: 5
bp_reg: 1 type: 0 end: 0
rsp: ff77abc4e81cfb08 textaddr: ffffffff81b239cb -> spo: 104 bpo: -48 spr:
5 bpr: 1 type: 0 end: 0
rsp: ff77abc4e81cfb08 prev_sp: ff77abc4e81cfb78 framesize: 96
rsp: ff77abc4e81cfb08 prev_sp: ff77abc4e81cfb78 prev_bp: ff77abc4e81cfb48
-> ff2e34ecbd3ca2c0
#1 [ff77abc4e81cfb70] schedule at ffffffff81b23e2d
< enable BT_CHECK_CALLER for schedule
orc_find:
ip: ffffffff81b23e2d idx: 45630
start = orc_lookup[45630]: 419286
stop = orc_lookup[45631] + 1: 419298
ip table start: ffffffff8312bbfc
unwind table start: ffffffff833a1040
__orc_find:
ip: ffffffff81b23e2d num_entries: 12
orc_ip(ffffffff8312bbfc): fe9f81e4 -> ffffffff81b23de0 / schedule+16 -
ffffffff833a1040
orc_ip(ffffffff8312bc00): fe9f8254 -> ffffffff81b23e54 / schedule+132 -
ffffffff833a1046
orc_ip(ffffffff8312bc04): fe9f8251 -> ffffffff81b23e55 / schedule+133 -
ffffffff833a104c
orc_ip(ffffffff8312bc08): fe9f8252 -> ffffffff81b23e5a / schedule+138 -
ffffffff833a1052
orc_ip(ffffffff8312bc0c): fe9f8260 -> ffffffff81b23e6c / schedule+156 -
ffffffff833a1058
orc_ip(ffffffff8312bc10): fe9f825d -> ffffffff81b23e6d / schedule+157 -
ffffffff833a105e
orc_ip(ffffffff8312bc14): fe9f825e -> ffffffff81b23e72 / schedule+162 -
ffffffff833a1064
orc_ip(ffffffff8312bc18): fe9f827d -> ffffffff81b23e95 / schedule+197 -
ffffffff833a106a
orc_ip(ffffffff8312bc1c): fe9f8284 -> ffffffff81b23ea0 / yield+0 -
ffffffff833a1070
orc_ip(ffffffff8312bc20): fe9f8298 -> ffffffff81b23eb8 / yield+24 -
ffffffff833a1076
orc_ip(ffffffff8312bc24): fe9f829c -> ffffffff81b23ec0 / io_schedule+0 -
ffffffff833a107c
orc_ip(ffffffff8312bc28): fe9f829e -> ffffffff81b23ec6 / io_schedule+6 -
ffffffff833a1082
found: ffffffff8312bbfc index: 0
orc_entry: ffffffff833a1040 sp_offset: 24 bp_offset: -16 sp_reg: 5
bp_reg: 1 type: 0 end: 0
rsp: ff77abc4e81cfb70 textaddr: ffffffff81b23e2d -> spo: 24 bpo: -16 spr: 5
bpr: 1 type: 0 end: 0
rsp: ff77abc4e81cfb70 prev_sp: ff77abc4e81cfb90 framesize: 16
rsp: ff77abc4e81cfb70 prev_sp: ff77abc4e81cfb90 prev_bp: ff77abc4e81cfb80
-> 101047510
#2 [ff77abc4e81cfb88] schedule_timeout at ffffffff81b2c9e8
< enable BT_CHECK_CALLER for schedule_timeout
orc_find:
ip: ffffffff81b2c9e8 idx: 45769
start = orc_lookup[45769]: 420322
stop = orc_lookup[45770] + 1: 420333
ip table start: ffffffff8312cc2c
unwind table start: ffffffff833a2888
__orc_find:
ip: ffffffff81b2c9e8 num_entries: 11
orc_ip(ffffffff8312cc2c): fe9ffcc3 -> ffffffff81b2c8ef /
usleep_range_state+31 -> ffffffff833a2888
orc_ip(ffffffff8312cc30): fe9ffd18 -> ffffffff81b2c948 /
usleep_range_state+120 -> ffffffff833a288e
orc_ip(ffffffff8312cc34): fe9ffd15 -> ffffffff81b2c949 /
usleep_range_state+121 -> ffffffff833a2894
orc_ip(ffffffff8312cc38): fe9ffd12 -> ffffffff81b2c94a /
usleep_range_state+122 -> ffffffff833a289a
orc_ip(ffffffff8312cc3c): fe9ffd10 -> ffffffff81b2c94c /
usleep_range_state+124 -> ffffffff833a28a0
orc_ip(ffffffff8312cc40): fe9ffd11 -> ffffffff81b2c951 /
usleep_range_state+129 -> ffffffff833a28a6
orc_ip(ffffffff8312cc44): fe9ffd12 -> ffffffff81b2c956 /
usleep_range_state+134 -> ffffffff833a28ac
orc_ip(ffffffff8312cc48): fe9ffd18 -> ffffffff81b2c960 /
schedule_timeout+0 -> ffffffff833a28b2
orc_ip(ffffffff8312cc4c): fe9ffd1b -> ffffffff81b2c967 /
schedule_timeout+7 -> ffffffff833a28b8
orc_ip(ffffffff8312cc50): fe9ffd18 -> ffffffff81b2c968 /
schedule_timeout+8 -> ffffffff833a28be
orc_ip(ffffffff8312cc54): fe9ffd18 -> ffffffff81b2c96c /
schedule_timeout+12 -> ffffffff833a28c4
found: ffffffff8312cc54 index: 10
orc_entry: ffffffff833a28c4 sp_offset: 104 bp_offset: -24 sp_reg: 5
bp_reg: 1 type: 0 end: 0
rsp: ff77abc4e81cfb88 textaddr: ffffffff81b2c9e8 -> spo: 104 bpo: -24 spr:
5 bpr: 1 type: 0 end: 0
rsp: ff77abc4e81cfb88 prev_sp: ff77abc4e81cfbf8 framesize: 96
rsp: ff77abc4e81cfb88 prev_sp: ff77abc4e81cfbf8 prev_bp: ff77abc4e81cfbe0
-> 2710
< ignoring __wait_for_common() -- makes indirect call and NOT
schedule_timeout()
< ignoring
vfio_unregister_group_dev() -- calls
wait_for_completion_interruptible_timeout() and NOT schedule_timeout()
< ignoring vfio_pci_core_unregister_device() -- calls
vfio_unregister_group_dev() and NOT schedule_timeout()
<
ignoring vfio_pci_remove() -- calls vfio_pci_core_unregister_device() and
NOT schedule_timeout()
< ignoring pci_device_remove() --
makes indirect call and NOT
schedule_timeout()
< ignoring
device_release_driver_internal() -- calls device_remove() and
NOT schedule_timeout()
< ignoring unbind_store() --
calls device_driver_detach() and NOT
schedule_timeout()
< ignoring
kernfs_fop_write_iter() -- makes indirect call and NOT
schedule_timeout()
< ignoring new_sync_write() --
makes indirect call and NOT
schedule_timeout()
< ignoring vfs_write() -- calls
new_sync_write() and NOT schedule_timeout()
<
ignoring ksys_write() -- calls vfs_write() and NOT schedule_timeout()
< ignoring do_syscall_64() -- makes indirect call and NOT
schedule_timeout()
< ignoring do_syscall_64() --
calls syscall_exit_to_user_mode() and NOT
schedule_timeout()
< ignoring
syscall_exit_to_user_mode() -- calls __context_tracking_enter()
and NOT schedule_timeout()
< ignoring do_syscall_64() --
calls syscall_exit_to_user_mode() and NOT
schedule_timeout()
< ignoring do_syscall_64() --
calls syscall_exit_to_user_mode() and NOT
schedule_timeout()
< ignoring do_syscall_64() --
calls syscall_exit_to_user_mode() and NOT
schedule_timeout()
< ignoring
syscall_exit_to_user_mode() -- calls __context_tracking_enter()
and NOT schedule_timeout()
< ignoring do_syscall_64() --
calls syscall_exit_to_user_mode() and NOT
schedule_timeout()
< ignoring
__context_tracking_enter() -- calls __this_cpu_preempt_check()
and NOT schedule_timeout()
< ignoring
syscall_exit_to_user_mode() -- calls __context_tracking_enter()
and NOT schedule_timeout()
< ignoring do_syscall_64() --
calls syscall_exit_to_user_mode() and NOT
schedule_timeout()
< ignoring do_syscall_64() --
calls syscall_exit_to_user_mode() and NOT
schedule_timeout()
< ignoring do_syscall_64() --
calls syscall_exit_to_user_mode() and NOT
schedule_timeout()
< ignoring do_syscall_64() --
calls syscall_exit_to_user_mode() and NOT
schedule_timeout()
< ignoring
__context_tracking_enter() -- calls __this_cpu_preempt_check()
and NOT schedule_timeout()
< ignoring
entry_SYSCALL_64_after_hwframe() -- calls do_syscall_64() and
NOT schedule_timeout()
< exception frame at:
ff77abc4e81cff58
RIP: 000000000047cdbb RSP:
000000c0000975a8 RFLAGS: 00000216
RAX: ffffffffffffffda RBX: 000000c00004e000 RCX: 000000000047cdbb
RDX: 000000000000000c RSI: 000000c000097798 RDI: 0000000000000009
RBP: 000000c0000975f8 R8: 0000000000000001 R9: 000000c00098d680
R10: 000000000000000c R11: 0000000000000216 R12: 000000c000097688
R13: 0000000000000000 R14: 000000c0006c3520 R15: 00007f5e359946b7
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash
Thanks.
Lianbo
> Thanks,
> Kazu
>
> > Signed-off-by: Lianbo
Jiang <lijiang(a)redhat.com
> > ---
> > x86_64.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
>
> > diff --git a/x86_64.c b/x86_64.c
> > index 42ade4817ad9..16850d98dc2d 100644
> > --- a/x86_64.c
> > +++ b/x86_64.c
> > @@ -4487,7 +4487,8 @@ x86_64_function_called_by(ulong rip)
> > */
> > if (sp) {
> > if ((STREQ(sp->name,
"schedule_timeout_interruptible") ||
> > - STREQ(sp->name,
"schedule_timeout_uninterruptible")))
> > + STREQ(sp->name,
"schedule_timeout_uninterruptible") ||
> > + STREQ(sp->name,
> "wait_for_completion_interruptible_timeout")))
> > sp = symbol_search("schedule_timeout");
>
> > if (STREQ(sp->name,
"__cond_resched"))