On 2023/11/28 11:31, Lianbo Jiang wrote:
Currently, the "bt pid" command may not print enough stack
trace and the
remaining frames will be truncated on x86_64. For example:
Without the patch:
crash> bt 493113
PID: 493113 TASK: ff2e34ecbd3ca2c0 CPU: 27 COMMAND: "sriov_fec_daemo"
#0 [ff77abc4e81cfb08] __schedule at ffffffff81b239cb
#1 [ff77abc4e81cfb70] schedule at ffffffff81b23e2d
#2 [ff77abc4e81cfb88] schedule_timeout at ffffffff81b2c9e8
RIP: 000000000047cdbb RSP: 000000c0000975a8 RFLAGS: 00000216
RAX: ffffffffffffffda RBX: 000000c00004e000 RCX: 000000000047cdbb
RDX: 000000000000000c RSI: 000000c000097798 RDI: 0000000000000009
RBP: 000000c0000975f8 R8: 0000000000000001 R9: 000000c00098d680
R10: 000000000000000c R11: 0000000000000216 R12: 000000c000097688
R13: 0000000000000000 R14: 000000c0006c3520 R15: 00007f5e359946b7
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
With the patch:
crash> bt 493113
PID: 493113 TASK: ff2e34ecbd3ca2c0 CPU: 27 COMMAND: "sriov_fec_daemo"
#0 [ff77abc4e81cfb08] __schedule at ffffffff81b239cb
#1 [ff77abc4e81cfb70] schedule at ffffffff81b23e2d
#2 [ff77abc4e81cfb88] schedule_timeout at ffffffff81b2c9e8
#3 [ff77abc4e81cfc68] vfio_unregister_group_dev at ffffffffc10e76ae [vfio]
#4 [ff77abc4e81cfca8] vfio_pci_core_unregister_device at ffffffffc11bb599
[vfio_pci_core]
#5 [ff77abc4e81cfcc0] vfio_pci_remove at ffffffffc103e045 [vfio_pci]
#6 [ff77abc4e81cfcd0] pci_device_remove at ffffffff815d7513
#7 [ff77abc4e81cfcf0] device_release_driver_internal at ffffffff81708baa
#8 [ff77abc4e81cfd20] unbind_store at ffffffff81705f6f
#9 [ff77abc4e81cfd50] kernfs_fop_write_iter at ffffffff81454bf1
#10 [ff77abc4e81cfd88] new_sync_write at ffffffff813aad8c
#11 [ff77abc4e81cfe20] vfs_write at ffffffff813adb36
#12 [ff77abc4e81cfe58] ksys_write at ffffffff813adeb2
#13 [ff77abc4e81cfe90] do_syscall_64 at ffffffff81b17159
#14 [ff77abc4e81cff50] entry_SYSCALL_64_after_hwframe at ffffffff81c0009b
RIP: 000000000047cdbb RSP: 000000c0000975a8 RFLAGS: 00000216
RAX: ffffffffffffffda RBX: 000000c00004e000 RCX: 000000000047cdbb
RDX: 000000000000000c RSI: 000000c000097798 RDI: 0000000000000009
RBP: 000000c0000975f8 R8: 0000000000000001 R9: 000000c00098d680
R10: 000000000000000c R11: 0000000000000216 R12: 000000c000097688
R13: 0000000000000000 R14: 000000c0006c3520 R15: 00007f5e359946b7
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
Let's add a check function that jump to schedule_timeout(), just like
the schedule_timeout_*() in x86_64_function_called_by().
Sorry, but could you elaborate the cause of this issue and the reason
why the patch can fix it? I didn't get it with the commit log.
Also, could I have the following information?
- output of "bt -t 493113" with the patch
- output of "bt 493113" after "set debug 2" without the patch.
Thanks,
Kazu
>
> Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com>
> ---
> x86_64.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/x86_64.c b/x86_64.c
> index 42ade4817ad9..16850d98dc2d 100644
> --- a/x86_64.c
> +++ b/x86_64.c
> @@ -4487,7 +4487,8 @@ x86_64_function_called_by(ulong rip)
> */
> if (sp) {
> if ((STREQ(sp->name, "schedule_timeout_interruptible") ||
> - STREQ(sp->name, "schedule_timeout_uninterruptible")))
> + STREQ(sp->name, "schedule_timeout_uninterruptible") ||
> + STREQ(sp->name, "wait_for_completion_interruptible_timeout")))
> sp = symbol_search("schedule_timeout");
>
> if (STREQ(sp->name, "__cond_resched"))