Hi lianbo,
On Fri, Sep 20, 2024 at 2:17 PM lijiang <lijiang(a)redhat.com> wrote:
On Wed, Sep 11, 2024 at 3:32 PM lijiang <lijiang(a)redhat.com> wrote:
>
> On Wed, Sep 11, 2024 at 2:36 PM Tao Liu <ltao(a)redhat.com> wrote:
>>
>> Hi lianbo,
>>
>> On Wed, Sep 11, 2024 at 2:26 PM lijiang <lijiang(a)redhat.com> wrote:
>> >
>> > Hi, Tao
>> >
>> > Thank you for the update.
>> >
>> > The following patch is a regression issue, so I tend to discuss it as a
separate patch.
>> > [PATCH v7 01/15] Fix the regression of cpumask_t for xen hyper
>
>
> Can you also post v2 for this one? I have two comments about it:
> [1] is it possible to not introduce the code related to hyper to a common module such
as tools.c?
> [2] for IA64 arch, I saw the machdep->get_irq_affinity = generic_get_irq_affinity
is registered (see the ia64_init())
>
>
>> >
>> > In addition, I found another issue in my tests(on ppc64le), the gdb bt can
display the back trace for the panic task, but when I switch to another task, the gdb bt
can not display the back trace:
>> >
>> > crash> gdb bt
>> > #0 0xc0000000002bde04 in crash_setup_regs (newregs=0xc00000003264b858,
oldregs=0x0) at ./arch/powerpc/include/asm/kexec.h:133
>> > #1 0xc0000000002be4f8 in __crash_kexec (regs=0x0) at
kernel/crash_core.c:122
>> > #2 0xc00000000016c254 in panic (fmt=0xc0000000015eef20 "sysrq
triggered crash\n") at kernel/panic.c:373
>> > #3 0xc000000000a708b8 in sysrq_handle_crash (key=<optimized out>) at
drivers/tty/sysrq.c:154
>> > #4 0xc000000000a713d4 in __handle_sysrq (key=key@entry=99 'c',
check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:612
>> > #5 0xc000000000a71e94 in write_sysrq_trigger (file=<optimized out>,
buf=<optimized out>, count=2, ppos=<optimized out>) at
drivers/tty/sysrq.c:1181
>> > #6 0xc00000000073260c in pde_write (pde=0xc00000000af9cc00,
file=<optimized out>, buf=<optimized out>, count=<optimized out>,
ppos=<optimized out>) at fs/proc/inode.c:334
>> > #7 proc_reg_write (file=<optimized out>, buf=<optimized out>,
count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:346
>> > #8 0xc00000000063c0e0 in vfs_write (file=0xc0000000092d2900,
buf=0x10012536f60 <error: Cannot access memory at address 0x10012536f60>, count=2,
pos=0xc00000003264bd30) at fs/read_write.c:588
>> > #9 vfs_write (file=0xc0000000092d2900, buf=0x10012536f60 <error: Cannot
access memory at address 0x10012536f60>, count=<optimized out>,
pos=0xc00000003264bd30) at fs/read_write.c:570
>> > #10 0xc00000000063c690 in ksys_write (fd=<optimized out>,
buf=0x10012536f60 <error: Cannot access memory at address 0x10012536f60>, count=2)
at fs/read_write.c:643
>> > #11 0xc000000000031a28 in system_call_exception (regs=0xc00000003264be80,
r0=<optimized out>) at arch/powerpc/kernel/syscall.c:153
>> > #12 0xc00000000000d05c in system_call_vectored_common () at
arch/powerpc/kernel/interrupt_64.S:198
>> >
>> > crash> ps
>> > PID PPID CPU TASK ST %MEM VSZ RSS COMM
>> > 0 0 0 c000000002bda980 RU 0.0 0 0
[swapper/0]
>> > > 0 0 1 c000000003864c80 RU 0.0 0 0
[swapper/1]
>> > ...
>> > 8017 923 0 c000000043a20000 IN 0.2 22528 16256
sshd-session
>> > 8025 8017 6 c000000032271880 IN 0.1 22784 11840
sshd-session
>> > > 8026 8025 0 c000000043a26600 RU 0.1 9664 6208
bash
>> > ...
>> > 11645 2 3 c000000032264c80 ID 0.0 0 0
[kworker/u32:2]
>> > 11738 6188 2 c00000003811b180 IN 0.1 43520 9408 pickup
>> > 12326 2 0 c00000003226b280 ID 0.0 0 0
[kworker/0:1]
>> > 13112 6089 2 c00000000c809900 IN 0.0 7232 3456 sleep
>> >
>> > Let's take the "pickup" task as an example:
>> >
>> > crash> set 11738
>> > PID: 11738
>> > COMMAND: "pickup"
>> > TASK: c00000003811b180 [THREAD_INFO: c00000003811b180]
>> > CPU: 2
>> > STATE: TASK_INTERRUPTIBLE
>> >
>> > crash> gdb bt
>> > #0 0xc0000000a7f876a0 in ?? ()
>> > gdb: gdb request failed: bt
>> > crash> set gdb on
>> > gdb: on
>> > gdb> bt
>> > #0 0xc0000000a7f876a0 in ?? ()
>> > gdb>
>>
>> There is a bug for ppc64 crash of newer version kernel. The code for
>> determining the address of pt_regs from stack is outdated, see the
>> following code from crash:
>>
>> ppc64.c:get_ppc64_frame()
>> readmem(sp+STACK_FRAME_OVERHEAD, KVADDR, ®s, sizeof(struct
>> ppc64_pt_regs), "PPC64 pt_regs", FAULT_ON_ERROR);
>>
>> The pt_regs is expected to be placed at sp+STACK_FRAME_OVERHEAD, aka sp+112.
>>
>> However since kernel >= v6.2, the value is no longer appropriate:
>>
>> linux kernel:arch/powerpc/kernel/process.c:copy_thread():
>> kregs = (struct pt_regs *)(sp + STACK_SWITCH_FRAME_REGS);
>> p->thread.ksp = sp;
>>
>> linux kernel:arch/powerpc/include/asm/ptrace.h:
>> #ifdef CONFIG_PPC64_ELF_ABI_V2
>> #define STACK_FRAME_MIN_SIZE 32
>> STACK_SWITCH_FRAME_REGS (STACK_FRAME_MIN_SIZE + 16)
>>
>
> Good findings, Tao.
>
>>
>> If we apply the change to crash, i.e:
>> readmem(sp+0x30, KVADDR, ®s, sizeof(struct ppc64_pt_regs), "PPC64
>> pt_regs", FAULT_ON_ERROR);
>>
>> The stack unwinding can work as expected, you can have a test locally
>> to see if the above change works for you.
>>
>> So this bug isn't related to the gdb stack unwinding support to me,
>> just a bug relating to a newer version of kernel.
>>
>
> Agree. For the [PATCH V7 02/15] -[PATCH V7 15/15]: Ack.
>
> And I will put them in the merging queue, once the current issue gets resolved, we
can merge them together. Otherwise it may not work on ppc64 arch.
BTW: another issue is observed when I do some tests, the "gdb bt" command does
not work well on old kernels, furthermore it can not display any backtrace on the 2.6
kernels.
Given that, can we state on which kernel version gdb stack unwinding works well in the
patch log? Or can it be improved in the next step?
Thanks for the feedback. The 2.6 kernel should work according to my
test, so there might be bugs so better to have the vmcore for a debug
and inspection.
case [1]:
crash> bt
PID: 0 TASK: ffffffff8cc10740 CPU: 0 COMMAND: "swapper/0"
#0 [fffffe0000008a10] machine_kexec at ffffffff8ba5176e
#1 [fffffe0000008a68] __crash_kexec at ffffffff8bb4776d
#2 [fffffe0000008b30] panic at ffffffff8baa6aa8
#3 [fffffe0000008bb8] watchdog_overflow_callback.cold.7 at ffffffff8bb79d94
#4 [fffffe0000008bc8] __perf_event_overflow at ffffffff8bbdc572
#5 [fffffe0000008bf8] intel_pmu_handle_irq at ffffffff8ba0cbc6
#6 [fffffe0000008e40] perf_event_nmi_handler at ffffffff8ba05ffd
#7 [fffffe0000008e58] nmi_handle at ffffffff8ba20663
#8 [fffffe0000008eb0] default_do_nmi at ffffffff8ba20a0e
#9 [fffffe0000008ed0] do_nmi at ffffffff8ba20bd2
#10 [fffffe0000008ef0] end_repeat_nmi at ffffffff8c4014d8
[exception RIP: cfb_imageblit+1102]
RIP: ffffffff8be751be RSP: ffff97587fa036e8 RFLAGS: 00000046
RAX: 0000000000000000 RBX: ffffa4c120650cf0 RCX: 0000000000000006
RDX: 000000000000000e RSI: 0000000000000000 RDI: ffffa4c120650c00
RBP: 0000000000000002 R8: ffff97587d15fb5b R9: 00000000ad55ad55
R10: 0000000000000000 R11: ffffa4c120650c04 R12: 0000000000000800
R13: 0000000000000003 R14: ffffffff8c899170 R15: ffff97587d15fb45
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#11 [ffff97587fa036e8] cfb_imageblit at ffffffff8be751be
#12 [ffff97587fa03750] drm_fb_helper_cfb_imageblit at ffffffffc04c0e12 [drm_kms_helper]
#13 [ffff97587fa03768] bit_putcs at ffffffff8be6f6f1
#14 [ffff97587fa03870] fbcon_putcs at ffffffff8be6baa9
#15 [ffff97587fa038c8] fbcon_redraw at ffffffff8be6bd1d
#16 [ffff97587fa03928] fbcon_scroll at ffffffff8be6d434
#17 [ffff97587fa03988] con_scroll at ffffffff8befe900
#18 [ffff97587fa039c0] lf at ffffffff8befe9b0
#19 [ffff97587fa039e8] vt_console_print at ffffffff8bf00816
...skipping...
#22 [ffff97587fa03af0] printk at ffffffff8bb09186
#23 [ffff97587fa03b50] irq_work_run_list at ffffffff8bbbca5d
#24 [ffff97587fa03b78] irq_work_run at ffffffff8bbbca94
#25 [ffff97587fa03b80] smp_irq_work_interrupt at ffffffff8c401e02
#26 [ffff97587fa03b90] irq_work_interrupt at ffffffff8c401bbf
#27 [ffff97587fa03ba0] irq_work_interrupt at ffffffff8c401bba
#28 [ffff97587fa03c28] update_group_capacity at ffffffff8bae4a2f
#29 [ffff97587fa03c80] find_busiest_group at ffffffff8bae4c6e
#30 [ffff97587fa03e10] load_balance at ffffffff8bae57eb
#31 [ffff97587fa03f00] rebalance_domains at ffffffff8bae62ca
#32 [ffff97587fa03f68] __softirqentry_text_start at ffffffff8c6000e3
#33 [ffff97587fa03fc8] irq_exit at ffffffff8baacaf5
#34 [ffff97587fa03fd8] smp_apic_timer_interrupt at ffffffff8c40241c
#35 [ffff97587fa03ff0] apic_timer_interrupt at ffffffff8c401a7f
--- <IRQ stack> ---
#36 [ffffffff8cc03d68] apic_timer_interrupt at ffffffff8c401a7f
[exception RIP: finish_task_switch+125]
RIP: ffffffff8bad131d RSP: ffffffff8cc03e10 RFLAGS: 00000246
RAX: ffff97587eb18000 RBX: ffffffff8cc10740 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff97587fa22940
RBP: ffffffff8cc03e38 R8: 000001e44b0102cc R9: 0000000000000002
R10: 0000000000000040 R11: 0000000000000000 R12: ffff97587fa22940
R13: ffff97587f3c7380 R14: ffff97587eb18000 R15: 0000000000000402
ORIG_RAX: ffffffffffffff13 CS: 0010 SS: 0018
#37 [ffffffff8cc03e40] __schedule at ffffffff8c20855b
#38 [ffffffff8cc03ea8] schedule_idle at ffffffff8c208dee
#39 [ffffffff8cc03eb0] do_idle at ffffffff8bad9fc5
#40 [ffffffff8cc03ef0] cpu_startup_entry at ffffffff8bada2af
#41 [ffffffff8cc03f10] start_kernel at ffffffff8d1b2055
#42 [ffffffff8cc03f50] secondary_startup_64 at ffffffff8ba000d5
crash> gdb bt
#0 crash_setup_regs (oldregs=<optimized out>, newregs=<optimized out>) at
./arch/x86/include/asm/processor.h:55
#1 __crash_kexec (regs=0x0) at kernel/kexec_core.c:945
#2 0xffffffff8baa6aa8 in panic (fmt=0xffffffff8ca8b1e1 "%s") at
kernel/panic.c:197
#3 0xffffffff8baa6c04 in nmi_panic (regs=<optimized out>, msg=<optimized
out>) at kernel/panic.c:121
#4 0xffffffff8bb79d94 in watchdog_overflow_callback (event=<optimized out>,
data=<optimized out>, regs=0xfffffe0000008ef8) at kernel/watchdog_hld.c:155
#5 0xffffffff8bbdc572 in __perf_event_overflow (event=0xffff974947c35800,
throttle=<optimized out>, data=0xfffffe0000008c40, regs=0xfffffe0000008ef8) at
kernel/events/core.c:7763
#6 0xffffffff8bbe7d40 in perf_event_overflow (event=<optimized out>,
data=<optimized out>, regs=<optimized out>) at kernel/events/core.c:7777
#7 0xffffffff8ba0cbc6 in intel_pmu_handle_irq (regs=<optimized out>) at
arch/x86/events/intel/core.c:2325
#8 0xffffffff8ba05ffd in perf_event_nmi_handler (cmd=<optimized out>,
regs=0xfffffe0000008ef8) at arch/x86/events/core.c:1511
#9 0xffffffff8ba20663 in nmi_handle (type=<optimized out>, regs=<optimized
out>) at arch/x86/kernel/nmi.c:137
#10 0xffffffff8ba20a0e in default_do_nmi (regs=0xfffffe0000008ef8) at
arch/x86/kernel/nmi.c:335
#11 0xffffffff8ba20bd2 in do_nmi (regs=0xfffffe0000008ef8, error_code=<optimized
out>) at arch/x86/kernel/nmi.c:521
#12 0xffffffff8c4014d8 in nmi () at arch/x86/entry/entry_64.S:1627
#13 0xffff97587d15fb45 in ?? ()
#14 0xffffffff8c899170 in ?? ()
The crash bt command presented the NMI exception stack as well as irq
stack, aka multiple stacks. Currently the gdb stack unwinding doesn't
support unwinding multiple stacks. You can see the gdb stack unwinding
only outputs the 1st stack, and fails at the boundary. I think this is
a feature which can be implemented later.
crash> sys
...
CPUS: 128
DATE: Tue Sep 11 09:26:50 EDT 2018
UPTIME: 00:03:33
LOAD AVERAGE: 83.25, 19.46, 6.46
TASKS: 2017
NODENAME: xxxx
RELEASE: 4.18.0-3.el8.x86_64
VERSION: #1 SMP Fri Aug 24 11:43:33 UTC 2018
MACHINE: x86_64 (2260 Mhz)
MEMORY: 512 GB
PANIC: "Kernel panic - not syncing: Hard LOCKUP"
crash>
The backtrace is different between the "bt" and "gdb bt".
case [2]:
crash> gdb bt
#0 0xffffffffa04e111c in ?? ()
#1 0xffffc9000654fa70 in ?? ()
crash> sys
...
CPUS: 2
DATE: Thu Sep 7 16:44:12 CST 2017
UPTIME: 01:33:24
LOAD AVERAGE: 6.61, 3.05, 1.80
TASKS: 222
NODENAME: xxx
RELEASE: 4.11.0-22.el7a.x86_64
VERSION: #1 SMP Fri Aug 4 09:27:17 EDT 2017
MACHINE: x86_64 (1999 Mhz)
MEMORY: 4 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at
0000000000000fb9"
crash> bt
PID: 24709 TASK: ffff88013900bfc0 CPU: 1 COMMAND: "mkdir"
#0 [ffffc9000654f630] machine_kexec at ffffffff8105640b
#1 [ffffc9000654f690] __crash_kexec at ffffffff8113e0b2
#2 [ffffc9000654f760] crash_kexec at ffffffff8113e1ac
#3 [ffffc9000654f780] oops_end at ffffffff8102a461
#4 [ffffc9000654f7a8] no_context at ffffffff8106321e
#5 [ffffc9000654f808] __bad_area_nosemaphore at ffffffff8106355e
#6 [ffffc9000654f858] bad_area at ffffffff810a28c5
#7 [ffffc9000654f880] __do_page_fault at ffffffff81064073
#8 [ffffc9000654f8f0] trace_do_page_fault at ffffffff810641e3
#9 [ffffc9000654f928] do_async_page_fault at ffffffff8105df8a
#10 [ffffc9000654f940] async_page_fault at ffffffff8177eba8
[exception RIP: SMB2_open+1468]
RIP: ffffffffa04e111c RSP: ffffc9000654f9f0 RFLAGS: 00010282
RAX: ffff88000cd56f01 RBX: 0000000000000fb9 RCX: 00000000001848cd
RDX: 00000000001848cc RSI: ffff88000cd57c00 RDI: ffff88013a65e140
RBP: ffffc9000654faf8 R8: 0000000000021dd0 R9: ffffffff811c8827
R10: ffff88013fd21dd0 R11: ffffea0000335580 R12: ffff8800a5edf800
R13: 00000000fffffe00 R14: ffffc9000654fb10 R15: ffffc9000654fb18
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#11 [ffffc9000654fb00] smb2_query_symlink at ffffffffa04d982d [cifs]
#12 [ffffc9000654fbd0] cifs_get_link at ffffffffa04c5d2b [cifs]
#13 [ffffc9000654fc40] link_path_walk at ffffffff81272354
#14 [ffffc9000654fcb0] path_lookupat at ffffffff8127252d
#15 [ffffc9000654fcd8] filename_lookup at ffffffff812744ff
#16 [ffffc9000654fde8] user_path_at_empty at ffffffff812746b6
#17 [ffffc9000654fe10] vfs_statx at ffffffff812690f7
#18 [ffffc9000654fe70] SYSC_newstat at ffffffff8126965a
#19 [ffffc9000654ff18] sys_newstat at ffffffff81269b9e
#20 [ffffc9000654ff28] do_syscall_64 at ffffffff81003a47
RIP: 00007fd9efba0105 RSP: 00007fff171dea58 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007fff171e0a1f RCX: 00007fd9efba0105
RDX: 00007fff171dea90 RSI: 00007fff171dea90 RDI: 00007fff171e0a7a
RBP: 00007fff171dec80 R8: 00000000000001ff R9: 00000000004029f0
R10: 00007fff171de770 R11: 0000000000000246 R12: 0000000000000011
R13: 0000000000402c40 R14: 00007fff171decd0 R15: 00000000000001ff
ORIG_RAX: 0000000000000004 CS: 0033 SS: 002b
crash>
This is not a bug to me. The address "#0 0xffffffffa04e111c in ?? ()"
belongs to cifs module. See "exception RIP: SMB2_open+1468". So if we
load the cifs kernel module, the gdb bt can work with no problem.
case [3](2.6 kernel):
crash> bt
PID: 0 TASK: ffffffff8173a0c0 CPU: 0 COMMAND: "swapper"
[exception RIP: native_safe_halt+11]
RIP: ffffffff8103baab RSP: ffffffff81717ec8 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff81717fd8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff81a181e8
RBP: ffffffff81717ec8 R8: 0000000000000000 R9: ffff88000a211e88
R10: 0000000000000000 R11: 0000000000fffcb4 R12: ffffffff818aa220
R13: 0000000000000000 R14: ffffffffffffffff R15: 0000000000093780
CS: 0010 SS: 0018
#0 [ffffffff81717ed0] default_idle at ffffffff8101bd4d
#1 [ffffffff81717ef0] cpu_idle at ffffffff81011e96
crash> gdb bt
crash>
Hard to say, since there is no output by "gdb bt", I don't know why
it
fails. Better to have a debug on the vmcore for inspection.
Thanks,
Tao Liu
Thanks
Lianbo
>
>> I think we can post an individual patch to deal with this issue. Since
>> there are plenty of places in crash which use the old
>> STACK_FRAME_OVERHEAD value, maybe they all need to be updated.
>>
>
> Please go ahead.
>
> Thanks
> Lianbo
>
>>
>> Thanks,
>> Tao Liu
>> >
>> > Anyway, I did the same test on x86 64 and aarch64, it can work well as
expected. Can you help to double check on ppc64 architecture?
>> >
>> > X86 64:
>> > crash> set 14599
>> > PID: 14599
>> > COMMAND: "pickup"
>> > TASK: ffff8f57a0d7c180 [THREAD_INFO: ffff8f57a0d7c180]
>> > CPU: 41
>> > STATE: TASK_INTERRUPTIBLE
>> > crash> gdb bt
>> > #0 0xffffffff8b3efe29 in context_switch (rq=0xffff8f6f1f835900,
prev=0xffff8f57a0d7c180, next=0xffff8f5786720000, rf=0xffff9df22fea7b80) at
kernel/sched/core.c:5208
>> > #1 __schedule (sched_mode=sched_mode@entry=0) at kernel/sched/core.c:6549
>> > #2 0xffffffff8b3f0217 in __schedule_loop (sched_mode=<optimized out>)
at kernel/sched/core.c:6626
>> > #3 schedule () at kernel/sched/core.c:6641
>> > #4 0xffffffff8b3f6eef in schedule_hrtimeout_range_clock
(expires=expires@entry=0xffff9df22fea7cb0, delta=<optimized out>,
delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS, clock_id=clock_id@entry=1) at
kernel/time/hrtimer.c:2293
>> > #5 0xffffffff8b3f7003 in schedule_hrtimeout_range
(expires=expires@entry=0xffff9df22fea7cb0, delta=delta@entry=99999999,
mode=mode@entry=HRTIMER_MODE_ABS) at kernel/time/hrtimer.c:2340
>> > #6 0xffffffff8aae301c in ep_poll (ep=0xffff8f5790d15d40,
events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100,
timeout=timeout@entry=0xffff9df22fea7d58) at fs/eventpoll.c:2062
>> > #7 0xffffffff8aae3138 in do_epoll_wait (epfd=epfd@entry=8,
events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100, to=0xffff9df22fea7d58)
at fs/eventpoll.c:2464
>> > #8 0xffffffff8aae44a1 in __do_sys_epoll_wait (epfd=<optimized out>,
events=0x7ffea91b6b90, maxevents=<optimized out>, timeout=<optimized out>) at
fs/eventpoll.c:2476
>> > #9 __se_sys_epoll_wait (epfd=<optimized out>, events=<optimized
out>, maxevents=<optimized out>, timeout=<optimized out>) at
fs/eventpoll.c:2471
>> > #10 __x64_sys_epoll_wait (regs=<optimized out>) at
fs/eventpoll.c:2471
>> > #11 0xffffffff8b3e293d in do_syscall_x64 (regs=0xffff9df22fea7f48, nr=232)
at arch/x86/entry/common.c:52
>> > #12 do_syscall_64 (regs=0xffff9df22fea7f48, nr=232) at
arch/x86/entry/common.c:83
>> > #13 0xffffffff8b40012f in entry_SYSCALL_64 () at
arch/x86/entry/entry_64.S:121
>> > crash>
>> >
>> >
>> > aarch64:
>> > crash> set 9338
>> > PID: 9338
>> > COMMAND: "pickup"
>> > TASK: ffff0000c7b05400 [THREAD_INFO: ffff0000c7b05400]
>> > CPU: 3
>> > STATE: TASK_INTERRUPTIBLE
>> > crash> gdb bt
>> > #0 __switch_to (prev=<unavailable>, prev@entry=0xffff0000c7b05400,
next=next@entry=<unavailable>) at arch/arm64/kernel/process.c:555
>> > #1 0xffffafc5b5ebd744 in context_switch (rq=0xffff00077bbd0ec0,
prev=0xffff0000c7b05400, next=<unavailable>, rf=0xffff80008ac63a60) at
kernel/sched/core.c:5208
>> > #2 __schedule (sched_mode=sched_mode@entry=0) at kernel/sched/core.c:6549
>> > #3 0xffffafc5b5ebdc2c in __schedule_loop (sched_mode=<optimized out>)
at kernel/sched/core.c:6626
>> > #4 schedule () at kernel/sched/core.c:6641
>> > #5 0xffffafc5b5ec6030 in schedule_hrtimeout_range_clock
(expires=expires@entry=0xffff80008ac63be8, delta=delta@entry=99999999,
mode=mode@entry=HRTIMER_MODE_ABS, clock_id=clock_id@entry=1) at
kernel/time/hrtimer.c:2293
>> > #6 0xffffafc5b5ec618c in schedule_hrtimeout_range
(expires=expires@entry=0xffff80008ac63be8, delta=delta@entry=99999999,
mode=mode@entry=HRTIMER_MODE_ABS) at kernel/time/hrtimer.c:2340
>> > #7 0xffffafc5b545d33c in ep_poll (ep=<unavailable>,
events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100,
timeout=timeout@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2062
>> > #8 0xffffafc5b545d4e4 in do_epoll_wait (epfd=epfd@entry=8,
events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100,
to=to@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2464
>> > #9 0xffffafc5b545d534 in do_epoll_pwait (epfd=epfd@entry=8,
events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100,
to=to@entry=0xffff80008ac63ce0, sigsetsize=<optimized out>, sigmask=<optimized
out>) at fs/eventpoll.c:2498
>> > #10 0xffffafc5b545e7c8 in do_epoll_pwait (epfd=8, events=0xffffde5c3f68,
maxevents=100, to=0xffff80008ac63ce0, sigmask=<optimized out>,
sigsetsize=<optimized out>) at fs/eventpoll.c:2495
>> > #11 __do_sys_epoll_pwait (epfd=8, events=0xffffde5c3f68, maxevents=100,
timeout=<optimized out>, sigmask=<optimized out>, sigsetsize=<optimized
out>) at fs/eventpoll.c:2511
>> > #12 __se_sys_epoll_pwait (epfd=8, events=281474412330856, maxevents=100,
timeout=<optimized out>, sigmask=<optimized out>, sigsetsize=<optimized
out>) at fs/eventpoll.c:2505
>> > #13 __arm64_sys_epoll_pwait (regs=<optimized out>) at
fs/eventpoll.c:2505
>> > #14 0xffffafc5b4fa99bc in __invoke_syscall (regs=0xffff80008ac63eb0,
syscall_fn=<optimized out>) at arch/arm64/kernel/syscall.c:35
>> > #15 invoke_syscall (regs=regs@entry=0xffff80008ac63eb0, scno=<optimized
out>, sc_nr=sc_nr@entry=463, syscall_table=<optimized out>) at
arch/arm64/kernel/syscall.c:49
>> > #16 0xffffafc5b4fa9ac8 in el0_svc_common (sc_nr=463,
syscall_table=<optimized out>, regs=0xffff80008ac63eb0, scno=<optimized out>)
at arch/arm64/kernel/syscall.c:132
>> > #17 do_el0_svc (regs=regs@entry=0xffff80008ac63eb0) at
arch/arm64/kernel/syscall.c:151
>> > #18 0xffffafc5b5eb6fa4 in el0_svc (regs=0xffff80008ac63eb0) at
arch/arm64/kernel/entry-common.c:712
>> > #19 0xffffafc5b5eb74c0 in el0t_64_sync_handler (regs=<optimized out>)
at arch/arm64/kernel/entry-common.c:730
>> > #20 0xffffafc5b4f91634 in el0t_64_sync () at arch/arm64/kernel/entry.S:598
>> > crash>
>> >
>> > BTW: other changes are fine to me.
>> >
>> > Thanks
>> > Lianbo
>> >
>> > On Wed, Sep 4, 2024 at 3:54 PM
<devel-request(a)lists.crash-utility.osci.io> wrote:
>> >>
>> >> Date: Wed, 4 Sep 2024 19:49:25 +1200
>> >> From: Tao Liu <ltao(a)redhat.com>
>> >> Subject: [Crash-utility] [PATCH v7 00/15] gdb stack unwinding support
>> >> for crash utility
>> >> To: devel(a)lists.crash-utility.osci.io
>> >> Cc: Tao Liu <ltao(a)redhat.com>
>> >> Message-ID: <20240904074940.21331-1-ltao(a)redhat.com>
>> >> Content-Type: text/plain; charset=UTF-8
>> >>
>> >> This patchset is a rebase/merged version of the following 3 patchsets:
>> >>
>> >> 1): [PATCH v10 0/5] Improve stack unwind on ppc64 [1]
>> >> 2): [PATCH 0/5] x86_64 gdb stack unwinding support [2]
>> >> 3): Clean up on top of one-thread-v2 [3]
>> >>
>> >> A complete description of gdb stack unwinding support for crash can be
>> >> found in [1].
>> >>
>> >> This patchset can be divided into the following 3 parts:
>> >>
>> >> 1) part1: preparations before stack unwinding support, some
>> >> bugs/regressions found when drafting this patchset.
>> >> 2) part2: common part for all CPU archs, mainly dealing with
>> >> crash_target.c/gdb_interface.c files, in order to
>> >> support different archs.
>> >> 3) part3: arch specific, for each ppc64/x86_64/arm64/vmware
>> >> stack unwinding support.
>> >>
>> >> === part 3
>> >> arm64: Add gdb stack unwinding support
>> >> vmware_guestdump: Various format versions support
>> >> x86_64: Add gdb stack unwinding support
>> >> ppc64: correct gdb passthroughs by implementing
machdep->get_current_task_reg
>> >>
>> >> === part 2
>> >> Conditionally output gdb stack unwinding stop reasons
>> >> Stop stack unwinding at non-kernel address
>> >> Print task pid/command instead of CPU index
>> >> Rename get_cpu_reg to get_current_task_reg
>> >> Let crash change gdb context
>> >> Leave only one gdb thread for crash
>> >> Remove 'frame' from prohibited commands list
>> >>
>> >> === part 1
>> >> Fix gdb_interface: restore gdb's output streams at end of
gdb_interface
>> >> x86_64: Fix invalid input "=>" for bt command
>> >> Fix cpumask_t recursive dependence issue
>> >> Fix the regression of cpumask_t for xen hyper
>> >> ===
>> >>
>> >> v7 -> v6:
>> >> 1) Reorganise the patchset, re-divided them into 3 part against the
>> >> previous 2 parts.
>> >> 2) Re-dealed with the cpumask_t part, which solved the comment No.4
>> >> pointed out by lianbo in [4].
>> >> 3) Add conditional output for the failing message of gdb stack
unwinding.
>> >> see [PATCH 11/15] Conditionally output gdb stack unwinding stop
reasons
>> >> 4) Redraft the commit messages, updated some outdated info.
>> >> 5) Merged "Let crash change gdb context" and
"set_context(): check if
>> >> context is already current" into one.
>> >>
>> >> [4]:
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg01067.html
>> >>
>> >> v6 -> v5:
>> >> 1) Refactor patch 4 & 9, which changed the function signature of
struct
>> >> get_cpu_reg/get_current_task_reg, and let each patch compile with no
>> >> error when added on.
>> >> 2) Rebased the patchset on top of latest upstream:
>> >> ("79b93ecb2e72ec Fix a "Bus error" issue caused by
'crash --osrelease' or
>> >> crash loading")
>> >>
>> >> v5 -> v4:
>> >> 1) Plenty of code refactoring based on Lianbo's comments on v4.
>> >> 2) Removed the magic number when dealing with regs bitmap, see [6].
>> >> 3) Rebased the patchset on top of latest upstream:
>> >> ("1c6da3eaff8207 arm64: Fix bt command show wrong stacktrace on
ramdump source")
>> >>
>> >> v4 -> v3:
>> >> Fixed the author issue in [PATCH v3 06/16] Fix gdb_interface: restore
gdb's
>> >> output streams at end of gdb_interface.
>> >>
>> >> v3 -> v2:
>> >> 1) Updated CC list as pointed out in [4]
>> >> 2) Compiling issues as in [5]
>> >>
>> >> v2 -> v1:
>> >> 1) Added the patch: x86_64: Fix invalid input "=>" for bt
command,
>> >> thanks for Kazu's testing.
>> >> 2) Modify the patch: x86_64: Add gdb stack unwinding support, added the
>> >> pcp_save, spp_save and sp, for restoring the value in match of the
original
>> >> code logic.
>> >>
>> >> [1]:
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00469.html
>> >> [2]:
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00488.html
>> >> [3]:
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00554.html
>> >> [4]:
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00681.html
>> >> [5]:
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00715.html
>> >> [6]:
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00819.html
>> >>
>> >> Aditya Gupta (3):
>> >> Fix gdb_interface: restore gdb's output streams at end of
>> >> gdb_interface
>> >> Remove 'frame' from prohibited commands list
>> >> ppc64: correct gdb passthroughs by implementing
>> >> machdep->get_current_task_reg
>> >>
>> >> Alexey Makhalov (1):
>> >> vmware_guestdump: Various format versions support
>> >>
>> >> Tao Liu (11):
>> >> Fix the regression of cpumask_t for xen hyper
>> >> Fix cpumask_t recursive dependence issue
>> >> x86_64: Fix invalid input "=>" for bt command
>> >> Leave only one gdb thread for crash
>> >> Let crash change gdb context
>> >> Rename get_cpu_reg to get_current_task_reg
>> >> Print task pid/command instead of CPU index
>> >> Stop stack unwinding at non-kernel address
>> >> Conditionally output gdb stack unwinding stop reasons
>> >> x86_64: Add gdb stack unwinding support
>> >> arm64: Add gdb stack unwinding support
>> >>
>> >> arm64.c | 120 +++++++++++++++--
>> >> crash_target.c | 71 ++++++----
>> >> defs.h | 194 ++++++++++++++++++++++++++-
>> >> gdb-10.2.patch | 96 ++++++++++++++
>> >> gdb_interface.c | 39 ++----
>> >> kernel.c | 63 +++++++--
>> >> ppc64.c | 174 +++++++++++++++++++++++-
>> >> symbols.c | 15 +++
>> >> task.c | 34 +++--
>> >> tools.c | 16 ++-
>> >> unwind_x86_64.h | 4 -
>> >> vmware_guestdump.c | 321 +++++++++++++++++++++++++++++++-------------
>> >> x86_64.c | 323 ++++++++++++++++++++++++++++++++++++++++-----
>> >> 13 files changed, 1247 insertions(+), 223 deletions(-)
>> >>
>> >> --
>> >> 2.40.1
>>