On Wed, Sep 11, 2024 at 3:32 PM lijiang <lijiang@redhat.com> wrote:
On Wed, Sep 11, 2024 at 2:36 PM Tao Liu <ltao@redhat.com> wrote:
Hi lianbo,

On Wed, Sep 11, 2024 at 2:26 PM lijiang <lijiang@redhat.com> wrote:
>
> Hi, Tao
>
> Thank you for the update.
>
> The following patch is a regression issue, so I tend to discuss it as a separate patch.
> [PATCH v7 01/15] Fix the regression of cpumask_t for xen hyper

Can you also post v2 for this one? I have two comments about it:
[1] is it possible to not introduce the code related to hyper to a common module such as tools.c? 
[2] for IA64 arch, I saw the machdep->get_irq_affinity = generic_get_irq_affinity is registered (see the ia64_init())


>
> In addition, I found another issue in my tests(on ppc64le), the gdb bt can display the back trace for the panic task, but when I switch to another task, the gdb bt can not display the back trace:
>
> crash> gdb bt
> #0  0xc0000000002bde04 in crash_setup_regs (newregs=0xc00000003264b858, oldregs=0x0) at ./arch/powerpc/include/asm/kexec.h:133
> #1  0xc0000000002be4f8 in __crash_kexec (regs=0x0) at kernel/crash_core.c:122
> #2  0xc00000000016c254 in panic (fmt=0xc0000000015eef20 "sysrq triggered crash\n") at kernel/panic.c:373
> #3  0xc000000000a708b8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:154
> #4  0xc000000000a713d4 in __handle_sysrq (key=key@entry=99 'c', check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:612
> #5  0xc000000000a71e94 in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>, count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1181
> #6  0xc00000000073260c in pde_write (pde=0xc00000000af9cc00, file=<optimized out>, buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:334
> #7  proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:346
> #8  0xc00000000063c0e0 in vfs_write (file=0xc0000000092d2900, buf=0x10012536f60 <error: Cannot access memory at address 0x10012536f60>, count=2, pos=0xc00000003264bd30) at fs/read_write.c:588
> #9  vfs_write (file=0xc0000000092d2900, buf=0x10012536f60 <error: Cannot access memory at address 0x10012536f60>, count=<optimized out>, pos=0xc00000003264bd30) at fs/read_write.c:570
> #10 0xc00000000063c690 in ksys_write (fd=<optimized out>, buf=0x10012536f60 <error: Cannot access memory at address 0x10012536f60>, count=2) at fs/read_write.c:643
> #11 0xc000000000031a28 in system_call_exception (regs=0xc00000003264be80, r0=<optimized out>) at arch/powerpc/kernel/syscall.c:153
> #12 0xc00000000000d05c in system_call_vectored_common () at arch/powerpc/kernel/interrupt_64.S:198
>
> crash> ps
>       PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
>         0       0   0  c000000002bda980  RU   0.0        0        0  [swapper/0]
> >       0       0   1  c000000003864c80  RU   0.0        0        0  [swapper/1]
> ...
>      8017     923   0  c000000043a20000  IN   0.2    22528    16256  sshd-session
>      8025    8017   6  c000000032271880  IN   0.1    22784    11840  sshd-session
> >    8026    8025   0  c000000043a26600  RU   0.1     9664     6208  bash
> ...
>     11645       2   3  c000000032264c80  ID   0.0        0        0  [kworker/u32:2]
>     11738    6188   2  c00000003811b180  IN   0.1    43520     9408  pickup
>     12326       2   0  c00000003226b280  ID   0.0        0        0  [kworker/0:1]
>     13112    6089   2  c00000000c809900  IN   0.0     7232     3456  sleep
>
> Let's take the "pickup" task as an example:
>
> crash> set 11738
>     PID: 11738
> COMMAND: "pickup"
>    TASK: c00000003811b180  [THREAD_INFO: c00000003811b180]
>     CPU: 2
>   STATE: TASK_INTERRUPTIBLE
>
> crash> gdb bt
> #0  0xc0000000a7f876a0 in ?? ()
> gdb: gdb request failed: bt
> crash> set gdb on
> gdb: on
> gdb> bt
> #0  0xc0000000a7f876a0 in ?? ()
> gdb>

There is a bug for ppc64 crash of newer version kernel. The code for
determining the address of pt_regs from stack is outdated, see the
following code from crash:

ppc64.c:get_ppc64_frame()
readmem(sp+STACK_FRAME_OVERHEAD, KVADDR, &regs, sizeof(struct
ppc64_pt_regs), "PPC64 pt_regs", FAULT_ON_ERROR);

The pt_regs is expected to be placed at sp+STACK_FRAME_OVERHEAD, aka sp+112.

However since kernel >= v6.2, the value is no longer appropriate:

linux kernel:arch/powerpc/kernel/process.c:copy_thread():
kregs = (struct pt_regs *)(sp + STACK_SWITCH_FRAME_REGS);
p->thread.ksp = sp;

linux kernel:arch/powerpc/include/asm/ptrace.h:
#ifdef CONFIG_PPC64_ELF_ABI_V2
#define STACK_FRAME_MIN_SIZE 32
STACK_SWITCH_FRAME_REGS (STACK_FRAME_MIN_SIZE + 16)


Good findings, Tao.
 
If we apply the change to crash, i.e:
readmem(sp+0x30, KVADDR, &regs, sizeof(struct ppc64_pt_regs), "PPC64
pt_regs", FAULT_ON_ERROR);

The stack unwinding can work as expected, you can have a test locally
to see if the above change works for you.

So this bug isn't related to the gdb stack unwinding support to me,
just a bug relating to a newer version of kernel.


Agree.  For the [PATCH V7 02/15] -[PATCH V7 15/15]:  Ack.

And I will put them in the merging queue, once the current issue gets resolved, we can merge them together. Otherwise it may not work on ppc64 arch.

BTW: another issue is observed when I do some tests, the "gdb bt" command does not work well on old kernels, furthermore it can not display any backtrace on the 2.6 kernels.

Given that, can we state on which kernel version gdb stack unwinding works well in the patch log? Or can it be improved in the next step?

case [1]:

crash> bt
PID: 0        TASK: ffffffff8cc10740  CPU: 0    COMMAND: "swapper/0"
 #0 [fffffe0000008a10] machine_kexec at ffffffff8ba5176e
 #1 [fffffe0000008a68] __crash_kexec at ffffffff8bb4776d
 #2 [fffffe0000008b30] panic at ffffffff8baa6aa8
 #3 [fffffe0000008bb8] watchdog_overflow_callback.cold.7 at ffffffff8bb79d94
 #4 [fffffe0000008bc8] __perf_event_overflow at ffffffff8bbdc572
 #5 [fffffe0000008bf8] intel_pmu_handle_irq at ffffffff8ba0cbc6
 #6 [fffffe0000008e40] perf_event_nmi_handler at ffffffff8ba05ffd
 #7 [fffffe0000008e58] nmi_handle at ffffffff8ba20663
 #8 [fffffe0000008eb0] default_do_nmi at ffffffff8ba20a0e
 #9 [fffffe0000008ed0] do_nmi at ffffffff8ba20bd2
#10 [fffffe0000008ef0] end_repeat_nmi at ffffffff8c4014d8
    [exception RIP: cfb_imageblit+1102]
    RIP: ffffffff8be751be  RSP: ffff97587fa036e8  RFLAGS: 00000046
    RAX: 0000000000000000  RBX: ffffa4c120650cf0  RCX: 0000000000000006
    RDX: 000000000000000e  RSI: 0000000000000000  RDI: ffffa4c120650c00
    RBP: 0000000000000002   R8: ffff97587d15fb5b   R9: 00000000ad55ad55
    R10: 0000000000000000  R11: ffffa4c120650c04  R12: 0000000000000800
    R13: 0000000000000003  R14: ffffffff8c899170  R15: ffff97587d15fb45
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
#11 [ffff97587fa036e8] cfb_imageblit at ffffffff8be751be
#12 [ffff97587fa03750] drm_fb_helper_cfb_imageblit at ffffffffc04c0e12 [drm_kms_helper]
#13 [ffff97587fa03768] bit_putcs at ffffffff8be6f6f1
#14 [ffff97587fa03870] fbcon_putcs at ffffffff8be6baa9
#15 [ffff97587fa038c8] fbcon_redraw at ffffffff8be6bd1d
#16 [ffff97587fa03928] fbcon_scroll at ffffffff8be6d434
#17 [ffff97587fa03988] con_scroll at ffffffff8befe900
#18 [ffff97587fa039c0] lf at ffffffff8befe9b0
#19 [ffff97587fa039e8] vt_console_print at ffffffff8bf00816
...skipping...
#22 [ffff97587fa03af0] printk at ffffffff8bb09186
#23 [ffff97587fa03b50] irq_work_run_list at ffffffff8bbbca5d
#24 [ffff97587fa03b78] irq_work_run at ffffffff8bbbca94
#25 [ffff97587fa03b80] smp_irq_work_interrupt at ffffffff8c401e02
#26 [ffff97587fa03b90] irq_work_interrupt at ffffffff8c401bbf
#27 [ffff97587fa03ba0] irq_work_interrupt at ffffffff8c401bba
#28 [ffff97587fa03c28] update_group_capacity at ffffffff8bae4a2f
#29 [ffff97587fa03c80] find_busiest_group at ffffffff8bae4c6e
#30 [ffff97587fa03e10] load_balance at ffffffff8bae57eb
#31 [ffff97587fa03f00] rebalance_domains at ffffffff8bae62ca
#32 [ffff97587fa03f68] __softirqentry_text_start at ffffffff8c6000e3
#33 [ffff97587fa03fc8] irq_exit at ffffffff8baacaf5
#34 [ffff97587fa03fd8] smp_apic_timer_interrupt at ffffffff8c40241c
#35 [ffff97587fa03ff0] apic_timer_interrupt at ffffffff8c401a7f
--- <IRQ stack> ---
#36 [ffffffff8cc03d68] apic_timer_interrupt at ffffffff8c401a7f
    [exception RIP: finish_task_switch+125]
    RIP: ffffffff8bad131d  RSP: ffffffff8cc03e10  RFLAGS: 00000246
    RAX: ffff97587eb18000  RBX: ffffffff8cc10740  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff97587fa22940
    RBP: ffffffff8cc03e38   R8: 000001e44b0102cc   R9: 0000000000000002
    R10: 0000000000000040  R11: 0000000000000000  R12: ffff97587fa22940
    R13: ffff97587f3c7380  R14: ffff97587eb18000  R15: 0000000000000402
    ORIG_RAX: ffffffffffffff13  CS: 0010  SS: 0018
#37 [ffffffff8cc03e40] __schedule at ffffffff8c20855b
#38 [ffffffff8cc03ea8] schedule_idle at ffffffff8c208dee
#39 [ffffffff8cc03eb0] do_idle at ffffffff8bad9fc5
#40 [ffffffff8cc03ef0] cpu_startup_entry at ffffffff8bada2af
#41 [ffffffff8cc03f10] start_kernel at ffffffff8d1b2055
#42 [ffffffff8cc03f50] secondary_startup_64 at ffffffff8ba000d5

crash> gdb bt
#0  crash_setup_regs (oldregs=<optimized out>, newregs=<optimized out>) at ./arch/x86/include/asm/processor.h:55
#1  __crash_kexec (regs=0x0) at kernel/kexec_core.c:945
#2  0xffffffff8baa6aa8 in panic (fmt=0xffffffff8ca8b1e1 "%s") at kernel/panic.c:197
#3  0xffffffff8baa6c04 in nmi_panic (regs=<optimized out>, msg=<optimized out>) at kernel/panic.c:121
#4  0xffffffff8bb79d94 in watchdog_overflow_callback (event=<optimized out>, data=<optimized out>, regs=0xfffffe0000008ef8) at kernel/watchdog_hld.c:155
#5  0xffffffff8bbdc572 in __perf_event_overflow (event=0xffff974947c35800, throttle=<optimized out>, data=0xfffffe0000008c40, regs=0xfffffe0000008ef8) at kernel/events/core.c:7763
#6  0xffffffff8bbe7d40 in perf_event_overflow (event=<optimized out>, data=<optimized out>, regs=<optimized out>) at kernel/events/core.c:7777
#7  0xffffffff8ba0cbc6 in intel_pmu_handle_irq (regs=<optimized out>) at arch/x86/events/intel/core.c:2325
#8  0xffffffff8ba05ffd in perf_event_nmi_handler (cmd=<optimized out>, regs=0xfffffe0000008ef8) at arch/x86/events/core.c:1511
#9  0xffffffff8ba20663 in nmi_handle (type=<optimized out>, regs=<optimized out>) at arch/x86/kernel/nmi.c:137
#10 0xffffffff8ba20a0e in default_do_nmi (regs=0xfffffe0000008ef8) at arch/x86/kernel/nmi.c:335
#11 0xffffffff8ba20bd2 in do_nmi (regs=0xfffffe0000008ef8, error_code=<optimized out>) at arch/x86/kernel/nmi.c:521
#12 0xffffffff8c4014d8 in nmi () at arch/x86/entry/entry_64.S:1627
#13 0xffff97587d15fb45 in ?? ()
#14 0xffffffff8c899170 in ?? ()

crash> sys
...
        CPUS: 128
        DATE: Tue Sep 11 09:26:50 EDT 2018
      UPTIME: 00:03:33
LOAD AVERAGE: 83.25, 19.46, 6.46
       TASKS: 2017
    NODENAME: xxxx
     RELEASE: 4.18.0-3.el8.x86_64
     VERSION: #1 SMP Fri Aug 24 11:43:33 UTC 2018
     MACHINE: x86_64  (2260 Mhz)
      MEMORY: 512 GB
       PANIC: "Kernel panic - not syncing: Hard LOCKUP"
crash>

The backtrace is different between the "bt" and "gdb bt".


case [2]:

crash> gdb bt
#0  0xffffffffa04e111c in ?? ()
#1  0xffffc9000654fa70 in ?? ()
crash> sys
...
        CPUS: 2
        DATE: Thu Sep  7 16:44:12 CST 2017
      UPTIME: 01:33:24
LOAD AVERAGE: 6.61, 3.05, 1.80
       TASKS: 222
    NODENAME: xxx
     RELEASE: 4.11.0-22.el7a.x86_64
     VERSION: #1 SMP Fri Aug 4 09:27:17 EDT 2017
     MACHINE: x86_64  (1999 Mhz)
      MEMORY: 4 GB
       PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000fb9"
crash> bt
PID: 24709    TASK: ffff88013900bfc0  CPU: 1    COMMAND: "mkdir"
 #0 [ffffc9000654f630] machine_kexec at ffffffff8105640b
 #1 [ffffc9000654f690] __crash_kexec at ffffffff8113e0b2
 #2 [ffffc9000654f760] crash_kexec at ffffffff8113e1ac
 #3 [ffffc9000654f780] oops_end at ffffffff8102a461
 #4 [ffffc9000654f7a8] no_context at ffffffff8106321e
 #5 [ffffc9000654f808] __bad_area_nosemaphore at ffffffff8106355e
 #6 [ffffc9000654f858] bad_area at ffffffff810a28c5
 #7 [ffffc9000654f880] __do_page_fault at ffffffff81064073
 #8 [ffffc9000654f8f0] trace_do_page_fault at ffffffff810641e3
 #9 [ffffc9000654f928] do_async_page_fault at ffffffff8105df8a
#10 [ffffc9000654f940] async_page_fault at ffffffff8177eba8
    [exception RIP: SMB2_open+1468]
    RIP: ffffffffa04e111c  RSP: ffffc9000654f9f0  RFLAGS: 00010282
    RAX: ffff88000cd56f01  RBX: 0000000000000fb9  RCX: 00000000001848cd
    RDX: 00000000001848cc  RSI: ffff88000cd57c00  RDI: ffff88013a65e140
    RBP: ffffc9000654faf8   R8: 0000000000021dd0   R9: ffffffff811c8827
    R10: ffff88013fd21dd0  R11: ffffea0000335580  R12: ffff8800a5edf800
    R13: 00000000fffffe00  R14: ffffc9000654fb10  R15: ffffc9000654fb18
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#11 [ffffc9000654fb00] smb2_query_symlink at ffffffffa04d982d [cifs]
#12 [ffffc9000654fbd0] cifs_get_link at ffffffffa04c5d2b [cifs]
#13 [ffffc9000654fc40] link_path_walk at ffffffff81272354
#14 [ffffc9000654fcb0] path_lookupat at ffffffff8127252d
#15 [ffffc9000654fcd8] filename_lookup at ffffffff812744ff
#16 [ffffc9000654fde8] user_path_at_empty at ffffffff812746b6
#17 [ffffc9000654fe10] vfs_statx at ffffffff812690f7
#18 [ffffc9000654fe70] SYSC_newstat at ffffffff8126965a
#19 [ffffc9000654ff18] sys_newstat at ffffffff81269b9e
#20 [ffffc9000654ff28] do_syscall_64 at ffffffff81003a47
    RIP: 00007fd9efba0105  RSP: 00007fff171dea58  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007fff171e0a1f  RCX: 00007fd9efba0105
    RDX: 00007fff171dea90  RSI: 00007fff171dea90  RDI: 00007fff171e0a7a
    RBP: 00007fff171dec80   R8: 00000000000001ff   R9: 00000000004029f0
    R10: 00007fff171de770  R11: 0000000000000246  R12: 0000000000000011
    R13: 0000000000402c40  R14: 00007fff171decd0  R15: 00000000000001ff
    ORIG_RAX: 0000000000000004  CS: 0033  SS: 002b
crash>


case [3](2.6 kernel):

crash> bt
PID: 0        TASK: ffffffff8173a0c0  CPU: 0    COMMAND: "swapper"
    [exception RIP: native_safe_halt+11]
    RIP: ffffffff8103baab  RSP: ffffffff81717ec8  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: ffffffff81717fd8  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000001  RDI: ffffffff81a181e8
    RBP: ffffffff81717ec8   R8: 0000000000000000   R9: ffff88000a211e88
    R10: 0000000000000000  R11: 0000000000fffcb4  R12: ffffffff818aa220
    R13: 0000000000000000  R14: ffffffffffffffff  R15: 0000000000093780
    CS: 0010  SS: 0018
 #0 [ffffffff81717ed0] default_idle at ffffffff8101bd4d
 #1 [ffffffff81717ef0] cpu_idle at ffffffff81011e96
crash> gdb bt
crash> 


Thanks
Lianbo


I think we can post an individual patch to deal with this issue. Since
there are plenty of places in crash which use the old
STACK_FRAME_OVERHEAD value, maybe they all need to be updated.


Please go ahead.

Thanks
Lianbo
 
Thanks,
Tao Liu
>
> Anyway, I did the same test on x86 64 and aarch64, it can work well as expected. Can you help to double check on ppc64 architecture?
>
> X86 64:
> crash> set 14599
>     PID: 14599
> COMMAND: "pickup"
>    TASK: ffff8f57a0d7c180  [THREAD_INFO: ffff8f57a0d7c180]
>     CPU: 41
>   STATE: TASK_INTERRUPTIBLE
> crash> gdb bt
> #0  0xffffffff8b3efe29 in context_switch (rq=0xffff8f6f1f835900, prev=0xffff8f57a0d7c180, next=0xffff8f5786720000, rf=0xffff9df22fea7b80) at kernel/sched/core.c:5208
> #1  __schedule (sched_mode=sched_mode@entry=0) at kernel/sched/core.c:6549
> #2  0xffffffff8b3f0217 in __schedule_loop (sched_mode=<optimized out>) at kernel/sched/core.c:6626
> #3  schedule () at kernel/sched/core.c:6641
> #4  0xffffffff8b3f6eef in schedule_hrtimeout_range_clock (expires=expires@entry=0xffff9df22fea7cb0, delta=<optimized out>, delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS, clock_id=clock_id@entry=1) at kernel/time/hrtimer.c:2293
> #5  0xffffffff8b3f7003 in schedule_hrtimeout_range (expires=expires@entry=0xffff9df22fea7cb0, delta=delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS) at kernel/time/hrtimer.c:2340
> #6  0xffffffff8aae301c in ep_poll (ep=0xffff8f5790d15d40, events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100, timeout=timeout@entry=0xffff9df22fea7d58) at fs/eventpoll.c:2062
> #7  0xffffffff8aae3138 in do_epoll_wait (epfd=epfd@entry=8, events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100, to=0xffff9df22fea7d58) at fs/eventpoll.c:2464
> #8  0xffffffff8aae44a1 in __do_sys_epoll_wait (epfd=<optimized out>, events=0x7ffea91b6b90, maxevents=<optimized out>, timeout=<optimized out>) at fs/eventpoll.c:2476
> #9  __se_sys_epoll_wait (epfd=<optimized out>, events=<optimized out>, maxevents=<optimized out>, timeout=<optimized out>) at fs/eventpoll.c:2471
> #10 __x64_sys_epoll_wait (regs=<optimized out>) at fs/eventpoll.c:2471
> #11 0xffffffff8b3e293d in do_syscall_x64 (regs=0xffff9df22fea7f48, nr=232) at arch/x86/entry/common.c:52
> #12 do_syscall_64 (regs=0xffff9df22fea7f48, nr=232) at arch/x86/entry/common.c:83
> #13 0xffffffff8b40012f in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:121
> crash>
>
>
> aarch64:
> crash> set 9338
>     PID: 9338
> COMMAND: "pickup"
>    TASK: ffff0000c7b05400  [THREAD_INFO: ffff0000c7b05400]
>     CPU: 3
>   STATE: TASK_INTERRUPTIBLE
> crash> gdb bt
> #0  __switch_to (prev=<unavailable>, prev@entry=0xffff0000c7b05400, next=next@entry=<unavailable>) at arch/arm64/kernel/process.c:555
> #1  0xffffafc5b5ebd744 in context_switch (rq=0xffff00077bbd0ec0, prev=0xffff0000c7b05400, next=<unavailable>, rf=0xffff80008ac63a60) at kernel/sched/core.c:5208
> #2  __schedule (sched_mode=sched_mode@entry=0) at kernel/sched/core.c:6549
> #3  0xffffafc5b5ebdc2c in __schedule_loop (sched_mode=<optimized out>) at kernel/sched/core.c:6626
> #4  schedule () at kernel/sched/core.c:6641
> #5  0xffffafc5b5ec6030 in schedule_hrtimeout_range_clock (expires=expires@entry=0xffff80008ac63be8, delta=delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS, clock_id=clock_id@entry=1) at kernel/time/hrtimer.c:2293
> #6  0xffffafc5b5ec618c in schedule_hrtimeout_range (expires=expires@entry=0xffff80008ac63be8, delta=delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS) at kernel/time/hrtimer.c:2340
> #7  0xffffafc5b545d33c in ep_poll (ep=<unavailable>, events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100, timeout=timeout@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2062
> #8  0xffffafc5b545d4e4 in do_epoll_wait (epfd=epfd@entry=8, events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100, to=to@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2464
> #9  0xffffafc5b545d534 in do_epoll_pwait (epfd=epfd@entry=8, events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100, to=to@entry=0xffff80008ac63ce0, sigsetsize=<optimized out>, sigmask=<optimized out>) at fs/eventpoll.c:2498
> #10 0xffffafc5b545e7c8 in do_epoll_pwait (epfd=8, events=0xffffde5c3f68, maxevents=100, to=0xffff80008ac63ce0, sigmask=<optimized out>, sigsetsize=<optimized out>) at fs/eventpoll.c:2495
> #11 __do_sys_epoll_pwait (epfd=8, events=0xffffde5c3f68, maxevents=100, timeout=<optimized out>, sigmask=<optimized out>, sigsetsize=<optimized out>) at fs/eventpoll.c:2511
> #12 __se_sys_epoll_pwait (epfd=8, events=281474412330856, maxevents=100, timeout=<optimized out>, sigmask=<optimized out>, sigsetsize=<optimized out>) at fs/eventpoll.c:2505
> #13 __arm64_sys_epoll_pwait (regs=<optimized out>) at fs/eventpoll.c:2505
> #14 0xffffafc5b4fa99bc in __invoke_syscall (regs=0xffff80008ac63eb0, syscall_fn=<optimized out>) at arch/arm64/kernel/syscall.c:35
> #15 invoke_syscall (regs=regs@entry=0xffff80008ac63eb0, scno=<optimized out>, sc_nr=sc_nr@entry=463, syscall_table=<optimized out>) at arch/arm64/kernel/syscall.c:49
> #16 0xffffafc5b4fa9ac8 in el0_svc_common (sc_nr=463, syscall_table=<optimized out>, regs=0xffff80008ac63eb0, scno=<optimized out>) at arch/arm64/kernel/syscall.c:132
> #17 do_el0_svc (regs=regs@entry=0xffff80008ac63eb0) at arch/arm64/kernel/syscall.c:151
> #18 0xffffafc5b5eb6fa4 in el0_svc (regs=0xffff80008ac63eb0) at arch/arm64/kernel/entry-common.c:712
> #19 0xffffafc5b5eb74c0 in el0t_64_sync_handler (regs=<optimized out>) at arch/arm64/kernel/entry-common.c:730
> #20 0xffffafc5b4f91634 in el0t_64_sync () at arch/arm64/kernel/entry.S:598
> crash>
>
> BTW:  other changes are fine to me.
>
> Thanks
> Lianbo
>
> On Wed, Sep 4, 2024 at 3:54 PM <devel-request@lists.crash-utility.osci.io> wrote:
>>
>> Date: Wed,  4 Sep 2024 19:49:25 +1200
>> From: Tao Liu <ltao@redhat.com>
>> Subject: [Crash-utility] [PATCH v7 00/15] gdb stack unwinding support
>>         for crash utility
>> To: devel@lists.crash-utility.osci.io
>> Cc: Tao Liu <ltao@redhat.com>
>> Message-ID: <20240904074940.21331-1-ltao@redhat.com>
>> Content-Type: text/plain; charset=UTF-8
>>
>> This patchset is a rebase/merged version of the following 3 patchsets:
>>
>> 1): [PATCH v10 0/5] Improve stack unwind on ppc64 [1]
>> 2): [PATCH 0/5] x86_64 gdb stack unwinding support [2]
>> 3): Clean up on top of one-thread-v2 [3]
>>
>> A complete description of gdb stack unwinding support for crash can be
>> found in [1].
>>
>> This patchset can be divided into the following 3 parts:
>>
>> 1) part1: preparations before stack unwinding support, some
>>           bugs/regressions found when drafting this patchset.
>> 2) part2: common part for all CPU archs, mainly dealing with
>>           crash_target.c/gdb_interface.c files, in order to
>>           support different archs.
>> 3) part3: arch specific, for each ppc64/x86_64/arm64/vmware
>>           stack unwinding support.
>>
>> === part 3
>> arm64: Add gdb stack unwinding support
>> vmware_guestdump: Various format versions support
>> x86_64: Add gdb stack unwinding support
>> ppc64: correct gdb passthroughs by implementing machdep->get_current_task_reg
>>
>> === part 2
>> Conditionally output gdb stack unwinding stop reasons
>> Stop stack unwinding at non-kernel address
>> Print task pid/command instead of CPU index
>> Rename get_cpu_reg to get_current_task_reg
>> Let crash change gdb context
>> Leave only one gdb thread for crash
>> Remove 'frame' from prohibited commands list
>>
>> === part 1
>> Fix gdb_interface: restore gdb's output streams at end of gdb_interface
>> x86_64: Fix invalid input "=>" for bt command
>> Fix cpumask_t recursive dependence issue
>> Fix the regression of cpumask_t for xen hyper
>> ===
>>
>> v7 -> v6:
>> 1) Reorganise the patchset, re-divided them into 3 part against the
>>    previous 2 parts.
>> 2) Re-dealed with the cpumask_t part, which solved the comment No.4
>>    pointed out by lianbo in [4].
>> 3) Add conditional output for the failing message of gdb stack unwinding.
>>    see [PATCH 11/15] Conditionally output gdb stack unwinding stop reasons
>> 4) Redraft the commit messages, updated some outdated info.
>> 5) Merged "Let crash change gdb context" and "set_context(): check if
>>    context is already current" into one.
>>
>> [4]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg01067.html
>>
>> v6 -> v5:
>> 1) Refactor patch 4 & 9, which changed the function signature of struct
>>    get_cpu_reg/get_current_task_reg, and let each patch compile with no
>>    error when added on.
>> 2) Rebased the patchset on top of latest upstream:
>>    ("79b93ecb2e72ec Fix a "Bus error" issue caused by 'crash --osrelease' or
>>    crash loading")
>>
>> v5 -> v4:
>> 1) Plenty of code refactoring based on Lianbo's comments on v4.
>> 2) Removed the magic number when dealing with regs bitmap, see [6].
>> 3) Rebased the patchset on top of latest upstream:
>>    ("1c6da3eaff8207 arm64: Fix bt command show wrong stacktrace on ramdump source")
>>
>> v4 -> v3:
>> Fixed the author issue in [PATCH v3 06/16] Fix gdb_interface: restore gdb's
>> output streams at end of gdb_interface.
>>
>> v3 -> v2:
>> 1) Updated CC list as pointed out in [4]
>> 2) Compiling issues as in [5]
>>
>> v2 -> v1:
>> 1) Added the patch: x86_64: Fix invalid input "=>" for bt command,
>>    thanks for Kazu's testing.
>> 2) Modify the patch: x86_64: Add gdb stack unwinding support, added the
>>    pcp_save, spp_save and sp, for restoring the value in match of the original
>>    code logic.
>>
>> [1]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00469.html
>> [2]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00488.html
>> [3]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00554.html
>> [4]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00681.html
>> [5]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00715.html
>> [6]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00819.html
>>
>> Aditya Gupta (3):
>>   Fix gdb_interface: restore gdb's output streams at end of
>>     gdb_interface
>>   Remove 'frame' from prohibited commands list
>>   ppc64: correct gdb passthroughs by implementing
>>     machdep->get_current_task_reg
>>
>> Alexey Makhalov (1):
>>   vmware_guestdump: Various format versions support
>>
>> Tao Liu (11):
>>   Fix the regression of cpumask_t for xen hyper
>>   Fix cpumask_t recursive dependence issue
>>   x86_64: Fix invalid input "=>" for bt command
>>   Leave only one gdb thread for crash
>>   Let crash change gdb context
>>   Rename get_cpu_reg to get_current_task_reg
>>   Print task pid/command instead of CPU index
>>   Stop stack unwinding at non-kernel address
>>   Conditionally output gdb stack unwinding stop reasons
>>   x86_64: Add gdb stack unwinding support
>>   arm64: Add gdb stack unwinding support
>>
>>  arm64.c            | 120 +++++++++++++++--
>>  crash_target.c     |  71 ++++++----
>>  defs.h             | 194 ++++++++++++++++++++++++++-
>>  gdb-10.2.patch     |  96 ++++++++++++++
>>  gdb_interface.c    |  39 ++----
>>  kernel.c           |  63 +++++++--
>>  ppc64.c            | 174 +++++++++++++++++++++++-
>>  symbols.c          |  15 +++
>>  task.c             |  34 +++--
>>  tools.c            |  16 ++-
>>  unwind_x86_64.h    |   4 -
>>  vmware_guestdump.c | 321 +++++++++++++++++++++++++++++++-------------
>>  x86_64.c           | 323 ++++++++++++++++++++++++++++++++++++++++-----
>>  13 files changed, 1247 insertions(+), 223 deletions(-)
>>
>> --
>> 2.40.1