Hi all,
Thanks Lianbo, Tao, Alexey and Daisuke for your reviews on this series.
Feels amazing to finally see this merged !
Thank you Tao for collaborating on this for so many months !
Hope this helps many people, I have been pinged my multiple people in
dev and support teams that this information can help them classify the
issue into which subsystem the issue might be in.
Thanks again,
- Aditya Gupta
On 04/11/24 13:39, lijiang wrote:
> Thank you for working on this feature, Aditya, Tao and Alex. Great
> job! For the [PATCH v7 02/15 -15/15], rearranged them with minor
> changes: [1]
>
https: //github. com/crash-utility/crash/commit/21e0a345f97324b3472d573ed20ef098f0300fac
> [2]
>
https: //github. com/crash-utility/crash/commit/c4db469af091edd1ea0897fbce41bc175375314b
>
> Thank you for working on this feature, Aditya, Tao and Alex. Great job!
>
> For the [PATCH v7 02/15 -15/15], rearranged them with minor changes:
>
> [1]
>
https://github.com/crash-utility/crash/commit/21e0a345f97324b3472d573ed20...
> [2]
>
https://github.com/crash-utility/crash/commit/c4db469af091edd1ea0897fbce4...
> [3]
>
https://github.com/crash-utility/crash/commit/7c8a7dddda66b3d1043ba99516d...
> [4]
>
https://github.com/crash-utility/crash/commit/1fd80c623c205443fdd2a29b14c...
> [5]
>
https://github.com/crash-utility/crash/commit/6dfda0d2235574cf80530ea92e0...
> [6]
>
https://github.com/crash-utility/crash/commit/89ff1e45734457eb66905ef6567...
> [7]
>
https://github.com/crash-utility/crash/commit/968debd0d5979dd9ddca3af0766...
>
> BTW: there are still some known issues about this one, but not
> critical issues, so which can be fixed later.
>
> Reminder: the current patchset has changed some function interfaces,
> which may affect crash extensions.
>
> Thanks
> Lianbo
>
> On Wed, Sep 11, 2024 at 10:25 AM lijiang <lijiang(a)redhat.com> wrote:
>
> Hi, Tao
>
> Thank you for the update.
>
> The following patch is a regression issue, so I tend to discuss it
> as a separate patch.
> [PATCH v7 01/15] Fix the regression of cpumask_t for xen hyper
>
> In addition, I found another issue in my tests(on ppc64le), the
> gdb bt can display the back trace for the panic task, but when I
> switch to another task, the gdb bt can not display the back trace:
>
> crash> gdb bt
> #0 0xc0000000002bde04 in crash_setup_regs
> (newregs=0xc00000003264b858, oldregs=0x0) at
> ./arch/powerpc/include/asm/kexec.h:133
> #1 0xc0000000002be4f8 in __crash_kexec (regs=0x0) at
> kernel/crash_core.c:122
> #2 0xc00000000016c254 in panic (fmt=0xc0000000015eef20 "sysrq
> triggered crash\n") at kernel/panic.c:373
> #3 0xc000000000a708b8 in sysrq_handle_crash (key=<optimized out>)
> at drivers/tty/sysrq.c:154
> #4 0xc000000000a713d4 in __handle_sysrq (key=key@entry=99 'c',
> check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:612
> #5 0xc000000000a71e94 in write_sysrq_trigger (file=<optimized
> out>, buf=<optimized out>, count=2, ppos=<optimized out>) at
> drivers/tty/sysrq.c:1181
> #6 0xc00000000073260c in pde_write (pde=0xc00000000af9cc00,
> file=<optimized out>, buf=<optimized out>, count=<optimized
out>,
> ppos=<optimized out>) at fs/proc/inode.c:334
> #7 proc_reg_write (file=<optimized out>, buf=<optimized out>,
> count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:346
> #8 0xc00000000063c0e0 in vfs_write (file=0xc0000000092d2900,
> buf=0x10012536f60 <error: Cannot access memory at address
> 0x10012536f60>, count=2, pos=0xc00000003264bd30) at
> fs/read_write.c:588
> #9 vfs_write (file=0xc0000000092d2900, buf=0x10012536f60 <error:
> Cannot access memory at address 0x10012536f60>, count=<optimized
> out>, pos=0xc00000003264bd30) at fs/read_write.c:570
> #10 0xc00000000063c690 in ksys_write (fd=<optimized out>,
> buf=0x10012536f60 <error: Cannot access memory at address
> 0x10012536f60>, count=2) at fs/read_write.c:643
> #11 0xc000000000031a28 in system_call_exception
> (regs=0xc00000003264be80, r0=<optimized out>) at
> arch/powerpc/kernel/syscall.c:153
> #12 0xc00000000000d05c in system_call_vectored_common () at
> arch/powerpc/kernel/interrupt_64.S:198
>
> crash> ps
> PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 0 0 0 c000000002bda980 RU 0.0 0 0
> [swapper/0]
> > 0 0 1 c000000003864c80 RU 0.0 0 0
> [swapper/1]
> ...
> 8017 923 0 c000000043a20000 IN 0.2 22528 16256
> sshd-session
> 8025 8017 6 c000000032271880 IN 0.1 22784 11840
> sshd-session
> > 8026 8025 0 c000000043a26600 RU 0.1 9664 6208
> bash
> ...
> 11645 2 3 c000000032264c80 ID 0.0 0 0
> [kworker/u32:2]
> 11738 6188 2 c00000003811b180 IN 0.1 43520 9408
> pickup
> 12326 2 0 c00000003226b280 ID 0.0 0 0
> [kworker/0:1]
> 13112 6089 2 c00000000c809900 IN 0.0 7232 3456 sleep
>
> Let's take the "pickup" task as an example:
>
> crash> set 11738
> PID: 11738
> COMMAND: "pickup"
> TASK: c00000003811b180 [THREAD_INFO: c00000003811b180]
> CPU: 2
> STATE: TASK_INTERRUPTIBLE
>
> crash> gdb bt
> #0 0xc0000000a7f876a0 in ?? ()
> gdb: gdb request failed: bt
> crash> set gdb on
> gdb: on
> gdb> bt
> #0 0xc0000000a7f876a0 in ?? ()
> gdb>
>
> Anyway, I did the same test on x86 64 and aarch64, it can work
> well as expected. Can you help to double check on ppc64 architecture?
>
> X86 64:
> crash> set 14599
> PID: 14599
> COMMAND: "pickup"
> TASK: ffff8f57a0d7c180 [THREAD_INFO: ffff8f57a0d7c180]
> CPU: 41
> STATE: TASK_INTERRUPTIBLE
> crash> gdb bt
> #0 0xffffffff8b3efe29 in context_switch (rq=0xffff8f6f1f835900,
> prev=0xffff8f57a0d7c180, next=0xffff8f5786720000,
> rf=0xffff9df22fea7b80) at kernel/sched/core.c:5208
> #1 __schedule (sched_mode=sched_mode@entry=0) at
> kernel/sched/core.c:6549
> #2 0xffffffff8b3f0217 in __schedule_loop (sched_mode=<optimized
> out>) at kernel/sched/core.c:6626
> #3 schedule () at kernel/sched/core.c:6641
> #4 0xffffffff8b3f6eef in schedule_hrtimeout_range_clock
> (expires=expires@entry=0xffff9df22fea7cb0, delta=<optimized out>,
> delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS,
> clock_id=clock_id@entry=1) at kernel/time/hrtimer.c:2293
> #5 0xffffffff8b3f7003 in schedule_hrtimeout_range
> (expires=expires@entry=0xffff9df22fea7cb0,
> delta=delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS) at
> kernel/time/hrtimer.c:2340
> #6 0xffffffff8aae301c in ep_poll (ep=0xffff8f5790d15d40,
> events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100,
> timeout=timeout@entry=0xffff9df22fea7d58) at fs/eventpoll.c:2062
> #7 0xffffffff8aae3138 in do_epoll_wait (epfd=epfd@entry=8,
> events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100,
> to=0xffff9df22fea7d58) at fs/eventpoll.c:2464
> #8 0xffffffff8aae44a1 in __do_sys_epoll_wait (epfd=<optimized
> out>, events=0x7ffea91b6b90, maxevents=<optimized out>,
> timeout=<optimized out>) at fs/eventpoll.c:2476
> #9 __se_sys_epoll_wait (epfd=<optimized out>, events=<optimized
> out>, maxevents=<optimized out>, timeout=<optimized out>) at
> fs/eventpoll.c:2471
> #10 __x64_sys_epoll_wait (regs=<optimized out>) at fs/eventpoll.c:2471
> #11 0xffffffff8b3e293d in do_syscall_x64 (regs=0xffff9df22fea7f48,
> nr=232) at arch/x86/entry/common.c:52
> #12 do_syscall_64 (regs=0xffff9df22fea7f48, nr=232) at
> arch/x86/entry/common.c:83
> #13 0xffffffff8b40012f in entry_SYSCALL_64 () at
> arch/x86/entry/entry_64.S:121
> crash>
>
>
> aarch64:
> crash> set 9338
> PID: 9338
> COMMAND: "pickup"
> TASK: ffff0000c7b05400 [THREAD_INFO: ffff0000c7b05400]
> CPU: 3
> STATE: TASK_INTERRUPTIBLE
> crash> gdb bt
> #0 __switch_to (prev=<unavailable>,
> prev@entry=0xffff0000c7b05400, next=next@entry=<unavailable>) at
> arch/arm64/kernel/process.c:555
> #1 0xffffafc5b5ebd744 in context_switch (rq=0xffff00077bbd0ec0,
> prev=0xffff0000c7b05400, next=<unavailable>,
> rf=0xffff80008ac63a60) at kernel/sched/core.c:5208
> #2 __schedule (sched_mode=sched_mode@entry=0) at
> kernel/sched/core.c:6549
> #3 0xffffafc5b5ebdc2c in __schedule_loop (sched_mode=<optimized
> out>) at kernel/sched/core.c:6626
> #4 schedule () at kernel/sched/core.c:6641
> #5 0xffffafc5b5ec6030 in schedule_hrtimeout_range_clock
> (expires=expires@entry=0xffff80008ac63be8,
> delta=delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS,
> clock_id=clock_id@entry=1) at kernel/time/hrtimer.c:2293
> #6 0xffffafc5b5ec618c in schedule_hrtimeout_range
> (expires=expires@entry=0xffff80008ac63be8,
> delta=delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS) at
> kernel/time/hrtimer.c:2340
> #7 0xffffafc5b545d33c in ep_poll (ep=<unavailable>,
> events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100,
> timeout=timeout@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2062
> #8 0xffffafc5b545d4e4 in do_epoll_wait (epfd=epfd@entry=8,
> events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100,
> to=to@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2464
> #9 0xffffafc5b545d534 in do_epoll_pwait (epfd=epfd@entry=8,
> events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100,
> to=to@entry=0xffff80008ac63ce0, sigsetsize=<optimized out>,
> sigmask=<optimized out>) at fs/eventpoll.c:2498
> #10 0xffffafc5b545e7c8 in do_epoll_pwait (epfd=8,
> events=0xffffde5c3f68, maxevents=100, to=0xffff80008ac63ce0,
> sigmask=<optimized out>, sigsetsize=<optimized out>) at
> fs/eventpoll.c:2495
> #11 __do_sys_epoll_pwait (epfd=8, events=0xffffde5c3f68,
> maxevents=100, timeout=<optimized out>, sigmask=<optimized out>,
> sigsetsize=<optimized out>) at fs/eventpoll.c:2511
> #12 __se_sys_epoll_pwait (epfd=8, events=281474412330856,
> maxevents=100, timeout=<optimized out>, sigmask=<optimized out>,
> sigsetsize=<optimized out>) at fs/eventpoll.c:2505
> #13 __arm64_sys_epoll_pwait (regs=<optimized out>) at
> fs/eventpoll.c:2505
> #14 0xffffafc5b4fa99bc in __invoke_syscall
> (regs=0xffff80008ac63eb0, syscall_fn=<optimized out>) at
> arch/arm64/kernel/syscall.c:35
> #15 invoke_syscall (regs=regs@entry=0xffff80008ac63eb0,
> scno=<optimized out>, sc_nr=sc_nr@entry=463,
> syscall_table=<optimized out>) at arch/arm64/kernel/syscall.c:49
> #16 0xffffafc5b4fa9ac8 in el0_svc_common (sc_nr=463,
> syscall_table=<optimized out>, regs=0xffff80008ac63eb0,
> scno=<optimized out>) at arch/arm64/kernel/syscall.c:132
> #17 do_el0_svc (regs=regs@entry=0xffff80008ac63eb0) at
> arch/arm64/kernel/syscall.c:151
> #18 0xffffafc5b5eb6fa4 in el0_svc (regs=0xffff80008ac63eb0) at
> arch/arm64/kernel/entry-common.c:712
> #19 0xffffafc5b5eb74c0 in el0t_64_sync_handler (regs=<optimized
> out>) at arch/arm64/kernel/entry-common.c:730
> #20 0xffffafc5b4f91634 in el0t_64_sync () at
> arch/arm64/kernel/entry.S:598
> crash>
>
> BTW: other changes are fine to me.
>
> Thanks
> Lianbo
>
> On Wed, Sep 4, 2024 at 3:54 PM
> <devel-request(a)lists.crash-utility.osci.io> wrote:
>
> Date: Wed, 4 Sep 2024 19:49:25 +1200
> From: Tao Liu <ltao(a)redhat.com>
> Subject: [Crash-utility] [PATCH v7 00/15] gdb stack unwinding
> support
> for crash utility
> To: devel(a)lists.crash-utility.osci.io
> Cc: Tao Liu <ltao(a)redhat.com>
> Message-ID: <20240904074940.21331-1-ltao(a)redhat.com>
> Content-Type: text/plain; charset=UTF-8
>
> This patchset is a rebase/merged version of the following 3
> patchsets:
>
> 1): [PATCH v10 0/5] Improve stack unwind on ppc64 [1]
> 2): [PATCH 0/5] x86_64 gdb stack unwinding support [2]
> 3): Clean up on top of one-thread-v2 [3]
>
> A complete description of gdb stack unwinding support for
> crash can be
> found in [1].
>
> This patchset can be divided into the following 3 parts:
>
> 1) part1: preparations before stack unwinding support, some
> bugs/regressions found when drafting this patchset.
> 2) part2: common part for all CPU archs, mainly dealing with
> crash_target.c/gdb_interface.c files, in order to
> support different archs.
> 3) part3: arch specific, for each ppc64/x86_64/arm64/vmware
> stack unwinding support.
>
> === part 3
> arm64: Add gdb stack unwinding support
> vmware_guestdump: Various format versions support
> x86_64: Add gdb stack unwinding support
> ppc64: correct gdb passthroughs by implementing
> machdep->get_current_task_reg
>
> === part 2
> Conditionally output gdb stack unwinding stop reasons
> Stop stack unwinding at non-kernel address
> Print task pid/command instead of CPU index
> Rename get_cpu_reg to get_current_task_reg
> Let crash change gdb context
> Leave only one gdb thread for crash
> Remove 'frame' from prohibited commands list
>
> === part 1
> Fix gdb_interface: restore gdb's output streams at end of
> gdb_interface
> x86_64: Fix invalid input "=>" for bt command
> Fix cpumask_t recursive dependence issue
> Fix the regression of cpumask_t for xen hyper
> ===
>
> v7 -> v6:
> 1) Reorganise the patchset, re-divided them into 3 part
> against the
> previous 2 parts.
> 2) Re-dealed with the cpumask_t part, which solved the comment
> No.4
> pointed out by lianbo in [4].
> 3) Add conditional output for the failing message of gdb stack
> unwinding.
> see [PATCH 11/15] Conditionally output gdb stack unwinding
> stop reasons
> 4) Redraft the commit messages, updated some outdated info.
> 5) Merged "Let crash change gdb context" and "set_context():
> check if
> context is already current" into one.
>
> [4]:
>
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg01067.html
>
> v6 -> v5:
> 1) Refactor patch 4 & 9, which changed the function signature
> of struct
> get_cpu_reg/get_current_task_reg, and let each patch
> compile with no
> error when added on.
> 2) Rebased the patchset on top of latest upstream:
> ("79b93ecb2e72ec Fix a "Bus error" issue caused by
'crash
> --osrelease' or
> crash loading")
>
> v5 -> v4:
> 1) Plenty of code refactoring based on Lianbo's comments on v4.
> 2) Removed the magic number when dealing with regs bitmap, see
> [6].
> 3) Rebased the patchset on top of latest upstream:
> ("1c6da3eaff8207 arm64: Fix bt command show wrong
> stacktrace on ramdump source")
>
> v4 -> v3:
> Fixed the author issue in [PATCH v3 06/16] Fix gdb_interface:
> restore gdb's
> output streams at end of gdb_interface.
>
> v3 -> v2:
> 1) Updated CC list as pointed out in [4]
> 2) Compiling issues as in [5]
>
> v2 -> v1:
> 1) Added the patch: x86_64: Fix invalid input "=>" for bt
command,
> thanks for Kazu's testing.
> 2) Modify the patch: x86_64: Add gdb stack unwinding support,
> added the
> pcp_save, spp_save and sp, for restoring the value in match
> of the original
> code logic.
>
> [1]:
>
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00469.html
> [2]:
>
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00488.html
> [3]:
>
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00554.html
> [4]:
>
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00681.html
> [5]:
>
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00715.html
> [6]:
>
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00819.html
>
> Aditya Gupta (3):
> Fix gdb_interface: restore gdb's output streams at end of
> gdb_interface
> Remove 'frame' from prohibited commands list
> ppc64: correct gdb passthroughs by implementing
> machdep->get_current_task_reg
>
> Alexey Makhalov (1):
> vmware_guestdump: Various format versions support
>
> Tao Liu (11):
> Fix the regression of cpumask_t for xen hyper
> Fix cpumask_t recursive dependence issue
> x86_64: Fix invalid input "=>" for bt command
> Leave only one gdb thread for crash
> Let crash change gdb context
> Rename get_cpu_reg to get_current_task_reg
> Print task pid/command instead of CPU index
> Stop stack unwinding at non-kernel address
> Conditionally output gdb stack unwinding stop reasons
> x86_64: Add gdb stack unwinding support
> arm64: Add gdb stack unwinding support
>
> arm64.c | 120 +++++++++++++++--
> crash_target.c | 71 ++++++----
> defs.h | 194 ++++++++++++++++++++++++++-
> gdb-10.2.patch | 96 ++++++++++++++
> gdb_interface.c | 39 ++----
> kernel.c | 63 +++++++--
> ppc64.c | 174 +++++++++++++++++++++++-
> symbols.c | 15 +++
> task.c | 34 +++--
> tools.c | 16 ++-
> unwind_x86_64.h | 4 -
> vmware_guestdump.c | 321
> +++++++++++++++++++++++++++++++-------------
> x86_64.c | 323
> ++++++++++++++++++++++++++++++++++++++++-----
> 13 files changed, 1247 insertions(+), 223 deletions(-)
>
> --
> 2.40.1
>