On Fri, Feb 02, 2024 at 07:04:14AM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote:
Hi Aditya,
Thank you for the work, it looks nicely done to me, except for the "info
threads" issue and the style of the gdb patch.
Hi Tao Liu,
I saw that you were interested in support for other architectures. I'd
like to test/support this also on x86_64 at the same time as ppc64 if
possible. Do you have any trial patch or plan?
Hi Kazu & Aditya,
Sorry for the delay. You can access my trial patch by
https://github.com/liutgnu/crash-dev. There are some known issue:
1) Some vmcores will stack unwind fail:
$ ./crash /var/crash/127.0.0.1-2023-11-10-18\:27\:30/vmcore ~/vmlinux
KERNEL: /root/vmlinux [TAINTED]
DUMPFILE: /var/crash/127.0.0.1-2023-11-10-18:27:30/vmcore [PARTIAL DUMP]
CPUS: 1
DATE: Fri Nov 10 18:27:26 CST 2023
UPTIME: 00:10:49
LOAD AVERAGE: 0.07, 0.11, 0.08
TASKS: 133
NODENAME:
ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com
RELEASE: 5.14.0-39.el9.x86_64
VERSION: #1 SMP PREEMPT Fri Dec 24 00:07:58 EST 2021
MACHINE: x86_64 (2303 Mhz)
MEMORY: 4 GB
PANIC: "Oops: 0002 [#1] PREEMPT SMP NOPTI" (check log for details)
PID: 22722
COMMAND: "insmod"
TASK: ffff973e88408000 [THREAD_INFO: ffff973e88408000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> set 1
>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800
ip:ffffffffb726a1b3
>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800 ip:ffffffffb726a1b3
crash>
gdb bt
#0 0xffffffffb726a1b3 in context_switch (rf=0xffffa5ee40013d00, next=0xffff973e88408000,
prev=0xffff973e801fb280, rq=0xffff973efbc2c800) at kernel/sched/core.c:4972
#1 __schedule (sched_mode=0) at kernel/sched/core.c:6253
#2 0x0004005300000004 in ?? ()
crash> bt
PID: 1 TASK: ffff973e801fb280 CPU: 0 COMMAND: "systemd"
#0 [ffffa5ee40013d30] __schedule at ffffffffb726a1b3
#1 [ffffa5ee40013d78] schedule at ffffffffb726a553
#2 [ffffa5ee40013d90] schedule_hrtimeout_range_clock at ffffffffb726f294
#3 [ffffa5ee40013e10] ep_poll at ffffffffb6bc00c4
#4 [ffffa5ee40013eb0] do_epoll_wait at ffffffffb6bc01db
#5 [ffffa5ee40013ee8] __x64_sys_epoll_wait at ffffffffb6bc09b0
#6 [ffffa5ee40013f38] do_syscall_64 at ffffffffb725ea98
#7 [ffffa5ee40013f50] entry_SYSCALL_64_after_hwframe at ffffffffb740007c
The stack unwinding failed for "gdb bt", it only unwinded the
"__schedule" function. The similarities for the failing is rsp and rbp which got
from stack pointing to different stack frames:
>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800
ip:ffffffffb726a1b3
crash> rd ffffa5ee40013d00 32
ffffa5ee40013d00: ffff973ef8eef4a8 0000000000000000 ....>........... r15, r14
(struct inactive_task_frame)
ffffa5ee40013d10: ffff973e88408000 ffff973e801fb280 ..@.>.......>... r13, r12
ffffa5ee40013d20: ffff973e801fbd98 ffff973efbc2c800 ....>.......>... bx, bp
ffffa5ee40013d30: ffffffffb726a1b3 ffff973e00000001 ..&.........>... ret_addr
ffffa5ee40013d40: 0004005300000004 f915b72f0f356b00 ....S....k5./...
ffffa5ee40013d50: ffff973e801fb280 ffff973e801fb280 ....>.......>...
ffffa5ee40013d60: ffff973e801fb280 ffff973ef8eef4e0 ....>.......>...
ffffa5ee40013d70: 0000000000000000 ffffffffb726a553 ........S.&.....
ffffa5ee40013d80: ffff973ef8eef480 ffffa5ee40013e68 ....>...h>.@....
ffffa5ee40013d90: ffffffffb726f294 ffffffffb6bbf092 ..&.............
ffffa5ee40013da0: 000055d6d4081de0 00000054b6b4c6fd .....U......T...
ffffa5ee40013db0: ffff973ef8eef4d0 ffffa5ee40013db8 ....>....=.@....
ffffa5ee40013dc0: ffffa5ee40013db8 0000000000000000 .=.@............
ffffa5ee40013dd0: ffff973e00000019 f915b72f0f356b00 ....>....k5./...
ffffa5ee40013de0: f915b72f0f356b00 ffff973ef8eef480 .k5./.......>...
ffffa5ee40013df0: ffffa5ee40013e68 ffff973e801fb280 h>.@........>...
And bp is modified as follows:
crash> dis schedule
0xffffffffb726a510 <schedule>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffb726a515 <schedule+5>: push %rbp
0xffffffffb726a516 <schedule+6>: mov %gs:0x16f40,%rbp <<<
0xffffffffb726a51f <schedule+15>: push %rbx
0xffffffffb726a520 <schedule+16>: mov 0x18(%rbp),%eax
0xffffffffb726a523 <schedule+19>: test %eax,%eax
I'm not sure why in this case gdb cannot get the stack unwinded. Other than this case,
the x86_64 stack unwinding works fine according to my test.
2) May break the original ppc64 patch.
This x86_64 patch is based on the original v7 ppc stack unwinding patch. And it modified a
bit of the original ppc64 patch code. I haven't fully tested the code in ppc64 arch.
Thanks,
Tao Liu
Thanks,
Kazu
On 2024/01/05 16:30, Aditya Gupta wrote:
> The Problem:
> ============
>
> Currently crash is unable to show function arguments and local variables, as
> gdb can do. And functionality for moving between frames
('up'/'down') is not
> working in crash.
>
> Crash has 'gdb passthroughs' for things gdb can do, but the gdb
passthroughs
> 'bt', 'frame', 'info locals', 'up', 'down'
are not working either, due to
> gdb not getting the register values from `crash_target::fetch_registers`,
> which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64
>
> Proposed Solution:
> ==================
>
> Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for
PPC64.
> This way, "gdb mode in crash" will support this feature for both ELF and
> kdump-compressed vmcore formats, while "gdb" would only have supported
ELF
> format
>
> This way other features of 'gdb', such as seeing
> backtraces/registers/variables/arguments/local variables, moving up and
> down stack frames, can be used with any ppc64 vmcore, irrespective of
> being ELF format or kdump-compressed format.
>
> Note: This doesn't support live debugging on ppc64, since registers are not
> available to be read
>
> Implications on Architectures:
> ====================================
>
> No architecture other than PPC64 has been affected, other than in case of
> 'frame' command
>
> As mentioned in patch #2, since frame will not be prohibited, so it will print:
>
> crash> frame
> #0 <unavailable> in ?? ()
>
> Instead of before prohibited message:
>
> crash> frame
> crash: prohibited gdb command: frame
>
> Major change will be in 'gdb mode' on PPC64, that it will print the frames,
and
> local variables, instead of failing with errors showing no frame, or showing
> that couldn't get PC, it will be able to give all this information.
>
> Testing:
> ========
>
> Git tree with this patch series applied:
>
https://github.com/adi-g15-ibm/crash/tree/stack-unwind-v6
>
> To test various gdb passthroughs:
>
> (crash) set
> (crash) set gdb on
> gdb> thread
> gdb> bt
> gdb> info threads
> gdb> info threads
> gdb> info locals
> gdb> info variables irq_rover_lock
> gdb> info args
> gdb> thread 2
> gdb> set gdb off
> (crash) set
> (crash) set -c 6
> (crash) gdb thread
> (crash) bt
> (crash) gdb bt
> (crash) frame
> (crash) up
> (crash) down
> (crash) info locals
>
> Known Issues:
> =============
>
> 1. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
> from older kernels. This is a known issue due to register mismatch, and
> its fix has been merged upstream:
>
> This can also cause some 'invalid kernel virtual address' errors during
gdb
> unwinding the stack registers
>
> Commit:
https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef78581...
>
> Fixing GDB passthroughs on other architectures
> ==============================================
>
> Much of the work for making gdb passthroughs like 'gdb bt', 'gdb
> thread', 'gdb info locals' etc. has been done by the patches
introducing
> 'machdep->get_cpu_reg' and this series fixing some issues in that.
>
> Other architectures should be able to fix these gdb functionalities by
> simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'.
>
> The reasoning behind that has been explained with a diagram in commit
> description of patch #1
>
> I will assist with my findings/observations fixing it on ppc64 whenever needed.
>
> Changelog:
> ==========
>
> V6:
> + changes in patch #5: fix bug introduced in v5 that caused initial gdb thread
> to be thread 1
>
> V5:
> + changes in patch #1: made ppc64_get_cpu_reg static, and remove unreachable
> code
> + changes in patch #3: fixed typo 'ppc64_renum' instead of
'ppc64_regnum',
> remove unneeded if condition
> + changes in patch #5: implement refresh regcache on per thread, instead of all
> threads at once
>
> V4:
> + fix segmentation fault in live debugging (change in patch #1)
> + mention live debugging not supported in cover letter and patch #1
> + fixed some checkpatch warnings (change in patch #5)
>
> V3:
> + default gdb thread will be the crashing thread, instead of being
> thread '0'
> + synchronise crash cpu and gdb thread context
> + fix bug in gdb_interface, that replaced gdb's output stream, losing
> output in some cases, such as info threads and extra output in info
> variables
> + fix 'info threads'
>
> RFC V2:
> - removed patch implementing 'frame', 'up', 'down' in
crash
> - updated the cover letter by removing the mention of those commands other
> than the respective gdb passthrough
>
> Aditya Gupta (5):
> ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
> remove 'frame' from prohibited commands list
> synchronise cpu context changes between crash/gdb
> fix gdb_interface: restore gdb's output streams at end of
> gdb_interface
> fix 'info threads' command
>
> crash_target.c | 44 ++++++++++++++++
> defs.h | 130 +++++++++++++++++++++++++++++++++++++++++++++++-
> gdb-10.2.patch | 110 +++++++++++++++++++++++++++++++++++++++-
> gdb_interface.c | 2 +-
> kernel.c | 47 +++++++++++++++--
> ppc64.c | 95 +++++++++++++++++++++++++++++++++--
> task.c | 14 ++++++
> tools.c | 2 +-
> 8 files changed, 434 insertions(+), 10 deletions(-)
>