Hi Aditya,
On Fri, Dec 15, 2023 at 4:00 PM Aditya Gupta <adityag(a)linux.ibm.com> wrote:
The Problem:
============
Currently crash is unable to show function arguments and local variables, as
gdb can do. And functionality for moving between frames ('up'/'down') is
not
working in crash.
Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs
'bt', 'frame', 'info locals', 'up', 'down' are
not working either, due to
gdb not getting the register values from `crash_target::fetch_registers`,
which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64
Proposed Solution:
==================
Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64.
This way, "gdb mode in crash" will support this feature for both ELF and
kdump-compressed vmcore formats, while "gdb" would only have supported ELF
format
This way other features of 'gdb', such as seeing
backtraces/registers/variables/arguments/local variables, moving up and
down stack frames, can be used with any ppc64 vmcore, irrespective of
being ELF format or kdump-compressed format.
Note: This doesn't support live debugging on ppc64, since registers are not
available to be read
I tried to enable live debugging on ppc64 arch based on your patch but
failed. I cannot get the saved register state from kernel:
task_struct-> thread_struct -> struct pt_regs *regs. The "regs" value
I read is always 0.
I'm not very familiar with the ppc64 kernel, won't the kernel save its
registers into "struct pt_regs *regs"? Or is them been saved to
somewhere else?
Thanks,
Tao Liu
Implications on Architectures:
====================================
No architecture other than PPC64 has been affected, other than in case of
'frame' command
As mentioned in patch #2, since frame will not be prohibited, so it will print:
crash> frame
#0 <unavailable> in ?? ()
Instead of before prohibited message:
crash> frame
crash: prohibited gdb command: frame
Major change will be in 'gdb mode' on PPC64, that it will print the frames, and
local variables, instead of failing with errors showing no frame, or showing
that couldn't get PC, it will be able to give all this information.
Testing:
========
Git tree with this patch series applied:
https://github.com/adi-g15-ibm/crash/tree/stack-unwind-v4
To test various gdb passthroughs:
(crash) set
(crash) set gdb on
gdb> thread
gdb> bt
gdb> info threads
gdb> info threads
gdb> info locals
gdb> info variables irq_rover_lock
gdb> info args
gdb> thread 2
gdb> set gdb off
(crash) set
(crash) set -c 6
(crash) gdb thread
(crash) bt
(crash) gdb bt
(crash) frame
(crash) up
(crash) down
(crash) info locals
Known Issues:
=============
1. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
from older kernels. This is a known issue due to register mismatch, and
its fix has been merged upstream:
This can also cause some 'invalid kernel virtual address' errors during gdb
unwinding the stack registers
Commit:
https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef78581...
Fixing GDB passthroughs on other architectures
==============================================
Much of the work for making gdb passthroughs like 'gdb bt', 'gdb
thread', 'gdb info locals' etc. has been done by the patches introducing
'machdep->get_cpu_reg' and this series fixing some issues in that.
Other architectures should be able to fix these gdb functionalities by
simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'.
The reasoning behind that has been explained with a diagram in commit
description of patch #1
I will assist with my findings/observations fixing it on ppc64 whenever needed.
Additional Notes:
=================
Sorry, it took a long time to send this version. Tried fixing 'info
threads' but wasn't able to. Gave it time again, and was able to fix it
this time after multiple days of debugging.
Some other things from last version review:
* 'info rv' not working:
It's not supported in gdb, instead we need to use 'info locals rv' or
'info variables rv'
* 'info variables' command hangs... and prints nothing after hanging for long
It likely hangs due to a lot of symbols being there, and it's trying to
get all gdb's output and page it, so Control+C messes it up, but if we pass
a regex filter to limit the output, eg. info variables rq, then it doesn't
hang, and prints the variables/symbols.
Even with gdb, ie. simply running 'gdb vmlinux vmcore' also hangs due
to the lot of symbols
* making crashing thread as default in gdb:
This is implemented now, along with synchronising crash & gdb contexts, in
patch #3
* 'info threads' not working:
This turned to be due to a bug in gdb_interface. I fixed 'info
threads' in 2 patches, to simplify it, first for the gdb_interface,
and another patch for setting the context correctly in crash
* other info commands:
I tested all the info commands, in crash along with this patch.
Most of those that fail in crash are due to gdb itself not supporting
them with vmcores, and other than that is the 'info pretty' command,
which might not be needed in crash anyways
* live debugging showing only one thread:
I tried it with crash, crash shows only the current thread, ie.
itself, so it does not have information of registers for the other
CPUs. Similarly gdb does not support live kernel debugging (without
connecting to a gdbstub/QEMU etc.).
If you need I can make it show the current thread id correctly for
the one thread, but I don't think it might help much with live
debugging
Hope, I set the context, thanks for the reviews, I replied and worked
on your suggestions, but got stuck there due to 'info threads'
Changelog:
==========
V4:
+ fix segmentation fault in live debugging (change in patch #1)
+ mention live debugging not supported in cover letter and patch #1
+ fixed some checkpatch warnings (change in patch #5)
V3:
+ default gdb thread will be the crashing thread, instead of being
thread '0'
+ synchronise crash cpu and gdb thread context
+ fix bug in gdb_interface, that replaced gdb's output stream, losing
output in some cases, such as info threads and extra output in info
variables
+ fix 'info threads'
RFC V2:
- removed patch implementing 'frame', 'up', 'down' in crash
- updated the cover letter by removing the mention of those commands other
than the respective gdb passthrough
Aditya Gupta (5):
ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
remove 'frame' from prohibited commands list
synchronise cpu context changes between crash/gdb
fix gdb_interface: restore gdb's output streams at end of
gdb_interface
fix 'info threads' command
crash_target.c | 44 ++++++++++++++++
defs.h | 130 +++++++++++++++++++++++++++++++++++++++++++++++-
gdb-10.2.patch | 110 +++++++++++++++++++++++++++++++++++++++-
gdb_interface.c | 2 +-
kernel.c | 47 +++++++++++++++--
ppc64.c | 95 +++++++++++++++++++++++++++++++++--
task.c | 14 ++++++
tools.c | 2 +-
8 files changed, 434 insertions(+), 10 deletions(-)
--
2.41.0