Hi Tao,
On Wed, Dec 13, 2023 at 11:09:42PM +0800, Tao Liu wrote:
Hi Aditya,
On Wed, Dec 13, 2023 at 10:28 PM Aditya Gupta <adityag(a)linux.ibm.com> wrote:
>
> Hi Tao,
>
> On Wed, Dec 13, 2023 at 09:03:37PM +0800, Tao Liu wrote:
> > Hi Aditya,
> >
> > I encountered a problem for analyze the ppc64 vmcore after applied all
> > patches in the patchset:
> >
> > crash> gdb bt
> > #0 0xc000000000279d98 in crash_setup_regs (gdb: invalid kernel
> > virtual address: fffffffffffffffb type: "gdb_readmem callback"
> > gdb: invalid kernel virtual address: fffffffffffffff7 type:
> > "gdb_readmem callback"
> > gdb: invalid kernel virtual address: fffffffffffffff3 type:
> > "gdb_readmem callback"
> > gdb: invalid kernel virtual address: fffffffffffffffb type:
> > "gdb_readmem callback"
> > gdb: invalid kernel virtual address: fffffffffffffff7 type:
> > "gdb_readmem callback"
> > gdb: invalid kernel virtual address: fffffffffffffff3 type:
> > "gdb_readmem callback"
> > oldregs=<optimized out>, newregs=0xc000000012e87968) at
> > ./arch/powerpc/include/asm/kexec.h:69
> > #1 __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:975
> > #2 0xfffffffffffffffb in ?? ()
> > Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> > crash> gdb info threads
> > Id Target Id Frame
> > 1 CPU 0 plpar_hcall_norets_notrace () at
> > arch/powerpc/platforms/pseries/hvCall.S:112
> > * 2 CPU 1 0xc000000000279d98 in crash_setup_regs (gdb:
> > invalid kernel virtual address: fffffffffffffffb type: "gdb_readmem
> > callback"
> > gdb: invalid kernel virtual address: fffffffffffffff7 type:
> > "gdb_readmem callback"
> > gdb: invalid kernel virtual address: fffffffffffffff3 type:
> > "gdb_readmem callback"
> > gdb: invalid kernel virtual address: fffffffffffffffb type:
> > "gdb_readmem callback"
> > gdb: invalid kernel virtual address: fffffffffffffff7 type:
> > "gdb_readmem callback"
> > gdb: invalid kernel virtual address: fffffffffffffff3 type:
> > "gdb_readmem callback"
> > oldregs=<optimized out>, newregs=0xc000000012e87968) at
> > ./arch/powerpc/include/asm/kexec.h:69
> >
> > Seems the crash stack unwinding gave a wrong value to gdb. I tried for
> > some time to find out the root cause but got unlucky. Hope you can
> > help me out. I can give you the vmcore to analyze this issue in
> > another mail. Thanks in advance!
>
> These kind of errors I mostly see due to symbol/structure change in kernel,
> maybe something changed in kernel, or some invalid value was read from some
> structure.
>
> Thanks for the backtrace, will try this with upstream kernel.
Thanks for your help!
>
> Just to check I should cause the crash using 'echo c >
/proc/sysrq-trigger'
> right ? or was it done through some other way ?
>
Yes, I get the vmcore just by triggering kernel crash by "echo c >
/proc/sysrq-trigger'" as you mentioned. In addition, the kernel which
I used for debugging is kernel-5.14.0-362.15.1.el9_3. I didn't try the
upstream kernel...
Okay, thanks. I will try it.
Thanks,
Aditya Gupta
>
> > >
> > > Currently I have made the x86_64 stack unwinding work based on your
> > > patchset. And I plan to post it upstream once your patchsets get
> > > merged. In addition, is there a plan to support the stack unwinding
> > > for live debugging in ppc64 arch? I think it is a useful feature
> > > too...
> >
> > Wow, great. I will fix this issue in the patch series, and any issue, then I
> > guess our patches will be ready to merge :)
>
> Yeah, looks great, thanks!
>
> Thanks,
> Tao Liu
> >
> > Thanks,
> > Aditya Gupta
> >
> > >
> > > Thanks,
> > > Tao Liu
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Dec 12, 2023 at 12:51 PM Aditya Gupta <adityag(a)linux.ibm.com>
wrote:
> > > >
> > > > On Mon, Dec 11, 2023 at 08:04:50PM +0800, Lianbo Jiang wrote:
> > > > > On 12/9/23 20:45, Aditya Gupta wrote:
> > > > >
> > > > > > Hi, just a ping. Any comments on the series ?
> > > > >
> > > > > Hi, Aditya
> > > > >
> > > > >
> > > > > Thank you for the update. I will have a look and do the tests
this week. And
> > > > > give some feedback.
> > > >
> > > > Sure. Thanks Lianbo.
> > > >
> > > > - Aditya Gupta
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Lianbo
> > > > >
> > > > > >
> > > > > > On Mon, Dec 04, 2023 at 08:29:36PM +0530, Aditya Gupta
wrote:
> > > > > > > The Problem:
> > > > > > > ============
> > > > > > >
> > > > > > > Currently crash is unable to show function arguments
and local variables, as
> > > > > > > gdb can do. And functionality for moving between frames
('up'/'down') is not
> > > > > > > working in crash.
> > > > > > >
> > > > > > > Crash has 'gdb passthroughs' for things gdb can
do, but the gdb passthroughs
> > > > > > > 'bt', 'frame', 'info locals',
'up', 'down' are not working either, due to
> > > > > > > gdb not getting the register values from
`crash_target::fetch_registers`,
> > > > > > > which then uses `machdep->get_cpu_reg`, which is not
implemented for PPC64
> > > > > > >
> > > > > > > Proposed Solution:
> > > > > > > ==================
> > > > > > >
> > > > > > > Fix the gdb passthroughs by implementing
"machdep->get_cpu_reg" for PPC64.
> > > > > > > This way, "gdb mode in crash" will support
this feature for both ELF and
> > > > > > > kdump-compressed vmcore formats, while "gdb"
would only have supported ELF
> > > > > > > format
> > > > > > >
> > > > > > > This way other features of 'gdb', such as
seeing
> > > > > > > backtraces/registers/variables/arguments/local
variables, moving up and
> > > > > > > down stack frames, can be used with any ppc64 vmcore,
irrespective of
> > > > > > > being ELF format or kdump-compressed format.
> > > > > > >
> > > > > > > Implications on Architectures:
> > > > > > > ====================================
> > > > > > >
> > > > > > > No architecture other than PPC64 has been affected,
other than in case of
> > > > > > > 'frame' command
> > > > > > >
> > > > > > > As mentioned in patch #2, since frame will not be
prohibited, so it will print:
> > > > > > >
> > > > > > > crash> frame
> > > > > > > #0 <unavailable> in ?? ()
> > > > > > >
> > > > > > > Instead of before prohibited message:
> > > > > > >
> > > > > > > crash> frame
> > > > > > > crash: prohibited gdb command: frame
> > > > > > >
> > > > > > > Major change will be in 'gdb mode' on PPC64,
that it will print the frames, and
> > > > > > > local variables, instead of failing with errors showing
no frame, or showing
> > > > > > > that couldn't get PC, it will be able to give all
this information.
> > > > > > >
> > > > > > > Testing:
> > > > > > > ========
> > > > > > >
> > > > > > > Git tree with this patch series applied:
> > > > > > >
https://github.com/adi-g15-ibm/crash/tree/stack-unwind-3
> > > > > > >
> > > > > > > To test various gdb passthroughs:
> > > > > > >
> > > > > > > gdb> set
> > > > > > > gdb> set gdb on
> > > > > > > gdb> thread
> > > > > > > gdb> bt
> > > > > > > gdb> info threads
> > > > > > > gdb> info threads
> > > > > > > gdb> info locals
> > > > > > > gdb> info variables irq_rover_lock
> > > > > > > gdb> info args
> > > > > > > gdb> thread 2
> > > > > > > gdb> set gdb off
> > > > > > > gdb> set
> > > > > > > gdb> set -c 6
> > > > > > > gdb> gdb thread
> > > > > > > gdb> bt
> > > > > > > gdb> gdb bt
> > > > > > > gdb> frame
> > > > > > > gdb> up
> > > > > > > gdb> down
> > > > > > > gdb> info locals
> > > > > > >
> > > > > > > Known Issues:
> > > > > > > =============
> > > > > > >
> > > > > > > 1. In gdb mode, 'bt' might fail to show
backtrace in few vmcores collected
> > > > > > > from older kernels. This is a known issue due to
register mismatch, and
> > > > > > > its fix has been merged upstream:
> > > > > > >
> > > > > > > Commit:
https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef78581...
> > > > > > >
> > > > > > > Fixing GDB passthroughs on other architectures
> > > > > > > ==============================================
> > > > > > >
> > > > > > > Much of the work for making gdb passthroughs like
'gdb bt', 'gdb
> > > > > > > thread', 'gdb info locals' etc. has been
done by the patches introducing
> > > > > > > 'machdep->get_cpu_reg' and this series
fixing some issues in that.
> > > > > > >
> > > > > > > Other architectures should be able to fix these gdb
functionalities by
> > > > > > > simply implementing 'machdep->get_cpu_reg (cpu,
regno, ...)'.
> > > > > > >
> > > > > > > The reasoning behind that has been explained with a
diagram in commit
> > > > > > > description of patch #1
> > > > > > >
> > > > > > > I will assist with my findings/observations fixing it
on ppc64 whenever needed.
> > > > > > >
> > > > > > > Additional Notes:
> > > > > > > =================
> > > > > > >
> > > > > > > Sorry, it took a long time to send this version. Tried
fixing 'info
> > > > > > > threads' but wasn't able to. Gave it time
again, and was able to fix it
> > > > > > > this time after multiple days of debugging.
> > > > > > >
> > > > > > > Some other things from last version review:
> > > > > > >
> > > > > > > * 'info rv' not working:
> > > > > > > It's not supported in gdb, instead we need to
use 'info locals rv' or
> > > > > > > 'info variables rv'
> > > > > > >
> > > > > > > * 'info variables' command hangs... and prints
nothing after hanging for long
> > > > > > > It likely hangs due to a lot of symbols being there,
and it's trying to
> > > > > > > get all gdb's output and page it, so Control+C
messes it up, but if we pass
> > > > > > > a regex filter to limit the output, eg. info
variables rq, then it doesn't
> > > > > > > hang, and prints the variables/symbols.
> > > > > > > Even with gdb, ie. simply running 'gdb vmlinux
vmcore' also hangs due
> > > > > > > to the lot of symbols
> > > > > > >
> > > > > > > * making crashing thread as default in gdb:
> > > > > > > This is implemented now, along with synchronising
crash & gdb contexts, in
> > > > > > > patch #3
> > > > > > >
> > > > > > > * 'info threads' not working:
> > > > > > > This turned to be due to a bug in gdb_interface. I
fixed 'info
> > > > > > > threads' in 2 patches, to simplify it, first for
the gdb_interface,
> > > > > > > and another patch for setting the context correctly
in crash
> > > > > > >
> > > > > > > * other info commands:
> > > > > > > I tested all the info commands, in crash along with
this patch.
> > > > > > > Most of those that fail in crash are due to gdb
itself not supporting
> > > > > > > them with vmcores, and other than that is the
'info pretty' command,
> > > > > > > which might not be needed in crash anyways
> > > > > > >
> > > > > > > * live debugging showing only one thread:
> > > > > > > I tried it with crash, crash shows only the current
thread, ie.
> > > > > > > itself, so it does not have information of registers
for the other
> > > > > > > CPUs. Similarly gdb does not support live kernel
debugging (without
> > > > > > > connecting to a gdbstub/QEMU etc.).
> > > > > > > If you need I can make it show the current thread id
correctly for
> > > > > > > the one thread, but I don't think it might help
much with live
> > > > > > > debugging
> > > > > > >
> > > > > > > Hope, I set the context, thanks for the reviews, I
replied and worked
> > > > > > > on your suggestions, but got stuck there due to
'info threads'
> > > > > > >
> > > > > > > Changelog:
> > > > > > > ==========
> > > > > > >
> > > > > > > V3:
> > > > > > > + default gdb thread will be the crashing thread,
instead of being
> > > > > > > thread '0'
> > > > > > > + synchronise crash cpu and gdb thread context
> > > > > > > + fix bug in gdb_interface, that replaced gdb's
output stream, losing
> > > > > > > output in some cases, such as info threads and extra
output in info
> > > > > > > variables
> > > > > > > + fix 'info threads'
> > > > > > >
> > > > > > > RFC V2:
> > > > > > > - removed patch implementing 'frame',
'up', 'down' in crash
> > > > > > > - updated the cover letter by removing the mention
of those commands other
> > > > > > > than the respective gdb passthrough
> > > > > > >
> > > > > > > Aditya Gupta (5):
> > > > > > > ppc64: correct gdb passthroughs by implementing
machdep->get_cpu_reg
> > > > > > > remove 'frame' from prohibited commands
list
> > > > > > > synchronise cpu context changes between crash/gdb
> > > > > > > fix gdb_interface: restore gdb's output streams
at end of
> > > > > > > gdb_interface
> > > > > > > fix 'info threads' command
> > > > > > >
> > > > > > > crash_target.c | 44 ++++++++++++++++
> > > > > > > defs.h | 130
+++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > > gdb-10.2.patch | 110
+++++++++++++++++++++++++++++++++++++++-
> > > > > > > gdb_interface.c | 2 +-
> > > > > > > kernel.c | 47 +++++++++++++++--
> > > > > > > ppc64.c | 95
+++++++++++++++++++++++++++++++++--
> > > > > > > task.c | 14 ++++++
> > > > > > > tools.c | 2 +-
> > > > > > > 8 files changed, 434 insertions(+), 10 deletions(-)
> > > > > > >
> > > > > > > --
> > > > > > > 2.41.0
> > > > > > >
> > > > >
> > > > --
> > > > Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io
> > > > To unsubscribe send an email to
devel-leave(a)lists.crash-utility.osci.io
> > > > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
> > > > Contribution Guidelines:
https://github.com/crash-utility/crash/wiki
> > >
> >
>