Hi Aditya,
On Wed, Dec 13, 2023 at 10:28 PM Aditya Gupta <adityag(a)linux.ibm.com> wrote:
Hi Tao,
On Wed, Dec 13, 2023 at 09:03:37PM +0800, Tao Liu wrote:
> Hi Aditya,
>
> I encountered a problem for analyze the ppc64 vmcore after applied all
> patches in the patchset:
>
> crash> gdb bt
> #0 0xc000000000279d98 in crash_setup_regs (gdb: invalid kernel
> virtual address: fffffffffffffffb type: "gdb_readmem callback"
> gdb: invalid kernel virtual address: fffffffffffffff7 type:
> "gdb_readmem callback"
> gdb: invalid kernel virtual address: fffffffffffffff3 type:
> "gdb_readmem callback"
> gdb: invalid kernel virtual address: fffffffffffffffb type:
> "gdb_readmem callback"
> gdb: invalid kernel virtual address: fffffffffffffff7 type:
> "gdb_readmem callback"
> gdb: invalid kernel virtual address: fffffffffffffff3 type:
> "gdb_readmem callback"
> oldregs=<optimized out>, newregs=0xc000000012e87968) at
> ./arch/powerpc/include/asm/kexec.h:69
> #1 __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:975
> #2 0xfffffffffffffffb in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> crash> gdb info threads
> Id Target Id Frame
> 1 CPU 0 plpar_hcall_norets_notrace () at
> arch/powerpc/platforms/pseries/hvCall.S:112
> * 2 CPU 1 0xc000000000279d98 in crash_setup_regs (gdb:
> invalid kernel virtual address: fffffffffffffffb type: "gdb_readmem
> callback"
> gdb: invalid kernel virtual address: fffffffffffffff7 type:
> "gdb_readmem callback"
> gdb: invalid kernel virtual address: fffffffffffffff3 type:
> "gdb_readmem callback"
> gdb: invalid kernel virtual address: fffffffffffffffb type:
> "gdb_readmem callback"
> gdb: invalid kernel virtual address: fffffffffffffff7 type:
> "gdb_readmem callback"
> gdb: invalid kernel virtual address: fffffffffffffff3 type:
> "gdb_readmem callback"
> oldregs=<optimized out>, newregs=0xc000000012e87968) at
> ./arch/powerpc/include/asm/kexec.h:69
>
> Seems the crash stack unwinding gave a wrong value to gdb. I tried for
> some time to find out the root cause but got unlucky. Hope you can
> help me out. I can give you the vmcore to analyze this issue in
> another mail. Thanks in advance!
These kind of errors I mostly see due to symbol/structure change in kernel,
maybe something changed in kernel, or some invalid value was read from some
structure.
Thanks for the backtrace, will try this with upstream kernel.
Just to check I should cause the crash using 'echo c > /proc/sysrq-trigger'
right ? or was it done through some other way ?
Yes, I get the vmcore just by triggering kernel crash by "echo c >
/proc/sysrq-trigger'" as you mentioned. In addition, the kernel which
I used for debugging is kernel-5.14.0-362.15.1.el9_3. I didn't try the
upstream kernel...
>
> Currently I have made the x86_64 stack unwinding work based on your
> patchset. And I plan to post it upstream once your patchsets get
> merged. In addition, is there a plan to support the stack unwinding
> for live debugging in ppc64 arch? I think it is a useful feature
> too...
Wow, great. I will fix this issue in the patch series, and any issue, then I
guess our patches will be ready to merge :)
Thanks,
Aditya Gupta
>
> Thanks,
> Tao Liu
>
>
>
>
>
> On Tue, Dec 12, 2023 at 12:51 PM Aditya Gupta <adityag(a)linux.ibm.com> wrote:
> >
> > On Mon, Dec 11, 2023 at 08:04:50PM +0800, Lianbo Jiang wrote:
> > > On 12/9/23 20:45, Aditya Gupta wrote:
> > >
> > > > Hi, just a ping. Any comments on the series ?
> > >
> > > Hi, Aditya
> > >
> > >
> > > Thank you for the update. I will have a look and do the tests this week.
And
> > > give some feedback.
> >
> > Sure. Thanks Lianbo.
> >
> > - Aditya Gupta
> >
> > >
> > > Thanks.
> > >
> > > Lianbo
> > >
> > > >
> > > > On Mon, Dec 04, 2023 at 08:29:36PM +0530, Aditya Gupta wrote:
> > > > > The Problem:
> > > > > ============
> > > > >
> > > > > Currently crash is unable to show function arguments and local
variables, as
> > > > > gdb can do. And functionality for moving between frames
('up'/'down') is not
> > > > > working in crash.
> > > > >
> > > > > Crash has 'gdb passthroughs' for things gdb can do, but
the gdb passthroughs
> > > > > 'bt', 'frame', 'info locals',
'up', 'down' are not working either, due to
> > > > > gdb not getting the register values from
`crash_target::fetch_registers`,
> > > > > which then uses `machdep->get_cpu_reg`, which is not
implemented for PPC64
> > > > >
> > > > > Proposed Solution:
> > > > > ==================
> > > > >
> > > > > Fix the gdb passthroughs by implementing
"machdep->get_cpu_reg" for PPC64.
> > > > > This way, "gdb mode in crash" will support this
feature for both ELF and
> > > > > kdump-compressed vmcore formats, while "gdb" would
only have supported ELF
> > > > > format
> > > > >
> > > > > This way other features of 'gdb', such as seeing
> > > > > backtraces/registers/variables/arguments/local variables, moving
up and
> > > > > down stack frames, can be used with any ppc64 vmcore,
irrespective of
> > > > > being ELF format or kdump-compressed format.
> > > > >
> > > > > Implications on Architectures:
> > > > > ====================================
> > > > >
> > > > > No architecture other than PPC64 has been affected, other than
in case of
> > > > > 'frame' command
> > > > >
> > > > > As mentioned in patch #2, since frame will not be prohibited, so
it will print:
> > > > >
> > > > > crash> frame
> > > > > #0 <unavailable> in ?? ()
> > > > >
> > > > > Instead of before prohibited message:
> > > > >
> > > > > crash> frame
> > > > > crash: prohibited gdb command: frame
> > > > >
> > > > > Major change will be in 'gdb mode' on PPC64, that it
will print the frames, and
> > > > > local variables, instead of failing with errors showing no
frame, or showing
> > > > > that couldn't get PC, it will be able to give all this
information.
> > > > >
> > > > > Testing:
> > > > > ========
> > > > >
> > > > > Git tree with this patch series applied:
> > > > >
https://github.com/adi-g15-ibm/crash/tree/stack-unwind-3
> > > > >
> > > > > To test various gdb passthroughs:
> > > > >
> > > > > gdb> set
> > > > > gdb> set gdb on
> > > > > gdb> thread
> > > > > gdb> bt
> > > > > gdb> info threads
> > > > > gdb> info threads
> > > > > gdb> info locals
> > > > > gdb> info variables irq_rover_lock
> > > > > gdb> info args
> > > > > gdb> thread 2
> > > > > gdb> set gdb off
> > > > > gdb> set
> > > > > gdb> set -c 6
> > > > > gdb> gdb thread
> > > > > gdb> bt
> > > > > gdb> gdb bt
> > > > > gdb> frame
> > > > > gdb> up
> > > > > gdb> down
> > > > > gdb> info locals
> > > > >
> > > > > Known Issues:
> > > > > =============
> > > > >
> > > > > 1. In gdb mode, 'bt' might fail to show backtrace in few
vmcores collected
> > > > > from older kernels. This is a known issue due to register
mismatch, and
> > > > > its fix has been merged upstream:
> > > > >
> > > > > Commit:
https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef78581...
> > > > >
> > > > > Fixing GDB passthroughs on other architectures
> > > > > ==============================================
> > > > >
> > > > > Much of the work for making gdb passthroughs like 'gdb
bt', 'gdb
> > > > > thread', 'gdb info locals' etc. has been done by the
patches introducing
> > > > > 'machdep->get_cpu_reg' and this series fixing some
issues in that.
> > > > >
> > > > > Other architectures should be able to fix these gdb
functionalities by
> > > > > simply implementing 'machdep->get_cpu_reg (cpu, regno,
...)'.
> > > > >
> > > > > The reasoning behind that has been explained with a diagram in
commit
> > > > > description of patch #1
> > > > >
> > > > > I will assist with my findings/observations fixing it on ppc64
whenever needed.
> > > > >
> > > > > Additional Notes:
> > > > > =================
> > > > >
> > > > > Sorry, it took a long time to send this version. Tried fixing
'info
> > > > > threads' but wasn't able to. Gave it time again, and was
able to fix it
> > > > > this time after multiple days of debugging.
> > > > >
> > > > > Some other things from last version review:
> > > > >
> > > > > * 'info rv' not working:
> > > > > It's not supported in gdb, instead we need to use
'info locals rv' or
> > > > > 'info variables rv'
> > > > >
> > > > > * 'info variables' command hangs... and prints nothing
after hanging for long
> > > > > It likely hangs due to a lot of symbols being there, and
it's trying to
> > > > > get all gdb's output and page it, so Control+C messes it
up, but if we pass
> > > > > a regex filter to limit the output, eg. info variables rq,
then it doesn't
> > > > > hang, and prints the variables/symbols.
> > > > > Even with gdb, ie. simply running 'gdb vmlinux
vmcore' also hangs due
> > > > > to the lot of symbols
> > > > >
> > > > > * making crashing thread as default in gdb:
> > > > > This is implemented now, along with synchronising crash &
gdb contexts, in
> > > > > patch #3
> > > > >
> > > > > * 'info threads' not working:
> > > > > This turned to be due to a bug in gdb_interface. I fixed
'info
> > > > > threads' in 2 patches, to simplify it, first for the
gdb_interface,
> > > > > and another patch for setting the context correctly in crash
> > > > >
> > > > > * other info commands:
> > > > > I tested all the info commands, in crash along with this
patch.
> > > > > Most of those that fail in crash are due to gdb itself not
supporting
> > > > > them with vmcores, and other than that is the 'info
pretty' command,
> > > > > which might not be needed in crash anyways
> > > > >
> > > > > * live debugging showing only one thread:
> > > > > I tried it with crash, crash shows only the current thread,
ie.
> > > > > itself, so it does not have information of registers for the
other
> > > > > CPUs. Similarly gdb does not support live kernel debugging
(without
> > > > > connecting to a gdbstub/QEMU etc.).
> > > > > If you need I can make it show the current thread id
correctly for
> > > > > the one thread, but I don't think it might help much with
live
> > > > > debugging
> > > > >
> > > > > Hope, I set the context, thanks for the reviews, I replied and
worked
> > > > > on your suggestions, but got stuck there due to 'info
threads'
> > > > >
> > > > > Changelog:
> > > > > ==========
> > > > >
> > > > > V3:
> > > > > + default gdb thread will be the crashing thread, instead of
being
> > > > > thread '0'
> > > > > + synchronise crash cpu and gdb thread context
> > > > > + fix bug in gdb_interface, that replaced gdb's output
stream, losing
> > > > > output in some cases, such as info threads and extra output
in info
> > > > > variables
> > > > > + fix 'info threads'
> > > > >
> > > > > RFC V2:
> > > > > - removed patch implementing 'frame', 'up',
'down' in crash
> > > > > - updated the cover letter by removing the mention of those
commands other
> > > > > than the respective gdb passthrough
> > > > >
> > > > > Aditya Gupta (5):
> > > > > ppc64: correct gdb passthroughs by implementing
machdep->get_cpu_reg
> > > > > remove 'frame' from prohibited commands list
> > > > > synchronise cpu context changes between crash/gdb
> > > > > fix gdb_interface: restore gdb's output streams at end
of
> > > > > gdb_interface
> > > > > fix 'info threads' command
> > > > >
> > > > > crash_target.c | 44 ++++++++++++++++
> > > > > defs.h | 130
+++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > gdb-10.2.patch | 110
+++++++++++++++++++++++++++++++++++++++-
> > > > > gdb_interface.c | 2 +-
> > > > > kernel.c | 47 +++++++++++++++--
> > > > > ppc64.c | 95 +++++++++++++++++++++++++++++++++--
> > > > > task.c | 14 ++++++
> > > > > tools.c | 2 +-
> > > > > 8 files changed, 434 insertions(+), 10 deletions(-)
> > > > >
> > > > > --
> > > > > 2.41.0
> > > > >
> > >
> > --
> > Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io
> > To unsubscribe send an email to devel-leave(a)lists.crash-utility.osci.io
> > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
> > Contribution Guidelines:
https://github.com/crash-utility/crash/wiki
>