Re: [Crash-utility] Crash faults when determining panic task

Wednesday, 28 September 2011

Hi Joe,

It pretty clear it's due to this change in 5.1.5:

         - Implemented the capability of using the NT_PRSTATUS ELF note data
           that is saved in version 4 compressed kdump headers to determine the
           starting stack and instruction pointer hooks for x86 and x86_64
           backtraces when they cannot be determined in the traditional manners.
           (wang.chao(a)cn.fujitsu.com, wency(a)cn.fujitsu.com)

What happens if you run it like so:

  $ crash --no_elf_notes vmlinux vmcore

As far as this message:

  WARNING: sparsemem: invalid section number: 137438888923

That should be outside the realm of Fujitsu's ELF notes patch.  Does this kernel
have some kind of Stratus VM modification?

Dave

----- Original Message -----
...

 Crash faults when determining panic task

 I have a vmcore generated on RHEL6.1 that newer versions of crash
 have trouble analyzing (5.1.1-2.el6 seems to work ok) .

 I can provide additional binary files if needed, just let me know
 what convention best suits the list (ftp, private email attachment,
 etc.)

 Crash Version : OS: Result:

 crash 5.1.8 Debian wheezy faults

 crash 5.1.7-1.el6 RHEL6.2 Alpha faults

 crash 5.1.1-2.el6 RHEL6.1 ok

 Kernel:

 2.6.32-131.0.15.el6.exp10.bz16586.x86_64 ( 2.6.32-131.0.15 + a fix
 for Red Hat bz - 707268)

 Interesting warnings when starting crash:

 WARNING: sparsemem: invalid section number: 137438888923

 WARNING: sparsemem: invalid section number: 137438888923

 First fault, null pointer deference:

 please wait... (determining panic task)

 Program received signal SIGSEGV, Segmentation fault.

 x86_64_get_dumpfile_stack_frame (rsp=0x7fffffffcc58,
 rip=0x7fffffffcc50,

 bt_in=0x7fffffffcce0) at x86_64.c:4183

 4183 ur_rip = ULONG(user_regs +

 (gdb) p user_regs

 $1 = 0x0

 Workaround, check that bt->machdep is not NULL:

 diff -Nupr crash-5.1.8/x86_64.c crash-5.1.8.new/x86_64.c

 --- crash-5.1.8/x86_64.c 2011-09-16 15:01:12.000000000 -0400

 +++ crash-5.1.8.new/x86_64.c 2011-09-28 14:12:45.347188571 -0400

 @@ -4178,7 +4178,7 @@ x86_64_get_dumpfile_stack_frame(struct b

 goto skip_stage;

 }

 }

 - } else if (ELF_NOTES_VALID()) {

 + } else if (ELF_NOTES_VALID() && bt->machdep) {

 user_regs = bt->machdep;

 ur_rip = ULONG(user_regs +

 OFFSET(user_regs_struct_rip));

 Second fault, a curiously large n_descsz in elf note header:

 please wait... (determining panic task)

 Program received signal SIGSEGV, Segmentation fault.

 get_regs_from_note (note=0xd26472 "\b", ip=0x7fffffffc4e0,
 sp=0x7fffffffc4e8)

 at netdump.c:2221

 2221 *sp = ULONG(user_regs + offset_sp);

 (gdb) p *(Elf64_Nhdr *)note

 $1 = {n_namesz = 8, n_descsz = 3438804992, n_type = 8}

 Workaround, do not attempt reading registers from elf notes (this
 chunk of code was not present in crash 5.1.1):

 diff -Nupr crash-5.1.8/netdump.c crash-5.1.8.new/netdump.c

 --- crash-5.1.8/netdump.c 2011-09-16 15:01:12.000000000 -0400

 +++ crash-5.1.8.new/netdump.c 2011-09-28 14:14:43.687183734 -0400

 @@ -2286,7 +2286,7 @@ get_netdump_regs_x86_64(struct bt_info *

 bt->machdep = (void *)user_regs;

 }

 -

 +#if 0

 if (ELF_NOTES_VALID() &&

 (bt->flags & BT_DUMPFILE_SEARCH) && DISKDUMP_DUMPFILE() &&

 (note = (Elf64_Nhdr *)

 @@ -2305,7 +2305,7 @@ get_netdump_regs_x86_64(struct bt_info *

 bt->machdep = (void *)user_regs;

 }

 -

 +#endif

 machdep->get_stack_frame(bt, ripp, rspp); }

 Given the warning messages at the beginning of the process, I'm sure
 if I' m dealing with a corrupted or incomplete vmcore image. Let me
 know what additional info could be useful if this seems worth
 debugging further.

 Thanks,

 -- Joe Lawrence
 --
 Crash-utility mailing list
 Crash-utility(a)redhat.com
 https://www.redhat.com/mailman/listinfo/crash-utility

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Crash faults when determining panic task