Re: [Crash-utility] Crash faults when determining panic task

Thursday, 29 September 2011

Dave,

Adding --no_elf_notes to the crash invocation does indeed start crash
with without issue.  Do you think that I am dealing with a
corrupted/incomplete vmcore (as evident in that extremely large n_descsz
value) or is this a bug that crash could more gracefully handle?

As far as the kernel is concerned,
2.6.32-131.0.15.el6.exp10.bz16586.x86_64 was a stock RH 2.6.32-131.0.15
with an added patch for handling an MD Raid bug (RHBZ-707268).   Stratus
does load a driver to track dirty VM pages for harvesting purposes, but
does not change general VM behavior.

FWIW, this is the only vmcore that I've seen ELF note faulting or
invalid section numbers.

Thanks,

-- Joe

-----Original Message-----
From: crash-utility-bounces(a)redhat.com
[mailto:crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson
Sent: Wednesday, September 28, 2011 5:15 PM
To: Discussion list for crash utility usage,maintenance and development
Subject: Re: [Crash-utility] Crash faults when determining panic task

Hi Joe,

It pretty clear it's due to this change in 5.1.5:

         - Implemented the capability of using the NT_PRSTATUS ELF note
data
           that is saved in version 4 compressed kdump headers to
determine the
           starting stack and instruction pointer hooks for x86 and
x86_64
           backtraces when they cannot be determined in the traditional
manners.
           (wang.chao(a)cn.fujitsu.com, wency(a)cn.fujitsu.com)

What happens if you run it like so:

  $ crash --no_elf_notes vmlinux vmcore

As far as this message:

  WARNING: sparsemem: invalid section number: 137438888923

That should be outside the realm of Fujitsu's ELF notes patch.  Does
this kernel
have some kind of Stratus VM modification?

Dave

----- Original Message -----
...

 Crash faults when determining panic task

 I have a vmcore generated on RHEL6.1 that newer versions of crash
 have trouble analyzing (5.1.1-2.el6 seems to work ok) .

 I can provide additional binary files if needed, just let me know
 what convention best suits the list (ftp, private email attachment,
 etc.)

 Crash Version : OS: Result:

 crash 5.1.8 Debian wheezy faults

 crash 5.1.7-1.el6 RHEL6.2 Alpha faults

 crash 5.1.1-2.el6 RHEL6.1 ok

 Kernel:

 2.6.32-131.0.15.el6.exp10.bz16586.x86_64 ( 2.6.32-131.0.15 + a fix
 for Red Hat bz - 707268)

 Interesting warnings when starting crash:

 WARNING: sparsemem: invalid section number: 137438888923

 WARNING: sparsemem: invalid section number: 137438888923

 First fault, null pointer deference:

 please wait... (determining panic task)

 Program received signal SIGSEGV, Segmentation fault.

 x86_64_get_dumpfile_stack_frame (rsp=0x7fffffffcc58,
 rip=0x7fffffffcc50,

 bt_in=0x7fffffffcce0) at x86_64.c:4183

 4183 ur_rip = ULONG(user_regs +

 (gdb) p user_regs

 $1 = 0x0

 Workaround, check that bt->machdep is not NULL:

 diff -Nupr crash-5.1.8/x86_64.c crash-5.1.8.new/x86_64.c

 --- crash-5.1.8/x86_64.c 2011-09-16 15:01:12.000000000 -0400

 +++ crash-5.1.8.new/x86_64.c 2011-09-28 14:12:45.347188571 -0400

 @@ -4178,7 +4178,7 @@ x86_64_get_dumpfile_stack_frame(struct b

 goto skip_stage;

 }

 }

 - } else if (ELF_NOTES_VALID()) {

 + } else if (ELF_NOTES_VALID() && bt->machdep) {

 user_regs = bt->machdep;

 ur_rip = ULONG(user_regs +

 OFFSET(user_regs_struct_rip));

 Second fault, a curiously large n_descsz in elf note header:

 please wait... (determining panic task)

 Program received signal SIGSEGV, Segmentation fault.

 get_regs_from_note (note=0xd26472 "\b", ip=0x7fffffffc4e0,
 sp=0x7fffffffc4e8)

 at netdump.c:2221

 2221 *sp = ULONG(user_regs + offset_sp);

 (gdb) p *(Elf64_Nhdr *)note

 $1 = {n_namesz = 8, n_descsz = 3438804992, n_type = 8}

 Workaround, do not attempt reading registers from elf notes (this
 chunk of code was not present in crash 5.1.1):

 diff -Nupr crash-5.1.8/netdump.c crash-5.1.8.new/netdump.c

 --- crash-5.1.8/netdump.c 2011-09-16 15:01:12.000000000 -0400

 +++ crash-5.1.8.new/netdump.c 2011-09-28 14:14:43.687183734 -0400

 @@ -2286,7 +2286,7 @@ get_netdump_regs_x86_64(struct bt_info *

 bt->machdep = (void *)user_regs;

 }

 -

 +#if 0

 if (ELF_NOTES_VALID() &&

 (bt->flags & BT_DUMPFILE_SEARCH) && DISKDUMP_DUMPFILE() &&

 (note = (Elf64_Nhdr *)

 @@ -2305,7 +2305,7 @@ get_netdump_regs_x86_64(struct bt_info *

 bt->machdep = (void *)user_regs;

 }

 -

 +#endif

 machdep->get_stack_frame(bt, ripp, rspp); }

 Given the warning messages at the beginning of the process, I'm sure
 if I' m dealing with a corrupted or incomplete vmcore image. Let me
 know what additional info could be useful if this seems worth
 debugging further.

 Thanks,

 -- Joe Lawrence
 --
 Crash-utility mailing list
 Crash-utility(a)redhat.com
 https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Crash faults when determining panic task