Hi Joe,
It pretty clear it's due to this change in 5.1.5:
- Implemented the capability of using the NT_PRSTATUS ELF note data
that is saved in version 4 compressed kdump headers to determine the
starting stack and instruction pointer hooks for x86 and x86_64
backtraces when they cannot be determined in the traditional manners.
(wang.chao(a)cn.fujitsu.com, wency(a)cn.fujitsu.com)
What happens if you run it like so:
$ crash --no_elf_notes vmlinux vmcore
As far as this message:
WARNING: sparsemem: invalid section number: 137438888923
That should be outside the realm of Fujitsu's ELF notes patch. Does this kernel
have some kind of Stratus VM modification?
Dave
----- Original Message -----
Crash faults when determining panic task
I have a vmcore generated on RHEL6.1 that newer versions of crash
have trouble analyzing (5.1.1-2.el6 seems to work ok) .
I can provide additional binary files if needed, just let me know
what convention best suits the list (ftp, private email attachment,
etc.)
Crash Version : OS: Result:
crash 5.1.8 Debian wheezy faults
crash 5.1.7-1.el6 RHEL6.2 Alpha faults
crash 5.1.1-2.el6 RHEL6.1 ok
Kernel:
2.6.32-131.0.15.el6.exp10.bz16586.x86_64 ( 2.6.32-131.0.15 + a fix
for Red Hat bz - 707268)
Interesting warnings when starting crash:
WARNING: sparsemem: invalid section number: 137438888923
WARNING: sparsemem: invalid section number: 137438888923
First fault, null pointer deference:
please wait... (determining panic task)
Program received signal SIGSEGV, Segmentation fault.
x86_64_get_dumpfile_stack_frame (rsp=0x7fffffffcc58,
rip=0x7fffffffcc50,
bt_in=0x7fffffffcce0) at x86_64.c:4183
4183 ur_rip = ULONG(user_regs +
(gdb) p user_regs
$1 = 0x0
Workaround, check that bt->machdep is not NULL:
diff -Nupr crash-5.1.8/x86_64.c crash-5.1.8.new/x86_64.c
--- crash-5.1.8/x86_64.c 2011-09-16 15:01:12.000000000 -0400
+++ crash-5.1.8.new/x86_64.c 2011-09-28 14:12:45.347188571 -0400
@@ -4178,7 +4178,7 @@ x86_64_get_dumpfile_stack_frame(struct b
goto skip_stage;
}
}
- } else if (ELF_NOTES_VALID()) {
+ } else if (ELF_NOTES_VALID() && bt->machdep) {
user_regs = bt->machdep;
ur_rip = ULONG(user_regs +
OFFSET(user_regs_struct_rip));
Second fault, a curiously large n_descsz in elf note header:
please wait... (determining panic task)
Program received signal SIGSEGV, Segmentation fault.
get_regs_from_note (note=0xd26472 "\b", ip=0x7fffffffc4e0,
sp=0x7fffffffc4e8)
at netdump.c:2221
2221 *sp = ULONG(user_regs + offset_sp);
(gdb) p *(Elf64_Nhdr *)note
$1 = {n_namesz = 8, n_descsz = 3438804992, n_type = 8}
Workaround, do not attempt reading registers from elf notes (this
chunk of code was not present in crash 5.1.1):
diff -Nupr crash-5.1.8/netdump.c crash-5.1.8.new/netdump.c
--- crash-5.1.8/netdump.c 2011-09-16 15:01:12.000000000 -0400
+++ crash-5.1.8.new/netdump.c 2011-09-28 14:14:43.687183734 -0400
@@ -2286,7 +2286,7 @@ get_netdump_regs_x86_64(struct bt_info *
bt->machdep = (void *)user_regs;
}
-
+#if 0
if (ELF_NOTES_VALID() &&
(bt->flags & BT_DUMPFILE_SEARCH) && DISKDUMP_DUMPFILE() &&
(note = (Elf64_Nhdr *)
@@ -2305,7 +2305,7 @@ get_netdump_regs_x86_64(struct bt_info *
bt->machdep = (void *)user_regs;
}
-
+#endif
machdep->get_stack_frame(bt, ripp, rspp); }
Given the warning messages at the beginning of the process, I'm sure
if I' m dealing with a corrupted or incomplete vmcore image. Let me
know what additional info could be useful if this seems worth
debugging further.
Thanks,
-- Joe Lawrence
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility