Just for sanity's sake, try this:
$ ./crash --minimal ../ddeb/usr/lib/debug/boot/vmlinux-3.13.0-39-generic
../dump.201412280256
and see if you can read the linux_banner string successfully. For example, using
my sample 3.13 kernel:
$ crash --minimal 3.13.0-0.rc1.git2.1.fc20_SLAB/vmlinux.gz
3.13.0-0.rc1.git2.1.fc20_SLAB/vmcore_c_d31
crash 7.0.9
Copyright (C) 2002-2014 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
NOTE: minimal mode commands: log, dis, rd, sym, eval, set, extend and exit
crash> rd -a linux_banner
ffffffff818000c0: Linux version 3.13.0-0.rc1.git2.1.fc20.x86_64 (root@hp-xw455
ffffffff818000fc:
) (gcc version 4.8.1 20130814 (Re
ffffffff81800138: d Hat 4.8.1-6) (GCC) ) #1 SMP Tue Nov 26 14:42:45 EST 2013
crash>
And then try reading other stuff, most notably the __per_cpu_offset[] array,
like this:
crash> rd __per_cpu_offset 256
Dave
----- Original Message -----
----- Original Message -----
> Hello,
>
> I have a couple dumps generated on Ubuntu Trusty LTS (3.13.0-39-generic
> kernel) which crash fails on.
>
> $ ./crash ../ddeb/usr/lib/debug/boot/vmlinux-3.13.0-39-generic
> ../dump.201412280256
>
> crash 7.0.9
> Copyright (C) 2002-2014 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005, 2011 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for
> details.
>
> GNU gdb (GDB) 7.6
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <
http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show
copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-unknown-linux-gnu"...
>
> crash: cannot determine thread return address
> please wait... (gathering kmem slab cache data)
> crash: invalid kernel virtual address: 1c type: "kmem_cache
> objsize/object_size"
> crash: failed to read pageflag_names entry
> please wait... (gathering module symbol data)
> WARNING: invalid kernel module size: 0
>
> crash: cannot determine idle task addresses from init_tasks[] or
> runqueues[]
>
> crash: cannot resolve "init_task_union"
>
>
> vmlinux-3.13.0-39-generic was extracted from Ubuntu ddeb:
>
> $ file ../ddeb/usr/lib/debug/boot/vmlinux-3.13.0-39-generic
> ../ddeb/usr/lib/debug/boot/vmlinux-3.13.0-39-generic: ELF 64-bit LSB
> executable, x86-64, version 1 (SYSV), statically linked,
> BuildID[sha1]=c4fa631d2cc34a0b2628a5de01a04e81a0667555, not stripped
>
> With -d8 I get:
>
> ...
> <read_diskdump: addr: ffffffffffffffff paddr: 7fffffff cnt: 1>
> read_diskdump: paddr/pfn: 7fffffff/7ffff -> cache physical page: 7ffff000
> crash: invalid kernel virtual address: 0 type: "memory section"
>
> The entire -d8 output is attached.
>
> Bogus "base kernel version" stands out immediately and I'm pretty
sure
> I've seen "0.0.0" in there a couple times with exactly the same dump.
> >From a quick look, the base kernel version code in kernel.c is not safe
> against kt->utsname.release being all zeroes.
>
> Eddy Gonzalo (CC'ed) can probably provide access to the dumps if
> needed.
>
> Thanks,
> Ilya
The obvious question is: are you sure that the vmlinux matches the dumpfile?
I say that because there are so many strange readings from this dumpfile,
As you noted, yes, this definitely is a mismatch, where the header shows
sysname: Linux
nodename: chqcephnas01
release: 3.13.0-39-generic
version: #66~precise1-Ubuntu SMP Wed Oct 29 09:56:49 UTC 2014
machine: x86_64
but this gets read from the dumpfile:
<readmem: ffffffff81c15284, KVADDR, "init_uts_ns", 390, (ROE), cfa7bc>
<read_diskdump: addr: ffffffff81c15284 paddr: 1c15284 cnt: 390>
read_diskdump: paddr/pfn: 1c15284/1c15 -> cache physical page: 1c15000
base kernel version: 0.13.0
And one of the first set of items accessed, are the contents of the cpu mask
variables:
<readmem: ffffffff8180acf0, KVADDR, "cpu_possible_mask", 8, (FOE),
7fff5ab8b618>
<read_diskdump: addr: ffffffff8180acf0 paddr: 180acf0 cnt: 8>
read_diskdump: paddr/pfn: 180acf0/180a -> cache physical page: 180a000
<readmem: ffffffff8180ace0, KVADDR, "cpu_present_mask", 8, (FOE),
7fff5ab8b618>
<read_diskdump: addr: ffffffff8180ace0 paddr: 180ace0 cnt: 8>
read_diskdump: paddr/pfn: 180ace0/180a -> physical page is cached: 180a000
<readmem: ffffffff8180ace8, KVADDR, "cpu_online_mask", 8, (FOE),
7fff5ab8b618>
<read_diskdump: addr: ffffffff8180ace8 paddr: 180ace8 cnt: 8>
read_diskdump: paddr/pfn: 180ace8/180a -> physical page is cached: 180a000
<readmem: ffffffff8180acd8, KVADDR, "cpu_active_mask", 8, (FOE),
7fff5ab8b618>
<read_diskdump: addr: ffffffff8180acd8 paddr: 180acd8 cnt: 8>
read_diskdump: paddr/pfn: 180acd8/180a -> physical page is cached: 180a000
But they all return NULL pointers. They should return pointers to bitmasks,
which then get read, and their contents displayed. For example, I've got
a 3.13 kernel dumpfile, where each mask pointer is read, the bitmask it
points
gets read, and then the contents are dumped:
<readmem: ffffffff8180a870, KVADDR, "cpu_possible_mask", 8, (FOE),
7fff5f116f48>
<read_diskdump: addr: ffffffff8180a870 paddr: 180a870 cnt: 8>
<readmem: ffffffff81d8c780, KVADDR, "possible", 1024, (ROE), f45b80>
<read_diskdump: addr: ffffffff81d8c780 paddr: 1d8c780 cnt: 1024>
cpu_possible_mask: 0 1 2 3
<readmem: ffffffff8180a860, KVADDR, "cpu_present_mask", 8, (FOE),
7fff5f116f48>
<read_diskdump: addr: ffffffff8180a860 paddr: 180a860 cnt: 8>
<readmem: ffffffff81d8bf80, KVADDR, "present", 1024, (ROE), f45b80>
<read_diskdump: addr: ffffffff81d8bf80 paddr: 1d8bf80 cnt: 128>
<read_diskdump: addr: ffffffff81d8c000 paddr: 1d8c000 cnt: 896>
cpu_present_mask: 0 1
<readmem: ffffffff8180a868, KVADDR, "cpu_online_mask", 8, (FOE),
7fff5f116f48>
<read_diskdump: addr: ffffffff8180a868 paddr: 180a868 cnt: 8>
<readmem: ffffffff81d8c380, KVADDR, "online", 1024, (ROE), f45b80>
<read_diskdump: addr: ffffffff81d8c380 paddr: 1d8c380 cnt: 1024>
cpu_online_mask: 0 1
<readmem: ffffffff8180a858, KVADDR, "cpu_active_mask", 8, (FOE),
7fff5f116f48>
<read_diskdump: addr: ffffffff8180a858 paddr: 180a858 cnt: 8>
<readmem: ffffffff81d8bb80, KVADDR, "active", 1024, (ROE), f45b80>
<read_diskdump: addr: ffffffff81d8bb80 paddr: 1d8bb80 cnt: 1024>
cpu_active_mask: 0 1
Right from the get-go, the __per_cpu_offset array looks like it's
returning all zeroes, in which case pretty much all is lost and the
dumpfile is useless.
That can be seen with the following readmem failure, which
should take the kt->__per_cpu_offset[0] value and add it to
the (per-cpu) symbol value of "cpu_number", which presumably
is b084 in that kernel, and where kt->__per_cpu_offset[0] is
apparently zero. Therefore this readmem() call:
if (!readmem(cpu_sp->value + kt->__per_cpu_offset[i],
KVADDR, &cpunumber, sizeof(int),
"cpu number (per_cpu)", QUIET|RETURN_ON_ERROR))
break;
generated this failure:
<readmem: b084, KVADDR, "cpu number (per_cpu)", 4, (ROE|Q),
7fff5ab9c800>
crash: invalid kernel virtual address: b084 type: "cpu number (per_cpu)"
The kt->__per_cpu_offset[] array would have been set up earlier in
kernel_init():
if (symbol_exists("__per_cpu_offset")) {
if (LKCD_KERNTYPES())
i = get_cpus_possible();
else
i = get_array_length("__per_cpu_offset", NULL, 0);
get_symbol_data("__per_cpu_offset",
sizeof(long)*((i && (i <= NR_CPUS)) ? i : NR_CPUS),
&kt->__per_cpu_offset[0]);
kt->flags |= PER_CPU_OFF;
}
It looks like it read the array OK, where the Ubuntu kernel looks like
it has 256 cpus configured:
<readmem: ffffffff81d130e0, KVADDR, "__per_cpu_offset", 2048, (FOE),
cfa968>
<read_diskdump: addr: ffffffff81d130e0 paddr: 1d130e0 cnt: 2048>
read_diskdump: paddr/pfn: 1d130e0/1d13 -> cache physical page: 1d13000
But when utilizing the stashed kt->__per_cpu_offset[0] value later on (for
cpu 0),
it got a zero offset.
So it looks like the vmlinux and dumpfile don't match, or perhaps the
dumpfile
is suspect.
It would be interesting to confirm that the kernel being used
(vmlinux-3.13.0-39-generic)
runs OK live on the crashing system.
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility