Hi, I'm happy to ask developers for help , I'm really confused for my
crash error.
My computer room is running 60 nodes,OS: suse 11 sp1 x86_64
Kernel:2.6.32.12-0.7-default Crash:crash-5.0.1-1.5.5
recently , server often kernel crash, Kdump service dump the vmcore
in /var/crash/happen-date/vmcore,
I installed the kernel-debuginfo so that check out why crash ,but I
got this:
linux:~/bb # crash -d8 vmlinux-2.6.32.12-0.7-default vmlinux-2.6.32.12-0.7-default.debug
vmcore
crash 5.0.1
Copyright (C) 2002-2010 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public
License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for
details.
compressed kdump: header->utsname.machine:
diskdump_data:
filename: (null)
flags: 6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED)
dfd: 3
ofp: 0
machine_type: 62 (EM_X86_64)
header: ccdfe0
signature: "KDUMP "
header_version: 1
utsname:
sysname:
nodename:
release:
version:
machine:
domainname:
timestamp:
tv_sec: 0
tv_usec: 0
status: 0 ()
block_size: 4096
sub_hdr_size: 1
bitmap_blocks: 400
max_mapnr: 6553600
total_ram_blocks: 0
device_blocks: 0
written_blocks: 0
current_cpu: 0
nr_cpus: 1
tasks[nr_cpus]: 0
sub_header: 0 (n/a)
sub_header_kdump: cceff0
phys_base: 0
dump_level: 0 (0x0)
data_offset: 192000
block_size: 4096
block_shift: 12
bitmap: 7fcca1e01010
bitmap_len: 1638400
dumpable_bitmap: 7fcca1002010
byte: 0
bit: 0
compressed_page: ce3220
curbufptr: 0
page_cache_hdr[0]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cd3210
pg_hit_count: 0
page_cache_hdr[1]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cd4210
pg_hit_count: 0
page_cache_hdr[2]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cd5210
pg_hit_count: 0
page_cache_hdr[3]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cd6210
pg_hit_count: 0
page_cache_hdr[4]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cd7210
pg_hit_count: 0
page_cache_hdr[5]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cd8210
pg_hit_count: 0
page_cache_hdr[6]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cd9210
pg_hit_count: 0
page_cache_hdr[7]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cda210
pg_hit_count: 0
page_cache_hdr[8]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cdb210
pg_hit_count: 0
page_cache_hdr[9]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cdc210
pg_hit_count: 0
page_cache_hdr[10]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cdd210
pg_hit_count: 0
page_cache_hdr[11]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cde210
pg_hit_count: 0
page_cache_hdr[12]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: cdf210
pg_hit_count: 0
page_cache_hdr[13]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: ce0210
pg_hit_count: 0
page_cache_hdr[14]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: ce1210
pg_hit_count: 0
page_cache_hdr[15]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: ce2210
pg_hit_count: 0
page_cache_buf: cd3210
evict_index: 0
evictions: 0
accesses: 0
cached_reads: 0
valid_pages: cd0000
crash: pv_init_ops exists: ARCH_PVOPS
compressed kdump: phys_base: 0
gdb vmlinux-2.6.32.12-0.7-default.debug
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
http://gnu.org/licenses/gpl.html >
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show
copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
GETBUF(248 -> 0)
GETBUF(1500 -> 1)
FREEBUF(1)
FREEBUF(0)
<readmem: ffffffff8141ae40, KVADDR, "kernel_config_data", 32768, ROE),
1ab66b0>
addr: ffffffff8141ae40 paddr: 141ae40 cnt: 448
crash: seek error: kernel virtual address: ffffffff8141ae40 type:
"kernel_config_data"
WARNING: cannot read kernel_config_data
GETBUF(248 -> 0)
FREEBUF(0)
GETBUF(512 -> 0)
<readmem: ffffffff81413640, KVADDR, "cpu_possible_mask", 8, (FOE),
7fff84995368>
addr: ffffffff81413640 paddr: 1413640 cnt: 8
crash: seek error: kernel virtual address: ffffffff81413640 type:
"cpu_possible_mask"
if I use --minimal can got crash> but It limits a lot..
something like crash can't find cpu_possible_mask and
cpu_possible_mask in vmcore I guess,
I Google it and find some email talking about this , read the
whitepaper writed by anderson, but I can't resolve it.
any help I am really appreciated . Any other infomation need I
provide, please tell me ,I'll reply as soon as possible.
Well for starters, crash-5.0.1 is quite old, and crash-5.0.1-1.5.5 is
a SUSE derivative, and typically I prefer to not get involved too
deeply into debugging issues with (1) older, and (2) derivative versions
of crash.
That being said, for some reason, when trying to access the physical pages
at 141ae40 and 1413640, the compressed kdump is returning SEEK_ERRORs,
which means that those two pages could not be found in the dumpfile.
Since they are both static kernel symbols that are below "edata", one
would expect that they would be there, even if the compressed kdump
had filtered out user pages, cache pages, zero-pages, etc.
A couple things you can try:
(1) Run "crash --minimal ..." on the vmcore, and enter:
crash> rd linux_banner 30
When I run it on RHEL6 2.6.32-based system it looks like this:
crash> rd linux_banner 30
ffffffff81600020: 65762078756e694c 2e32206e6f697372 Linux version 2.
ffffffff81600030: 3032322d32332e36 3638782e366c652e 6.32-220.el6.x86
ffffffff81600040: 636f6d282034365f 7840646c6975626b _64 (mockbuild@x
ffffffff81600050: 622e3430302d3638 736f622e646c6975 86-004.build.bos
ffffffff81600060: 2e7461686465722e 63672820296d6f63 .redhat.com) (gc
ffffffff81600070: 6f69737265762063 20352e342e34206e c version 4.4.5
ffffffff81600080: 3431323031313032 6148206465522820 20110214 (Red Ha
ffffffff81600090: 2d352e342e342074 2943434728202936 t 4.4.5-6) (GCC)
ffffffff816000a0: 4d53203123202920 6f4e206465572050 ) #1 SMP Wed No
ffffffff816000b0: 303a383020392076 5453452033313a33 v 9 08:03:13 EST
ffffffff816000c0: 00000a3131303220 0000000000000000 2011...........
ffffffff816000d0: 0000000000000000 0000000000000000 ................
ffffffff816000e0: 6973726576207325 6d28207325206e6f %s version %s (m
ffffffff816000f0: 646c6975626b636f 3430302d36387840 ockbuild@x86-004
ffffffff81600100: 622e646c6975622e 61686465722e736f .build.bos.redha
crash>
The "Linux version..." string should line up as shown above. What happens
when you do that?
(2) Log onto the machine that crashed -- which is still presumably running
the vmlinux-2.6.32.12-0.7-default kernel -- and try to bring up crash
on the live system:
$ crash vmlinux-2.6.32.12-0.7-default vmlinux-2.6.32.12-0.7-default.debug
What happens then?
Dave