Re: [Crash-utility] Crash, won't read my vmcore "crash: page excluded: kernel virtual address:"

Tuesday, 11 February 2014

----- Original Message -----
...
 Dave Anderson reached out and wrote:

 ----- Original Message -----
 > [root kvm7 127.0.0.1-2014-02-07-19:17:09]# crash
 > /boot/System.map-2.6.32-220.el6.x86_64.debug
 > /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux vmcore
 > 
 > crash 5.1.8-1.el6
 > Copyright (C) 2002-2011 Red Hat, Inc.
 > Copyright (C) 2004, 2005, 2006 IBM Corporation
 > Copyright (C) 1999-2006 Hewlett-Packard Co
 > Copyright (C) 2005, 2006 Fujitsu Limited
 > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
 > Copyright (C) 2005 NEC Corporation
 > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
 > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
 > This program is free software, covered by the GNU General Public License,
 > and you are welcome to change it and/or distribute copies of it under
 > certain conditions. Enter "help copying" to see the conditions.
 > This program has absolutely no warranty. Enter "help warranty" for
details.
 > GNU gdb (GDB) 7.0
 > Copyright (C) 2009 Free Software Foundation, Inc.
 > License GPLv3+: GNU GPL version 3 or later <
 > http://gnu.org/licenses/gpl.html
 > > 
 > This is free software: you are free to change and redistribute it.
 > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
 > and "show warranty" for details.
 > This GDB was configured as "x86_64-unknown-linux-gnu"...
 > 
 > crash: page excluded: kernel virtual address: ffffffff81542000 type:
"cpu_possible_mask"
 > 
 > I can go into minimal,
 > 
 > 
 > nm -Bn /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux |
 > grep _stext
 > ffffffff81000198 T _stext
 > 
 > cat /proc/kallsyms | grep _stext
 > ffffffff81000198 T _stext
 > 
 > If I use the System Map parm I get this warning
 > 
 > WARNING: kernels compiled by different gcc versions:
 > /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux: 4.4.5
 > vmcore kernel: 4.4.6
 > 
 > 
 > Would really like to understand why this system crashed. I know I'm a bit
 > behind on my kernel versions however, but I should be able to look at this
 > kernel??
 > 
 > Thanks
 > Tory

 It looks like the vmcore and vmlinux file don't match, like maybe the crashing
 system was running the standard 2.6.32-220.el6.x86_64 kernel, and you're trying
 to debug it using the 2.6.32-220.el6.x86_64.debug kernel variant?

 First thing -- *never* use a System.map file unless for some reason you don't
 have the original kernel's vmlinux available *and* you feel that the vmlinux
 file you have is very close to the crashing kernel's vmlinux. Bit with any
 RHEL standard (unmodified) vmlinux/vmcore setup, the System.map is completely
 useless.

 So the first question is: what kernel generated the vmcore?

 Do this:

 $ strings vmcore | grep '2.6.32'
 Dave

 --
 Dave you are right, I thought I had to use the devel kernel and in fact my
 system is not running that, so it crashed with the standard 2.6.32-220.el6.x86_64
kernel.

 [tblue@kvm7 127.0.0.1-2014-02-07-19:17:09]$ sudo strings vmcore | grep '2.6.32'

 2.6.32-220.el6.x86_64
 OSRELEASE=2.6.32-220.el6.x86_64

 But it won't take my vmlinux from /boot

 crash: /boot/vmlinuz-2.6.32-220.el6.x86_64: not a supported file format 
There is no vmlinux file in /boot.  The "vmlinuz" (with-a-z) file is not
usable.
You will always need the vmlinux from from the associated kernel-debuginfo rpm.

...
 Yes sir you were correct, I was using the wrong kernel!

 please wait... (determining panic task)
 WARNING: multiple active tasks have called die

 KERNEL: /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64/vmlinux
 DUMPFILE: /libvirt/crash/127.0.0.1-2014-02-07-19:17:09/vmcore [PARTIAL DUMP]
 CPUS: 32
 DATE: Fri Feb 7 18:16:05 2014
 UPTIME: 226 days, 21:36:13
 LOAD AVERAGE: 2.42, 2.68, 2.69
 TASKS: 816
 NODENAME: kvm7.domain.com
 RELEASE: 2.6.32-220.el6.x86_64
 VERSION: #1 SMP Tue Dec 6 19:48:22 GMT 2011
 MACHINE: x86_64 (2200 Mhz)
 MEMORY: 88 GB
 PANIC: ""
 PID: 0
 COMMAND: "swapper"
 TASK: ffff881665514b40 (1 of 32) [THREAD_INFO: ffff880c6124e000]
 CPU: 19
 STATE: TASK_RUNNING (PANIC)

 Nothing stands out as s bug or reason to fail

 divide error: 0000 [#1] SMP
 last sysfs file: /sys/devices/system/cpu/cpu31/cache/index2/shared_cpu_map
 CPU 19
 Modules linked in: ext3 jbd ip6table_filter ip6_tables ebtable_nat ebtables
 ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
 nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables
 sunrpc bridge stp llc bonding ipv6 vhost_net macvtap macvlan tun kvm_intel
 kvm cdc_ether usbnet mii microcode i2c_i801 i2c_core iTCO_wdt
 iTCO_vendor_support shpchp igb ioatdma dca ses enclosure sg ext4 mbcache
 jbd2 sr_mod cdrom sd_mod crc_t10dif ahci megaraid_sas dm_mirror
 dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

 Pid: 0, comm: swapper Not tainted 2.6.32-220.el6.x86_64 #1 IBM System x3650
 M4 -[7915AC1]-/00J6528
 RIP: 0010:[<ffffffff81054ad5>] [<ffffffff81054ad5>]
find_busiest_group+0x5c5/0xb20
 RSP: 0018:ffff880028363c40 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff880028363e64 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff8800282cf540 RDI: ffff8800282d5fc0
 RBP: ffff880028363dd0 R08: ffff8800282cf860 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffff01
 R13: 0000000000015fc0 R14: ffffffffffffffff R15: 0000000000000000
 FS: 0000000000000000(0000) GS:ffff880028360000(0000) knlGS:0000000000000000
 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
 CR2: 00007f4e5215c000 CR3: 00000011bea54000 CR4: 00000000000426e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process swapper (pid: 0, threadinfo ffff880c6124e000, task ffff881665514b40)
 Stack:
 ffff880028363d70 ffff880028363ce0 ffff880028363ca0 000000000000024d
 <0> ffff8800282cf860 ffff880028363e58 0101881664b121a8 0000000600000000
 <0> 0000000600000000 ffff8800282cf540 0000000123386cc0 0000000000000008
 Call Trace:
 <IRQ>
 [<ffffffffa02e4669>] ? br_handle_frame_finish+0x179/0x2a0 [bridge]
 [<ffffffff8105fc52>] rebalance_domains+0x1a2/0x5b0
 [<ffffffff81060153>] run_rebalance_domains+0xf3/0x160
 [<ffffffff8107c4f0>] ? get_next_timer_interrupt+0x1b0/0x250
 [<ffffffff81072161>] __do_softirq+0xc1/0x1d0
 [<ffffffff81097e0a>] ? sched_clock_idle_wakeup_event+0x1a/0x20
 [<ffffffff8100c24c>] call_softirq+0x1c/0x30
 [<ffffffff8100de85>] do_softirq+0x65/0xa0
 [<ffffffff81071f45>] irq_exit+0x85/0x90
 [<ffffffff8102a255>] smp_call_function_single_interrupt+0x35/0x40
 [<ffffffff8100bdb3>] call_function_single_interrupt+0x13/0x20
 <EOI>
 [<ffffffff812c4a5e>] ? intel_idle+0xde/0x170
 [<ffffffff812c4a41>] ? intel_idle+0xc1/0x170
 [<ffffffff813f9f47>] cpuidle_idle_call+0xa7/0x140
 [<ffffffff81009e06>] cpu_idle+0xb6/0x110
 [<ffffffff814e5f23>] start_secondary+0x202/0x245
 Code: d0 b8 01 00 00 00 48 c1 ea 0a 48 85 d2 0f 45 c2 41 89 40 08 66 90 4c 8b
 85 e0 fe ff ff 48 8b 45 a8 31 d2 41 8b 48 08 48 c1 e0 0a <48> f7 f1 48 8b 4d
 b0 48 89 45 a0 31 c0 48 85 c9 74 0c 48 8b 45
 RIP [<ffffffff81054ad5>] find_busiest_group+0x5c5/0xb20
 RSP <ffff880028363c40>

 Is there a forum that would help me figure out what exactly cause this crash
 as it's not the first time, across this series of servers running KVM

 Thank you sir,

 Tory  
...
From the information above, there was a divide-by-zero fault in
find_busiest_group(). If you ran the "bt" command on the panic task, it
might be a little more obvious,
but the "divide error: 0000 [#1] SMP" string comes from the divide_error()
function.

Anyway, you are running 2.6.32-220.el6, and from a more recent kernel.spec changelog, 
this issue was fixed in 2.6.32-248.el6:

* Tue Mar 06 2012 Aristeu Rozanski <arozansk(a)redhat.com&gt; [2.6.32-248.el6]
- [netdrv] bnx2: revert firmware load modifications (Neil Horman) [720428]
- [virt] virtio: balloon: leak / fill balloon across S4 (Amit Shah) [798583]
- [scsi] silencing 'killing requests for dead queue' (David Milburn) [798672]
- [scsi] sd_dif: fix setting bio flags (Jeff Moyer) [799075]
- [scsi] megaraid_sas: driver update to version 00.00.06.14-rh1 (Tomas Henzl) [749923]
- [infiniband] srp: fix include ordering issue (Doug Ledford) [791209]
- [sched] Fix Kernel divide by zero panic in find_busiest_group() (Larry Woodman) [785959]

Time to upgrade...

Dave

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Crash, won't read my vmcore "crash: page excluded: kernel virtual address:"