Re: [Crash-utility] Crash, won't read my vmcore "crash: page excluded: kernel virtual address:"

Tuesday, 11 February 2014

On Tue, Feb 11, 2014 at 6:53 AM, Dave Anderson <anderson(a)redhat.com&gt; wrote:

...

 ----- Original Message -----
 > Dave Anderson reached out and wrote:
 >
 > ----- Original Message -----
 > > [root kvm7 127.0.0.1-2014-02-07-19:17:09]# crash
 > > /boot/System.map-2.6.32-220.el6.x86_64.debug
 > > /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux vmcore
 > >
 > > crash 5.1.8-1.el6
 > > Copyright (C) 2002-2011 Red Hat, Inc.
 > > Copyright (C) 2004, 2005, 2006 IBM Corporation
 > > Copyright (C) 1999-2006 Hewlett-Packard Co
 > > Copyright (C) 2005, 2006 Fujitsu Limited
 > > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
 > > Copyright (C) 2005 NEC Corporation
 > > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
 > > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
 > > This program is free software, covered by the GNU General Public
 License,
 > > and you are welcome to change it and/or distribute copies of it under
 > > certain conditions. Enter "help copying" to see the conditions.
 > > This program has absolutely no warranty. Enter "help warranty" for
 details.
 > > GNU gdb (GDB) 7.0
 > > Copyright (C) 2009 Free Software Foundation, Inc.
 > > License GPLv3+: GNU GPL version 3 or later <
 > > http://gnu.org/licenses/gpl.html
 > > >
 > > This is free software: you are free to change and redistribute it.
 > > There is NO WARRANTY, to the extent permitted by law. Type "show
 copying"
 > > and "show warranty" for details.
 > > This GDB was configured as "x86_64-unknown-linux-gnu"...
 > >
 > > crash: page excluded: kernel virtual address: ffffffff81542000 type:
 "cpu_possible_mask"
 > >
 > > I can go into minimal,
 > >
 > >
 > > nm -Bn /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux |
 > > grep _stext
 > > ffffffff81000198 T _stext
 > >
 > > cat /proc/kallsyms | grep _stext
 > > ffffffff81000198 T _stext
 > >
 > > If I use the System Map parm I get this warning
 > >
 > > WARNING: kernels compiled by different gcc versions:
 > > /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux: 4.4.5
 > > vmcore kernel: 4.4.6
 > >
 > >
 > > Would really like to understand why this system crashed. I know I'm a
 bit
 > > behind on my kernel versions however, but I should be able to look at
 this
 > > kernel??
 > >
 > > Thanks
 > > Tory
 >
 > It looks like the vmcore and vmlinux file don't match, like maybe the
 crashing
 > system was running the standard 2.6.32-220.el6.x86_64 kernel, and you're
 trying
 > to debug it using the 2.6.32-220.el6.x86_64.debug kernel variant?
 >
 > First thing -- *never* use a System.map file unless for some reason you
 don't
 > have the original kernel's vmlinux available *and* you feel that the
 vmlinux
 > file you have is very close to the crashing kernel's vmlinux. Bit with
 any
 > RHEL standard (unmodified) vmlinux/vmcore setup, the System.map is
 completely
 > useless.
 >
 > So the first question is: what kernel generated the vmcore?
 >
 > Do this:
 >
 > $ strings vmcore | grep '2.6.32'
 > Dave
 >
 >
 > --
 > Dave you are right, I thought I had to use the devel kernel and in fact
 my
 > system is not running that, so it crashed with the standard
 2.6.32-220.el6.x86_64 kernel.
 >
 > [tblue@kvm7 127.0.0.1-2014-02-07-19:17:09]$ sudo strings vmcore | grep
 '2.6.32'
 >
 > 2.6.32-220.el6.x86_64
 > OSRELEASE=2.6.32-220.el6.x86_64
 >
 > But it won't take my vmlinux from /boot
 >
 > crash: /boot/vmlinuz-2.6.32-220.el6.x86_64: not a supported file format

 There is no vmlinux file in /boot.  The "vmlinuz" (with-a-z) file is not
 usable.
 You will always need the vmlinux from from the associated kernel-debuginfo
 rpm.

 > Yes sir you were correct, I was using the wrong kernel!
 >
 > please wait... (determining panic task)
 > WARNING: multiple active tasks have called die
 >
 > KERNEL: /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64/vmlinux
 > DUMPFILE: /libvirt/crash/127.0.0.1-2014-02-07-19:17:09/vmcore [PARTIAL
 DUMP]
 > CPUS: 32
 > DATE: Fri Feb 7 18:16:05 2014
 > UPTIME: 226 days, 21:36:13
 > LOAD AVERAGE: 2.42, 2.68, 2.69
 > TASKS: 816
 > NODENAME: kvm7.domain.com
 > RELEASE: 2.6.32-220.el6.x86_64
 > VERSION: #1 SMP Tue Dec 6 19:48:22 GMT 2011
 > MACHINE: x86_64 (2200 Mhz)
 > MEMORY: 88 GB
 > PANIC: ""
 > PID: 0
 > COMMAND: "swapper"
 > TASK: ffff881665514b40 (1 of 32) [THREAD_INFO: ffff880c6124e000]
 > CPU: 19
 > STATE: TASK_RUNNING (PANIC)
 >
 > Nothing stands out as s bug or reason to fail
 >
 > divide error: 0000 [#1] SMP
 > last sysfs file:
 /sys/devices/system/cpu/cpu31/cache/index2/shared_cpu_map
 > CPU 19
 > Modules linked in: ext3 jbd ip6table_filter ip6_tables ebtable_nat
 ebtables
 > ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
 xt_state
 > nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter
 ip_tables
 > sunrpc bridge stp llc bonding ipv6 vhost_net macvtap macvlan tun
 kvm_intel
 > kvm cdc_ether usbnet mii microcode i2c_i801 i2c_core iTCO_wdt
 > iTCO_vendor_support shpchp igb ioatdma dca ses enclosure sg ext4 mbcache
 > jbd2 sr_mod cdrom sd_mod crc_t10dif ahci megaraid_sas dm_mirror
 > dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
 >
 > Pid: 0, comm: swapper Not tainted 2.6.32-220.el6.x86_64 #1 IBM System
 x3650
 > M4 -[7915AC1]-/00J6528
 > RIP: 0010:[<ffffffff81054ad5>] [<ffffffff81054ad5>]
 find_busiest_group+0x5c5/0xb20
 > RSP: 0018:ffff880028363c40 EFLAGS: 00010246
 > RAX: 0000000000000000 RBX: ffff880028363e64 RCX: 0000000000000000
 > RDX: 0000000000000000 RSI: ffff8800282cf540 RDI: ffff8800282d5fc0
 > RBP: ffff880028363dd0 R08: ffff8800282cf860 R09: 0000000000000000
 > R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffff01
 > R13: 0000000000015fc0 R14: ffffffffffffffff R15: 0000000000000000
 > FS: 0000000000000000(0000) GS:ffff880028360000(0000)
 knlGS:0000000000000000
 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
 > CR2: 00007f4e5215c000 CR3: 00000011bea54000 CR4: 00000000000426e0
 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 > Process swapper (pid: 0, threadinfo ffff880c6124e000, task
 ffff881665514b40)
 > Stack:
 > ffff880028363d70 ffff880028363ce0 ffff880028363ca0 000000000000024d
 > <0> ffff8800282cf860 ffff880028363e58 0101881664b121a8 0000000600000000
 > <0> 0000000600000000 ffff8800282cf540 0000000123386cc0 0000000000000008
 > Call Trace:
 > <IRQ>
 > [<ffffffffa02e4669>] ? br_handle_frame_finish+0x179/0x2a0 [bridge]
 > [<ffffffff8105fc52>] rebalance_domains+0x1a2/0x5b0
 > [<ffffffff81060153>] run_rebalance_domains+0xf3/0x160
 > [<ffffffff8107c4f0>] ? get_next_timer_interrupt+0x1b0/0x250
 > [<ffffffff81072161>] __do_softirq+0xc1/0x1d0
 > [<ffffffff81097e0a>] ? sched_clock_idle_wakeup_event+0x1a/0x20
 > [<ffffffff8100c24c>] call_softirq+0x1c/0x30
 > [<ffffffff8100de85>] do_softirq+0x65/0xa0
 > [<ffffffff81071f45>] irq_exit+0x85/0x90
 > [<ffffffff8102a255>] smp_call_function_single_interrupt+0x35/0x40
 > [<ffffffff8100bdb3>] call_function_single_interrupt+0x13/0x20
 > <EOI>
 > [<ffffffff812c4a5e>] ? intel_idle+0xde/0x170
 > [<ffffffff812c4a41>] ? intel_idle+0xc1/0x170
 > [<ffffffff813f9f47>] cpuidle_idle_call+0xa7/0x140
 > [<ffffffff81009e06>] cpu_idle+0xb6/0x110
 > [<ffffffff814e5f23>] start_secondary+0x202/0x245
 > Code: d0 b8 01 00 00 00 48 c1 ea 0a 48 85 d2 0f 45 c2 41 89 40 08 66 90
 4c 8b
 > 85 e0 fe ff ff 48 8b 45 a8 31 d2 41 8b 48 08 48 c1 e0 0a <48> f7 f1 48
 8b 4d
 > b0 48 89 45 a0 31 c0 48 85 c9 74 0c 48 8b 45
 > RIP [<ffffffff81054ad5>] find_busiest_group+0x5c5/0xb20
 > RSP <ffff880028363c40>
 >
 > Is there a forum that would help me figure out what exactly cause this
 crash
 > as it's not the first time, across this series of servers running KVM
 >
 > Thank you sir,
 >
 > Tory

 >From the information above, there was a divide-by-zero fault in
 find_busiest_group().
 If you ran the "bt" command on the panic task, it might be a little more
 obvious,
 but the "divide error: 0000 [#1] SMP" string comes from the divide_error()
 function.

 Anyway, you are running 2.6.32-220.el6, and from a more recent kernel.spec
 changelog,
 this issue was fixed in 2.6.32-248.el6:

 * Tue Mar 06 2012 Aristeu Rozanski <arozansk(a)redhat.com&gt; [2.6.32-248.el6]
 - [netdrv] bnx2: revert firmware load modifications (Neil Horman) [720428]
 - [virt] virtio: balloon: leak / fill balloon across S4 (Amit Shah)
 [798583]
 - [scsi] silencing 'killing requests for dead queue' (David Milburn)
 [798672]
 - [scsi] sd_dif: fix setting bio flags (Jeff Moyer) [799075]
 - [scsi] megaraid_sas: driver update to version 00.00.06.14-rh1 (Tomas
 Henzl) [749923]
 - [infiniband] srp: fix include ordering issue (Doug Ledford) [791209]
 - [sched] Fix Kernel divide by zero panic in find_busiest_group() (Larry
 Woodman) [785959]

 Time to upgrade...

 Dave

 You are again absolutely right, I appreciate the time and the assistance.
Scheduling the upgrades!

Thanks!
Tory

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Crash, won't read my vmcore "crash: page excluded: kernel virtual address:"