----- Original Message -----
Dave Anderson reached out and wrote:
----- Original Message -----
> [root kvm7 127.0.0.1-2014-02-07-19:17:09]# crash
> /boot/System.map-2.6.32-220.el6.x86_64.debug
> /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux vmcore
>
> crash 5.1.8-1.el6
> Copyright (C) 2002-2011 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for
details.
> GNU gdb (GDB) 7.0
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
>
http://gnu.org/licenses/gpl.html
> >
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-unknown-linux-gnu"...
>
> crash: page excluded: kernel virtual address: ffffffff81542000 type:
"cpu_possible_mask"
>
> I can go into minimal,
>
>
> nm -Bn /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux |
> grep _stext
> ffffffff81000198 T _stext
>
> cat /proc/kallsyms | grep _stext
> ffffffff81000198 T _stext
>
> If I use the System Map parm I get this warning
>
> WARNING: kernels compiled by different gcc versions:
> /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux: 4.4.5
> vmcore kernel: 4.4.6
>
>
> Would really like to understand why this system crashed. I know I'm a bit
> behind on my kernel versions however, but I should be able to look at this
> kernel??
>
> Thanks
> Tory
It looks like the vmcore and vmlinux file don't match, like maybe the crashing
system was running the standard 2.6.32-220.el6.x86_64 kernel, and you're trying
to debug it using the 2.6.32-220.el6.x86_64.debug kernel variant?
First thing -- *never* use a System.map file unless for some reason you don't
have the original kernel's vmlinux available *and* you feel that the vmlinux
file you have is very close to the crashing kernel's vmlinux. Bit with any
RHEL standard (unmodified) vmlinux/vmcore setup, the System.map is completely
useless.
So the first question is: what kernel generated the vmcore?
Do this:
$ strings vmcore | grep '2.6.32'
Dave
--
Dave you are right, I thought I had to use the devel kernel and in fact my
system is not running that, so it crashed with the standard 2.6.32-220.el6.x86_64
kernel.
[tblue@kvm7 127.0.0.1-2014-02-07-19:17:09]$ sudo strings vmcore | grep '2.6.32'
2.6.32-220.el6.x86_64
OSRELEASE=2.6.32-220.el6.x86_64
But it won't take my vmlinux from /boot
crash: /boot/vmlinuz-2.6.32-220.el6.x86_64: not a supported file format
There is no vmlinux file in /boot. The "vmlinuz" (with-a-z) file is not
usable.
You will always need the vmlinux from from the associated kernel-debuginfo rpm.
Yes sir you were correct, I was using the wrong kernel!
please wait... (determining panic task)
WARNING: multiple active tasks have called die
KERNEL: /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64/vmlinux
DUMPFILE: /libvirt/crash/127.0.0.1-2014-02-07-19:17:09/vmcore [PARTIAL DUMP]
CPUS: 32
DATE: Fri Feb 7 18:16:05 2014
UPTIME: 226 days, 21:36:13
LOAD AVERAGE: 2.42, 2.68, 2.69
TASKS: 816
NODENAME:
kvm7.domain.com
RELEASE: 2.6.32-220.el6.x86_64
VERSION: #1 SMP Tue Dec 6 19:48:22 GMT 2011
MACHINE: x86_64 (2200 Mhz)
MEMORY: 88 GB
PANIC: ""
PID: 0
COMMAND: "swapper"
TASK: ffff881665514b40 (1 of 32) [THREAD_INFO: ffff880c6124e000]
CPU: 19
STATE: TASK_RUNNING (PANIC)
Nothing stands out as s bug or reason to fail
divide error: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu31/cache/index2/shared_cpu_map
CPU 19
Modules linked in: ext3 jbd ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables
sunrpc bridge stp llc bonding ipv6 vhost_net macvtap macvlan tun kvm_intel
kvm cdc_ether usbnet mii microcode i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support shpchp igb ioatdma dca ses enclosure sg ext4 mbcache
jbd2 sr_mod cdrom sd_mod crc_t10dif ahci megaraid_sas dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.32-220.el6.x86_64 #1 IBM System x3650
M4 -[7915AC1]-/00J6528
RIP: 0010:[<ffffffff81054ad5>] [<ffffffff81054ad5>]
find_busiest_group+0x5c5/0xb20
RSP: 0018:ffff880028363c40 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880028363e64 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8800282cf540 RDI: ffff8800282d5fc0
RBP: ffff880028363dd0 R08: ffff8800282cf860 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffff01
R13: 0000000000015fc0 R14: ffffffffffffffff R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff880028360000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f4e5215c000 CR3: 00000011bea54000 CR4: 00000000000426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff880c6124e000, task ffff881665514b40)
Stack:
ffff880028363d70 ffff880028363ce0 ffff880028363ca0 000000000000024d
<0> ffff8800282cf860 ffff880028363e58 0101881664b121a8 0000000600000000
<0> 0000000600000000 ffff8800282cf540 0000000123386cc0 0000000000000008
Call Trace:
<IRQ>
[<ffffffffa02e4669>] ? br_handle_frame_finish+0x179/0x2a0 [bridge]
[<ffffffff8105fc52>] rebalance_domains+0x1a2/0x5b0
[<ffffffff81060153>] run_rebalance_domains+0xf3/0x160
[<ffffffff8107c4f0>] ? get_next_timer_interrupt+0x1b0/0x250
[<ffffffff81072161>] __do_softirq+0xc1/0x1d0
[<ffffffff81097e0a>] ? sched_clock_idle_wakeup_event+0x1a/0x20
[<ffffffff8100c24c>] call_softirq+0x1c/0x30
[<ffffffff8100de85>] do_softirq+0x65/0xa0
[<ffffffff81071f45>] irq_exit+0x85/0x90
[<ffffffff8102a255>] smp_call_function_single_interrupt+0x35/0x40
[<ffffffff8100bdb3>] call_function_single_interrupt+0x13/0x20
<EOI>
[<ffffffff812c4a5e>] ? intel_idle+0xde/0x170
[<ffffffff812c4a41>] ? intel_idle+0xc1/0x170
[<ffffffff813f9f47>] cpuidle_idle_call+0xa7/0x140
[<ffffffff81009e06>] cpu_idle+0xb6/0x110
[<ffffffff814e5f23>] start_secondary+0x202/0x245
Code: d0 b8 01 00 00 00 48 c1 ea 0a 48 85 d2 0f 45 c2 41 89 40 08 66 90 4c 8b
85 e0 fe ff ff 48 8b 45 a8 31 d2 41 8b 48 08 48 c1 e0 0a <48> f7 f1 48 8b 4d
b0 48 89 45 a0 31 c0 48 85 c9 74 0c 48 8b 45
RIP [<ffffffff81054ad5>] find_busiest_group+0x5c5/0xb20
RSP <ffff880028363c40>
Is there a forum that would help me figure out what exactly cause this crash
as it's not the first time, across this series of servers running KVM
Thank you sir,
Tory
From the information above, there was a divide-by-zero fault in
find_busiest_group().
If you ran the "bt" command on the panic task, it
might be a little more obvious,
but the "divide error: 0000 [#1] SMP" string comes from the divide_error()
function.
Anyway, you are running 2.6.32-220.el6, and from a more recent kernel.spec changelog,
this issue was fixed in 2.6.32-248.el6:
* Tue Mar 06 2012 Aristeu Rozanski <arozansk(a)redhat.com> [2.6.32-248.el6]
- [netdrv] bnx2: revert firmware load modifications (Neil Horman) [720428]
- [virt] virtio: balloon: leak / fill balloon across S4 (Amit Shah) [798583]
- [scsi] silencing 'killing requests for dead queue' (David Milburn) [798672]
- [scsi] sd_dif: fix setting bio flags (Jeff Moyer) [799075]
- [scsi] megaraid_sas: driver update to version 00.00.06.14-rh1 (Tomas Henzl) [749923]
- [infiniband] srp: fix include ordering issue (Doug Ledford) [791209]
- [sched] Fix Kernel divide by zero panic in find_busiest_group() (Larry Woodman) [785959]
Time to upgrade...
Dave