October 2007 - Crash-utility - Crash Utility List Archives

by Jan-Frode Myklebust

I'm trying to get kdump working on our RHEL5 servers, but am failing to get any dumps. After I trigger a crash, it starts booting the kdump environment, but quickly fails with messages about (from memory): Imported volume group "rootvg" OK. Failed to find logical volume /dev/mapper/-swaplv Failed to find logical volume /dev/mapper/-rootlv Failed to find logical volume /dev/mapper/-varlv Failed to find logical volume /dev/mapper/-locallv and dumps me into the shell. I checked the /dev/mapper directory, and found the expected /dev/mapper/rootvg-swaplv, /dev/mapper/rootvg-rootlv, etc.. So the problem seems to be that kdump is eating up the "rootvg"-part of the device name. Is this a known problem? Any ideas for how to fix it ? Another thing.. anybody know if it's possible the redirect the kdump output to netconsole ? That would be very nice for capturing the reason when it fails.. -jf

17 years, 9 months

2
2
0 / 0

crash version 4.0-4.8 is available

by Dave Anderson

- Implemented support for kernels configured with CONFIG_SLUB, which completely replaces the venerable "kernel/slab.c" with the new "kernel/slub.c" kmalloc() slab subsystem. Accordingly, the "kmem -s [address]", "kmem -S [address]", and "kmem <address>" commands will display slab-related information in a similar manner to what they currently do, with additional per-node information. It should be noted that, due to slub.c's design, the verbose "kmem -S" output will be pared down slightly to not display the list of all "full" slabs unless the proper kernel slub debugging has been turned on. However, given a address of an object from a full slab page, or of the full slab page itself, that address will then be traced back to its original slab cache and its data displayed. (anderson(a)redhat.com) - Change for support of LKCD dumpfile version 8 and later to determine the backtrace starting registers from the dumpfile header. Increase (maximum) NR_CPUS for ia64 to 4096. (bwalle(a)suse.de) - The SIAL interpreter extension module has been updated to support the ia64, ppc64, s390 and s390x architectures. Several fixes have been applied, and three new debug commands, sdebug, sclass and sname have been added. (lucchouina(a)yahoo.com) - Fixed a bug in the CONFIG_SPARSEMEM patch (contributed in 4.0-3.22) in which a static pointer variable was initializing itself with a buffer that was returned from a command-time-only GETBUF() call, instead of using malloc(). It would then continue to use the buffer, trampling on the buffer contents set up by whatever command that subsequently allocated the buffer. I only caught this during the CONFIG_SLUB development, so I have no examples (if any) of how this would have ever manifested itself in a crash command error. (anderson(a)redhat.com) - Fixed the "mach" command in CONFIG_SLUB kernels which would abort with the error message: "mach: cannot resolve cache_cache" when trying to determine the value for the L1 CACHE SIZE display. Since the generic manner of determining the cache size no longer worked correctly anyway, the L1 CACHE SIZE display has been removed. (anderson(a)redhat.com) - Fix for missing NODE header in NUMA "kmem -f" output. (anderson(a)redhat.com) - Fix for the chronology of the contents of the kernel message buffer output by the "log" command. (atyson(a)hp.com) - Display a WARNING message if a PT_LOAD segment in an ELF-style dumpfile advertises a memory segment that would go beyond the end of the dumpfile. (bwalle(a)suse.de, anderson(a)redhat.com) Download from: http://people.redhat.com/anderson

17 years, 9 months

1
0
0 / 0

LKCD patch (was: Re: [Crash-utility] Increase of NR_CPUS on IA64)

by Dave Anderson

> Dave Anderson <anderson redhat com> [2007-10-22 15:32]: >> Troy Heber wrote: >>> On 10/19/07 12:23, Dave Anderson wrote: >>>> So my biggest worry would be if this somehow breaks >>>> backwards-compatibility, but I'm presuming that you took >>>> that into account. But anyway, I leave this all up >>>> to Troy. >>> I just did a quick sanity check on a couple of old IA64 LKCD dumps and >>> everything seems to work, so I'm happy. >>> Troy > > Troy, thanks for checking this! > >> Bernhard, can you post a cleaned-up patch for queueing? > > Here it is (attached). I didn't see any warnings in the crash code > with 'make warn' now. I have used your own definition of offsetof() > but moved it into the header file. My biggest worry came true, so I'm going to have to NAK this patch in its current state. We have a major customer who uses an older version of LKCD (the dh_version in the header shows version 2). Because of that, I wouldn't have thought your patch would in any way affect them. Anyway, it's the *only* LKCD dumpfile that I test with each new crash release. They run both x86 and x86_64. With 4.0-4.7, the backtrace of the x86 panic task shows this: crash> bt PID: 12727 TASK: c086c000 CPU: 0 COMMAND: "httpd" #0 [c086da80] dump_execute at f5728f42 #1 [c086da84] do_dump at f572928d #2 [c086db2c] die at c010798a #3 [c086db44] do_invalid_op at c0107c5a #4 [c086dc00] error_code (via invalid_op) at c010750e EAX: 0000001d EBX: c0293cd6 ECX: c0330148 EDX: 0011062b EBP: c086dc4c DS: 0018 ESI: c086dc9c ES: 0018 EDI: c086c000 CS: 0010 EIP: c011db63 ERR: ffffffff EFLAGS: 00010002 #5 [c086dc3c] panic at c011db63 #6 [c086dc50] XXXXXXX_nmi_check at c010811b (company name removed...) #7 [c086dc64] do_nmi at c0108254 #8 [c086dc90] nmi at c0107595 EAX: 000003dc EBX: 00000000 ECX: 00000064 EDX: c086dcec EBP: c086dd10 DS: 0018 ESI: 000000f0 ES: 0018 EDI: 00000001 CS: 0010 EIP: c0261440 ERR: 000003dc EFLAGS: 00000286 #9 [c086dccc] stext_lock (via prune_icache) at c0261440 #10 [c086dd14] shrink_icache_memory at c015f7dd #11 [c086dd20] do_try_to_free_pages at c013f402 #12 [c086dd4c] try_to_free_pages at c013f8d2 #13 [c086dd64] _wrapped_alloc_pages at c01406bd #14 [c086dd88] __alloc_pages at c014079d #15 [c086dda8] __get_free_pages at c014083e #16 [c086ddb0] kmem_cache_grow at c013a77b #17 [c086dde8] kmalloc at c013ad8b #18 [c086de20] skbmem_grow_bucket at f638cdd5 #19 [c086de3c] skbmemalloc at f638cfa0 #20 [c086de58] alloc_skb at c01f5770 #21 [c086de74] sock_alloc_send_skb at c01f4c15 #22 [c086de90] unix_stream_sendmsg at c02395c3 #23 [c086dee0] sock_sendmsg at c01f23c6 #24 [c086df34] sock_write at c01f25d0 #25 [c086df7c] sys_write at c0148d06 #26 [c086dfc0] system_call at c010740c EAX: 00000004 EBX: 0000000a ECX: be1fd8fc EDX: 00000004 DS: 002b ESI: 00000004 ES: 002b EDI: be1fd8fc SS: 002b ESP: be1fd8a4 EBP: be1fd8d4 CS: 0023 EIP: 4024f214 ERR: 00000004 EFLAGS: 00000296 crash> With your patch applied, it shows this: crash> bt PID: 12727 TASK: c086c000 CPU: 0 COMMAND: "httpd" bt: cannot resolve stack trace: bt: Task in user space -- no backtrace crash> and in fact, "bt -a" shows the same thing for all active tasks: crash> bt -a PID: 12727 TASK: c086c000 CPU: 0 COMMAND: "httpd" bt: cannot resolve stack trace: bt: Task in user space -- no backtrace PID: 0 TASK: cdccc000 CPU: 1 COMMAND: "swapper" bt: cannot resolve stack trace: bt: Task in user space -- no backtrace PID: 9959 TASK: ce01a000 CPU: 2 COMMAND: "httpd" bt: cannot resolve stack trace: bt: Task in user space -- no backtrace PID: 0 TASK: cdcde000 CPU: 3 COMMAND: "swapper" bt: cannot resolve stack trace: bt: Task in user space -- no backtrace PID: 16444 TASK: dc4d8000 CPU: 1 COMMAND: "httpd" bt: cannot resolve stack trace: bt: Task in user space -- no backtrace PID: 5874 TASK: d3920000 CPU: 0 COMMAND: "httpd" bt: cannot resolve stack trace: bt: Task in user space -- no backtrace crash> The backtraces of the non-active tasks are OK. Any ideas on what's wrong, and how to address this? Dave

17 years, 9 months

2
2
0 / 0

[PATCH] Don't share privregs with hvm domain

by Isaku Yamahata

Don't share privregs with hvm domain and twist IA64 xen dump core format slightly. Xen shares privregs pages with IA64 HVM domain for xm dump-core to dump the pages. However sharing the page allows hvm guest domain peek/destroy the page contents that might cause xen crash. And the xen dump core file doesn't need privregs page because cpu context should be obtained from vcpu context in case of IA64 HVM domain. Although this patch modify xen dump core format, current crash utility (at least crash 4.0-4.7) doesn't look into .xen_ia64_mmapped_regs section and I don't know any other tools to understand xen dump core file. So this format modification doesn't cause incompatibility issue. -- yamahata

17 years, 9 months

2
1
0 / 0

Increase of NR_CPUS on IA64

by Bernhard Walle

Hi Dave, Because we will increase the CONFIG_NR_CPUS in our kernel on Itanium in the future (from now 1024 to 4096). This also affects crash. IMO it's a good idea to have that change also in the mainline crash to be able to also use the mainline crash for analysing SLES dumps. The first patch [1] attached implements this -- it's quite trivial, of course. :) Yes, the number is higher than 4096, but then we don't have to increase the constant again in future again and again. The drawback is that the mainline crash still has the compile time NR_CPUS == kernel CONFIG_NR_CPUS dependency for LKCD dumps. So, after applying the patch, LKCD dumps may break on IA64. The solution for that problem is to calculate the number of CPUs for IA64 at runtime. The 2nd patch implements this, and also reads the registers from the LKCD dump header instead of guessing on the stack. This fixes a problem here -- unfortunately, I don't still have that dump to provide further details. I posted a similar patch (the reason that I re-based the patch is to touch less files and to address some of the concerns you and Troy had) in the past. I think we finally need to merge something in that direction mainline. I can split the patch so that you only apply the part which calculates the number of CPUs at runtime which blocks increasing NR_CPUS for kdump. But I'd like to get first your and Troy's opinion on this, I know, he's maintaining the LKCD part of crash. Thanks, Bernhard

17 years, 9 months

4
9
0 / 0

Debugging Xen Hypervisor with 'crash' question...

by Roger Cruz

Sorry if this is an obvious question but I'm new to the 'crash' utility. I read Anderson's white paper on crash and didn't find any references to how to use 'crash' to debug the hypervisor. I have crash running and accessing Domain 0's kernel tasks and other variables, so I am comfortable thinking that I have the right setup. I start crash with: #crash xen-syms /dom0/proc/vmcore And get the following output #crash xen-syms /dom0/proc/vmcore crash 4.0-4.7 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... KERNEL: xen-syms DUMPFILE: /dom0/proc/vmcore CPUS: 4 DOMAINS: 4 UPTIME: 00:01:30 MACHINE: Intel(R) Xeon(R) CPU 5140 @ 2.33GHz (2327 Mhz) MEMORY: 4 GB PCPU-ID: 2 PCPU: ff1bbfb4 VCPU-ID: 0 VCPU: ffbe6080 (VCPU_RUNNING) DOMAIN-ID: 0 DOMAIN: ff238080 (DOMAIN_RUNNING) STATE: CRASH I would like to know what commands there are to examine the memory management system or any other internal data structures. Also, how do I look at a stack trace in the hypervisor for a crash. I tried the 'gdb where' command and it said no stack. Thanks in advance. Roger Cruz Principal SW Engineer Marathon Technologies Corp. 978-489-1153

17 years, 10 months

3
5
0 / 0

Crash on RHEL5 IA64 boxes.

by Pierre Amadio

Hi there. Can somebody please confirm crash is suppose to work on a current RHEL5 ia64 machine ? All i have is a directory created with an empty core file in it. Knowing somebody already succeed using crash with ia64 would be helpfull as right now, i do not know if this is an architecture dependent bug or a problem in my configuration. Thanks. -- Pierre Amadio <pamadio(a)redhat.com> Technical Account Manager mobile: +33 685 774 477 Red Hat France SARL, 171 Avenue Georges Clemenceau, 92024 Nanterre Cedex, France. Siret n° 421 199 464 00056

17 years, 10 months

2
1
0 / 0

log command and incorrect wrapping of the buffer

by Alan Tyson

Hi, I've noticed on some systems that the crash log command shows output from the kernel log buffer which is not wrapped at the correct location. All of the data from the buffer is present, but the display is not from oldest to newest. Crash uses the pointer log_start to determine the "current" location in the log buffer. This isn't correct. log_start is the next location to be read by the syslog interface. log_end is the pointer to the last byte written by printk (well, last byte written +1) and this is what should be used. The quick fix is just to replace "log_start" with "log_end" in line 3141. The following tries to optimise the code a little and save a gdb call some of the time. Thanks, Alan Tyson, HP. --- crash-4.0-4.7/kernel.c 2007-09-25 16:01:56.000000000 +0100 +++ crash-4.0-4.7-at/kernel.c 2007-09-26 16:06:20.000000000 +0100 @@ -3111,7 +3111,7 @@ void dump_log(int msg_level) { int i; - ulong log_buf, log_start, logged_chars; + ulong log_buf, logged_chars; char *buf; char last; ulong index; @@ -3138,13 +3138,16 @@ dump_log(int msg_level) buf = GETBUF(log_buf_len); log_wrap = FALSE; - get_symbol_data("log_start", sizeof(ulong), &log_start); get_symbol_data("logged_chars", sizeof(ulong), &logged_chars); readmem(log_buf, KVADDR, buf, log_buf_len, "log_buf contents", FAULT_ON_ERROR); - log_start &= log_buf_len-1; - index = (logged_chars < log_buf_len) ? 0 : log_start; + if (logged_chars < log_buf_len) { + index = 0; + } else { + get_symbol_data("log_end", sizeof(ulong), &index); + index &= log_buf_len-1; + } if ((logged_chars < log_buf_len) && (index == 0) && (buf[index] == '<')) loglevel = TRUE;

17 years, 10 months

2
1
0 / 0

Re: [Kgdb-bugreport] Problem getting kgdb to read kernel symbols. addresses shifted?

by Dave Anderson

> Pete/Piet Delaney wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Derek Atkins wrote: >> >> Dave, I thought you would likely know what's going on here. >> How about helping out Derek? Sounds like a RedHat'ism and >> I kinda recall your mentioning it and apologizing for it >> as an unfortunate RedHat directive. >> >> - -piet http://people.redhat.com/anderson/crash.changelog.html#4_0_4_5 >> >>> ebiederm(a)xmission.com (Eric W. Biederman) writes: >>> >>> >>>> Derek Atkins <warlord(a)MIT.EDU> writes: >>>> >>>> >>>>> Well, gdb agrees with System.map, so I'm sure that gdb itself is >>>>> okay. It's certainly possible that that the kgdb stub is weird, >>>>> but /proc/kallsyms doesn't match System.map, and THAT'S what's >>>>> confusing me most of all. >>>> >>>> >>>> Ok. So we must have a relocatable kernel that figures it has been >>>> relocated. Interesting. >>>> What is your bootloader? >>> >>> >>> GRUB >>> >>> >>>> What is your kernel version? >>> >>> >>> 2.6.22.5-76_kgdb0.fc7-i686 >>> >>> >>>> What is your kernel config? >>> >>> >>> See the attached .config file. >>> >>> >>>> The only time I would expect to see what you are seeing is if >>>> you are debugging the kdump kernel, which doesn't sound like >>>> the case. >>> >>> >>> Nope. I started with the Fedora 'i686' config and then patched >>> in the kgdb patches and configuration. >>> >>> >>>> If we actually have a truly offset kernel then while things >>>> may not be perfect this is at least expected. I don't think >>>> I have heard of anyone handling this case very well. >>> >>> >>> :( Like I said before, it SEEMS to work okay by telling GDB >>> to load in at a different address. >>> >>> >>>>> Which was how long ago? ;) >>>> >>>> >>>> Long enough ago that I don't remember when ;) >>> >>> >>> Heh. >>> >>> >>>> Eric >>> >>> >>> -derek >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> ------------------------------------------------------------------------- >>> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2005. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Kgdb-bugreport mailing list >>> Kgdb-bugreport(a)lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport >> >> >> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.7 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFG/Z0uJICwm/rv3hoRAl1uAJ9QoR5DhfUGCccgz9KFpEpHkbvaaACdGG2z >> 9LxB5RdtsUi9IrTKzbPpB1U= >> =gQee >> -----END PGP SIGNATURE----- > > >

17 years, 10 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility October 2007