kdump and lvm ?
by Jan-Frode Myklebust
I'm trying to get kdump working on our RHEL5 servers,
but am failing to get any dumps. After I trigger a crash,
it starts booting the kdump environment, but quickly
fails with messages about (from memory):
Imported volume group "rootvg" OK.
Failed to find logical volume /dev/mapper/-swaplv
Failed to find logical volume /dev/mapper/-rootlv
Failed to find logical volume /dev/mapper/-varlv
Failed to find logical volume /dev/mapper/-locallv
and dumps me into the shell. I checked the /dev/mapper
directory, and found the expected /dev/mapper/rootvg-swaplv,
/dev/mapper/rootvg-rootlv, etc.. So the problem seems to be
that kdump is eating up the "rootvg"-part of the device name.
Is this a known problem? Any ideas for how to fix it ?
Another thing.. anybody know if it's possible the redirect
the kdump output to netconsole ? That would be very nice
for capturing the reason when it fails..
-jf
17 years, 1 month
crash version 4.0-4.8 is available
by Dave Anderson
- Implemented support for kernels configured with CONFIG_SLUB, which
completely replaces the venerable "kernel/slab.c" with the new
"kernel/slub.c" kmalloc() slab subsystem. Accordingly, the
"kmem -s [address]", "kmem -S [address]", and "kmem <address>"
commands will display slab-related information in a similar manner
to what they currently do, with additional per-node information.
It should be noted that, due to slub.c's design, the verbose
"kmem -S" output will be pared down slightly to not display the
list of all "full" slabs unless the proper kernel slub debugging
has been turned on. However, given a address of an object from a
full slab page, or of the full slab page itself, that address
will then be traced back to its original slab cache and its data
displayed. (anderson(a)redhat.com)
- Change for support of LKCD dumpfile version 8 and later to determine
the backtrace starting registers from the dumpfile header. Increase
(maximum) NR_CPUS for ia64 to 4096. (bwalle(a)suse.de)
- The SIAL interpreter extension module has been updated to support
the ia64, ppc64, s390 and s390x architectures. Several fixes have
been applied, and three new debug commands, sdebug, sclass and sname
have been added. (lucchouina(a)yahoo.com)
- Fixed a bug in the CONFIG_SPARSEMEM patch (contributed in 4.0-3.22)
in which a static pointer variable was initializing itself with a
buffer that was returned from a command-time-only GETBUF() call,
instead of using malloc(). It would then continue to use the buffer,
trampling on the buffer contents set up by whatever command that
subsequently allocated the buffer. I only caught this during the
CONFIG_SLUB development, so I have no examples (if any) of how this
would have ever manifested itself in a crash command error.
(anderson(a)redhat.com)
- Fixed the "mach" command in CONFIG_SLUB kernels which would abort
with the error message: "mach: cannot resolve cache_cache" when
trying to determine the value for the L1 CACHE SIZE display. Since
the generic manner of determining the cache size no longer worked
correctly anyway, the L1 CACHE SIZE display has been removed.
(anderson(a)redhat.com)
- Fix for missing NODE header in NUMA "kmem -f" output.
(anderson(a)redhat.com)
- Fix for the chronology of the contents of the kernel message buffer
output by the "log" command. (atyson(a)hp.com)
- Display a WARNING message if a PT_LOAD segment in an ELF-style
dumpfile advertises a memory segment that would go beyond the end
of the dumpfile. (bwalle(a)suse.de, anderson(a)redhat.com)
Download from: http://people.redhat.com/anderson
17 years, 1 month
LKCD patch (was: Re: [Crash-utility] Increase of NR_CPUS on IA64)
by Dave Anderson
> Dave Anderson <anderson redhat com> [2007-10-22 15:32]:
>> Troy Heber wrote:
>>> On 10/19/07 12:23, Dave Anderson wrote:
>>>> So my biggest worry would be if this somehow breaks
>>>> backwards-compatibility, but I'm presuming that you took
>>>> that into account. But anyway, I leave this all up
>>>> to Troy.
>>> I just did a quick sanity check on a couple of old IA64 LKCD dumps and
>>> everything seems to work, so I'm happy.
>>> Troy
>
> Troy, thanks for checking this!
>
>> Bernhard, can you post a cleaned-up patch for queueing?
>
> Here it is (attached). I didn't see any warnings in the crash code
> with 'make warn' now. I have used your own definition of offsetof()
> but moved it into the header file.
My biggest worry came true, so I'm going to have to NAK
this patch in its current state.
We have a major customer who uses an older version
of LKCD (the dh_version in the header shows version 2).
Because of that, I wouldn't have thought your patch
would in any way affect them. Anyway, it's the *only*
LKCD dumpfile that I test with each new crash release.
They run both x86 and x86_64.
With 4.0-4.7, the backtrace of the x86 panic task shows this:
crash> bt
PID: 12727 TASK: c086c000 CPU: 0 COMMAND: "httpd"
#0 [c086da80] dump_execute at f5728f42
#1 [c086da84] do_dump at f572928d
#2 [c086db2c] die at c010798a
#3 [c086db44] do_invalid_op at c0107c5a
#4 [c086dc00] error_code (via invalid_op) at c010750e
EAX: 0000001d EBX: c0293cd6 ECX: c0330148 EDX: 0011062b EBP: c086dc4c
DS: 0018 ESI: c086dc9c ES: 0018 EDI: c086c000
CS: 0010 EIP: c011db63 ERR: ffffffff EFLAGS: 00010002
#5 [c086dc3c] panic at c011db63
#6 [c086dc50] XXXXXXX_nmi_check at c010811b (company name removed...)
#7 [c086dc64] do_nmi at c0108254
#8 [c086dc90] nmi at c0107595
EAX: 000003dc EBX: 00000000 ECX: 00000064 EDX: c086dcec EBP: c086dd10
DS: 0018 ESI: 000000f0 ES: 0018 EDI: 00000001
CS: 0010 EIP: c0261440 ERR: 000003dc EFLAGS: 00000286
#9 [c086dccc] stext_lock (via prune_icache) at c0261440
#10 [c086dd14] shrink_icache_memory at c015f7dd
#11 [c086dd20] do_try_to_free_pages at c013f402
#12 [c086dd4c] try_to_free_pages at c013f8d2
#13 [c086dd64] _wrapped_alloc_pages at c01406bd
#14 [c086dd88] __alloc_pages at c014079d
#15 [c086dda8] __get_free_pages at c014083e
#16 [c086ddb0] kmem_cache_grow at c013a77b
#17 [c086dde8] kmalloc at c013ad8b
#18 [c086de20] skbmem_grow_bucket at f638cdd5
#19 [c086de3c] skbmemalloc at f638cfa0
#20 [c086de58] alloc_skb at c01f5770
#21 [c086de74] sock_alloc_send_skb at c01f4c15
#22 [c086de90] unix_stream_sendmsg at c02395c3
#23 [c086dee0] sock_sendmsg at c01f23c6
#24 [c086df34] sock_write at c01f25d0
#25 [c086df7c] sys_write at c0148d06
#26 [c086dfc0] system_call at c010740c
EAX: 00000004 EBX: 0000000a ECX: be1fd8fc EDX: 00000004
DS: 002b ESI: 00000004 ES: 002b EDI: be1fd8fc
SS: 002b ESP: be1fd8a4 EBP: be1fd8d4
CS: 0023 EIP: 4024f214 ERR: 00000004 EFLAGS: 00000296
crash>
With your patch applied, it shows this:
crash> bt
PID: 12727 TASK: c086c000 CPU: 0 COMMAND: "httpd"
bt: cannot resolve stack trace:
bt: Task in user space -- no backtrace
crash>
and in fact, "bt -a" shows the same thing for all
active tasks:
crash> bt -a
PID: 12727 TASK: c086c000 CPU: 0 COMMAND: "httpd"
bt: cannot resolve stack trace:
bt: Task in user space -- no backtrace
PID: 0 TASK: cdccc000 CPU: 1 COMMAND: "swapper"
bt: cannot resolve stack trace:
bt: Task in user space -- no backtrace
PID: 9959 TASK: ce01a000 CPU: 2 COMMAND: "httpd"
bt: cannot resolve stack trace:
bt: Task in user space -- no backtrace
PID: 0 TASK: cdcde000 CPU: 3 COMMAND: "swapper"
bt: cannot resolve stack trace:
bt: Task in user space -- no backtrace
PID: 16444 TASK: dc4d8000 CPU: 1 COMMAND: "httpd"
bt: cannot resolve stack trace:
bt: Task in user space -- no backtrace
PID: 5874 TASK: d3920000 CPU: 0 COMMAND: "httpd"
bt: cannot resolve stack trace:
bt: Task in user space -- no backtrace
crash>
The backtraces of the non-active tasks are OK.
Any ideas on what's wrong, and how to address this?
Dave
17 years, 1 month
[PATCH] Don't share privregs with hvm domain
by Isaku Yamahata
Don't share privregs with hvm domain and twist IA64 xen dump core format
slightly.
Xen shares privregs pages with IA64 HVM domain for xm dump-core to dump
the pages.
However sharing the page allows hvm guest domain peek/destroy the page
contents that might cause xen crash.
And the xen dump core file doesn't need privregs page because cpu context
should be obtained from vcpu context in case of IA64 HVM domain.
Although this patch modify xen dump core format, current crash utility
(at least crash 4.0-4.7) doesn't look into .xen_ia64_mmapped_regs section
and I don't know any other tools to understand xen dump core file.
So this format modification doesn't cause incompatibility issue.
--
yamahata
17 years, 1 month
Increase of NR_CPUS on IA64
by Bernhard Walle
Hi Dave,
Because we will increase the CONFIG_NR_CPUS in our kernel on Itanium
in the future (from now 1024 to 4096). This also affects crash. IMO
it's a good idea to have that change also in the mainline crash to be
able to also use the mainline crash for analysing SLES dumps.
The first patch [1] attached implements this -- it's quite trivial, of
course. :) Yes, the number is higher than 4096, but then we don't have
to increase the constant again in future again and again.
The drawback is that the mainline crash still has the compile time
NR_CPUS == kernel CONFIG_NR_CPUS dependency for LKCD dumps. So, after
applying the patch, LKCD dumps may break on IA64.
The solution for that problem is to calculate the number of CPUs for
IA64 at runtime. The 2nd patch implements this, and also reads the
registers from the LKCD dump header instead of guessing on the stack.
This fixes a problem here -- unfortunately, I don't still have that
dump to provide further details.
I posted a similar patch (the reason that I re-based the patch is to
touch less files and to address some of the concerns you and Troy had)
in the past. I think we finally need to merge something in that
direction mainline.
I can split the patch so that you only apply the part which calculates
the number of CPUs at runtime which blocks increasing NR_CPUS for
kdump. But I'd like to get first your and Troy's opinion on this, I
know, he's maintaining the LKCD part of crash.
Thanks,
Bernhard
17 years, 1 month
Debugging Xen Hypervisor with 'crash' question...
by Roger Cruz
Sorry if this is an obvious question but I'm new to the 'crash' utility.
I read Anderson's white paper on crash and didn't find any references to
how to use 'crash' to debug the hypervisor. I have crash running and
accessing Domain 0's kernel tasks and other variables, so I am
comfortable thinking that I have the right setup. I start crash with:
#crash xen-syms /dom0/proc/vmcore
And get the following output
#crash xen-syms /dom0/proc/vmcore
crash 4.0-4.7
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public
License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for
details.
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i686-pc-linux-gnu"...
KERNEL: xen-syms
DUMPFILE: /dom0/proc/vmcore
CPUS: 4
DOMAINS: 4
UPTIME: 00:01:30
MACHINE: Intel(R) Xeon(R) CPU 5140 @ 2.33GHz (2327 Mhz)
MEMORY: 4 GB
PCPU-ID: 2
PCPU: ff1bbfb4
VCPU-ID: 0
VCPU: ffbe6080 (VCPU_RUNNING)
DOMAIN-ID: 0
DOMAIN: ff238080 (DOMAIN_RUNNING)
STATE: CRASH
I would like to know what commands there are to examine the memory
management system or any other internal data structures. Also, how do I
look at a stack trace in the hypervisor for a crash. I tried the 'gdb
where' command and it said no stack.
Thanks in advance.
Roger Cruz
Principal SW Engineer
Marathon Technologies Corp.
978-489-1153
17 years, 1 month
Crash on RHEL5 IA64 boxes.
by Pierre Amadio
Hi there.
Can somebody please confirm crash is suppose to work on a current RHEL5
ia64 machine ?
All i have is a directory created with an empty core file in it.
Knowing somebody already succeed using crash with ia64 would be
helpfull as right now, i do not know if this is an architecture
dependent bug or a problem in my configuration.
Thanks.
--
Pierre Amadio <pamadio(a)redhat.com>
Technical Account Manager mobile: +33 685 774 477
Red Hat France SARL, 171 Avenue Georges Clemenceau, 92024 Nanterre
Cedex, France. Siret n° 421 199 464 00056
17 years, 1 month
log command and incorrect wrapping of the buffer
by Alan Tyson
Hi,
I've noticed on some systems that the crash log command shows output
from the kernel log buffer which is not wrapped at the correct
location. All of the data from the buffer is present, but the display
is not from oldest to newest.
Crash uses the pointer log_start to determine the "current" location in
the log buffer. This isn't correct. log_start is the next location to
be read by the syslog interface. log_end is the pointer to the last
byte written by printk (well, last byte written +1) and this is what
should be used.
The quick fix is just to replace "log_start" with "log_end" in line
3141. The following tries to optimise the code a little and save
a gdb call some of the time.
Thanks,
Alan Tyson, HP.
--- crash-4.0-4.7/kernel.c 2007-09-25 16:01:56.000000000 +0100
+++ crash-4.0-4.7-at/kernel.c 2007-09-26 16:06:20.000000000 +0100
@@ -3111,7 +3111,7 @@ void
dump_log(int msg_level)
{
int i;
- ulong log_buf, log_start, logged_chars;
+ ulong log_buf, logged_chars;
char *buf;
char last;
ulong index;
@@ -3138,13 +3138,16 @@ dump_log(int msg_level)
buf = GETBUF(log_buf_len);
log_wrap = FALSE;
- get_symbol_data("log_start", sizeof(ulong), &log_start);
get_symbol_data("logged_chars", sizeof(ulong), &logged_chars);
readmem(log_buf, KVADDR, buf,
log_buf_len, "log_buf contents", FAULT_ON_ERROR);
- log_start &= log_buf_len-1;
- index = (logged_chars < log_buf_len) ? 0 : log_start;
+ if (logged_chars < log_buf_len) {
+ index = 0;
+ } else {
+ get_symbol_data("log_end", sizeof(ulong), &index);
+ index &= log_buf_len-1;
+ }
if ((logged_chars < log_buf_len) && (index == 0) && (buf[index] == '<'))
loglevel = TRUE;
17 years, 1 month
Re: [Kgdb-bugreport] Problem getting kgdb to read kernel symbols. addresses shifted?
by Dave Anderson
> Pete/Piet Delaney wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Derek Atkins wrote:
>>
>> Dave, I thought you would likely know what's going on here.
>> How about helping out Derek? Sounds like a RedHat'ism and
>> I kinda recall your mentioning it and apologizing for it
>> as an unfortunate RedHat directive.
>>
>> - -piet
http://people.redhat.com/anderson/crash.changelog.html#4_0_4_5
>>
>>> ebiederm(a)xmission.com (Eric W. Biederman) writes:
>>>
>>>
>>>> Derek Atkins <warlord(a)MIT.EDU> writes:
>>>>
>>>>
>>>>> Well, gdb agrees with System.map, so I'm sure that gdb itself is
>>>>> okay. It's certainly possible that that the kgdb stub is weird,
>>>>> but /proc/kallsyms doesn't match System.map, and THAT'S what's
>>>>> confusing me most of all.
>>>>
>>>>
>>>> Ok. So we must have a relocatable kernel that figures it has been
>>>> relocated. Interesting.
>>>> What is your bootloader?
>>>
>>>
>>> GRUB
>>>
>>>
>>>> What is your kernel version?
>>>
>>>
>>> 2.6.22.5-76_kgdb0.fc7-i686
>>>
>>>
>>>> What is your kernel config?
>>>
>>>
>>> See the attached .config file.
>>>
>>>
>>>> The only time I would expect to see what you are seeing is if
>>>> you are debugging the kdump kernel, which doesn't sound like
>>>> the case.
>>>
>>>
>>> Nope. I started with the Fedora 'i686' config and then patched
>>> in the kgdb patches and configuration.
>>>
>>>
>>>> If we actually have a truly offset kernel then while things
>>>> may not be perfect this is at least expected. I don't think
>>>> I have heard of anyone handling this case very well.
>>>
>>>
>>> :( Like I said before, it SEEMS to work okay by telling GDB
>>> to load in at a different address.
>>>
>>>
>>>>> Which was how long ago? ;)
>>>>
>>>>
>>>> Long enough ago that I don't remember when ;)
>>>
>>>
>>> Heh.
>>>
>>>
>>>> Eric
>>>
>>>
>>> -derek
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> -------------------------------------------------------------------------
>>>
>>> This SF.net email is sponsored by: Microsoft
>>> Defy all challenges. Microsoft(R) Visual Studio 2005.
>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Kgdb-bugreport mailing list
>>> Kgdb-bugreport(a)lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
>>
>>
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.7 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>
>> iD8DBQFG/Z0uJICwm/rv3hoRAl1uAJ9QoR5DhfUGCccgz9KFpEpHkbvaaACdGG2z
>> 9LxB5RdtsUi9IrTKzbPpB1U=
>> =gQee
>> -----END PGP SIGNATURE-----
>
>
>
17 years, 2 months