crashdc: problem when running crash in kexec environment for SLES11
by Louis Bouchard
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
As I indicated previously, I'm having a problem with running crash from
within the kexec environment of SLES11.
Actually, crash itself runs fine, but its gdb portion seems to be having
problems. Here is a capture of what happens :
> Running /usr/bin/run-crashdc-sles11.sh
> crashexe : /usr/bin/crash
> crashoutput : /root/var/crash/2009-09-29-17:45/crash-data-200909291746.txt
> namelist : /root/boot/vmlinux-2.6.27.23-0.1-default
> vmcorefile : /root/var/crash/2009-09-29-17:45/vmcore
> debuginfofile : /root/usr/lib/debug/boot/vmlinux-2.6.27.23-0.1-default.debug
>
This portion above is debug info from crashdc.
> crash 4.0-7.6
> Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for details.
>
> NOTE: stdin: not a tty
>
> cannot determine relocation value: not a live system
> gdb /root/usr/lib/debug/boot/vmlinux-2.6.27.23-0.1-default.debug
>
>
> dlopen failed on 'libthread_db.so.1' - libthread_db.so.1: cannot open shared object file: No such file or directory
> GDB will not be able to debug pthreads.
>
> GNU gdb 6.1
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "i686-pc-linux-gnu".../usr/bin/crashdc: line 160: 682 Killed $crashexe -d $crashdebug $namelist $debuginfofile $vmcorefile < $crashcmd
> File /root/var/crash/2009-09-29-17:45/crash-data-200909291746.txt has been generated
> generated a crash-data file to /root/var/crash/2009-09-29-17:45
> ..done
> Restarting system.
My feeling is that the "dlopen failed on 'libthread_db.so.1'" might be
causing this.
Bernhard Walle might have an idea. Here is what I have in
/etc/sysconfig/kdump :
> KDUMP_REQUIRED_PROGRAMS="/bin/basename /usr/bin/crash /usr/bin/crashdc /usr/bin/run-crashdc-sles11.sh /etc/rc.status /bin/gzip /bin/logger /usr/bin/gdb /lib/libpthread.so.0 /lib/libthread_db.so.1"
This line allow for inclusion of the listed file into the initramfs file
that gets loaded at kexec time. Maybe crash/gdb is looking for it where
it's not, but right now I'm a bit stucked.
As a side not to Bernhard, is there a way to have kdump stop its
execution while in the kexec kernel ? Just like when it hits an error
and drop to a shell ?
TIA,
- --
Louis Bouchard, Linux Support Engineer
Team lead, EMEA Linux Competency Center,
Linux Ambassador, HP
HP Services 1 Ave du Canada
HP France Z.A. de Courtaboeuf
louis.bouchard(a)hp.com 91 947 Les Ulis
http://www.hp.com/go/linux France
http://www.hp.com/fr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkrDc2gACgkQDvqokHrhnCyU6QCdHSlzw6q2u0qoifczMp3AGMob
woAAoMbhGJaDgiQ6GhqB+PNTCvsyELB6
=l3xC
-----END PGP SIGNATURE-----
15 years, 1 month
Re: [Crash-utility] Re: crash 4.0.9 on x86 is crashing
by Dave Anderson
----- "Sumeet Gupta" <meetsumeet(a)gmail.com> wrote:
>
> >
> > On a side-topic - is an ARM port of this utility (ie, a vmcore
> > generated on an ARM system, debugged with crash offline on X86)
> > available, or in the offing?
>
> > None that I'm aware of. It was brought up on this list some time
> > ago, and I gave the requester some initial guidance on the steps
> > to take in order to add support for a new architecture. But I've
> > heard nothing since then.
>
>
> If you could give me a link to your previous mail having the guidance steps, that would be helpful.
https://www.redhat.com/archives/crash-utility/2007-August/msg00005.html
https://www.redhat.com/archives/crash-utility/2007-August/msg00017.html
The sketchy directions I gave were for supporting an architecture as is
done today, i.e., such that the host machine type would run its arch-specific
crash utility binary.
The only "cross-arch" support is the x86_64's capability of running a 32-bit
x86 binary. (and perhaps the same can be done on an ia64). But to run an
x86 binary to view an ARM dumpfile would be yet another level of complexity
for which there's no support.
There is a "cross-crash" utility maintained elsewhere -- I don't know
where or not it's still active -- and which if I'm not mistaken, supports
running an x86 crash binary to look at a MIPS vmcore. There also was
a group doing the same kind of thing for using an x86 binary to look
at 32-bit ppc vmcores. But again, I have no details on either initiative.
Dave
15 years, 1 month
Re: [Crash-utility] crashdc: problem when running crash in kexec environment for SLES11
by Dave Anderson
----- "Louis Bouchard" <louis.bouchard(a)hp.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> As I indicated previously, I'm having a problem with running crash from
> within the kexec environment of SLES11.
>
> Actually, crash itself runs fine, but its gdb portion seems to be having
> problems. Here is a capture of what happens :
>
> > Running /usr/bin/run-crashdc-sles11.sh
> > crashexe : /usr/bin/crash
> > crashoutput : /root/var/crash/2009-09-29-17:45/crash-data-200909291746.txt
> > namelist : /root/boot/vmlinux-2.6.27.23-0.1-default
> > vmcorefile : /root/var/crash/2009-09-29-17:45/vmcore
> > debuginfofile : /root/usr/lib/debug/boot/vmlinux-2.6.27.23-0.1-default.debug
> >
> This portion above is debug info from crashdc.
>
> > crash 4.0-7.6
> > Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Red Hat, Inc.
> > Copyright (C) 2004, 2005, 2006 IBM Corporation
> > Copyright (C) 1999-2006 Hewlett-Packard Co
> > Copyright (C) 2005, 2006 Fujitsu Limited
> > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> > Copyright (C) 2005 NEC Corporation
> > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> > This program is free software, covered by the GNU General Public License,
> > and you are welcome to change it and/or distribute copies of it under
> > certain conditions. Enter "help copying" to see the conditions.
> > This program has absolutely no warranty. Enter "help warranty" for details.
> >
> > NOTE: stdin: not a tty
> >
> > cannot determine relocation value: not a live system
> > gdb /root/usr/lib/debug/boot/vmlinux-2.6.27.23-0.1-default.debug
> >
> >
> > dlopen failed on 'libthread_db.so.1' - libthread_db.so.1: cannot open shared object file: No such file or directory
> > GDB will not be able to debug pthreads.
> >
> > GNU gdb 6.1
> > Copyright 2004 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and you are
> > welcome to change it and/or distribute copies of it under certain conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB. Type "show warranty" for details.
> > This GDB was configured as "i686-pc-linux-gnu".../usr/bin/crashdc:
> line 160: 682 Killed $crashexe -d $crashdebug $namelist $debuginfofile $vmcorefile < $crashcmd
> > File /root/var/crash/2009-09-29-17:45/crash-data-200909291746.txt has been generated
> > generated a crash-data file to /root/var/crash/2009-09-29-17:45
> > ..done
> > Restarting system.
>
> My feeling is that the "dlopen failed on 'libthread_db.so.1'" might be
> causing this.
I note that if you move the library away entirely, the embedded gdb in the
crash utility complains as above, but crash/gdb still continues to run.
>
> Bernhard Walle might have an idea. Here is what I have in
> /etc/sysconfig/kdump :
>
> KDUMP_REQUIRED_PROGRAMS="/bin/basename /usr/bin/crash /usr/bin/crashdc /usr/bin/run-crashdc-sles11.sh /etc/rc.status /bin/gzip /bin/logger /usr/bin/gdb /lib/libpthread.so.0 /lib/libthread_db.so.1"
Anyway, this is just a wild guess, and I'm presuming that you're running
with x86_64 above, but on RHEL it uses the 64-bit version of that library:
crash> vm
PID: 29013 TASK: ffff81003c2eb0c0 CPU: 4 COMMAND: "crash"
MM PGD RSS TOTAL_VM
ffff81003b128040 ffff81001e9cf000 116016k 190620k
VMA START END FLAGS FILE
ffff81000fd416b8 400000 830000 1875 /usr/bin/crash
ffff810008b2b558 a2f000 a51000 101873 /usr/bin/crash
ffff8100133a0558 a51000 bb9000 100073
ffff810009fd2b88 5ce8000 9782000 100073
ffff81003a5e8ef8 3fad600000 3fad61a000 875 /lib64/ld-2.5.so
ffff810025d75298 3fad81a000 3fad81b000 100871 /lib64/ld-2.5.so
ffff8100137be298 3fad81b000 3fad81c000 100873 /lib64/ld-2.5.so
ffff810032b43a28 3fada00000 3fada06000 75 /lib64/libthread_db-1.0.so
ffff810032b431e8 3fada06000 3fadc05000 70 /lib64/libthread_db-1.0.so
ffff810032b43088 3fadc05000 3fadc06000 100071 /lib64/libthread_db-1.0.so
ffff810032b43ce8 3fadc06000 3fadc07000 100073 /lib64/libthread_db-1.0.so
...
Does the KDUMP_REQUIRED_PROGRAMS list need to specify the /lib64 version,
or does SLES have a different 32/64-bit library setup?
Dave
15 years, 1 month
Re: [Crash-utility] Re: crash 4.0.9 on x86 is crashing
by Dave Anderson
----- "Sumeet Gupta" <meetsumeet(a)gmail.com> wrote:
> On Tue, Sep 29, 2009 at 7:11 PM, Dave Anderson < anderson(a)redhat.com >
> wrote:
> ----- "Sumeet Gupta" < meetsumeet(a)gmail.com > wrote:
>
> > Hi,
> >
> > It was my mistake.
> >
> > During an earlier debugging, I set verbose=1, misinterpreting it to
> > mean just a few debug prints.
>
> What do you mean by that? Where are you setting "verbose"?
>
> I was talking about the file x86.c, function x86_kvtop, around line
> 2622, where it returns from the function, or not, depending upon
> verbose. If the purpose of verbose in this function is only to have
> more fprintfs, then the behaviour should not change based on this
> argument. I had (to see whats going on in this function,) set the
> verbose flag in it to 1. This caused the recursion in function calls I
> mentioned.
>
Ah, OK -- now I get it...
Yes -- the purpose of the "verbose" flag in x86_kvtop() is not to just have
more debug fprintfs, but rather it is *only* used when the virtual-to-physical
translation is being requested by the "vtop" command -- which verbosely
walks through and displays the page table data.
Thanks,
Dave
15 years, 1 month
Re: [Crash-utility] Re: Question about fixing another crash annoyance...t
by Dave Anderson
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
>
> Dave,
>
> You are faster than me at fixing crash:-) I was just about to start on
> the part for kmem_cache_len_nodes...
>
> The patch fixes the problem on my example dump which previously said:
> =====
> please wait... (gathering kmem slab cache data)
> crash: page excluded: kernel virtual address: ffff88022457a000 type:
> "kmem_cache_s buffer"
>
> crash: unable to initialize kmem slab cache subsystem
> =====
>
> It now says:
> =====
> please wait... (gathering kmem slab cache data)
> kmem_cache_downsize: SIZE(kmem_cache_s): 872 cache_cache.buffer_size: 384
> kmem_cache_downsize: nr_node_ids: 2
> =====
>
Sorry -- I didn't mean to leave the "CRASHDEBUG(0)" in there, so the
messages above shouldn't normally be displayed. I'll make it
CRASHDEBUG(1) so we'll have a record of it happening if something
else comes up in the future.
>
> In the meantime, I remembered "--zero_exclude", which makes for
> a slightly dangerous workaround for the problem. It fills in the
> unnecessarily-accessed missing pages with zeros.
>
> The output of "kmem -s" and "kmem -S" on my problem dump is the
> same between your patched version and the old version running
> with --zero_exclude. (I don't normally think of using zero_exclude
> because it can mask both kernel bugs and makedumpfile bugs...)
Yep --zero_excluded masks this problem quite nicely... ;-)
> Thanks for making that patch. Is there anything left to
> fix in crash ?-)
No doubt...
Thanks,
Dave
15 years, 1 month
Re: [Crash-utility] Re: crash 4.0.9 on x86 is crashing
by Dave Anderson
----- "Sumeet Gupta" <meetsumeet(a)gmail.com> wrote:
> Hi,
>
> It was my mistake.
>
> During an earlier debugging, I set verbose=1, misinterpreting it to
> mean just a few debug prints.
What do you mean by that? Where are you setting "verbose"?
> On restoring the code, I'm able to run crash utility quite well. I'm
> yet to understand the purpose of verbose, though.
Right -- nor do I. Regardless, I don't know why you should have run
into that problem on that particular symbol. If you can make that
vmcore available for me to download, I can take a look at it.
>
> On a side-topic - is an ARM port of this utility (ie, a vmcore
> generated on an ARM system, debugged with crash offline on X86)
> available, or in the offing?
None that I'm aware of. It was brought up on this list some time
ago, and I gave the requester some initial guidance on the steps
to take in order to add support for a new architecture. But I've
heard nothing since then.
Dave
>
> Sumeet
>
>
> On Tue, Sep 29, 2009 at 7:03 AM, Sumeet Gupta < meetsumeet(a)gmail.com >
> wrote:
>
>
>
>
>
> Hi All,
>
> I downloaded and built crash 4.0.9 for Intel X86 linux machine.
> I'm trying to debug kdump-generated vmcore, taken using the "crash
> kernel".
> Kernel: 2.6.27.34
> Main kernel argument: crashkernel=64M@16M
> Main kernel loaded at 2M.
>
> The problem:
> ./crash <vmlinux> <vmcore>
> behaves kinda strange... it gets stuck in the following loop of
> function calls, during reading totalram_pages symbol:
> get_symbol_data("totalram_pages") -> readmem("totalram_pages") ->kvtop
> -> x86_kvtop ("pgd page") ->readmem ("pgd page") ->kvtop ->
> x86_kvtop("pgd page") -> readmem("pgd page")
>
>
> and eventually crashes, probably when recursion reaches stack limit.
>
> Why would such a situation happen...?
>
> With gdb, though, things are different - gdb is able to read the
> vmcore, and give the gdb prompt, at which I can see the init_mm
> structure, and backtrace etc. I also verified that pgd address
> (init_mm.pgd) is the same as crash is trying to read through
> readmem("pgd page").
>
>
> Any inputs will be very useful.
>
> Thanks,
> Sumeet
>
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
15 years, 1 month
Re: [Crash-utility] Re: Question about fixing another crash annoyance...t
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
>
> So the fix would be to first determine the cache_cache.buffer_size value,
> and use that to initialize the size_table.kmem_cache_s value used by the
> "SIZE(kmem_cache_s)" macro. Secondly, "vt->kmem_cache_len_nodes", which
> is also based upon the same MAX_NUMNODES array index value, needs to be
> downsized as well. It looks like if the kernel "nr_node_ids" exists as
> symbol (instead of a #define), then it should be used.
I'm thinking this patch should work:
http://people.redhat.com/anderson/memory.patch
Dave
15 years, 1 month
Re: Question about fixing another crash annoyance...t
by Dave Anderson
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> Dave,
>
> Please pardon the direct question, I'm attempting to cash in on my "dis
> -l" goodwill :-)
>
> The latest problem I'm working on:
>
> We occasionally get dumps that wake up in crash with:
>
> ...
> please wait... (gathering kmem slab cache data)
> crash-4.0.9-fix: page excluded: kernel virtual address:
> ffff88022457a000
> type: "kmem_cache_s buffer"
>
> crash-4.0.9-fix: unable to initialize kmem slab cache subsystem
> ...
>
> These are partial dumps with only kernel pages included.
>
> This problem comes about because readmem fails to read one
> of the kmem_cache structs in the list, for example:
>
> crash-4.0.9-fix> struct kmem_cache 0xffff880224579cc0
> struct kmem_cache struct: page excluded: kernel virtual address:
> ffff88022457a000 type: "gdb_readmem_callback"
> Cannot access memory at address 0xffff880224579cc0
>
> This struct starts toward the end of a page (0xffff880224579cc0)
> and extends into the next page (0xffff88022457a000) which has
> been excluded from the dump because it isn't a kernel page.
>
> That is pretty scary if I assume some bug in the kernel is
> giving pages back to user land that still hold parts of kernel
> structs. But that's not what's happening.
>
> crash-4.0.9-fix> struct -o kmem_cache
> struct kmem_cache {
> [0x0] struct array_cache *array[32];
> ...
> [0x158] struct list_head next;
> [0x168] struct kmem_list3 *nodelists[64];
> }
> SIZE: 0x368
>
> Crash thinks the struct is 0x368 in length, making the
> apparent end of the struct lie in the next page (...a000
> instead of ...9000)
>
> crash-4.0.9-fix> p/x 0xffff880224579cc0+0x368
> $3 = 0xffff88022457a028
>
> But the clever kernel folks did this in slab.c:
>
> /*
> * We put nodelists[] at the end of kmem_cache, because we want to size
> * this array to nr_node_ids slots instead of MAX_NUMNODES
> * (see kmem_cache_init())
> * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
> * is statically defined, so we reserve the max number of nodes.
> */
> struct kmem_list3 *nodelists[MAX_NUMNODES];
>
> So that means crash needs to curtail the read of kmem_cache
> to the actual size of the nodelists array, instead of the
> declared size.
>
> I still need to determine if the actual size is determined
> once for all instances, or per structure.
>
> This should affect partial dumps with kernels that use slab.c.
I never noticed that before -- the buffer_size of the global "cache_cache"
kmem_cache structure gets downsized here in kmem_cache_init() in 2.6.22
and later:
/*
* struct kmem_cache size depends on nr_node_ids, which
* can be less than MAX_NUMNODES.
*/
cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
nr_node_ids * sizeof(struct kmem_list3 *);
So the fix would be to first determine the cache_cache.buffer_size value,
and use that to initialize the size_table.kmem_cache_s value used by the
"SIZE(kmem_cache_s)" macro. Secondly, "vt->kmem_cache_len_nodes", which
is also based upon the same MAX_NUMNODES array index value, needs to be
downsized as well. It looks like if the kernel "nr_node_ids" exists as
symbol (instead of a #define), then it should be used.
> Any other structs in the kernel like this that crash already
> deals with?
None that I'm aware of...
Dave
15 years, 1 month
crash 4.0.9 on x86 is crashing
by Sumeet Gupta
Hi All,
I downloaded and built crash 4.0.9 for Intel X86 linux machine.
I'm trying to debug kdump-generated vmcore, taken using the "crash kernel".
Kernel: 2.6.27.34
Main kernel argument: crashkernel=64M@16M
Main kernel loaded at 2M.
The problem:
./crash <vmlinux> <vmcore>
behaves kinda strange... it gets stuck in the following loop of function
calls, during reading totalram_pages symbol:
get_symbol_data("totalram_pages") -> readmem("totalram_pages") ->kvtop
-> x86_kvtop ("pgd page") ->readmem ("pgd page") ->kvtop -> x86_kvtop("pgd
page") -> readmem("pgd page")
and eventually crashes, probably when recursion reaches stack limit.
Why would such a situation happen...?
With gdb, though, things are different - gdb is able to read the vmcore, and
give the gdb prompt, at which I can see the init_mm structure, and backtrace
etc. I also verified that pgd address (init_mm.pgd) is the same as crash is
trying to read through readmem("pgd page").
Any inputs will be very useful.
Thanks,
Sumeet
15 years, 1 month
Re: [Crash-utility] crashdc : automatic data collection for new vmcores in text format
by Louis Bouchard
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
> Is /usr/bin/crashdc itself a shell script?
>
> Dave
Yes, crashdc itself is a shell script, as well as the run-crashdc-*
which are intended to be invoked by the kdump mechanisms (kdump_post in
RHEL, KDUMP_POSTSCRIPT in SLES).
Kind Regards,
- --
Louis Bouchard, Linux Support Engineer
Team lead, EMEA Linux Competency Center,
Linux Ambassador, HP
HP Services 1 Ave du Canada
HP France Z.A. de Courtaboeuf
louis.bouchard(a)hp.com 91 947 Les Ulis
http://www.hp.com/go/linux France
http://www.hp.com/fr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkq80yAACgkQDvqokHrhnCycRQCfbRmrQHpGnbBfZab3KDgROJFD
hnMAoKUTU2gIDs/WWl2golnM6JUoDC89
=ErtS
-----END PGP SIGNATURE-----
15 years, 1 month