September 2009 - Crash-utility - Crash Utility List Archives

crashdc: problem when running crash in kexec environment for SLES11

by Louis Bouchard

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, As I indicated previously, I'm having a problem with running crash from within the kexec environment of SLES11. Actually, crash itself runs fine, but its gdb portion seems to be having problems. Here is a capture of what happens : > Running /usr/bin/run-crashdc-sles11.sh > crashexe : /usr/bin/crash > crashoutput : /root/var/crash/2009-09-29-17:45/crash-data-200909291746.txt > namelist : /root/boot/vmlinux-2.6.27.23-0.1-default > vmcorefile : /root/var/crash/2009-09-29-17:45/vmcore > debuginfofile : /root/usr/lib/debug/boot/vmlinux-2.6.27.23-0.1-default.debug > This portion above is debug info from crashdc. > crash 4.0-7.6 > Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Red Hat, Inc. > Copyright (C) 2004, 2005, 2006 IBM Corporation > Copyright (C) 1999-2006 Hewlett-Packard Co > Copyright (C) 2005, 2006 Fujitsu Limited > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. > Copyright (C) 2005 NEC Corporation > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. > This program is free software, covered by the GNU General Public License, > and you are welcome to change it and/or distribute copies of it under > certain conditions. Enter "help copying" to see the conditions. > This program has absolutely no warranty. Enter "help warranty" for details. > > NOTE: stdin: not a tty > > cannot determine relocation value: not a live system > gdb /root/usr/lib/debug/boot/vmlinux-2.6.27.23-0.1-default.debug > > > dlopen failed on 'libthread_db.so.1' - libthread_db.so.1: cannot open shared object file: No such file or directory > GDB will not be able to debug pthreads. > > GNU gdb 6.1 > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i686-pc-linux-gnu".../usr/bin/crashdc: line 160: 682 Killed $crashexe -d $crashdebug $namelist $debuginfofile $vmcorefile < $crashcmd > File /root/var/crash/2009-09-29-17:45/crash-data-200909291746.txt has been generated > generated a crash-data file to /root/var/crash/2009-09-29-17:45 > ..done > Restarting system. My feeling is that the "dlopen failed on 'libthread_db.so.1'" might be causing this. Bernhard Walle might have an idea. Here is what I have in /etc/sysconfig/kdump : > KDUMP_REQUIRED_PROGRAMS="/bin/basename /usr/bin/crash /usr/bin/crashdc /usr/bin/run-crashdc-sles11.sh /etc/rc.status /bin/gzip /bin/logger /usr/bin/gdb /lib/libpthread.so.0 /lib/libthread_db.so.1" This line allow for inclusion of the listed file into the initramfs file that gets loaded at kexec time. Maybe crash/gdb is looking for it where it's not, but right now I'm a bit stucked. As a side not to Bernhard, is there a way to have kdump stop its execution while in the kexec kernel ? Just like when it hits an error and drop to a shell ? TIA, - -- Louis Bouchard, Linux Support Engineer Team lead, EMEA Linux Competency Center, Linux Ambassador, HP HP Services 1 Ave du Canada HP France Z.A. de Courtaboeuf louis.bouchard(a)hp.com 91 947 Les Ulis http://www.hp.com/go/linux France http://www.hp.com/fr -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkrDc2gACgkQDvqokHrhnCyU6QCdHSlzw6q2u0qoifczMp3AGMob woAAoMbhGJaDgiQ6GhqB+PNTCvsyELB6 =l3xC -----END PGP SIGNATURE-----

15 years, 11 months

2
3
0 / 0

Re: [Crash-utility] Re: crash 4.0.9 on x86 is crashing

by Dave Anderson

----- "Sumeet Gupta" <meetsumeet(a)gmail.com> wrote: > > > > > On a side-topic - is an ARM port of this utility (ie, a vmcore > > generated on an ARM system, debugged with crash offline on X86) > > available, or in the offing? > > > None that I'm aware of. It was brought up on this list some time > > ago, and I gave the requester some initial guidance on the steps > > to take in order to add support for a new architecture. But I've > > heard nothing since then. > > > If you could give me a link to your previous mail having the guidance steps, that would be helpful. https://www.redhat.com/archives/crash-utility/2007-August/msg00005.html https://www.redhat.com/archives/crash-utility/2007-August/msg00017.html The sketchy directions I gave were for supporting an architecture as is done today, i.e., such that the host machine type would run its arch-specific crash utility binary. The only "cross-arch" support is the x86_64's capability of running a 32-bit x86 binary. (and perhaps the same can be done on an ia64). But to run an x86 binary to view an ARM dumpfile would be yet another level of complexity for which there's no support. There is a "cross-crash" utility maintained elsewhere -- I don't know where or not it's still active -- and which if I'm not mistaken, supports running an x86 crash binary to look at a MIPS vmcore. There also was a group doing the same kind of thing for using an x86 binary to look at 32-bit ppc vmcores. But again, I have no details on either initiative. Dave

15 years, 11 months

1
0
0 / 0

Re: [Crash-utility] crashdc: problem when running crash in kexec environment for SLES11

by Dave Anderson

----- "Louis Bouchard" <louis.bouchard(a)hp.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > > As I indicated previously, I'm having a problem with running crash from > within the kexec environment of SLES11. > > Actually, crash itself runs fine, but its gdb portion seems to be having > problems. Here is a capture of what happens : > > > Running /usr/bin/run-crashdc-sles11.sh > > crashexe : /usr/bin/crash > > crashoutput : /root/var/crash/2009-09-29-17:45/crash-data-200909291746.txt > > namelist : /root/boot/vmlinux-2.6.27.23-0.1-default > > vmcorefile : /root/var/crash/2009-09-29-17:45/vmcore > > debuginfofile : /root/usr/lib/debug/boot/vmlinux-2.6.27.23-0.1-default.debug > > > This portion above is debug info from crashdc. > > > crash 4.0-7.6 > > Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Red Hat, Inc. > > Copyright (C) 2004, 2005, 2006 IBM Corporation > > Copyright (C) 1999-2006 Hewlett-Packard Co > > Copyright (C) 2005, 2006 Fujitsu Limited > > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. > > Copyright (C) 2005 NEC Corporation > > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. > > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. > > This program is free software, covered by the GNU General Public License, > > and you are welcome to change it and/or distribute copies of it under > > certain conditions. Enter "help copying" to see the conditions. > > This program has absolutely no warranty. Enter "help warranty" for details. > > > > NOTE: stdin: not a tty > > > > cannot determine relocation value: not a live system > > gdb /root/usr/lib/debug/boot/vmlinux-2.6.27.23-0.1-default.debug > > > > > > dlopen failed on 'libthread_db.so.1' - libthread_db.so.1: cannot open shared object file: No such file or directory > > GDB will not be able to debug pthreads. > > > > GNU gdb 6.1 > > Copyright 2004 Free Software Foundation, Inc. > > GDB is free software, covered by the GNU General Public License, and you are > > welcome to change it and/or distribute copies of it under certain conditions. > > Type "show copying" to see the conditions. > > There is absolutely no warranty for GDB. Type "show warranty" for details. > > This GDB was configured as "i686-pc-linux-gnu".../usr/bin/crashdc: > line 160: 682 Killed $crashexe -d $crashdebug $namelist $debuginfofile $vmcorefile < $crashcmd > > File /root/var/crash/2009-09-29-17:45/crash-data-200909291746.txt has been generated > > generated a crash-data file to /root/var/crash/2009-09-29-17:45 > > ..done > > Restarting system. > > My feeling is that the "dlopen failed on 'libthread_db.so.1'" might be > causing this. I note that if you move the library away entirely, the embedded gdb in the crash utility complains as above, but crash/gdb still continues to run. > > Bernhard Walle might have an idea. Here is what I have in > /etc/sysconfig/kdump : > > KDUMP_REQUIRED_PROGRAMS="/bin/basename /usr/bin/crash /usr/bin/crashdc /usr/bin/run-crashdc-sles11.sh /etc/rc.status /bin/gzip /bin/logger /usr/bin/gdb /lib/libpthread.so.0 /lib/libthread_db.so.1" Anyway, this is just a wild guess, and I'm presuming that you're running with x86_64 above, but on RHEL it uses the 64-bit version of that library: crash> vm PID: 29013 TASK: ffff81003c2eb0c0 CPU: 4 COMMAND: "crash" MM PGD RSS TOTAL_VM ffff81003b128040 ffff81001e9cf000 116016k 190620k VMA START END FLAGS FILE ffff81000fd416b8 400000 830000 1875 /usr/bin/crash ffff810008b2b558 a2f000 a51000 101873 /usr/bin/crash ffff8100133a0558 a51000 bb9000 100073 ffff810009fd2b88 5ce8000 9782000 100073 ffff81003a5e8ef8 3fad600000 3fad61a000 875 /lib64/ld-2.5.so ffff810025d75298 3fad81a000 3fad81b000 100871 /lib64/ld-2.5.so ffff8100137be298 3fad81b000 3fad81c000 100873 /lib64/ld-2.5.so ffff810032b43a28 3fada00000 3fada06000 75 /lib64/libthread_db-1.0.so ffff810032b431e8 3fada06000 3fadc05000 70 /lib64/libthread_db-1.0.so ffff810032b43088 3fadc05000 3fadc06000 100071 /lib64/libthread_db-1.0.so ffff810032b43ce8 3fadc06000 3fadc07000 100073 /lib64/libthread_db-1.0.so ... Does the KDUMP_REQUIRED_PROGRAMS list need to specify the /lib64 version, or does SLES have a different 32/64-bit library setup? Dave

15 years, 11 months

2
2
0 / 0

Re: [Crash-utility] Re: crash 4.0.9 on x86 is crashing

by Dave Anderson

----- "Sumeet Gupta" <meetsumeet(a)gmail.com> wrote: > On Tue, Sep 29, 2009 at 7:11 PM, Dave Anderson < anderson(a)redhat.com > > wrote: > ----- "Sumeet Gupta" < meetsumeet(a)gmail.com > wrote: > > > Hi, > > > > It was my mistake. > > > > During an earlier debugging, I set verbose=1, misinterpreting it to > > mean just a few debug prints. > > What do you mean by that? Where are you setting "verbose"? > > I was talking about the file x86.c, function x86_kvtop, around line > 2622, where it returns from the function, or not, depending upon > verbose. If the purpose of verbose in this function is only to have > more fprintfs, then the behaviour should not change based on this > argument. I had (to see whats going on in this function,) set the > verbose flag in it to 1. This caused the recursion in function calls I > mentioned. > Ah, OK -- now I get it... Yes -- the purpose of the "verbose" flag in x86_kvtop() is not to just have more debug fprintfs, but rather it is *only* used when the virtual-to-physical translation is being requested by the "vtop" command -- which verbosely walks through and displays the page table data. Thanks, Dave

15 years, 11 months

1
0
0 / 0

Re: [Crash-utility] Re: Question about fixing another crash annoyance...t

by Dave Anderson

----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote: > > Dave, > > You are faster than me at fixing crash:-) I was just about to start on > the part for kmem_cache_len_nodes... > > The patch fixes the problem on my example dump which previously said: > ===== > please wait... (gathering kmem slab cache data) > crash: page excluded: kernel virtual address: ffff88022457a000 type: > "kmem_cache_s buffer" > > crash: unable to initialize kmem slab cache subsystem > ===== > > It now says: > ===== > please wait... (gathering kmem slab cache data) > kmem_cache_downsize: SIZE(kmem_cache_s): 872 cache_cache.buffer_size: 384 > kmem_cache_downsize: nr_node_ids: 2 > ===== > Sorry -- I didn't mean to leave the "CRASHDEBUG(0)" in there, so the messages above shouldn't normally be displayed. I'll make it CRASHDEBUG(1) so we'll have a record of it happening if something else comes up in the future. > > In the meantime, I remembered "--zero_exclude", which makes for > a slightly dangerous workaround for the problem. It fills in the > unnecessarily-accessed missing pages with zeros. > > The output of "kmem -s" and "kmem -S" on my problem dump is the > same between your patched version and the old version running > with --zero_exclude. (I don't normally think of using zero_exclude > because it can mask both kernel bugs and makedumpfile bugs...) Yep --zero_excluded masks this problem quite nicely... ;-) > Thanks for making that patch. Is there anything left to > fix in crash ?-) No doubt... Thanks, Dave

15 years, 11 months

1
0
0 / 0

Re: [Crash-utility] Re: crash 4.0.9 on x86 is crashing

by Dave Anderson

----- "Sumeet Gupta" <meetsumeet(a)gmail.com> wrote: > Hi, > > It was my mistake. > > During an earlier debugging, I set verbose=1, misinterpreting it to > mean just a few debug prints. What do you mean by that? Where are you setting "verbose"? > On restoring the code, I'm able to run crash utility quite well. I'm > yet to understand the purpose of verbose, though. Right -- nor do I. Regardless, I don't know why you should have run into that problem on that particular symbol. If you can make that vmcore available for me to download, I can take a look at it. > > On a side-topic - is an ARM port of this utility (ie, a vmcore > generated on an ARM system, debugged with crash offline on X86) > available, or in the offing? None that I'm aware of. It was brought up on this list some time ago, and I gave the requester some initial guidance on the steps to take in order to add support for a new architecture. But I've heard nothing since then. Dave > > Sumeet > > > On Tue, Sep 29, 2009 at 7:03 AM, Sumeet Gupta < meetsumeet(a)gmail.com > > wrote: > > > > > > Hi All, > > I downloaded and built crash 4.0.9 for Intel X86 linux machine. > I'm trying to debug kdump-generated vmcore, taken using the "crash > kernel". > Kernel: 2.6.27.34 > Main kernel argument: crashkernel=64M@16M > Main kernel loaded at 2M. > > The problem: > ./crash <vmlinux> <vmcore> > behaves kinda strange... it gets stuck in the following loop of > function calls, during reading totalram_pages symbol: > get_symbol_data("totalram_pages") -> readmem("totalram_pages") ->kvtop > -> x86_kvtop ("pgd page") ->readmem ("pgd page") ->kvtop -> > x86_kvtop("pgd page") -> readmem("pgd page") > > > and eventually crashes, probably when recursion reaches stack limit. > > Why would such a situation happen...? > > With gdb, though, things are different - gdb is able to read the > vmcore, and give the gdb prompt, at which I can see the init_mm > structure, and backtrace etc. I also verified that pgd address > (init_mm.pgd) is the same as crash is trying to read through > readmem("pgd page"). > > > Any inputs will be very useful. > > Thanks, > Sumeet > > > -- > Crash-utility mailing list > Crash-utility(a)redhat.com > https://www.redhat.com/mailman/listinfo/crash-utility

15 years, 11 months

2
1
0 / 0

Re: [Crash-utility] Re: Question about fixing another crash annoyance...t

by Dave Anderson

----- "Dave Anderson" <anderson(a)redhat.com> wrote: > > So the fix would be to first determine the cache_cache.buffer_size value, > and use that to initialize the size_table.kmem_cache_s value used by the > "SIZE(kmem_cache_s)" macro. Secondly, "vt->kmem_cache_len_nodes", which > is also based upon the same MAX_NUMNODES array index value, needs to be > downsized as well. It looks like if the kernel "nr_node_ids" exists as > symbol (instead of a #define), then it should be used. I'm thinking this patch should work: http://people.redhat.com/anderson/memory.patch Dave

15 years, 11 months

2
1
0 / 0

Re: Question about fixing another crash annoyance...t

by Dave Anderson

----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote: > Dave, > > Please pardon the direct question, I'm attempting to cash in on my "dis > -l" goodwill :-) > > The latest problem I'm working on: > > We occasionally get dumps that wake up in crash with: > > ... > please wait... (gathering kmem slab cache data) > crash-4.0.9-fix: page excluded: kernel virtual address: > ffff88022457a000 > type: "kmem_cache_s buffer" > > crash-4.0.9-fix: unable to initialize kmem slab cache subsystem > ... > > These are partial dumps with only kernel pages included. > > This problem comes about because readmem fails to read one > of the kmem_cache structs in the list, for example: > > crash-4.0.9-fix> struct kmem_cache 0xffff880224579cc0 > struct kmem_cache struct: page excluded: kernel virtual address: > ffff88022457a000 type: "gdb_readmem_callback" > Cannot access memory at address 0xffff880224579cc0 > > This struct starts toward the end of a page (0xffff880224579cc0) > and extends into the next page (0xffff88022457a000) which has > been excluded from the dump because it isn't a kernel page. > > That is pretty scary if I assume some bug in the kernel is > giving pages back to user land that still hold parts of kernel > structs. But that's not what's happening. > > crash-4.0.9-fix> struct -o kmem_cache > struct kmem_cache { > [0x0] struct array_cache *array[32]; > ... > [0x158] struct list_head next; > [0x168] struct kmem_list3 *nodelists[64]; > } > SIZE: 0x368 > > Crash thinks the struct is 0x368 in length, making the > apparent end of the struct lie in the next page (...a000 > instead of ...9000) > > crash-4.0.9-fix> p/x 0xffff880224579cc0+0x368 > $3 = 0xffff88022457a028 > > But the clever kernel folks did this in slab.c: > > /* > * We put nodelists[] at the end of kmem_cache, because we want to size > * this array to nr_node_ids slots instead of MAX_NUMNODES > * (see kmem_cache_init()) > * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache > * is statically defined, so we reserve the max number of nodes. > */ > struct kmem_list3 *nodelists[MAX_NUMNODES]; > > So that means crash needs to curtail the read of kmem_cache > to the actual size of the nodelists array, instead of the > declared size. > > I still need to determine if the actual size is determined > once for all instances, or per structure. > > This should affect partial dumps with kernels that use slab.c. I never noticed that before -- the buffer_size of the global "cache_cache" kmem_cache structure gets downsized here in kmem_cache_init() in 2.6.22 and later: /* * struct kmem_cache size depends on nr_node_ids, which * can be less than MAX_NUMNODES. */ cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + nr_node_ids * sizeof(struct kmem_list3 *); So the fix would be to first determine the cache_cache.buffer_size value, and use that to initialize the size_table.kmem_cache_s value used by the "SIZE(kmem_cache_s)" macro. Secondly, "vt->kmem_cache_len_nodes", which is also based upon the same MAX_NUMNODES array index value, needs to be downsized as well. It looks like if the kernel "nr_node_ids" exists as symbol (instead of a #define), then it should be used. > Any other structs in the kernel like this that crash already > deals with? None that I'm aware of... Dave

15 years, 11 months

1
0
0 / 0

crash 4.0.9 on x86 is crashing

by Sumeet Gupta

Hi All, I downloaded and built crash 4.0.9 for Intel X86 linux machine. I'm trying to debug kdump-generated vmcore, taken using the "crash kernel". Kernel: 2.6.27.34 Main kernel argument: crashkernel=64M@16M Main kernel loaded at 2M. The problem: ./crash <vmlinux> <vmcore> behaves kinda strange... it gets stuck in the following loop of function calls, during reading totalram_pages symbol: get_symbol_data("totalram_pages") -> readmem("totalram_pages") ->kvtop -> x86_kvtop ("pgd page") ->readmem ("pgd page") ->kvtop -> x86_kvtop("pgd page") -> readmem("pgd page") and eventually crashes, probably when recursion reaches stack limit. Why would such a situation happen...? With gdb, though, things are different - gdb is able to read the vmcore, and give the gdb prompt, at which I can see the init_mm structure, and backtrace etc. I also verified that pgd address (init_mm.pgd) is the same as crash is trying to read through readmem("pgd page"). Any inputs will be very useful. Thanks, Sumeet

15 years, 11 months

1
1
0 / 0

Re: [Crash-utility] crashdc : automatic data collection for new vmcores in text format

by Louis Bouchard

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, > Is /usr/bin/crashdc itself a shell script? > > Dave Yes, crashdc itself is a shell script, as well as the run-crashdc-* which are intended to be invoked by the kdump mechanisms (kdump_post in RHEL, KDUMP_POSTSCRIPT in SLES). Kind Regards, - -- Louis Bouchard, Linux Support Engineer Team lead, EMEA Linux Competency Center, Linux Ambassador, HP HP Services 1 Ave du Canada HP France Z.A. de Courtaboeuf louis.bouchard(a)hp.com 91 947 Les Ulis http://www.hp.com/go/linux France http://www.hp.com/fr -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkq80yAACgkQDvqokHrhnCycRQCfbRmrQHpGnbBfZab3KDgROJFD hnMAoKUTU2gIDs/WWl2golnM6JUoDC89 =ErtS -----END PGP SIGNATURE-----

15 years, 11 months

3
4
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility September 2009