November 2011 - Crash-utility - Crash Utility List Archives

[RFC] makedumpfile, crash: LZO compression support

by HATAYAMA Daisuke

Hello, This is a RFC patch set that adds LZO compression support to makedumpfile and crash utility. LZO is as good as in size but by far better in speed than ZLIB, leading to reducing down time during generation of crash dump and refiltering. How to build: 1. Get LZO library, which is provided as lzo-devel package on recent linux distributions, and is also available on author's website: http://www.oberhumer.com/opensource/lzo/. 2. Apply the patch set to makedumpfile v1.4.0 and crash v6.0.0. 3. Build both using make. But for crash, do the following now: $ make CFLAGS="-llzo2" How to use: I've newly used -l option for lzo compression in this patch. So for example, do as follows: $ makedumpfile -l vmcore dumpfile $ crash vmlinux dumpfile Request of configure-like feature for crash utility: I would like configure-like feature on crash utility for users to select wheather to add LZO feature actually or not in build-time, that is: ./configure --enable-lzo or ./configure --disable-lzo. The reason is that support staff often downloads and installs the latest version of crash utility on machines where lzo library is not provided. Looking at the source code, it looks to me that crash does some kind of configuration processing in a local manner, around configure.c, and I guess it's difficult to use autoconf tools directly. Or is there another better way? Performance Comparison: Sample Data Ideally, I must have measured the performance for many enough vmcores generated from machines that was actually running, but now I don't have enough sample vmcores, I couldn't do so. So this comparison doesn't answer question on I/O time improvement. This is TODO for now. Instead, I choosed worst and best cases regarding compression ratio and speed only. Specifically, the former is /dev/urandom and the latter is /dev/zero. I get the sample data of 10MB, 100MB and 1GB by doing like this: $ dd bs=4096 count=$((1024*1024*1024/4096)) if=/dev/urandom of=urandom.1GB How to measure Then I performed compression for each block, 4096 bytes, and measured total compression time and output size. See attached mycompress.c. Result See attached file result.txt. Discussion For both kinds of data, lzo's compression was considerably quicker than zlib's. Compression ratio is about 37% for urandom data, and about 8.5% for zero data. Actual situation of physical memory would be in between the two cases, and so I guess average compression time ratio is between 37% and 8.5%. Although beyond the topic of this patch set, we can estimate worst compression time on more data size since compression is performed block size wise and the compression time increases linearly. Estimated worst time on 2TB memory is about 15 hours for lzo and about 40 hours for zlib. In this case, compressed data size is larger than the original, so they are really not used, compression time is fully meaningless. I think compression must be done in parallel, and I'll post such patch later. Diffstat * makedumpfile diskdump_mod.h | 3 +- makedumpfile.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++------ makedumpfile.h | 12 +++++++ 3 files changed, 101 insertions(+), 12 deletions(-) * crash defs.h | 1 + diskdump.c | 20 +++++++++++++++++++- diskdump.h | 3 ++- 3 files changed, 22 insertions(+), 2 deletions(-) TODO * evaluation including I/O time using actual vmcores Thanks. HATAYAMA, Daisuke

1 year, 8 months

3
15
0 / 0

Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format

by tachibana＠mxm.nes.nec.co.jp

Hi Hatayama-san, On 2011/06/29 12:12:18 +0900, HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com> wrote: > From: Dave Anderson <anderson(a)redhat.com> > Subject: Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format > Date: Tue, 28 Jun 2011 08:57:42 -0400 (EDT) > > > > > > > ----- Original Message ----- > >> Fujitsu has stand-alone dump mechanism based on firmware level > >> functionality, which we call SADUMP, in short. > >> > >> We've maintained utility tools internally but now we're thinking that > >> the best is crash utility and makedumpfile supports the sadump format > >> for the viewpoint of both portability and maintainability. > >> > >> We'll be of course responsible for its maintainance in a continuous > >> manner. The sadump dump format is very similar to diskdump format and > >> so kdump (compressed) format, so we estimate patch set would be a > >> relatively small size. > >> > >> Could you tell me whether crash utility and makedumpfile can support > >> the sadump format? If OK, we'll start to make patchset. I think it's not bad to support sadump by makedumpfile. However I have several questions. - Do you want to use makedumpfile to make an existing file that sadump has dumped small? - It isn't possible to support the same form as kdump-compressed format now, is it? - When the information that makedumpfile reads from a note of /proc/vmcore (or a header of kdump-compressed format) is added by an extension of makedumpfile, do you need to modify sadump? Thanks tachibana > > > > Sure, yes, the crash utility can always support another dumpfile format. > > > > Thanks. It helps a lot. > > > It's unclear to me how similar SADUMP is to diskdump/compressed-kdump. > > Does your internal version patch diskdump.c, or do you maintain your > > own "sadump.c"? I ask because if your patchset is at all intrusive, > > I'd prefer it be kept in its own file, primarily for maintainability, > > but also because SADUMP is essentially a black-box to anybody outside > > Fujitsu. > > What I meant when I used ``similar'' is both literally and > logically. The format consists of diskdump header-like header, two > kinds of bitmaps used for the same purpose as those in diskump format, > and memory data. They can be handled in common with the existing data > structure, diskdump_data, non-intrusively, so I hope they are placed > in diskdump.c. > > On the other hand, there's a code to be placed at such specific > area. sadump is triggered depending on kdump's progress and so > register values to be contained in vmcore varies according to the > progress: If crash_notes has been initialized when sadump is > triggered, sadump packs the register values in crash_notes; if not > yet, packs registers gathered by firmware. This is sadump specific > processing, so I think putting it in specific sadump.c file is a > natural and reasonable choise. > > Anyway, I have not made any patch set for this. I'll post a patch set > when I complete. > > Again, thanks a lot for the positive answer. > > Thanks. > HATAYAMA, Daisuke > > > _______________________________________________ > kexec mailing list > kexec(a)lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec

1 year, 8 months

3
5
0 / 0

[ANNOUNCE] crash version 6.0.1 is available

by Dave Anderson

Download from: http://people.redhat.com/anderson - Several fixes/updates for the 32-bit PPC architecture: (1) Delete "__func__.<number>" symbols from the symbol list. (2) Update manner of determining the processor speed displayed by the initial system banner and the "sys" command. (3) Use the kernel's online cpus mask for determining the cpu count. (4) Enable the "bt" command to follow traces that start in a per-cpu IRQ stack. (5) Fix for the "bt" command to better prevent runaway stack traces. (6) Fix for the "bt" command to recognize/display 2.6 kernel exception frames. (7) Update "bt" command's exception frame register display. (8) Implement "bt -f" option. (nakayama.ts(a)ncos.nec.co.jp) - Fix for the X86 kernel module line-number capability on some kernels. It is unclear why only some kernel versions exhibit this problem, but the newly-embedded gdb version 7.3.1 has changed behaviour such that the addrmap arrays of module text address blocks may contain the module text offset values instead of their loaded vmalloc addresses, and so without the patch, there is no "match" for the vmalloc address when searching for its line number information. It is fixed by doing a preliminary symbol search before accessing the line-number access routine. (anderson(a)redhat.com) - Fix for the X86_64 kernel module line-number capability on kernels that have functions preceded by the __vsyscall_fn macro, which puts the kernel text function in the vsyscall page that starts at virtual address 0xffffffffff600000. This results in a text address block that starts at a normal kernel text address but ends with a vsyscall address, which inadvertently contains the whole vmalloc address range. Without the patch, line number requests for module vmalloc text addresses would be mistakenly issued the first text section that ended with a vsyscall address, but then cannot find line number information in that section. (anderson(a)redhat.com) - Fix for the inadvertent patching of the symbols of the 32-bit Xen hypervisor binary. Without the patch, during initialization the minimal_symbols are "patched" with their original values, so they remain unchanged, and the message "WARNING: kernel relocated [0MB]: patching 3434 gdb minimal_symbol values" is displayed. (anderson(a)redhat.com) - If the "--mod <directory-tree>" command line option, or the setting of the CRASH_MODULE_PATH environment variable, or the "mod -S <directory-tree>" point to a tree that contains only the separate debuginfo "<module>.ko.debug" files, then those debuginfo files will be used as the internal "add-symbol-file" arguments to the embedded gdb module. Without the patch, it was only acceptable to point to a directory tree that contained the base "<module>.ko" files, and the separate debuginfo files were found automatically based upon the directory path to the base module file. This will allow an alternate module-debuginfo directory tree to be set up like so: # cd <directory> # rpm2cpio kernel-debuginfo-<release>.rpm | cpio -idv Having done that, the <directory> may be used with the "--mod", command line argument, or as the CRASH_MODULE_PATH environment variable, or as the "mod -S <directory> argument. (anderson(a)redhat.com) - Make the suspension of the verbose/time-consuming "sym -l" output immediate upon the killing of the output pipe, or the entry of the first CTRL-c. Without the patch, it would typically take several seconds, or multiple CTRL-c entries, for the "crash>" prompt to be re-displayed. (anderson(a)redhat.com) - Fix for the handling of piped commands if the command receiving the crash output is non-existent or invalid. Without the patch, the crash command would wait indefinitely unless multiple CTRL-c entries were entered. (anderson(a)redhat.com) - Fix for the s390x "bt" command's floating point register display header. Without the patch, the header indicates that only registers 0, 2, 4 and 6 are printed, a relic of the s390 architecture, whereas on the s390x all floating point registers are displayed. (holzheu(a)linux.vnet.ibm.com) - Fix for the error message displayed when an untrusted .gdbinit file exists in the current directory. Without the patch, the error message "WARNING: not using untrusted file: " would be followed by garbage ASCII data instead of the full pathname of the .gdbinit file. (anderson(a)redhat.com) - Fix for the "kmem -p" and "kmem -i" commands in 3.1 and later kernels where the page structure's "_count" member was moved into an embedded anonymous structure. Without the patch, the commands fail with the error message "kmem: invalid structure member offset: page_count FILE: memory.c LINE: 4610 FUNCTION: dump_mem_map_SPARSEMEM()". (anderson(a)redhat.com) - Allow the user to append data to the CFLAGS and LDFLAGS variables in the top-level Makefile. The extra data should be put in files named "CFLAGS.extra" and "LDFLAGS.extra" in the top-level directory; if either or both files exist, the extra data within them will be appended to the relevant variable. Typically the LDFLAGS.extra file will contain "-l<library>" strings, and the CFLAGS.extra file will contain "-D<value>" strings. This will allow the crash utility to be built with optional libraries, and the code that references them to be encapsulated with associated "#ifdef <value>" sections. The extra CFLAGS data will also be passed to extension modules that are built within the local "crash-<version>/extensions" subdirectory. (anderson(a)redhat.com) - The LDFLAGS setting in the Makefile can no longer be modified by hand. It will be automatically configured by the "configure -b" option, based upon the contents of the optional "LDFLAGS.extra" file. (anderson(a)redhat.com) - Fix for the "runq" command to display the runnable tasks that are contained within a cgroup's task-group scheduling entity. Without the patch, only scheduling entities that are individual tasks get displayed, and runnable tasks in task-group scheduling entities get skipped. (d.hatayama(a)jp.fujitsu.com, anderson(a)redhat.com) - Fix for the SIAL extension module when repeatedly loading and unloading a sial script when a full pathname is specified for the script. Without the patch, the 4th unload attempt generates a segmentation violation. (lmcilroy(a)redhat.com) - Fix for the SIAL extension module to register the help and usage functions for a command only when loading a script. (lchouinard(a)s2sys.com)

13 years, 8 months

1
0
0 / 0

Re: [Crash-utility] Fix use after free in sial variable lists

by Dave Anderson

----- Original Message ----- > > > Lachlan - thanks for chasing this down. > Your fix will work and is safe. The real problem is that we call the > help and usage functions for the associated command even if we are > doing an unload. We should really only call these during a load. > > Can you try the patch below against your test case? Lachlan is in Melbourne and most likely will not be responding until tomorrow. But anyway, I'll also queue this patch for crash-6.0.1 -- which I'd hoped to get out today, but I'll defer it until tomorrow. Thanks, Dave > > --- crash-6.0.0/extensions/sial.c 2011-10-25 10:58:15.000000000 -0700 > +++ crash-6.0.0.new/extensions/sial.c 2011-11-29 05:59:54.552190994 > -0800 > @@ -919,25 +919,26 @@ > if(!help) return; > snprintf(fname, sizeof(fname), "%s_help", name); > if(sial_chkfname(fname, 0)) { > - help_str=sial_strdup((char*)(unsigned long)sial_exefunc(fname, 0)); > snprintf(fname, sizeof(fname), "%s_usage", name); > if(sial_chkfname(fname, 0)) { > if(load) { > opt_str=sial_strdup((char*)(unsigned long)sial_exefunc(fname, 0)); > + snprintf(fname, sizeof(fname), "%s_help", name); > + help_str=sial_strdup((char*)(unsigned long)sial_exefunc(fname, 0)); > help[0]=sial_strdup(name); > help[1]=""; > help[2]=sial_strdup(opt_str); > help[3]=sial_strdup(help_str); > help[4]=0; > add_sial_cmd(name, run_callback, help, 0); > + sial_free(help_str); > + sial_free(opt_str); > return; > } > else rm_sial_cmd(name); > } > - sial_free(help_str); > } > free(help); > - return; > } > > /* > > -----Original Message----- > From: Dave Anderson [ mailto:anderson@redhat.com ] > Sent: Tue 11/29/2011 8:55 AM > To: Lachlan McIlroy; Discussion list for crash utility usage, > maintenance and development > Cc: Luc Chouinard > Subject: Re: [Crash-utility] Fix use after free in sial variable > lists > > > > ----- Original Message ----- > > > > I encountered a segfault in the sial module when repeatedly loading > > and unloading > > a sial script. The bug is repeatable and it always segfaults on the > > 4th unload. > > The bug only triggers if a pathname is specified for the sial > > script too (just > > doing 'load test.sial, unload test.sial, ...' doesn't trigger the > > problem). > > > > The cause of the problem is that sial_freefile() will free the > > memory used by the > > static and global variable lists but leaves the stale pointer in > > the fdata object. > > This stale pointer is then assumed to be allocated later by > > sial_inivars(). > > > > I've included a patch below to NULL out the static and global > > variable list > > pointers after they are deallocated and this fixes the problem for > > me although > > I'm not totally sure it's the best way to fix this. > > Makes perfect sense to me -- unless Luc objects or has a better idea, > consider it queued for crash-6.0.1. > > Thanks, > Dave > > > > > Lachlan > > > > ... > > crash> extend /home/lmcilroy/bin/sial.so > > Core LINUX_RELEASE == '2.6.18-238.12.1.el5' > > < Sial interpreter version 3.0 > > > Loading sial commands from > > /usr/share/sial/crash:/home/lmcilroy/.sial .... Done. > > /home/lmcilroy/bin/sial.so: shared object loaded > > crash> load very_long_directory_name/test.sial > > crash> unload very_long_directory_name/test.sial > > crash> load very_long_directory_name/test.sial > > crash> unload very_long_directory_name/test.sial > > crash> load very_long_directory_name/test.sial > > crash> unload very_long_directory_name/test.sial > > crash> load very_long_directory_name/test.sial > > crash> unload very_long_directory_name/test.sial > > Segmentation fault > > > > > > Program received signal SIGSEGV, Segmentation fault. > > sial_inivars (sv=0x2cf4488) at sial_var.c:549 > > 549 if(!v->ini && v->dv && v->dv->init) { > > Missing separate debuginfos, use: debuginfo-install > > crash-4.0.9-2.fc12.x86_64 > > (gdb) bt > > #0 sial_inivars (sv=0x2cf4488) at sial_var.c:549 > > #1 0x00007fffed6b1d24 in sial_addsvs (type=1, sv=<value optimized > > out>) at sial_var.c:821 > > #2 0x00007fffed69f07d in sial_execmcfunc (f=<value optimized out>, > > vp=<value optimized out>) at sial_func.c:900 > > #3 0x00007fffed6a026b in sial_exefunc (fname=0x7fffffffdc90 > > "cciss_help", vp=0x0) at sial_func.c:968 > > #4 0x00007fffed69cac2 in reg_callback (name=0x2c30538 "cciss", > > load=0) at sial.c:922 > > #5 0x00007fffed69f72f in sial_docallback (fd=0x2f2ed58) at > > sial_func.c:264 > > #6 sial_freefile (fd=0x2f2ed58) at sial_func.c:288 > > #7 0x00007fffed69fe2a in sial_deletefile (name=0x7885a28 > > "very_long_directory_name/test.sial") at sial_func.c:314 > > #8 0x00007fffed6a5ce6 in sial_loadunload (load=0, name=<value > > optimized out>, silent=0) at sial_api.c:1289 > > #9 0x00007fffed69c77d in unload_cmd () at sial.c:775 > > #10 0x000000000045d334 in exec_command () at main.c:740 > > #11 0x000000000045d57a in main_loop () at main.c:699 > > #12 0x0000000000554d19 in captured_command_loop (data=<value > > optimized out>) at ./main.c:228 > > #13 0x0000000000552feb in catch_errors (func=<value optimized out>, > > func_args=<value optimized out>, errstring=<value optimized out>, > > mask=<value optimized out>) at exceptions.c:531 > > #14 0x0000000000554a26 in captured_main (data=<value optimized > > out>) > > at ./main.c:958 > > #15 0x0000000000552feb in catch_errors (func=<value optimized out>, > > func_args=<value optimized out>, errstring=<value optimized out>, > > mask=<value optimized out>) at exceptions.c:531 > > #16 0x0000000000553be4 in gdb_main (args=0x2cf4488) at ./main.c:973 > > #17 0x0000000000553c1e in gdb_main_entry (argc=<value optimized > > out>, > > argv=<value optimized out>) at ./main.c:993 > > #18 0x000000000045e0df in main (argc=<value optimized out>, > > argv=<value optimized out>) at main.c:603 > > (gdb) p sv > > $1 = (var_t *) 0x2cf4488 > > (gdb) p sv->next > > $2 = (struct var_s *) 0x7463657269645f67 > > > > 0x7463657269645f67 = "g-direct" which is part of the sial pathname > > (with the > > underscore converted) so the memory has been reallocated. > > > > > > diff -up crash-6.0.0/extensions/libsial/sial_func.c.orig > > crash-6.0.0/extensions/libsial/sial_func.c > > --- crash-6.0.0/extensions/libsial/sial_func.c.orig 2011-11-29 > > 13:09:43.985631958 +1100 > > +++ crash-6.0.0/extensions/libsial/sial_func.c 2011-11-29 > > 13:10:31.930603477 +1100 > > @@ -280,8 +280,14 @@ sial_freefile(fdata *fd) > > } > > > > /* free the associated static and global variables */ > > - if(fd->fsvs) sial_freesvs(fd->fsvs); > > - if(fd->fgvs) sial_freesvs(fd->fgvs); > > + if(fd->fsvs) { > > + sial_freesvs(fd->fsvs); > > + fd->fsvs = NULL; > > + } > > + if(fd->fgvs) { > > + sial_freesvs(fd->fgvs); > > + fd->fgvs = NULL; > > + } > > > > /* free all function nodes */ > > // let debugger know ... > > > > -- > > Crash-utility mailing list > > Crash-utility(a)redhat.com > > https://www.redhat.com/mailman/listinfo/crash-utility > > > > > >

13 years, 8 months

2
1
0 / 0

Fix use after free in sial variable lists

by Lachlan McIlroy

I encountered a segfault in the sial module when repeatedly loading and unloading a sial script. The bug is repeatable and it always segfaults on the 4th unload. The bug only triggers if a pathname is specified for the sial script too (just doing 'load test.sial, unload test.sial, ...' doesn't trigger the problem). The cause of the problem is that sial_freefile() will free the memory used by the static and global variable lists but leaves the stale pointer in the fdata object. This stale pointer is then assumed to be allocated later by sial_inivars(). I've included a patch below to NULL out the static and global variable list pointers after they are deallocated and this fixes the problem for me although I'm not totally sure it's the best way to fix this. Lachlan ... crash> extend /home/lmcilroy/bin/sial.so Core LINUX_RELEASE == '2.6.18-238.12.1.el5' < Sial interpreter version 3.0 > Loading sial commands from /usr/share/sial/crash:/home/lmcilroy/.sial .... Done. /home/lmcilroy/bin/sial.so: shared object loaded crash> load very_long_directory_name/test.sial crash> unload very_long_directory_name/test.sial crash> load very_long_directory_name/test.sial crash> unload very_long_directory_name/test.sial crash> load very_long_directory_name/test.sial crash> unload very_long_directory_name/test.sial crash> load very_long_directory_name/test.sial crash> unload very_long_directory_name/test.sial Segmentation fault Program received signal SIGSEGV, Segmentation fault. sial_inivars (sv=0x2cf4488) at sial_var.c:549 549 if(!v->ini && v->dv && v->dv->init) { Missing separate debuginfos, use: debuginfo-install crash-4.0.9-2.fc12.x86_64 (gdb) bt #0 sial_inivars (sv=0x2cf4488) at sial_var.c:549 #1 0x00007fffed6b1d24 in sial_addsvs (type=1, sv=<value optimized out>) at sial_var.c:821 #2 0x00007fffed69f07d in sial_execmcfunc (f=<value optimized out>, vp=<value optimized out>) at sial_func.c:900 #3 0x00007fffed6a026b in sial_exefunc (fname=0x7fffffffdc90 "cciss_help", vp=0x0) at sial_func.c:968 #4 0x00007fffed69cac2 in reg_callback (name=0x2c30538 "cciss", load=0) at sial.c:922 #5 0x00007fffed69f72f in sial_docallback (fd=0x2f2ed58) at sial_func.c:264 #6 sial_freefile (fd=0x2f2ed58) at sial_func.c:288 #7 0x00007fffed69fe2a in sial_deletefile (name=0x7885a28 "very_long_directory_name/test.sial") at sial_func.c:314 #8 0x00007fffed6a5ce6 in sial_loadunload (load=0, name=<value optimized out>, silent=0) at sial_api.c:1289 #9 0x00007fffed69c77d in unload_cmd () at sial.c:775 #10 0x000000000045d334 in exec_command () at main.c:740 #11 0x000000000045d57a in main_loop () at main.c:699 #12 0x0000000000554d19 in captured_command_loop (data=<value optimized out>) at ./main.c:228 #13 0x0000000000552feb in catch_errors (func=<value optimized out>, func_args=<value optimized out>, errstring=<value optimized out>, mask=<value optimized out>) at exceptions.c:531 #14 0x0000000000554a26 in captured_main (data=<value optimized out>) at ./main.c:958 #15 0x0000000000552feb in catch_errors (func=<value optimized out>, func_args=<value optimized out>, errstring=<value optimized out>, mask=<value optimized out>) at exceptions.c:531 #16 0x0000000000553be4 in gdb_main (args=0x2cf4488) at ./main.c:973 #17 0x0000000000553c1e in gdb_main_entry (argc=<value optimized out>, argv=<value optimized out>) at ./main.c:993 #18 0x000000000045e0df in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:603 (gdb) p sv $1 = (var_t *) 0x2cf4488 (gdb) p sv->next $2 = (struct var_s *) 0x7463657269645f67 0x7463657269645f67 = "g-direct" which is part of the sial pathname (with the underscore converted) so the memory has been reallocated. diff -up crash-6.0.0/extensions/libsial/sial_func.c.orig crash-6.0.0/extensions/libsial/sial_func.c --- crash-6.0.0/extensions/libsial/sial_func.c.orig 2011-11-29 13:09:43.985631958 +1100 +++ crash-6.0.0/extensions/libsial/sial_func.c 2011-11-29 13:10:31.930603477 +1100 @@ -280,8 +280,14 @@ sial_freefile(fdata *fd) } /* free the associated static and global variables */ - if(fd->fsvs) sial_freesvs(fd->fsvs); - if(fd->fgvs) sial_freesvs(fd->fgvs); + if(fd->fsvs) { + sial_freesvs(fd->fsvs); + fd->fsvs = NULL; + } + if(fd->fgvs) { + sial_freesvs(fd->fgvs); + fd->fgvs = NULL; + } /* free all function nodes */ // let debugger know ...

13 years, 8 months

2
1
0 / 0

[PATCH v1 0/4] Kdump core analysis support for PPC440x

by Suzuki K. Poulose

The following series implements the kdump core analysis support for PPC32. I have posted the KDUMP kernel support patches for PPC440x here : http://lists.ozlabs.org/pipermail/linuxppc-dev/2011-November/093933.html You need upstream git snapshot of kexec-tools for kdump support on PPC440x. Patches are based on crash-6.0.0 --- Suzuki K. Poulose (4): [ppc] Enable stack frame display for KDUMP [ppc][netdump] Read registers from ELF note [ppc] Function to print the PPC register set Support PPC32 Core analysis on PPC64 Host configure.c | 11 ++++ netdump.c | 77 +++++++++++++++++++++++++++++ ppc.c | 156 +++++++++++++++++++++++++++++++++++++++++++++++------------ 3 files changed, 211 insertions(+), 33 deletions(-) -- Suzuki

13 years, 8 months

4
8
0 / 0

crash sometimes doesn't terminate, loops forever looking for a process that doesn't exist

by Adrien Kunysz

Dear crash-utility, In our vmcore analysis infrastructure we stumbled on a case where crash doesn't terminate. When examining the state of the process with gdb it seems to be looping forever over /proc/$pid/stat in an attempt to determine the PID of a process that doesn't exist any more. The backtrace: (gdb) bt #0 0x00007fb9814fea57 in munmap () from /lib/libc.so.6 #1 0x00007fb9814a30aa in _IO_setb () from /lib/libc.so.6 #2 0x00007fb9814a1d18 in _IO_file_close_it () from /lib/libc.so.6 #3 0x00007fb981495a48 in fclose () from /lib/libc.so.6 #4 0x00000000004fe75b in output_command_to_pids () at cmdline.c:775 #5 0x00000000004fed7c in setup_redirect (origin=1) at cmdline.c:519 #6 0x00000000005012bb in process_command_line () at cmdline.c:149 #7 0x000000000045f575 in main_loop () at main.c:610 #8 0x0000000000541af9 in captured_command_loop (data=0x7fb982282000) at ./main.c:226 #9 0x000000000053fd8b in catch_errors (func=<value optimized out>, func_args=<value optimized out>, errstring=<value optimized out>, mask=<value optimized out>) at exceptions.c:520 #10 0x00000000005415b6 in captured_main (data=<value optimized out>) at ./main.c:924 #11 0x000000000053fd8b in catch_errors (func=<value optimized out>, func_args=<value optimized out>, errstring=<value optimized out>, mask=<value optimized out>) at exceptions.c:520 #12 0x0000000000540994 in gdb_main (args=0x1000) at ./main.c:939 #13 0x00000000005409ce in gdb_main_entry (argc=<value optimized out>, argv=0x1000) at ./main.c:959 #14 0x000000000046025a in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:525 The problematic code: 720 /* 721 * Determine the pids of the current popen'd shell and output command. 722 * This is all done using /proc; the ps kludge at the bottom of this 723 * routine is legacy, and should only get executed if /proc doesn't exist. 724 */ 725 static int 726 output_command_to_pids(void) 727 { ... 738 int retries; 739 740 retries = 0; 741 pc->pipe_pid = pc->pipe_shell_pid = 0; 742 sprintf(lookfor, "(%s)", pc->pipe_command); 743 stall(1000); 744 retry: 745 if (is_directory("/proc") && (dirp = opendir("/proc"))) { 746 for (dp = readdir(dirp); dp && !pc->pipe_pid; 747 dp = readdir(dirp)) { 748 if (!decimal(dp->d_name, 0)) 749 continue; 750 sprintf(buf1, "/proc/%s/stat", dp->d_name); 751 if (file_exists(buf1, NULL) && 752 (stp = fopen(buf1, "r"))) { 753 if (fgets(buf2, BUFSIZE, stp)) { 754 pid = strtok(buf2, " "); 755 name = strtok(NULL, " "); 756 status = strtok(NULL, " "); 757 p_pid = strtok(NULL, " "); 758 pgrp = strtok(NULL, " "); 759 if (STREQ(name, "(sh)") && 760 (atoi(p_pid) == getpid())) 761 pc->pipe_shell_pid = atoi(pid); 762 if (STREQ(name, lookfor) && 763 ((atoi(p_pid) == getpid()) || 764 (atoi(p_pid) == pc->pipe_shell_pid) 765 || (atoi(pgrp) == getpid()))) { 766 pc->pipe_pid = atoi(pid); 767 console( 768 "FOUND[%d] (%d->%d->%d) %s %s p_pid: %s pgrp: %s\n", 769 retries, getpid(), 770 pc->pipe_shell_pid, 771 pc->pipe_pid, 772 name, status, p_pid, pgrp); 773 } 774 } 775 fclose(stp); 776 } 777 } 778 closedir(dirp); 779 } 780 781 if (!pc->pipe_pid && ((retries++ < 10) || pc->pipe_shell_pid)) { 782 stall(1000); 783 goto retry; 784 } Looking at how many times it has been looping over /proc: (gdb) p retries $19 = 138056108 It found the PID of the shell but not of the command: (gdb) p pc->pipe_shell_pid $20 = 9306 (gdb) p pc->pipe_pid $21 = 0 For completeness the command that was being run was looking like this: (gdb) p pc->orig_line $26 = "log | grep -A1 'some string' >> /some/file", '\000' [...] So it seems something like this happened: +>popen(grep) +--> fork(); execve(sh) +---> fork(); execve(grep) +----> grep exit()s for some reason +>crash(8) finds sh in /proc +---> sh exit +>crash(8) keeps looking for grep in /proc I have a second core showing a similar situation if that's of any use but now we just work around the problem by wrapping crash(8) within timeout(1). We could try and fix that function to bail out when the shell exits but it really doesn't look like a nice way to do it to me. So I looked at the reasons we want the PID of that command and it seems there are two: * determining whether the process is still alive This can be done by checking whether the intervening shell is still alive. Obtaining only the PID of the shell seems less problematic than trying to get the PID of the grandchildren. At worst reimplementing popen() to store the PID of sh is not exactly hard. * terminating the process forcibly (SIGKILL) This is done in close_output() which is only called from within restart() when its argument is not SIGSEGV, SIGPIPE, SIGINT or 0. I cannot find that function being set as a signal handler for anything else or being called with an argument different from 0. As far as I can tell this is dead code. Before I write/test/submit a patch, am I missing something or would it make sense to get rid of that pipe_pid? Thanks, Adrien

13 years, 9 months

3
8
0 / 0

[PATCH] s390: Fix heading for s390x floating point registers

by Michael Holzheu

Hello Dave, The heading for floating point registers that are printed for active tasks is wrong. It tells that only registers 0,2,4, and 6 are printed. This is a relict from s390 (31 bit) times. On s390x (64 bit) we have all floating point registers. Now the correct heading is printed. Signed-off-by: Michael Holzheu <holzheu(a)linux.vnet.ibm.com> --- s390x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/s390x.c +++ b/s390x.c @@ -1196,7 +1196,7 @@ s390x_print_lowcore(char* lc, struct bt_ fprintf(fp," %#018lx %#018lx\n", tmp[2],tmp[3]); ptr = lc + MEMBER_OFFSET("_lowcore","floating_pt_save_area"); - fprintf(fp," -floating point registers 0,2,4,6:\n"); + fprintf(fp," -floating point registers:\n"); tmp[0]=ULONG(ptr); tmp[1]=ULONG(ptr + S390X_WORD_SIZE); tmp[2]=ULONG(ptr + 2 * S390X_WORD_SIZE);

13 years, 9 months

2
1
0 / 0

Fix buildrequires in RPM spec file to build sial extention.

by Wade Mealing

Gday, I noticed that the documentation on the extention page ( http://people.redhat.com/anderson/extensions.html ) mentions that the sial scripts should be built by default when crash is built. In the current release (version 6.0.0) the sial scripts are not being built due to a missing BuildRequires dependencies in the spec file, in my case specifically flex and bison. Attached is a modification to the crash.spec file to include the correct BuildRequires. Thanks, Wade Mealing

13 years, 9 months

3
3
0 / 0

Loading debuginfo symbols

by Lachlan McIlroy

Hi, I would like to be able to load symbols from a specific debuginfo directory using 'mod -S <dir>'. This works if you unpack the corresponding kernel rpm too but I'd like to be able to save on the disk space by just using the debuginfo rpm. This currently doesn't work because crash doesn't explicitly search for modules ending in .ko.debug. The simple patch below adds this support. Is this a sensible change to make or is there a better way to do this? Lachlan diff -up crash-5.1.9/kernel.c.orig crash-5.1.9/kernel.c --- crash-5.1.9/kernel.c.orig 2011-11-08 14:15:51.467399576 +1100 +++ crash-5.1.9/kernel.c 2011-11-09 14:26:23.264253341 +1100 @@ -3588,7 +3588,10 @@ module_objfile_search(char *modref, char { case KMOD_V2: sprintf(file, "%s.ko", modref); - retbuf = search_directory_tree(tree, file, 1); + if (!(retbuf = search_directory_tree(tree, file, 1))) { + sprintf(file, "%s.ko.debug", modref); + retbuf = search_directory_tree(tree, file, 1); + } } } return retbuf; @@ -3605,7 +3608,10 @@ module_objfile_search(char *modref, char { case KMOD_V2: sprintf(file, "%s.ko", modref); - retbuf = search_directory_tree(dir, file, 0); + if (!(retbuf = search_directory_tree(dir, file, 0))) { + sprintf(file, "%s.ko.debug", modref); + retbuf = search_directory_tree(dir, file, 0); + } } } }

13 years, 9 months

2
2
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility November 2011