[RFC] makedumpfile, crash: LZO compression support
by HATAYAMA Daisuke
Hello,
This is a RFC patch set that adds LZO compression support to
makedumpfile and crash utility. LZO is as good as in size but by far
better in speed than ZLIB, leading to reducing down time during
generation of crash dump and refiltering.
How to build:
1. Get LZO library, which is provided as lzo-devel package on recent
linux distributions, and is also available on author's website:
http://www.oberhumer.com/opensource/lzo/.
2. Apply the patch set to makedumpfile v1.4.0 and crash v6.0.0.
3. Build both using make. But for crash, do the following now:
$ make CFLAGS="-llzo2"
How to use:
I've newly used -l option for lzo compression in this patch. So for
example, do as follows:
$ makedumpfile -l vmcore dumpfile
$ crash vmlinux dumpfile
Request of configure-like feature for crash utility:
I would like configure-like feature on crash utility for users to
select wheather to add LZO feature actually or not in build-time,
that is: ./configure --enable-lzo or ./configure --disable-lzo.
The reason is that support staff often downloads and installs the
latest version of crash utility on machines where lzo library is not
provided.
Looking at the source code, it looks to me that crash does some kind
of configuration processing in a local manner, around configure.c,
and I guess it's difficult to use autoconf tools directly.
Or is there another better way?
Performance Comparison:
Sample Data
Ideally, I must have measured the performance for many enough
vmcores generated from machines that was actually running, but now
I don't have enough sample vmcores, I couldn't do so. So this
comparison doesn't answer question on I/O time improvement. This
is TODO for now.
Instead, I choosed worst and best cases regarding compression
ratio and speed only. Specifically, the former is /dev/urandom and
the latter is /dev/zero.
I get the sample data of 10MB, 100MB and 1GB by doing like this:
$ dd bs=4096 count=$((1024*1024*1024/4096)) if=/dev/urandom of=urandom.1GB
How to measure
Then I performed compression for each block, 4096 bytes, and
measured total compression time and output size. See attached
mycompress.c.
Result
See attached file result.txt.
Discussion
For both kinds of data, lzo's compression was considerably quicker
than zlib's. Compression ratio is about 37% for urandom data, and
about 8.5% for zero data. Actual situation of physical memory
would be in between the two cases, and so I guess average
compression time ratio is between 37% and 8.5%.
Although beyond the topic of this patch set, we can estimate worst
compression time on more data size since compression is performed
block size wise and the compression time increases
linearly. Estimated worst time on 2TB memory is about 15 hours for
lzo and about 40 hours for zlib. In this case, compressed data
size is larger than the original, so they are really not used,
compression time is fully meaningless. I think compression must be
done in parallel, and I'll post such patch later.
Diffstat
* makedumpfile
diskdump_mod.h | 3 +-
makedumpfile.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++------
makedumpfile.h | 12 +++++++
3 files changed, 101 insertions(+), 12 deletions(-)
* crash
defs.h | 1 +
diskdump.c | 20 +++++++++++++++++++-
diskdump.h | 3 ++-
3 files changed, 22 insertions(+), 2 deletions(-)
TODO
* evaluation including I/O time using actual vmcores
Thanks.
HATAYAMA, Daisuke
1 year, 2 months
Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format
by tachibana@mxm.nes.nec.co.jp
Hi Hatayama-san,
On 2011/06/29 12:12:18 +0900, HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com> wrote:
> From: Dave Anderson <anderson(a)redhat.com>
> Subject: Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format
> Date: Tue, 28 Jun 2011 08:57:42 -0400 (EDT)
>
> >
> >
> > ----- Original Message -----
> >> Fujitsu has stand-alone dump mechanism based on firmware level
> >> functionality, which we call SADUMP, in short.
> >>
> >> We've maintained utility tools internally but now we're thinking that
> >> the best is crash utility and makedumpfile supports the sadump format
> >> for the viewpoint of both portability and maintainability.
> >>
> >> We'll be of course responsible for its maintainance in a continuous
> >> manner. The sadump dump format is very similar to diskdump format and
> >> so kdump (compressed) format, so we estimate patch set would be a
> >> relatively small size.
> >>
> >> Could you tell me whether crash utility and makedumpfile can support
> >> the sadump format? If OK, we'll start to make patchset.
I think it's not bad to support sadump by makedumpfile. However I have
several questions.
- Do you want to use makedumpfile to make an existing file that sadump has
dumped small?
- It isn't possible to support the same form as kdump-compressed format
now, is it?
- When the information that makedumpfile reads from a note of /proc/vmcore
(or a header of kdump-compressed format) is added by an extension of
makedumpfile, do you need to modify sadump?
Thanks
tachibana
> >
> > Sure, yes, the crash utility can always support another dumpfile format.
> >
>
> Thanks. It helps a lot.
>
> > It's unclear to me how similar SADUMP is to diskdump/compressed-kdump.
> > Does your internal version patch diskdump.c, or do you maintain your
> > own "sadump.c"? I ask because if your patchset is at all intrusive,
> > I'd prefer it be kept in its own file, primarily for maintainability,
> > but also because SADUMP is essentially a black-box to anybody outside
> > Fujitsu.
>
> What I meant when I used ``similar'' is both literally and
> logically. The format consists of diskdump header-like header, two
> kinds of bitmaps used for the same purpose as those in diskump format,
> and memory data. They can be handled in common with the existing data
> structure, diskdump_data, non-intrusively, so I hope they are placed
> in diskdump.c.
>
> On the other hand, there's a code to be placed at such specific
> area. sadump is triggered depending on kdump's progress and so
> register values to be contained in vmcore varies according to the
> progress: If crash_notes has been initialized when sadump is
> triggered, sadump packs the register values in crash_notes; if not
> yet, packs registers gathered by firmware. This is sadump specific
> processing, so I think putting it in specific sadump.c file is a
> natural and reasonable choise.
>
> Anyway, I have not made any patch set for this. I'll post a patch set
> when I complete.
>
> Again, thanks a lot for the positive answer.
>
> Thanks.
> HATAYAMA, Daisuke
>
>
> _______________________________________________
> kexec mailing list
> kexec(a)lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
1 year, 2 months
[ANNOUNCE] crash version 6.0.1 is available
by Dave Anderson
Download from: http://people.redhat.com/anderson
- Several fixes/updates for the 32-bit PPC architecture:
(1) Delete "__func__.<number>" symbols from the symbol list.
(2) Update manner of determining the processor speed displayed
by the initial system banner and the "sys" command.
(3) Use the kernel's online cpus mask for determining the cpu count.
(4) Enable the "bt" command to follow traces that start in a per-cpu
IRQ stack.
(5) Fix for the "bt" command to better prevent runaway stack traces.
(6) Fix for the "bt" command to recognize/display 2.6 kernel
exception frames.
(7) Update "bt" command's exception frame register display.
(8) Implement "bt -f" option.
(nakayama.ts(a)ncos.nec.co.jp)
- Fix for the X86 kernel module line-number capability on some kernels.
It is unclear why only some kernel versions exhibit this problem,
but the newly-embedded gdb version 7.3.1 has changed behaviour such
that the addrmap arrays of module text address blocks may contain
the module text offset values instead of their loaded vmalloc
addresses, and so without the patch, there is no "match" for the
vmalloc address when searching for its line number information.
It is fixed by doing a preliminary symbol search before accessing
the line-number access routine.
(anderson(a)redhat.com)
- Fix for the X86_64 kernel module line-number capability on kernels
that have functions preceded by the __vsyscall_fn macro, which
puts the kernel text function in the vsyscall page that starts
at virtual address 0xffffffffff600000. This results in a text
address block that starts at a normal kernel text address but
ends with a vsyscall address, which inadvertently contains the
whole vmalloc address range. Without the patch, line number
requests for module vmalloc text addresses would be mistakenly
issued the first text section that ended with a vsyscall address,
but then cannot find line number information in that section.
(anderson(a)redhat.com)
- Fix for the inadvertent patching of the symbols of the 32-bit Xen
hypervisor binary. Without the patch, during initialization the
minimal_symbols are "patched" with their original values, so they
remain unchanged, and the message "WARNING: kernel relocated [0MB]:
patching 3434 gdb minimal_symbol values" is displayed.
(anderson(a)redhat.com)
- If the "--mod <directory-tree>" command line option, or the
setting of the CRASH_MODULE_PATH environment variable, or the
"mod -S <directory-tree>" point to a tree that contains only the
separate debuginfo "<module>.ko.debug" files, then those
debuginfo files will be used as the internal "add-symbol-file"
arguments to the embedded gdb module. Without the patch, it was
only acceptable to point to a directory tree that contained the
base "<module>.ko" files, and the separate debuginfo files
were found automatically based upon the directory path to the
base module file. This will allow an alternate module-debuginfo
directory tree to be set up like so:
# cd <directory>
# rpm2cpio kernel-debuginfo-<release>.rpm | cpio -idv
Having done that, the <directory> may be used with the "--mod",
command line argument, or as the CRASH_MODULE_PATH environment
variable, or as the "mod -S <directory> argument.
(anderson(a)redhat.com)
- Make the suspension of the verbose/time-consuming "sym -l" output
immediate upon the killing of the output pipe, or the entry of the
first CTRL-c. Without the patch, it would typically take several
seconds, or multiple CTRL-c entries, for the "crash>" prompt to be
re-displayed.
(anderson(a)redhat.com)
- Fix for the handling of piped commands if the command receiving
the crash output is non-existent or invalid. Without the patch,
the crash command would wait indefinitely unless multiple CTRL-c
entries were entered.
(anderson(a)redhat.com)
- Fix for the s390x "bt" command's floating point register display
header. Without the patch, the header indicates that only registers
0, 2, 4 and 6 are printed, a relic of the s390 architecture, whereas
on the s390x all floating point registers are displayed.
(holzheu(a)linux.vnet.ibm.com)
- Fix for the error message displayed when an untrusted .gdbinit file
exists in the current directory. Without the patch, the error
message "WARNING: not using untrusted file: " would be followed by
garbage ASCII data instead of the full pathname of the .gdbinit file.
(anderson(a)redhat.com)
- Fix for the "kmem -p" and "kmem -i" commands in 3.1 and later kernels
where the page structure's "_count" member was moved into an embedded
anonymous structure. Without the patch, the commands fail with the
error message "kmem: invalid structure member offset: page_count
FILE: memory.c LINE: 4610 FUNCTION: dump_mem_map_SPARSEMEM()".
(anderson(a)redhat.com)
- Allow the user to append data to the CFLAGS and LDFLAGS variables in
the top-level Makefile. The extra data should be put in files named
"CFLAGS.extra" and "LDFLAGS.extra" in the top-level directory; if
either or both files exist, the extra data within them will be
appended to the relevant variable. Typically the LDFLAGS.extra file
will contain "-l<library>" strings, and the CFLAGS.extra file will
contain "-D<value>" strings. This will allow the crash utility to
be built with optional libraries, and the code that references them
to be encapsulated with associated "#ifdef <value>" sections. The
extra CFLAGS data will also be passed to extension modules that are
built within the local "crash-<version>/extensions" subdirectory.
(anderson(a)redhat.com)
- The LDFLAGS setting in the Makefile can no longer be modified by
hand. It will be automatically configured by the "configure -b"
option, based upon the contents of the optional "LDFLAGS.extra" file.
(anderson(a)redhat.com)
- Fix for the "runq" command to display the runnable tasks that
are contained within a cgroup's task-group scheduling entity.
Without the patch, only scheduling entities that are individual
tasks get displayed, and runnable tasks in task-group scheduling
entities get skipped.
(d.hatayama(a)jp.fujitsu.com, anderson(a)redhat.com)
- Fix for the SIAL extension module when repeatedly loading and
unloading a sial script when a full pathname is specified for the
script. Without the patch, the 4th unload attempt generates a
segmentation violation.
(lmcilroy(a)redhat.com)
- Fix for the SIAL extension module to register the help and usage
functions for a command only when loading a script.
(lchouinard(a)s2sys.com)
13 years, 1 month
Re: [Crash-utility] Fix use after free in sial variable lists
by Dave Anderson
----- Original Message -----
>
>
> Lachlan - thanks for chasing this down.
> Your fix will work and is safe. The real problem is that we call the
> help and usage functions for the associated command even if we are
> doing an unload. We should really only call these during a load.
>
> Can you try the patch below against your test case?
Lachlan is in Melbourne and most likely will not be responding until
tomorrow.
But anyway, I'll also queue this patch for crash-6.0.1 -- which I'd
hoped to get out today, but I'll defer it until tomorrow.
Thanks,
Dave
>
> --- crash-6.0.0/extensions/sial.c 2011-10-25 10:58:15.000000000 -0700
> +++ crash-6.0.0.new/extensions/sial.c 2011-11-29 05:59:54.552190994
> -0800
> @@ -919,25 +919,26 @@
> if(!help) return;
> snprintf(fname, sizeof(fname), "%s_help", name);
> if(sial_chkfname(fname, 0)) {
> - help_str=sial_strdup((char*)(unsigned long)sial_exefunc(fname, 0));
> snprintf(fname, sizeof(fname), "%s_usage", name);
> if(sial_chkfname(fname, 0)) {
> if(load) {
> opt_str=sial_strdup((char*)(unsigned long)sial_exefunc(fname, 0));
> + snprintf(fname, sizeof(fname), "%s_help", name);
> + help_str=sial_strdup((char*)(unsigned long)sial_exefunc(fname, 0));
> help[0]=sial_strdup(name);
> help[1]="";
> help[2]=sial_strdup(opt_str);
> help[3]=sial_strdup(help_str);
> help[4]=0;
> add_sial_cmd(name, run_callback, help, 0);
> + sial_free(help_str);
> + sial_free(opt_str);
> return;
> }
> else rm_sial_cmd(name);
> }
> - sial_free(help_str);
> }
> free(help);
> - return;
> }
>
> /*
>
> -----Original Message-----
> From: Dave Anderson [ mailto:anderson@redhat.com ]
> Sent: Tue 11/29/2011 8:55 AM
> To: Lachlan McIlroy; Discussion list for crash utility usage,
> maintenance and development
> Cc: Luc Chouinard
> Subject: Re: [Crash-utility] Fix use after free in sial variable
> lists
>
>
>
> ----- Original Message -----
> >
> > I encountered a segfault in the sial module when repeatedly loading
> > and unloading
> > a sial script. The bug is repeatable and it always segfaults on the
> > 4th unload.
> > The bug only triggers if a pathname is specified for the sial
> > script too (just
> > doing 'load test.sial, unload test.sial, ...' doesn't trigger the
> > problem).
> >
> > The cause of the problem is that sial_freefile() will free the
> > memory used by the
> > static and global variable lists but leaves the stale pointer in
> > the fdata object.
> > This stale pointer is then assumed to be allocated later by
> > sial_inivars().
> >
> > I've included a patch below to NULL out the static and global
> > variable list
> > pointers after they are deallocated and this fixes the problem for
> > me although
> > I'm not totally sure it's the best way to fix this.
>
> Makes perfect sense to me -- unless Luc objects or has a better idea,
> consider it queued for crash-6.0.1.
>
> Thanks,
> Dave
>
> >
> > Lachlan
> >
> > ...
> > crash> extend /home/lmcilroy/bin/sial.so
> > Core LINUX_RELEASE == '2.6.18-238.12.1.el5'
> > < Sial interpreter version 3.0 >
> > Loading sial commands from
> > /usr/share/sial/crash:/home/lmcilroy/.sial .... Done.
> > /home/lmcilroy/bin/sial.so: shared object loaded
> > crash> load very_long_directory_name/test.sial
> > crash> unload very_long_directory_name/test.sial
> > crash> load very_long_directory_name/test.sial
> > crash> unload very_long_directory_name/test.sial
> > crash> load very_long_directory_name/test.sial
> > crash> unload very_long_directory_name/test.sial
> > crash> load very_long_directory_name/test.sial
> > crash> unload very_long_directory_name/test.sial
> > Segmentation fault
> >
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > sial_inivars (sv=0x2cf4488) at sial_var.c:549
> > 549 if(!v->ini && v->dv && v->dv->init) {
> > Missing separate debuginfos, use: debuginfo-install
> > crash-4.0.9-2.fc12.x86_64
> > (gdb) bt
> > #0 sial_inivars (sv=0x2cf4488) at sial_var.c:549
> > #1 0x00007fffed6b1d24 in sial_addsvs (type=1, sv=<value optimized
> > out>) at sial_var.c:821
> > #2 0x00007fffed69f07d in sial_execmcfunc (f=<value optimized out>,
> > vp=<value optimized out>) at sial_func.c:900
> > #3 0x00007fffed6a026b in sial_exefunc (fname=0x7fffffffdc90
> > "cciss_help", vp=0x0) at sial_func.c:968
> > #4 0x00007fffed69cac2 in reg_callback (name=0x2c30538 "cciss",
> > load=0) at sial.c:922
> > #5 0x00007fffed69f72f in sial_docallback (fd=0x2f2ed58) at
> > sial_func.c:264
> > #6 sial_freefile (fd=0x2f2ed58) at sial_func.c:288
> > #7 0x00007fffed69fe2a in sial_deletefile (name=0x7885a28
> > "very_long_directory_name/test.sial") at sial_func.c:314
> > #8 0x00007fffed6a5ce6 in sial_loadunload (load=0, name=<value
> > optimized out>, silent=0) at sial_api.c:1289
> > #9 0x00007fffed69c77d in unload_cmd () at sial.c:775
> > #10 0x000000000045d334 in exec_command () at main.c:740
> > #11 0x000000000045d57a in main_loop () at main.c:699
> > #12 0x0000000000554d19 in captured_command_loop (data=<value
> > optimized out>) at ./main.c:228
> > #13 0x0000000000552feb in catch_errors (func=<value optimized out>,
> > func_args=<value optimized out>, errstring=<value optimized out>,
> > mask=<value optimized out>) at exceptions.c:531
> > #14 0x0000000000554a26 in captured_main (data=<value optimized
> > out>)
> > at ./main.c:958
> > #15 0x0000000000552feb in catch_errors (func=<value optimized out>,
> > func_args=<value optimized out>, errstring=<value optimized out>,
> > mask=<value optimized out>) at exceptions.c:531
> > #16 0x0000000000553be4 in gdb_main (args=0x2cf4488) at ./main.c:973
> > #17 0x0000000000553c1e in gdb_main_entry (argc=<value optimized
> > out>,
> > argv=<value optimized out>) at ./main.c:993
> > #18 0x000000000045e0df in main (argc=<value optimized out>,
> > argv=<value optimized out>) at main.c:603
> > (gdb) p sv
> > $1 = (var_t *) 0x2cf4488
> > (gdb) p sv->next
> > $2 = (struct var_s *) 0x7463657269645f67
> >
> > 0x7463657269645f67 = "g-direct" which is part of the sial pathname
> > (with the
> > underscore converted) so the memory has been reallocated.
> >
> >
> > diff -up crash-6.0.0/extensions/libsial/sial_func.c.orig
> > crash-6.0.0/extensions/libsial/sial_func.c
> > --- crash-6.0.0/extensions/libsial/sial_func.c.orig 2011-11-29
> > 13:09:43.985631958 +1100
> > +++ crash-6.0.0/extensions/libsial/sial_func.c 2011-11-29
> > 13:10:31.930603477 +1100
> > @@ -280,8 +280,14 @@ sial_freefile(fdata *fd)
> > }
> >
> > /* free the associated static and global variables */
> > - if(fd->fsvs) sial_freesvs(fd->fsvs);
> > - if(fd->fgvs) sial_freesvs(fd->fgvs);
> > + if(fd->fsvs) {
> > + sial_freesvs(fd->fsvs);
> > + fd->fsvs = NULL;
> > + }
> > + if(fd->fgvs) {
> > + sial_freesvs(fd->fgvs);
> > + fd->fgvs = NULL;
> > + }
> >
> > /* free all function nodes */
> > // let debugger know ...
> >
> > --
> > Crash-utility mailing list
> > Crash-utility(a)redhat.com
> > https://www.redhat.com/mailman/listinfo/crash-utility
> >
>
>
>
>
13 years, 1 month
Fix use after free in sial variable lists
by Lachlan McIlroy
I encountered a segfault in the sial module when repeatedly loading and unloading
a sial script. The bug is repeatable and it always segfaults on the 4th unload.
The bug only triggers if a pathname is specified for the sial script too (just
doing 'load test.sial, unload test.sial, ...' doesn't trigger the problem).
The cause of the problem is that sial_freefile() will free the memory used by the
static and global variable lists but leaves the stale pointer in the fdata object.
This stale pointer is then assumed to be allocated later by sial_inivars().
I've included a patch below to NULL out the static and global variable list
pointers after they are deallocated and this fixes the problem for me although
I'm not totally sure it's the best way to fix this.
Lachlan
...
crash> extend /home/lmcilroy/bin/sial.so
Core LINUX_RELEASE == '2.6.18-238.12.1.el5'
< Sial interpreter version 3.0 >
Loading sial commands from /usr/share/sial/crash:/home/lmcilroy/.sial .... Done.
/home/lmcilroy/bin/sial.so: shared object loaded
crash> load very_long_directory_name/test.sial
crash> unload very_long_directory_name/test.sial
crash> load very_long_directory_name/test.sial
crash> unload very_long_directory_name/test.sial
crash> load very_long_directory_name/test.sial
crash> unload very_long_directory_name/test.sial
crash> load very_long_directory_name/test.sial
crash> unload very_long_directory_name/test.sial
Segmentation fault
Program received signal SIGSEGV, Segmentation fault.
sial_inivars (sv=0x2cf4488) at sial_var.c:549
549 if(!v->ini && v->dv && v->dv->init) {
Missing separate debuginfos, use: debuginfo-install crash-4.0.9-2.fc12.x86_64
(gdb) bt
#0 sial_inivars (sv=0x2cf4488) at sial_var.c:549
#1 0x00007fffed6b1d24 in sial_addsvs (type=1, sv=<value optimized out>) at sial_var.c:821
#2 0x00007fffed69f07d in sial_execmcfunc (f=<value optimized out>, vp=<value optimized out>) at sial_func.c:900
#3 0x00007fffed6a026b in sial_exefunc (fname=0x7fffffffdc90 "cciss_help", vp=0x0) at sial_func.c:968
#4 0x00007fffed69cac2 in reg_callback (name=0x2c30538 "cciss", load=0) at sial.c:922
#5 0x00007fffed69f72f in sial_docallback (fd=0x2f2ed58) at sial_func.c:264
#6 sial_freefile (fd=0x2f2ed58) at sial_func.c:288
#7 0x00007fffed69fe2a in sial_deletefile (name=0x7885a28 "very_long_directory_name/test.sial") at sial_func.c:314
#8 0x00007fffed6a5ce6 in sial_loadunload (load=0, name=<value optimized out>, silent=0) at sial_api.c:1289
#9 0x00007fffed69c77d in unload_cmd () at sial.c:775
#10 0x000000000045d334 in exec_command () at main.c:740
#11 0x000000000045d57a in main_loop () at main.c:699
#12 0x0000000000554d19 in captured_command_loop (data=<value optimized out>) at ./main.c:228
#13 0x0000000000552feb in catch_errors (func=<value optimized out>, func_args=<value optimized out>, errstring=<value optimized out>, mask=<value optimized out>) at exceptions.c:531
#14 0x0000000000554a26 in captured_main (data=<value optimized out>) at ./main.c:958
#15 0x0000000000552feb in catch_errors (func=<value optimized out>, func_args=<value optimized out>, errstring=<value optimized out>, mask=<value optimized out>) at exceptions.c:531
#16 0x0000000000553be4 in gdb_main (args=0x2cf4488) at ./main.c:973
#17 0x0000000000553c1e in gdb_main_entry (argc=<value optimized out>, argv=<value optimized out>) at ./main.c:993
#18 0x000000000045e0df in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:603
(gdb) p sv
$1 = (var_t *) 0x2cf4488
(gdb) p sv->next
$2 = (struct var_s *) 0x7463657269645f67
0x7463657269645f67 = "g-direct" which is part of the sial pathname (with the
underscore converted) so the memory has been reallocated.
diff -up crash-6.0.0/extensions/libsial/sial_func.c.orig crash-6.0.0/extensions/libsial/sial_func.c
--- crash-6.0.0/extensions/libsial/sial_func.c.orig 2011-11-29 13:09:43.985631958 +1100
+++ crash-6.0.0/extensions/libsial/sial_func.c 2011-11-29 13:10:31.930603477 +1100
@@ -280,8 +280,14 @@ sial_freefile(fdata *fd)
}
/* free the associated static and global variables */
- if(fd->fsvs) sial_freesvs(fd->fsvs);
- if(fd->fgvs) sial_freesvs(fd->fgvs);
+ if(fd->fsvs) {
+ sial_freesvs(fd->fsvs);
+ fd->fsvs = NULL;
+ }
+ if(fd->fgvs) {
+ sial_freesvs(fd->fgvs);
+ fd->fgvs = NULL;
+ }
/* free all function nodes */
// let debugger know ...
13 years, 1 month
[PATCH v1 0/4] Kdump core analysis support for PPC440x
by Suzuki K. Poulose
The following series implements the kdump core analysis support
for PPC32. I have posted the KDUMP kernel support patches for PPC440x
here :
http://lists.ozlabs.org/pipermail/linuxppc-dev/2011-November/093933.html
You need upstream git snapshot of kexec-tools for kdump support on PPC440x.
Patches are based on crash-6.0.0
---
Suzuki K. Poulose (4):
[ppc] Enable stack frame display for KDUMP
[ppc][netdump] Read registers from ELF note
[ppc] Function to print the PPC register set
Support PPC32 Core analysis on PPC64 Host
configure.c | 11 ++++
netdump.c | 77 +++++++++++++++++++++++++++++
ppc.c | 156 +++++++++++++++++++++++++++++++++++++++++++++++------------
3 files changed, 211 insertions(+), 33 deletions(-)
--
Suzuki
13 years, 2 months
crash sometimes doesn't terminate, loops forever looking for a process that doesn't exist
by Adrien Kunysz
Dear crash-utility,
In our vmcore analysis infrastructure we stumbled on a case where crash
doesn't terminate. When examining the state of the process with gdb
it seems to be looping forever over /proc/$pid/stat in an attempt to determine
the PID of a process that doesn't exist any more.
The backtrace:
(gdb) bt
#0 0x00007fb9814fea57 in munmap () from /lib/libc.so.6
#1 0x00007fb9814a30aa in _IO_setb () from /lib/libc.so.6
#2 0x00007fb9814a1d18 in _IO_file_close_it () from /lib/libc.so.6
#3 0x00007fb981495a48 in fclose () from /lib/libc.so.6
#4 0x00000000004fe75b in output_command_to_pids () at cmdline.c:775
#5 0x00000000004fed7c in setup_redirect (origin=1) at cmdline.c:519
#6 0x00000000005012bb in process_command_line () at cmdline.c:149
#7 0x000000000045f575 in main_loop () at main.c:610
#8 0x0000000000541af9 in captured_command_loop (data=0x7fb982282000)
at ./main.c:226
#9 0x000000000053fd8b in catch_errors (func=<value optimized out>,
func_args=<value optimized out>, errstring=<value optimized out>,
mask=<value optimized out>) at exceptions.c:520
#10 0x00000000005415b6 in captured_main (data=<value optimized out>)
at ./main.c:924
#11 0x000000000053fd8b in catch_errors (func=<value optimized out>,
func_args=<value optimized out>, errstring=<value optimized out>,
mask=<value optimized out>) at exceptions.c:520
#12 0x0000000000540994 in gdb_main (args=0x1000) at ./main.c:939
#13 0x00000000005409ce in gdb_main_entry (argc=<value optimized out>,
argv=0x1000) at ./main.c:959
#14 0x000000000046025a in main (argc=<value optimized out>,
argv=<value optimized out>) at main.c:525
The problematic code:
720 /*
721 * Determine the pids of the current popen'd shell and output command.
722 * This is all done using /proc; the ps kludge at the bottom of this
723 * routine is legacy, and should only get executed if /proc doesn't exist.
724 */
725 static int
726 output_command_to_pids(void)
727 {
...
738 int retries;
739
740 retries = 0;
741 pc->pipe_pid = pc->pipe_shell_pid = 0;
742 sprintf(lookfor, "(%s)", pc->pipe_command);
743 stall(1000);
744 retry:
745 if (is_directory("/proc") && (dirp = opendir("/proc"))) {
746 for (dp = readdir(dirp); dp && !pc->pipe_pid;
747 dp = readdir(dirp)) {
748 if (!decimal(dp->d_name, 0))
749 continue;
750 sprintf(buf1, "/proc/%s/stat", dp->d_name);
751 if (file_exists(buf1, NULL) &&
752 (stp = fopen(buf1, "r"))) {
753 if (fgets(buf2, BUFSIZE, stp)) {
754 pid = strtok(buf2, " ");
755 name = strtok(NULL, " ");
756 status = strtok(NULL, " ");
757 p_pid = strtok(NULL, " ");
758 pgrp = strtok(NULL, " ");
759 if (STREQ(name, "(sh)") &&
760 (atoi(p_pid) == getpid()))
761
pc->pipe_shell_pid = atoi(pid);
762 if (STREQ(name, lookfor) &&
763 ((atoi(p_pid) == getpid()) ||
764 (atoi(p_pid) ==
pc->pipe_shell_pid)
765 || (atoi(pgrp) ==
getpid()))) {
766 pc->pipe_pid = atoi(pid);
767 console(
768 "FOUND[%d] (%d->%d->%d) %s %s p_pid:
%s pgrp: %s\n",
769 retries, getpid(),
770 pc->pipe_shell_pid,
771 pc->pipe_pid,
772 name, status,
p_pid, pgrp);
773 }
774 }
775 fclose(stp);
776 }
777 }
778 closedir(dirp);
779 }
780
781 if (!pc->pipe_pid && ((retries++ < 10) || pc->pipe_shell_pid)) {
782 stall(1000);
783 goto retry;
784 }
Looking at how many times it has been looping over /proc:
(gdb) p retries
$19 = 138056108
It found the PID of the shell but not of the command:
(gdb) p pc->pipe_shell_pid
$20 = 9306
(gdb) p pc->pipe_pid
$21 = 0
For completeness the command that was being run was looking like this:
(gdb) p pc->orig_line
$26 = "log | grep -A1 'some string' >> /some/file", '\000' [...]
So it seems something like this happened:
+>popen(grep)
+--> fork(); execve(sh)
+---> fork(); execve(grep)
+----> grep exit()s for some reason
+>crash(8) finds sh in /proc
+---> sh exit
+>crash(8) keeps looking for grep in /proc
I have a second core showing a similar situation if that's of any use but
now we just work around the problem by wrapping crash(8) within timeout(1).
We could try and fix that function to bail out when the shell exits
but it really doesn't look like a nice way to do it to me. So I looked
at the reasons we want the PID of that command and it seems there are
two:
* determining whether the process is still alive
This can be done by checking whether the intervening shell is still alive.
Obtaining only the PID of the shell seems less problematic than trying to
get the PID of the grandchildren. At worst reimplementing popen()
to store the PID of sh is not exactly hard.
* terminating the process forcibly (SIGKILL)
This is done in close_output() which is only called from within restart() when
its argument is not SIGSEGV, SIGPIPE, SIGINT or 0. I cannot find that function
being set as a signal handler for anything else or being called with an
argument different from 0. As far as I can tell this is dead code.
Before I write/test/submit a patch, am I missing something or would it make
sense to get rid of that pipe_pid?
Thanks,
Adrien
13 years, 2 months
[PATCH] s390: Fix heading for s390x floating point registers
by Michael Holzheu
Hello Dave,
The heading for floating point registers that are printed for active
tasks is wrong. It tells that only registers 0,2,4, and 6 are printed.
This is a relict from s390 (31 bit) times. On s390x (64 bit)
we have all floating point registers. Now the correct heading is
printed.
Signed-off-by: Michael Holzheu <holzheu(a)linux.vnet.ibm.com>
---
s390x.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/s390x.c
+++ b/s390x.c
@@ -1196,7 +1196,7 @@ s390x_print_lowcore(char* lc, struct bt_
fprintf(fp," %#018lx %#018lx\n", tmp[2],tmp[3]);
ptr = lc + MEMBER_OFFSET("_lowcore","floating_pt_save_area");
- fprintf(fp," -floating point registers 0,2,4,6:\n");
+ fprintf(fp," -floating point registers:\n");
tmp[0]=ULONG(ptr);
tmp[1]=ULONG(ptr + S390X_WORD_SIZE);
tmp[2]=ULONG(ptr + 2 * S390X_WORD_SIZE);
13 years, 2 months
Fix buildrequires in RPM spec file to build sial extention.
by Wade Mealing
Gday,
I noticed that the documentation on the extention page ( http://people.redhat.com/anderson/extensions.html ) mentions that the sial scripts should be built by default when crash is built. In the current release (version 6.0.0) the sial scripts are not being built due to a missing BuildRequires dependencies in the spec file, in my case specifically flex and bison.
Attached is a modification to the crash.spec file to include the correct BuildRequires.
Thanks,
Wade Mealing
13 years, 2 months
Loading debuginfo symbols
by Lachlan McIlroy
Hi,
I would like to be able to load symbols from a specific debuginfo directory using
'mod -S <dir>'. This works if you unpack the corresponding kernel rpm too but I'd
like to be able to save on the disk space by just using the debuginfo rpm. This
currently doesn't work because crash doesn't explicitly search for modules ending
in .ko.debug. The simple patch below adds this support. Is this a sensible change
to make or is there a better way to do this?
Lachlan
diff -up crash-5.1.9/kernel.c.orig crash-5.1.9/kernel.c
--- crash-5.1.9/kernel.c.orig 2011-11-08 14:15:51.467399576 +1100
+++ crash-5.1.9/kernel.c 2011-11-09 14:26:23.264253341 +1100
@@ -3588,7 +3588,10 @@ module_objfile_search(char *modref, char
{
case KMOD_V2:
sprintf(file, "%s.ko", modref);
- retbuf = search_directory_tree(tree, file, 1);
+ if (!(retbuf = search_directory_tree(tree, file, 1))) {
+ sprintf(file, "%s.ko.debug", modref);
+ retbuf = search_directory_tree(tree, file, 1);
+ }
}
}
return retbuf;
@@ -3605,7 +3608,10 @@ module_objfile_search(char *modref, char
{
case KMOD_V2:
sprintf(file, "%s.ko", modref);
- retbuf = search_directory_tree(dir, file, 0);
+ if (!(retbuf = search_directory_tree(dir, file, 0))) {
+ sprintf(file, "%s.ko.debug", modref);
+ retbuf = search_directory_tree(dir, file, 0);
+ }
}
}
}
13 years, 2 months