[RFC] makedumpfile, crash: LZO compression support
by HATAYAMA Daisuke
Hello,
This is a RFC patch set that adds LZO compression support to
makedumpfile and crash utility. LZO is as good as in size but by far
better in speed than ZLIB, leading to reducing down time during
generation of crash dump and refiltering.
How to build:
1. Get LZO library, which is provided as lzo-devel package on recent
linux distributions, and is also available on author's website:
http://www.oberhumer.com/opensource/lzo/.
2. Apply the patch set to makedumpfile v1.4.0 and crash v6.0.0.
3. Build both using make. But for crash, do the following now:
$ make CFLAGS="-llzo2"
How to use:
I've newly used -l option for lzo compression in this patch. So for
example, do as follows:
$ makedumpfile -l vmcore dumpfile
$ crash vmlinux dumpfile
Request of configure-like feature for crash utility:
I would like configure-like feature on crash utility for users to
select wheather to add LZO feature actually or not in build-time,
that is: ./configure --enable-lzo or ./configure --disable-lzo.
The reason is that support staff often downloads and installs the
latest version of crash utility on machines where lzo library is not
provided.
Looking at the source code, it looks to me that crash does some kind
of configuration processing in a local manner, around configure.c,
and I guess it's difficult to use autoconf tools directly.
Or is there another better way?
Performance Comparison:
Sample Data
Ideally, I must have measured the performance for many enough
vmcores generated from machines that was actually running, but now
I don't have enough sample vmcores, I couldn't do so. So this
comparison doesn't answer question on I/O time improvement. This
is TODO for now.
Instead, I choosed worst and best cases regarding compression
ratio and speed only. Specifically, the former is /dev/urandom and
the latter is /dev/zero.
I get the sample data of 10MB, 100MB and 1GB by doing like this:
$ dd bs=4096 count=$((1024*1024*1024/4096)) if=/dev/urandom of=urandom.1GB
How to measure
Then I performed compression for each block, 4096 bytes, and
measured total compression time and output size. See attached
mycompress.c.
Result
See attached file result.txt.
Discussion
For both kinds of data, lzo's compression was considerably quicker
than zlib's. Compression ratio is about 37% for urandom data, and
about 8.5% for zero data. Actual situation of physical memory
would be in between the two cases, and so I guess average
compression time ratio is between 37% and 8.5%.
Although beyond the topic of this patch set, we can estimate worst
compression time on more data size since compression is performed
block size wise and the compression time increases
linearly. Estimated worst time on 2TB memory is about 15 hours for
lzo and about 40 hours for zlib. In this case, compressed data
size is larger than the original, so they are really not used,
compression time is fully meaningless. I think compression must be
done in parallel, and I'll post such patch later.
Diffstat
* makedumpfile
diskdump_mod.h | 3 +-
makedumpfile.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++------
makedumpfile.h | 12 +++++++
3 files changed, 101 insertions(+), 12 deletions(-)
* crash
defs.h | 1 +
diskdump.c | 20 +++++++++++++++++++-
diskdump.h | 3 ++-
3 files changed, 22 insertions(+), 2 deletions(-)
TODO
* evaluation including I/O time using actual vmcores
Thanks.
HATAYAMA, Daisuke
1 year, 1 month
Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format
by tachibana@mxm.nes.nec.co.jp
Hi Hatayama-san,
On 2011/06/29 12:12:18 +0900, HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com> wrote:
> From: Dave Anderson <anderson(a)redhat.com>
> Subject: Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format
> Date: Tue, 28 Jun 2011 08:57:42 -0400 (EDT)
>
> >
> >
> > ----- Original Message -----
> >> Fujitsu has stand-alone dump mechanism based on firmware level
> >> functionality, which we call SADUMP, in short.
> >>
> >> We've maintained utility tools internally but now we're thinking that
> >> the best is crash utility and makedumpfile supports the sadump format
> >> for the viewpoint of both portability and maintainability.
> >>
> >> We'll be of course responsible for its maintainance in a continuous
> >> manner. The sadump dump format is very similar to diskdump format and
> >> so kdump (compressed) format, so we estimate patch set would be a
> >> relatively small size.
> >>
> >> Could you tell me whether crash utility and makedumpfile can support
> >> the sadump format? If OK, we'll start to make patchset.
I think it's not bad to support sadump by makedumpfile. However I have
several questions.
- Do you want to use makedumpfile to make an existing file that sadump has
dumped small?
- It isn't possible to support the same form as kdump-compressed format
now, is it?
- When the information that makedumpfile reads from a note of /proc/vmcore
(or a header of kdump-compressed format) is added by an extension of
makedumpfile, do you need to modify sadump?
Thanks
tachibana
> >
> > Sure, yes, the crash utility can always support another dumpfile format.
> >
>
> Thanks. It helps a lot.
>
> > It's unclear to me how similar SADUMP is to diskdump/compressed-kdump.
> > Does your internal version patch diskdump.c, or do you maintain your
> > own "sadump.c"? I ask because if your patchset is at all intrusive,
> > I'd prefer it be kept in its own file, primarily for maintainability,
> > but also because SADUMP is essentially a black-box to anybody outside
> > Fujitsu.
>
> What I meant when I used ``similar'' is both literally and
> logically. The format consists of diskdump header-like header, two
> kinds of bitmaps used for the same purpose as those in diskump format,
> and memory data. They can be handled in common with the existing data
> structure, diskdump_data, non-intrusively, so I hope they are placed
> in diskdump.c.
>
> On the other hand, there's a code to be placed at such specific
> area. sadump is triggered depending on kdump's progress and so
> register values to be contained in vmcore varies according to the
> progress: If crash_notes has been initialized when sadump is
> triggered, sadump packs the register values in crash_notes; if not
> yet, packs registers gathered by firmware. This is sadump specific
> processing, so I think putting it in specific sadump.c file is a
> natural and reasonable choise.
>
> Anyway, I have not made any patch set for this. I'll post a patch set
> when I complete.
>
> Again, thanks a lot for the positive answer.
>
> Thanks.
> HATAYAMA, Daisuke
>
>
> _______________________________________________
> kexec mailing list
> kexec(a)lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
1 year, 1 month
[ANNOUNCE] crash version 6.0.2 is available
by Dave Anderson
Download from: http://people.redhat.com/anderson
Changelog:
- Implemention of a new "arguments-input-file" feature, where an input
file containing crash command arguments may be iteratively fed to
a crash command. For each line of arguments in an input file, the
selected crash command will be executed. Taking a simple example,
consider an a file named "input" which contains several task_struct
addresses:
crash> cat input
ffff88022bdc2080
ffff88012ae78ac0
ffff88012c334b00
ffff88012c335540
crash>
Each line in the input file may be passed to a crash command by
entering the redirection character followed by the filename:
crash> ps < input
PID PPID CPU TASK ST %MEM VSZ RSS COMM
5752 624 5 ffff88022bdc2080 IN 0.0 12340 2584 udevd
PID PPID CPU TASK ST %MEM VSZ RSS COMM
5779 4927 1 ffff88012ae78ac0 IN 0.0 97820 3916 sshd
PID PPID CPU TASK ST %MEM VSZ RSS COMM
5956 1 3 ffff88012c334b00 IN 0.0 27712 868 auditd
PID PPID CPU TASK ST %MEM VSZ RSS COMM
5784 5779 2 ffff88012c335540 IN 0.0 108392 1856 bash
crash> struct task_struct.pid,mm < input
pid = 5752
mm = 0xffff88022ab65100
pid = 5779
mm = 0xffff88012c272180
pid = 5956
mm = 0xffff88012b00f7c0
pid = 5784
mm = 0xffff88012ae30800
crash>
The input file may contain data containing anything that can be
inserted into a given crash command line. There is no restriction
on the number of arguments in each line; essentially the data in
each input file line will be inserted into the command line starting
where the "<" character is located, and any intervening whitespace
and the filename will be removed. However, because pipes and output
redirection are set up prior to the insertion of input file data,
pipe or redirection should not be put on input file lines. If that
is attempted, the arguments will just be passed to the command, with
unpredictable results. However, output can be piped or redirected
the same way as can be done with normal commands:
crash> set < input | grep -e COMMAND -e CPU
COMMAND: "udevd"
CPU: 5
COMMAND: "sshd"
CPU: 1
COMMAND: "auditd"
CPU: 3
COMMAND: "bash"
CPU: 2
crash>
Many thanks to Josef Bacik for proposing this feature.
(anderson(a)redhat.com)
- Fix for the "runq" command for kernels configured with
CONFIG_FAIR_GROUP_SCHED. Without the patch, it is possible
that a task may be listed twice in a cpu's CFS runqueue.
(d.hatayama(a)jp.fujitsu.com)
- Fix for the internal parse_line() function to properly handle the
case where the first argument in a line is a string argument that is
encapulated with quotation marks.
(anderson(a)redhat.com)
- Fix for the usage of gzip'd vmlinux file that was compressed with
"gzip -n" or "gzip --no-name" without using "-f" on the command line.
Without the patch, the crash session fails with an error message that
indicates "crash: <string-containing-garbage>: compressed file name
does not start with vmlinux". With the patch, if such a file is used
without "-f", it will be accepted with a message that indicates that
the original filename is unknown, and a suggestion that "-f" be used
to prevent the message.
(anderson(a)redhat.com)
- Added a new "mod -g" option that enhances the symbol display for
kernel modules. After loading a module's debuginfo data, the module
object's section addresses will be shown as pseudo-symbols, like this
simple example using the crash memory driver module:
crash> mod -g -s crash
... [cut] ...
crash> sym -m crash
ffffffff88edb000 MODULE START: crash
ffffffff88edb000 [.text]: section start
ffffffff88edb000 (t) crash_llseek
ffffffff88edb01d (t) crash_read
ffffffff88edb18c [.text]: section end
ffffffff88edb18c [.exit.text]: section start
ffffffff88edb18c (T) cleanup_module
ffffffff88edb18c (t) crash_cleanup_module
ffffffff88edb198 [.exit.text]: section end
ffffffff88edb2a0 [__versions]: section start
ffffffff88edb2a0 (r) ____versions
ffffffff88edb2a0 (r) __versions
ffffffff88edb4a0 [__versions]: section end
ffffffff88edb920 [.data]: section start
ffffffff88edb920 (d) crash_dev
ffffffff88edb960 (d) crash_fops
ffffffff88edba48 [.data]: section end
ffffffff88edba80 [.gnu.linkonce.this_module]: section start
ffffffff88edba80 (D) __this_module
ffffffff88ee3c80 [.gnu.linkonce.this_module]: section end
ffffffff88ee3cc1 MODULE END: crash
crash>
The option may also be used in conjunction with "mod -S".
(nakayama.ts(a)ncos.nec.co.jp)
- Fix for the "gdb" command to prevent the option handling of command
lines. Without the patch, a gdb command string that contained a
"-<character>" pair preceded by whitespace, would fail with the
error message "gdb: gdb: invalid option -- <character>".
(anderson(a)redhat.com)
- Fix for the panic-task determination if a dumpfile is taken on a
system that actually has a cpu count that is equal to its per-arch
NR_CPUS value. Without the patch, the task running on the cpu
whose number is equal to NR_CPUS-1 would be selected.
(d.hatayama(a)jp.fujitsu.com)
- Fix for the x86_64 "bt" command to handle a recursive entry into
the NMI exception stack. While this should normally never happen,
it is possible if, for example, a kprope is entered into a function
that gets executed during NMI handling, and a second NMI is received
after the initial one, corrupting the original exception frame at
the top of the NMI stack. Without the patch, the NMI stack backtrace
and exception frame would be displayed repeatedly; with the patch,
the backtrace and exception frame are followed by the warning message
"NMI exception stack recursion: prior stack location overwritten".
(anderson(a)redhat.com)
- Support dumpfiles that are created by the PPC64 Firmware Assisted
Dump facility, also known as "fadump" or "FAD". Without the patch,
the panic task cannot be determined from a fadump vmcore which was
subsequently compressed with makedumpfile, and therefore a proper
backtrace of the panic task cannot be generated.
(mahesh(a)linux.vnet.ibm.com)
- Preparation for new s390x kernels that will increase MAX_PHYSMEM_BITS
from 42 to 46.
(mahesh(a)linux.vnet.ibm.com, holzheu(a)linux.vnet.ibm.com,
anderson(a)redhat.com)
12 years, 12 months
[PATCH v2 0/3] Kdump core analysis support for PPC32
by Suzuki K. Poulose
The following series implements the kdump core analysis support
for PPC32. I have posted the KDUMP kernel support patches for PPC440x
here :
http://lists.ozlabs.org/pipermail/linuxppc-dev/2011-December/094994.html
You need upstream git snapshot of kexec-tools for kdump support on PPC440x.
These patches are based on crash-6.0.2
---
Suzuki K. Poulose (3):
[ppc] Enable stack trace display for KDUMP cores
[ppc][netdump] Read register set from ELF Note
[ppc] Support PPC32 Core analysis on PPC64 host
configure.c | 14 ++++++++
netdump.c | 77 +++++++++++++++++++++++++++++++++++++++++++++
ppc.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
3 files changed, 178 insertions(+), 14 deletions(-)
--
Suzuki Poulose
13 years
[PATCH] Fix wrong memset size parameter in gdb-7.3.1/bfd/bfdio.c
by Ismail Dönmez
Hi;
clang catched this. Please apply.
P.S: Please CC me on your replies, I am not subscribed to the list.
Regards.
--
İsmail Dönmez - openSUSE Booster
SUSE LINUX Products GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)
13 years
[PATCH] Crash-Utility: s390x: Auto-detect the correct MAX_PHYSMEM_BITS used in vmcore being analyzed.
by Mahesh J Salgaonkar
From: Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
So far s390x kernel was using 42 bits for MAX_PHYSMEM_BITS that use to
support maximum of 4TB of memory. In order to support bigger systems,
the newer s390x kernel will now use 46 bits for MAX_PHYSMEM_BITS to support
maximum of 64TB of memory.
This patch enhances crash utility to auto-detect the correct value to use
for MAX_PHYSMEM_BITS by examining the mem_section array size from the vmcore
being analyzed.
Signed-off-by: Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
---
defs.h | 3 ++-
s390x.c | 26 +++++++++++++++++++++++++-
2 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/defs.h b/defs.h
index 381e8c2..eb992d1 100755
--- a/defs.h
+++ b/defs.h
@@ -2954,7 +2954,8 @@ struct efi_memory_desc_t {
#define TIF_SIGPENDING (2)
#define _SECTION_SIZE_BITS 28
-#define _MAX_PHYSMEM_BITS 42
+#define _MAX_PHYSMEM_BITS_OLD 42
+#define _MAX_PHYSMEM_BITS_NEW 46
#endif /* S390X */
diff --git a/s390x.c b/s390x.c
index 22e29a9..53bf272 100755
--- a/s390x.c
+++ b/s390x.c
@@ -282,6 +282,29 @@ static void s390x_process_elf_notes(void *note_ptr, unsigned long size_note)
}
}
+static int
+set_s390x_max_physmem_bits(void)
+{
+ int array_len = get_array_length("mem_section", NULL, 0);
+ /*
+ * The older s390x kernels uses _MAX_PHYSMEM_BITS as 42 and the
+ * newer kernels uses 46 bits.
+ */
+
+ STRUCT_SIZE_INIT(mem_section, "mem_section");
+ machdep->max_physmem_bits = _MAX_PHYSMEM_BITS_OLD;
+ if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME()))
+ || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT())))
+ return TRUE;
+
+ machdep->max_physmem_bits = _MAX_PHYSMEM_BITS_NEW;
+ if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME()))
+ || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT())))
+ return TRUE;
+
+ return FALSE;
+}
+
/*
* Do all necessary machine-specific setup here. This is called several
* times during initialization.
@@ -350,7 +373,8 @@ s390x_init(int when)
if (!machdep->hz)
machdep->hz = HZ;
machdep->section_size_bits = _SECTION_SIZE_BITS;
- machdep->max_physmem_bits = _MAX_PHYSMEM_BITS;
+ if (!set_s390x_max_physmem_bits())
+ error(FATAL, "Can't detect max_physmem_bits.");
s390x_offsets_init();
break;
13 years
submission about PaX linux support
by Toshikazu Nakayama
Hello Dave,
I would like to send proposed patch set which can support
PaX linux introduced at http://grsecurity.net/ over crash utility.
In previous thread, you said that it is important for current implementation
not to be increased maintenance burden.
Then, I tolerably think to consider about them in my merge work with
small modifications to current code as possible.
But the reality is, there are several undesirable impacts which
I made in this work.
So could you please check and make a conclusion from this patch set?
(Detail about modification are written in each patch file.)
Thanks,
Toshi
--------
Toshikazu Nakayama (9):
add PaX linux staff from linux-2.6.27.
setup PaX module structure members and pseudos
manufacture module's dumping symbol data
use IN_MODULE macros for ec->st_value
define new namespace command to sort by per module order
vefiry PaX module RW area, also fix leak
catch apt module symbol
sharpen vague module data with found out section
RW for lowest or highest module virtual address
defs.h | 42 +++++++++++-
kernel.c | 58 ++++++++++++++++-
symbols.c | 221 +++++++++++++++++++++++++++++++++++++++++++++++++++++--------
3 files changed, 291 insertions(+), 30 deletions(-)
13 years
s390x MAX_PHYSMEM_BITS
by Dave Anderson
Hi Mahesh and Michael,
Re: this post from the kexec mailing list:
> Subject: [PATCH] makedumpfile: s390x: Auto-detect the correct
> MAX_PHYSMEM_BITS used in vmcore being analyzed.
> From: Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
>
> So far s390x kernel was using 42 bits for MAX_PHYSMEM_BITS that use to
> support maximum of 4TB of memory. In order to support bigger systems,
> the newer s390x kernel will now use 46 bits for MAX_PHYSMEM_BITS to support
> maximum of 64TB of memory.
>
> This patch auto-detects the correct value to use for MAX_PHYSMEM_BITS by
> examining the mem_section array size from the vmcore being analyzed.
>
> Signed-off-by: Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
> ---
Can you guys post an s390x crash utility patch to properly select the
correct MAX_PHYSMEM_BITS value to store in machdep->max_physmem_bits?
Thanks,
Dave
13 years
[PATCH] Fix junk values when run crash on a .gz file
by Aruna Balakrishnaiah
Signed-off-by: Aruna Balakrishnaiah <aruna(a)linux.vnet.ibm.com>
---
symbols.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/symbols.c b/symbols.c
index 0cd3a01..acd8ad5 100755
--- a/symbols.c
+++ b/symbols.c
@@ -2995,10 +2995,10 @@ is_compressed_kernel(char *file, char **tmp)
type = 0;
if ((header[0] == 0x1f) && (header[1] == 0x8b) && (header[2] == 8)) {
- if (!STRNEQ((char *)&header[10], "vmlinux") &&
+ if (!STRNEQ(basename(file), "vmlinux") &&
!(st->flags & FORCE_DEBUGINFO)) {
error(INFO, "%s: compressed file name does not "
- "start with \"vmlinux\"\n", &header[10]);
+ "start with \"vmlinux\"\n", file);
error(CONT,
"Use \"-f %s\" on command line to override.\n\n",
file);
13 years
[PATCH] fadump: Add support for compressed firmware-assisted dump.
by Mahesh J Salgaonkar
Hi Dave,
The firmware assisted dump (fadump) patches are still under discussion. When
fadump patches gets into upstream kernel, this change will need to go into
crash. For now, I am posting this patch for a review.
Reference: http://lists.ozlabs.org/pipermail/linuxppc-dev/2011-December/094859.html
Thanks,
-Mahesh.
From: Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
With the firmware-assisted dump (fadump) support added for Powerpc the
crash tool also needs to be modified to be able to read compressed/filtered
firmware-assisted dump (diskdump). The crash tool is able to read and
identify the panic task for ELF formatted dump generated by firmware
assisted dump mechanism. But when fadump is filtered/compressed using
makdumpfile the crash tool fails to identify panic task and back trace
associated to panic task. This patch enables crash tool to identify the
panic task for dump generated by firmware-assisted dump on Power platform.
Signed-off-by: Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
---
ppc64.c | 1 +
task.c | 21 ++++++++++++++++++++-
2 files changed, 21 insertions(+), 1 deletions(-)
diff --git a/ppc64.c b/ppc64.c
index eee7359..196c417 100755
--- a/ppc64.c
+++ b/ppc64.c
@@ -1907,6 +1907,7 @@ retry:
STREQ(sym, ".netpoll_start_netdump") ||
STREQ(sym, ".start_disk_dump") ||
STREQ(sym, ".crash_kexec") ||
+ STREQ(sym, ".crash_fadump") ||
STREQ(sym, ".disk_dump")) {
*nip = *up;
*ksp = bt->stackbase +
diff --git a/task.c b/task.c
index 1ce48ea..ec1984e 100755
--- a/task.c
+++ b/task.c
@@ -6443,6 +6443,13 @@ clear_active_set(void)
crash_kexec_task); \
return crash_kexec_task; \
} \
+ if (crash_fadump_task) { \
+ if (CRASHDEBUG(1)) \
+ error(INFO, \
+ "get_active_set_panic_task: %lx (crash_fadump)\n", \
+ crash_fadump_task); \
+ return crash_fadump_task; \
+ } \
if ((panic_task > (NO_TASK+1)) && !die_task) { \
if (CRASHDEBUG(1)) \
fprintf(fp, \
@@ -6508,6 +6515,10 @@ clear_active_set(void)
strstr(buf, " .crash_kexec+")) { \
crash_kexec_task = task; \
} \
+ if (strstr(buf, " crash_fadump+") || \
+ strstr(buf, " .crash_fadump+")) { \
+ crash_fadump_task = task; \
+ } \
if (strstr(buf, " machine_kexec+") || \
strstr(buf, " .machine_kexec+")) { \
crash_kexec_task = task; \
@@ -6531,12 +6542,13 @@ get_active_set_panic_task()
int i, j, found;
ulong task;
char buf[BUFSIZE];
- ulong panic_task, die_task, crash_kexec_task;
+ ulong panic_task, die_task, crash_kexec_task, crash_fadump_task;
ulong xen_panic_task;
ulong xen_sysrq_task;
panic_task = die_task = crash_kexec_task = xen_panic_task = NO_TASK;
xen_sysrq_task = NO_TASK;
+ crash_fadump_task = NO_TASK;
for (i = 0; i < NR_CPUS; i++) {
if (!(task = tt->active_set[i]) || !task_exists(task))
@@ -6616,6 +6628,13 @@ get_active_set_panic_task()
crash_kexec_task);
return crash_kexec_task;
}
+ if (crash_fadump_task) {
+ if (CRASHDEBUG(1))
+ error(INFO,
+ "get_active_set_panic_task: %lx (crash_fadump)\n",
+ crash_fadump_task);
+ return crash_fadump_task;
+ }
if (xen_sysrq_task) {
if (CRASHDEBUG(1))
13 years