Re: [Crash-utility] crash-5.0: Segmentation fault with x86_64_get_active_set
by Dave Anderson
----- "ville mattila" <ville.mattila(a)stonesoft.com> wrote:
> crash-utility-bounces(a)redhat.com wrote on 14.01.2010 16:08:41:
>
> > From:
> >
> > Dave Anderson <anderson(a)redhat.com>
> >
> > To:
> >
> > ----- "ville mattila" <ville.mattila(a)stonesoft.com> wrote:
> >
> > > Hello,
> > >
> > > I get segementation fault from our 64-bit kernel crash
> > > This crash is caused by "echo c > /proc/sys-trigger".
> > > The reason seems to be that the x86_64_cpu_pda_init is
> > > not called at least gdb do not break there.
> > >
> > > Here is a little patch that fixes it. Everyting seems to
> > > work correctly. I'll provide more info if needed.
> > >
> > >
> > > --- crash-5.0.0/x86_64.c 2010-01-06 21:38:27.000000000 +0200
> > > +++ crash-5.0.0-64bit/x86_64.c 2010-01-14 08:24:13.679603706 +0200
> > > @@ -6325,6 +6325,12 @@ x86_64_get_active_set(void)
> > >
> > > ms = machdep->machspec;
> > >
> > > + if (!ms->current) {
> > > + error(INFO, "%s: Cannot get active set, ms->current is NULL\n",
> > > + __func__);
> > > + return;
> > > + }
> > > +
> >
> > That patch just masks the real problem.
> >
> > What kernel version is it?
> >
> > If it's 2.6.30 or later, then x86_64_per_cpu_init() should
> > be called, otherwise x86_64_cpu_pda_init() is called. And
> > whichever one that gets called should allocate the array.
> >
> > 2.6.30 or later kernels should show:
> >
> > crash> struct x8664_pda
> > struct: invalid data structure reference: x8664_pda
> > crash>
> >
> > and they will use x86_64_per_cpu_init().
> >
> > Kernels prior to 2.6.30 should show:
> >
> > crash> struct x8664_pda
> > struct x8664_pda {
> > struct task_struct *pcurrent;
> > long unsigned int data_offset;
> > long unsigned int kernelstack;
> > long unsigned int oldrsp;
> > long unsigned int debugstack;
> > int irqcount;
> > int cpunumber;
> > char *irqstackptr;
> > int nodenumber;
> > unsigned int __softirq_pending;
> > unsigned int __nmi_count;
> > int mmu_state;
> > struct mm_struct *active_mm;
> > unsigned int apic_timer_irqs;
> > }
> > SIZE: 128
> > crash>
> >
> > and they will use x86_64_cpu_pda_init().
> >
> > If you're having trouble with gdb, can you put some fprintf(fp, ...)
> > calls in the relevant function and find out why it isn't doing
> > the calloc() call?
>
>
> Yes I thought so. This is a customized 2.6.31.7 kernel.org
> kernel. This is a UP configuration e.g. CONFIG_SMP is n.
> I think the problem is that the PER_CPU_OFF is not set.
Ahah -- that would do it. UP x86_64 kernels are so rare
that apparently nobody ever noticed, and I don't have a UP
x86_64 vmcore to even test with. (RHEL5 doesn't even ship
a UP x86_64 kernel).
Anyway, that change went into 4.0-8.11. And as far as I
can tell, x86_64_per_cpu_init() should still populate the
single "ms->current[0]" task from the "per_cpu__current_task"
symbol from UP kernels -- which doesn't need the PER_CPU_OFF
translation mechanism. In other words, I think you should
be able to do this on your UP kernel:
crash> px per_cpu__current_task
and it should show the panic task address that comes up as the
current task upon invocation. Is that right?
> Btw, the "struct" command caused another segementation fault.
> Here is gdb bt:
>
> (gdb) bt
> #0 0x00007f74b3524a92 in strcmp () from /lib/libc.so.6
> #1 0x0000000000534284 in lookup_partial_symtab (name=0x120e3c0
> "x8664_pda")
> at symtab.c:276
> #2 0x00000000005344ed in lookup_symtab (name=0x120e3c0 "x8664_pda")
> at symtab.c:228
> #3 0x000000000060019d in c_lex () at c-exp.y:2149
> #4 0x00000000006008f5 in c_parse_internal () at c-exp.c.tmp:1468
> #5 0x00000000006022dd in c_parse () at c-exp.y:2225
> #6 0x000000000055f614 in parse_exp_in_context
> (stringptr=0x7fffbc2f2260,
> block=<value optimized out>, comma=<value optimized out>,
> void_context_p=0, out_subexp=0x0) at parse.c:1094
> #7 0x000000000055f924 in parse_expression (string=0x7fffbc2f2950
> "x8664_pda")
> at parse.c:1144
> #8 0x000000000053291b in gdb_command_funnel (req=0xca2c00) at
> symtab.c:4992
> #9 0x00000000004c1740 in gdb_interface (req=0xca2c00) at
> gdb_interface.c:407
> #10 0x00000000004e9dca in datatype_info (name=0xb618a7 "x8664_pda",
> member=0x0, dm=0x7fffbc2f3620) at symbols.c:4146
> #11 0x00000000004eb1ee in arg_to_datatype (s=0xb618a7 "x8664_pda",
> dm=0x7fffbc2f3620, flags=524290) at symbols.c:4867
> #12 0x00000000004efa1b in cmd_datatype_common (flags=2048) at
> symbols.c:4664
> #13 0x000000000045efd9 in exec_command () at main.c:644
> #14 0x000000000045f1fa in main_loop () at main.c:603
> #15 0x00000000005452a9 in captured_command_loop (data=0x120e3c0)
> at ./main.c:226
> #16 0x00000000005434e4 in catch_errors (func=0x5452a0
> <captured_command_loop>,
> func_args=0x0, errstring=0x7f9d7c "", mask=<value optimized out>)
> at exceptions.c:520
> #17 0x0000000000544d36 in captured_main (data=<value optimized out>)
> at ./main.c:924
> #18 0x00000000005434e4 in catch_errors (func=0x544340 <captured_main>,
> func_args=0x7fffbc2f38b0, errstring=0x7f9d7c "",
> mask=<value optimized out>) at exceptions.c:520
> #19 0x000000000054412f in gdb_main_entry (argc=<value optimized out>,
> argv=<value optimized out>) at ./main.c:939
> #20 0x000000000045fece in main (argc=3, argv=0x7fffbc2f3a08) at
> main.c:517
> (gdb) frame 1
> #1 0x0000000000534284 in lookup_partial_symtab (name=0x120e3c0
> "x8664_pda")
> at symtab.c:276
> 276 if (FILENAME_CMP (name, pst->filename) == 0)
> (gdb) p name
> $4 = 0x120e3c0 "x8664_pda"
> (gdb) p pst
> $5 = (struct partial_symtab *) 0x14d6040
> (gdb) p pst->filename
> $6 = 0x0
> (gdb) p *pst
> $7 = {next = 0x0, filename = 0x0, fullname = 0x0, dirname = 0x0,
> objfile = 0x0, section_offsets = 0x0, textlow = 0, texthigh = 0,
> dependencies = 0x0, number_of_dependencies = 0, globals_offset = 0,
> n_global_syms = 0, statics_offset = 0, n_static_syms = 0, symtab =
> 0x0,
> read_symtab = 0, read_symtab_private = 0x0, readin = 0 '\0'}
> (gdb)
>
>
> I fixed it with the patch below:
> -- crash-5.0.0/gdb-7.0/gdb/symtab.c 2010-01-15 10:41:00.919973440
> +0200
> +++ crash-5.0.0-64bit/gdb-7.0/gdb/symtab.c 2010-01-15
> 10:19:21.436128740 +0200
> @@ -256,7 +256,7 @@ got_symtab:
> struct partial_symtab *
> lookup_partial_symtab (const char *name)
> {
> - struct partial_symtab *pst;
> + struct partial_symtab *pst = NULL;
> struct objfile *objfile;
> char *full_path = NULL;
> char *real_path = NULL;
> @@ -273,7 +273,7 @@ lookup_partial_symtab (const char *name)
>
> ALL_PSYMTABS (objfile, pst)
> {
> - if (FILENAME_CMP (name, pst->filename) == 0)
> + if (pst->filename && FILENAME_CMP (name, pst->filename) == 0)
> {
> return (pst);
> }
> @@ -311,7 +311,7 @@ lookup_partial_symtab (const char *name)
> if (lbasename (name) == name)
> ALL_PSYMTABS (objfile, pst)
> {
> - if (FILENAME_CMP (lbasename (pst->filename), name) == 0)
> + if (pst->filename && FILENAME_CMP (lbasename (pst->filename), name)
> == 0)
> return (pst);
> }
Weird -- so you're apparently able to do that when running any
"struct <non-existent>" command from the crash command line?
But I can't reproduce that -- this is what should happen:
crash> struct this_is_junk
struct: invalid data structure reference: this_is_junk
crash>
and I don't understand what could be different with your
custom kernel?
> >
> > Either that, or if you can make the vmlinux/vmcore pair available
> > for me to download, I can look at it.
>
> I'll arrange this if the above information is not enough.
Yes please -- can you put the vmlinux/vmcore pair somewhere
where I can download it? You can send me the particulars
off-line to anderson(a)redhat.com.
Thanks,
Dave
14 years, 10 months
Re: [Crash-utility] crash-5.0: zero-size memory-allocation
by Dave Anderson
----- "ville mattila" <ville.mattila(a)stonesoft.com> wrote:
> That would be usefull, just warn that some major corruption seems to have
> happen.It is always good to get atleast some crash info out. For example
> dmesg and bt. I'll gladly test patches, if needed.
Patch attached...
> Also one question. Is there some hidden option that will show all the
> hidden crash command line options, e.g. --no_kmem_cache and alike?
No, for the most part they are there for debugging crash itself,
or were put in place as a result of specific odd-ball vmcores,
or short-time kernels that were missing a key ingredient, etc.
So, for example, with the attached patch, --no_kmem_cache should
not be needed, even with your horrifically corrupted vmcore...
Dave
14 years, 10 months
Re: [Crash-utility] crash-5.0: Segmentation fault with x86_64_get_active_set
by Dave Anderson
----- "ville mattila" <ville.mattila(a)stonesoft.com> wrote:
> Hello,
>
> I get segementation fault from our 64-bit kernel crash
> This crash is caused by "echo c > /proc/sys-trigger".
> The reason seems to be that the x86_64_cpu_pda_init is
> not called at least gdb do not break there.
>
> Here is a little patch that fixes it. Everyting seems to
> work correctly. I'll provide more info if needed.
>
>
> --- crash-5.0.0/x86_64.c 2010-01-06 21:38:27.000000000 +0200
> +++ crash-5.0.0-64bit/x86_64.c 2010-01-14 08:24:13.679603706 +0200
> @@ -6325,6 +6325,12 @@ x86_64_get_active_set(void)
>
> ms = machdep->machspec;
>
> + if (!ms->current) {
> + error(INFO, "%s: Cannot get active set, ms->current is NULL\n",
> + __func__);
> + return;
> + }
> +
That patch just masks the real problem.
What kernel version is it?
If it's 2.6.30 or later, then x86_64_per_cpu_init() should
be called, otherwise x86_64_cpu_pda_init() is called. And
whichever one that gets called should allocate the array.
2.6.30 or later kernels should show:
crash> struct x8664_pda
struct: invalid data structure reference: x8664_pda
crash>
and they will use x86_64_per_cpu_init().
Kernels prior to 2.6.30 should show:
crash> struct x8664_pda
struct x8664_pda {
struct task_struct *pcurrent;
long unsigned int data_offset;
long unsigned int kernelstack;
long unsigned int oldrsp;
long unsigned int debugstack;
int irqcount;
int cpunumber;
char *irqstackptr;
int nodenumber;
unsigned int __softirq_pending;
unsigned int __nmi_count;
int mmu_state;
struct mm_struct *active_mm;
unsigned int apic_timer_irqs;
}
SIZE: 128
crash>
and they will use x86_64_cpu_pda_init().
If you're having trouble with gdb, can you put some fprintf(fp, ...)
calls in the relevant function and find out why it isn't doing
the calloc() call?
Either that, or if you can make the vmlinux/vmcore pair available
for me to download, I can look at it.
Dave
14 years, 10 months
crash-5.0: Segmentation fault with x86_64_get_active_set
by ville.mattila@stonesoft.com
Hello,
I get segementation fault from our 64-bit kernel crash
This crash is caused by "echo c > /proc/sys-trigger".
The reason seems to be that the x86_64_cpu_pda_init is
not called at least gdb do not break there.
Here is a little patch that fixes it. Everyting seems to
work correctly. I'll provide more info if needed.
--- crash-5.0.0/x86_64.c 2010-01-06 21:38:27.000000000 +0200
+++ crash-5.0.0-64bit/x86_64.c 2010-01-14 08:24:13.679603706 +0200
@@ -6325,6 +6325,12 @@ x86_64_get_active_set(void)
ms = machdep->machspec;
+ if (!ms->current) {
+ error(INFO, "%s: Cannot get active set, ms->current is NULL\n",
+ __func__);
+ return;
+ }
+
14 years, 10 months
Re: [Crash-utility] crash-5.0: zero-size memory-allocation
by Dave Anderson
----- "ville mattila" <ville.mattila(a)stonesoft.com> wrote:
> > From:
> >
> > Dave Anderson <anderson(a)redhat.com>
> >
> ...
> > But your kernel shows cache_cache.buffer_size set to zero -- and the
> > ASSIGN_SIZE(kmem_cache_s) above dutifully downsized the data structure
> > size from 204 to zero. Later on, that size was used to allocate a
> > kmem_cache buffer, which failed when a GETBUF() was called with a zero-size.
> >
> > I guess a check could be made above for a zero cache_cache.buffer_size,
> > but why would that ever be?
> >
> > Try this:
> >
> > # crash --no_kmem_cache vmlinux vmcore
> >
> > which will allow you to get past the kmem_cache initialization.
> >
> > Then enter:
> >
> > crash> p cache_cache
> >
> > Does the "buffer_size" member really show zero?
>
> Yes it seems so!
> initialize_task_state: using old defaults
> <readmem: 8067a300, KVADDR, "fill_task_struct", 868, (ROE), 86e3f78>
> addr: 8067a300 paddr: 67a300 cnt: 868
> STATE: TASK_RUNNING (PANIC)
>
> crash> p cache_cache
> cache_cache = GETBUF(128 -> 0)
> <readmem: 8067f1c0, KVADDR, "gdb_readmem_callback", 204, (ROE), 8ac00d8>
> addr: 8067f1c0 paddr: 67f1c0 cnt: 204
> $3 = {
> array = {0x0, 0x8067f1c4, 0x8067f1c4, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0xf7813e00, 0xf7849400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
> batchcount = 0,
> limit = 0,
> shared = 0,
> buffer_size = 0,
> reciprocal_buffer_size = 0,
> flags = 0,
> num = 0,
> gfporder = 0,
> gfpflags = 60,
> colour = 120,
> colour_off = 8,
> slabp_cache = 0x100,
> slab_size = 16777216,
> dflags = 0,
> ctor = 0xf,
> name = 0x0,
> next = {
> next = 0x0,
> prev = 0x2
> },
> nodelists = {0x40}
> }
> FREEBUF(0)
That's some serious corruption!
> >
> > BTW, you can work around the problem by commenting out the call
> > to kmem_cache_downsize() in vm_init().
>
> This workaround works ok.
But even then, if you comment out the call to kmem_cache_downsize(),
the kmem_cache_init() function could not have done anything useful
because the "cache_cache.next.next" pointer is corrupted with a NULL,
which points to the first of the chain of kmem_cache slab cache headers.
I'm surprised it managed to continue without running into another
roadblock -- did it display the "crash: unable to initialize kmem
slab cache subsystem" error message?
> > (And if you're using makedumpfile with excluded pages, hope that
> > the problem I described above doesn't occur...)
> >
> We are not excluding files so this is not a big issue. Also
> the --no_kmem_cache lets me open dump and let me do quite many things
> already.
Like I mentioned before, I could put a check in kmem_cache_downsize()
to check for a zero buffer_size, but the odds of that happening are
absurdly small. I suppose I could check whether the value is less
than the kmem_cache.nodelists structure offset.
Dave
14 years, 10 months
Re: [Crash-utility] crash-5.0: zero-size memory-allocation
by Dave Anderson
----- "ville mattila" <ville.mattila(a)stonesoft.com> wrote:
> Hello,
>
> We have a custom kernel based on 2.6.27.39. This kernel
> has 2/2 memory split. Now we have one crash dump that can be
> successfully be opened with crash 4.0-8.8 but not with crash 5.0.
> This crashdump happens because double free of memory block, so there
> might be some memory corruption in cache data area.
>
> Unfortunately I cannot pinpoint the exact version where this
> starts to happen because I could not find older crash releases.
>
> Here is some debug info.
>
> The tail of crash -d 10 output
> ...
> NOTE: page_hash_table does not exist in this kernel
> please wait... (gathering kmem slab cache data)<readmem: 8075801c,
> KVADDR,
> "cache_chain", 4, (FOE), ffb944f8>
> addr: 8075801c paddr: 75801c cnt: 4
> GETBUF(128 -> 0)
> FREEBUF(0)
> GETBUF(204 -> 0)
> <readmem: 8067f1c0, KVADDR, "kmem_cache buffer", 204, (FOE), 8520f00>
> addr: 8067f1c0 paddr: 67f1c0 cnt: 204
> GETBUF(128 -> 1)
> FREEBUF(1)
> GETBUF(128 -> 1)
> FREEBUF(1)
>
> kmem_cache_downsize: SIZE(kmem_cache_s): 204 cache_cache.buffer_size: 0
> kmem_cache_downsize: nr_node_ids: 1
> FREEBUF(0)
>
> crash: zero-size memory allocation! (called from 80b7b7b)
> >
> addr2line -e crash 80b7b7b
> /workarea/build/packages/crash/crash-5.0.0-32bit/memory.c:7439
>
> I'm happy to test patches.
Nice bug report!
Here's what's happening:
It's related to this patch that went into 4.1.0:
- Fix for a potential failure to initialize the kmem slab cache
subsystem on 2.6.22 and later CONFIG_SLAB kernels if the dumpfile
has pages excluded by the makedumpfile facility. Without the patch,
the following error message would be displayed during initialization:
"crash: page excluded: kernel virtual address: <address> type:
kmem_cache_s buffer", followed by "crash: unable to initialize kmem
slab cache subsystem".
(anderson(a)redhat.com)
The patch was put in place due to this definition of the kmem_cache data structure:
struct kmem_cache {
/* 1) per-cpu data, touched during every alloc/free */
struct array_cache *array[NR_CPUS];
/* 2) Cache tunables. Protected by cache_chain_mutex */
unsigned int batchcount;
unsigned int limit;
... [ snip ] ...
* We put nodelists[] at the end of kmem_cache, because we want to size
* this array to nr_node_ids slots instead of MAX_NUMNODES
* (see kmem_cache_init())
* We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
* is statically defined, so we reserve the max number of nodes.
*/
struct kmem_list3 *nodelists[MAX_NUMNODES];
/*
* Do not add fields after nodelists[]
*/
};
where for all kernel instances of the kmem_cache data structure *except* for
the head "cache_cache" kmem_cache structure, every other kmem_cache structure
in the kernel has its nodelists[] array downsized to whatever "nr_node_ids"
is initialized to. The actual size of all of the downsized kmem_cache data
structures can be found in the head "cache_cache.buffer_size" field.
But when the crash utility queries gdb for the size of a kmem_cache
structure it gets the "full" size as declared in the vmlinux debuginfo
data. And so whenever a kmem_cache structure was read by crash, it
was using the "full" size instead of the downsized size. Doing that
type of over-sized read could potentially extend into the next page,
and there was a reported case where doing that happened to extend into
a page that was excluded by makedumpfile. Hence the kmem_cache_downsize()
function added to memory.c.
Anyway, given that your debug output shows:
kmem_cache_downsize: SIZE(kmem_cache_s): 204 cache_cache.buffer_size: 0
kmem_cache_downsize: nr_node_ids: 1
In vm_init() there was an initial STRUCT_SIZE_INIT(kmem_cache_s, ...)
that set the size to 204 bytes. But then kmem_cache_downsize() was
called to downsize to whatever cache_cache.buffer_size contains:
...
buffer_size = UINT(cache_buf +
MEMBER_OFFSET("kmem_cache", "buffer_size"));
if (buffer_size < SIZE(kmem_cache_s)) {
ASSIGN_SIZE(kmem_cache_s) = buffer_size;
if (kernel_symbol_exists("nr_node_ids")) {
get_symbol_data("nr_node_ids", sizeof(int),
&nr_node_ids);
vt->kmem_cache_len_nodes = nr_node_ids;
} else
vt->kmem_cache_len_nodes = 1;
if (CRASHDEBUG(1)) {
fprintf(fp,
"\nkmem_cache_downsize: SIZE(kmem_cache_s): %ld "
"cache_cache.buffer_size: %d\n",
STRUCT_SIZE("kmem_cache"), buffer_size);
fprintf(fp,
"kmem_cache_downsize: nr_node_ids: %ld\n",
vt->kmem_cache_len_nodes);
}
}
But your kernel shows cache_cache.buffer_size set to zero -- and the
ASSIGN_SIZE(kmem_cache_s) above dutifully downsized the data structure
size from 204 to zero. Later on, that size was used to allocate a
kmem_cache buffer, which failed when a GETBUF() was called with a zero-size.
I guess a check could be made above for a zero cache_cache.buffer_size,
but why would that ever be?
Try this:
# crash --no_kmem_cache vmlinux vmcore
which will allow you to get past the kmem_cache initialization.
Then enter:
crash> p cache_cache
Does the "buffer_size" member really show zero?
BTW, you can work around the problem by commenting out the call
to kmem_cache_downsize() in vm_init(). (And if you're using
makedumpfile with excluded pages, hope that the problem I described
above doesn't occur...)
Dave
14 years, 10 months
[PATCH] Display "irqaction mask" only if available
by Bernhard Walle
Display "irqaction mask" only if available
The member "mask" has been removed from "struct irqaction" in the kernel per
commit ef79f8e191722dbc1fc33bdfc448f572266c37e9
Author: Rusty Russell <rusty(a)rustcorp.com.au>
Date: Thu Sep 24 09:34:37 2009 -0600
cpumask: remove unused mask field from struct irqaction.
Up until 1.1.83, the primitive human tribes used struct sigaction for
interrupts. The sa_mask field was overloaded to hold a pointer to the
name.
When someone created the new "struct irqaction" they carried across
the "mask" field as a kind of ancestor worship: the fact that it was
unused makes clear its spiritual significance.
Signed-off-by: Rusty Russell <rusty(a)rustcorp.com.au>
This patch only displays the "irqaction mask" in the "irq" command if the
member is present. It fixes the following error (kernel was 2.6.33):
crash> irq
irq: invalid structure member offset: irqaction_mask
FILE: kernel.c LINE: 5001 FUNCTION: generic_dump_irq()
[./crash.orig] error trace: 8097e44 => 8109541 => 810c0ec => 8156299
Signed-off-by: Bernhard Walle <bernhard(a)bwalle.de>
14 years, 10 months
crash-5.0: zero-size memory-allocation
by ville.mattila@stonesoft.com
Hello,
We have a custom kernel based on 2.6.27.39. This kernel
has 2/2 memory split. Now we have one crash dump that can be
successfully be opened with crash 4.0-8.8 but not with crash 5.0.
This crashdump happens because double free of memory block, so there
might be some memory corruption in cache data area.
Unfortunately I cannot pinpoint the exact version where this
starts to happen because I could not find older crash releases.
Here is some debug info.
The tail of crash -d 10 output
...
NOTE: page_hash_table does not exist in this kernel
please wait... (gathering kmem slab cache data)<readmem: 8075801c, KVADDR,
"cache_chain", 4, (FOE), ffb944f8>
addr: 8075801c paddr: 75801c cnt: 4
GETBUF(128 -> 0)
FREEBUF(0)
GETBUF(204 -> 0)
<readmem: 8067f1c0, KVADDR, "kmem_cache buffer", 204, (FOE), 8520f00>
addr: 8067f1c0 paddr: 67f1c0 cnt: 204
GETBUF(128 -> 1)
FREEBUF(1)
GETBUF(128 -> 1)
FREEBUF(1)
kmem_cache_downsize: SIZE(kmem_cache_s): 204 cache_cache.buffer_size: 0
kmem_cache_downsize: nr_node_ids: 1
FREEBUF(0)
crash: zero-size memory allocation! (called from 80b7b7b)
>
addr2line -e crash 80b7b7b
/workarea/build/packages/crash/crash-5.0.0-32bit/memory.c:7439
I'm happy to test patches.
14 years, 10 months
Re: [Crash-utility] Degradation with crash 5.0.0 on x86 -- [PATCH]
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> ----- "Shahar Luxenberg" <shahar(a)checkpoint.com> wrote:
>
> > Hi,
> >
> >
> >
> > Environment: Red Hat Enterprise Linux Server release 5.2 (Tikanga),
> > x86, 2.6.18-92.el5
> >
> > I’ve installed crash 5.0.0 and noticed lots of error messages during
> > startup of the form:
> >
> > ‘crash: input string too large: "804328c4:" (9 vs 8)’
> >
> > This doesn’t happen with crash 4.1.2
> >
> >
> >
> > While debugging it a little, I’ve noticed that BUG_x86 is calling gdb
> > with the x/i command:
> >
> > sprintf(buf1, "x/%ldi 0x%lx", spn->value - sp->value, sp->value);
> >
> > The return buffer (buf2) is: 0x80430800: push %ebp
> >
> > On 4.1.2, the return buffer (buf2) is: 0x80430800 <do_exit>: push %ebp
> >
> > This explains the problem since parse_line will parse the line
> > differently returning ‘0x80430800:’ on arglist[0] and nothing on
> > arglist[2] (crash 5.0.0) while returning 0x80430800 on arglist[0] and
> > ‘push’ on arglist[2].
> >
> > Have you noticed this kind of problem?
>
> I see it now, at least on 2.6.18-era kernels. It doesn't seem to happen
> with earlier RHEL4 (2.6.9-era) vmlinux files for some reason. And on anything
> later than 2.6.20, the code in question isn't run. Anyway, as you tracked
> it down, the x86 code disassembly output is different, but should be trivial
> to fix.
>
> Thanks for the report,
> Dave
Patch attached, and queued for the next release.
Dave
14 years, 10 months
Re: [Crash-utility] Degradation with crash 5.0.0 on x86
by Dave Anderson
----- "Shahar Luxenberg" <shahar(a)checkpoint.com> wrote:
> Hi,
>
>
>
> Environment: Red Hat Enterprise Linux Server release 5.2 (Tikanga),
> x86, 2.6.18-92.el5
>
> I’ve installed crash 5.0.0 and noticed lots of error messages during
> startup of the form:
>
> ‘crash: input string too large: "804328c4:" (9 vs 8)’
>
> This doesn’t happen with crash 4.1.2
>
>
>
> While debugging it a little, I’ve noticed that BUG_x86 is calling gdb
> with the x/i command:
>
> sprintf(buf1, "x/%ldi 0x%lx", spn->value - sp->value, sp->value);
>
> The return buffer (buf2) is: 0x80430800: push %ebp
>
> On 4.1.2, the return buffer (buf2) is: 0x80430800 <do_exit>: push %ebp
>
> This explains the problem since parse_line will parse the line
> differently returning ‘0x80430800:’ on arglist[0] and nothing on
> arglist[2] (crash 5.0.0) while returning 0x80430800 on arglist[0] and
> ‘push’ on arglist[2].
>
> Have you noticed this kind of problem?
I see it now, at least on 2.6.18-era kernels. It doesn't seem to happen
with earlier RHEL4 (2.6.9-era) vmlinux files for some reason. And on anything
later than 2.6.20, the code in question isn't run. Anyway, as you tracked
it down, the x86 code disassembly output is different, but should be trivial
to fix.
Thanks for the report,
Dave
14 years, 10 months