> Dave Anderson <anderson@redhat.com>
> > To:
> > "Discussion list for crash utility usage, maintenance and > development" <crash-utility@redhat.com>
> > Date:
> > 13.01.2010 16:14
> > Subject:
> > Re: [Crash-utility] crash-5.0: zero-size memory-allocation
> > Sent by:
> > crash-utility-bounces@redhat.com
> > > ----- "ville mattila" <ville.mattila@stonesoft.com> wrote: > > > > From: > > > > > > Dave Anderson <anderson@redhat.com> > > > > > ... > > > But your kernel shows cache_cache.buffer_size set to zero -- and the > > > ASSIGN_SIZE(kmem_cache_s) above dutifully downsized the data structure > > > size from 204 to zero. Later on, that size was used to allocate a > > > kmem_cache buffer, which failed when a GETBUF() was called with > a zero-size. > > > > > > I guess a check could be made above for a zero cache_cache.buffer_size, > > > but why would that ever be? > > > > > > Try this: > > > > > > # crash --no_kmem_cache vmlinux vmcore > > > > > > which will allow you to get past the kmem_cache initialization. > > > > > > Then enter: > > > > > > crash> p cache_cache > > > > > > Does the "buffer_size" member really show zero? > > > > Yes it seems so! > > initialize_task_state: using old defaults > > <readmem: 8067a300, KVADDR, "fill_task_struct", 868, (ROE), 86e3f78> > > addr: 8067a300 paddr: 67a300 cnt: 868 > > STATE: TASK_RUNNING (PANIC) > > > > crash> p cache_cache > > cache_cache = GETBUF(128 -> 0) > > <readmem: 8067f1c0, KVADDR, "gdb_readmem_callback", 204, (ROE), 8ac00d8> > > addr: 8067f1c0 paddr: 67f1c0 cnt: 204 > > $3 = { > > array = {0x0, 0x8067f1c4, 0x8067f1c4, 0x0, 0x0, 0x0, 0x0, 0x0, > > 0xf7813e00, 0xf7849400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, > > batchcount = 0, > > limit = 0, > > shared = 0, > > buffer_size = 0, > > reciprocal_buffer_size = 0, > > flags = 0, > > num = 0, > > gfporder = 0, > > gfpflags = 60, > > colour = 120, > > colour_off = 8, > > slabp_cache = 0x100, > > slab_size = 16777216, > > dflags = 0, > > ctor = 0xf, > > name = 0x0, > > next = { > > next = 0x0, > > prev = 0x2 > > }, > > nodelists = {0x40} > > } > > FREEBUF(0) > > That's some serious corruption! > Yes, this double free caused a lot of head scratching!

> > > > > > BTW, you can work around the problem by commenting out the call > > > to kmem_cache_downsize() in vm_init(). > > > > This workaround works ok. > > But even then, if you comment out the call to kmem_cache_downsize(), > the kmem_cache_init() function could not have done anything useful > because the "cache_cache.next.next" pointer is corrupted with a NULL, > which points to the first of the chain of kmem_cache slab cache headers. > I'm surprised it managed to continue without running into another > roadblock -- did it display the "crash: unable to initialize kmem > slab cache subsystem" error message? > No, there is no other error messages.

> > > (And if you're using makedumpfile with excluded pages, hope that > > > the problem I described above doesn't occur...) > > > > > We are not excluding files so this is not a big issue. Also > > the --no_kmem_cache lets me open dump and let me do quite many things > > already. > > Like I mentioned before, I could put a check in kmem_cache_downsize() > to check for a zero buffer_size, but the odds of that happening are > absurdly small. I suppose I could check whether the value is less > than the kmem_cache.nodelists structure offset. >
That would be usefull, just warn that some major corruption seems to have
happen.It is always good to get atleast some crash info out. For example
dmesg and bt. I'll gladly test patches, if needed.

Also one question. Is there some hidden option that will show all the
hidden crash command line options, e.g. --no_kmem_cache and alike?

- Ville