On Thu, Oct 16, 2008 at 2:22 PM, Dave Anderson <anderson(a)redhat.com> wrote:
----- "Mike Snitzer" <snitzer(a)gmail.com> wrote:
> On Thu, Oct 16, 2008 at 1:16 PM, Dave Anderson <anderson(a)redhat.com>
> wrote:
> >
> > ----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> >> Ok, then I can't see off-hand why it would segfault. Prior to
> this
> >> routine running, si->cpudata[0...i] all get allocated buffers
> equal
> >> to the size that's being BZERO'd.
> >>
> >> Is si->cpudata[i] NULL or something?
>
> (gdb) p si->cpudata
> $1 = {0xa56400, 0xa56800, 0xa56c00, 0xa57000, 0x0 <repeats 252
> times>}
> (gdb) p si->cpudata[0]
> $4 = (ulong *) 0xa56400
OK, so if "i" is 0 at the time, then I don't understand how the
BZERO/memset can segfault while zero'ing out memory starting at
address 0xa56400?
BZERO(si->cpudata[i], sizeof(ulong) * vt->kmem_max_limit);
Even if it over-ran the 0x400 bytes that's been allocated to
si->cpuinfo[0], it would still harmlessly run into the buffer
that was allocated for si->cpuinfo[1]. What's the bad address
it's faulting on?
Frame 0 of crash's core shows:
(gdb) bt
#0 0x0000003b708773e0 in memset () from /lib64/libc.so.6
I'm not sure how to get the faulting address though? Is it just
0x0000003b708773e0?
And for sanity's sake, what is the crash utility's
vm_table.kmem_max_limit
equal to, and what architecture are you running on?
Architecture is x86_64.
kmem_max_limit=128, sizeof(ulong)=8; so the memset() should in fact be
zero'ing all 1024 (0x400) bytes that were allocated.
> > Also, can you confirm that you are always using the exact
vmlinux
> > that is associated with each vmcore/live-system? I mean you're
> > not using a System.map command line argument, right?
>
> Yes, I'm using the exact vmlinux. Not using any arguments for live
> crash; I am for the vmcore runs but that seems needed given crash's
> [mapfile] [namelist] [dumpfile] argument parsing.
>
> I use a redhat-style kernel rpm build process (with a more advanced
> kernel .spec file); so I have debuginfo packages to match all my
> kernels.
OK cool -- so you know what you're doing. ;-)
So the thing is; now when I run live crash on the 2.6.25.17 devel
kernel I no longer git a segfault!? It still isn't happy but its at
least not segfaulting.. very odd.
I've not rebooted the system at all either... now when I run 'kmem -s'
in live crash I see:
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
...
kmem: nfs_direct_cache: full list: slab: ffff810073503000 bad inuse counter: 5
kmem: nfs_direct_cache: full list: slab: ffff810073503000 bad inuse counter: 5
kmem: nfs_direct_cache: partial list: bad slab pointer: 88
kmem: nfs_direct_cache: full list: bad slab pointer: 98
kmem: nfs_direct_cache: free list: bad slab pointer: a8
kmem: nfs_direct_cache: partial list: bad slab pointer: 9f911029d74e35b
kmem: nfs_direct_cache: full list: bad slab pointer: 6b6b6b6b6b6b6b6b
kmem: nfs_direct_cache: free list: bad slab pointer: 6b6b6b6b6b6b6b6b
kmem: nfs_direct_cache: partial list: bad slab pointer: 100000001
kmem: nfs_direct_cache: full list: bad slab pointer: 100000011
kmem: nfs_direct_cache: free list: bad slab pointer: 100000021
ffff810073501600 nfs_direct_cache 192 2 40 2 4k
...
kmem: nfs_write_data: partial list: bad slab pointer: 65676e61725f32
kmem: nfs_write_data: full list: bad slab pointer: 65676e61725f42
kmem: nfs_write_data: free list: bad slab pointer: 65676e61725f52
kmem: nfs_write_data: partial list: bad slab pointer: 74736f705f73666e
kmem: nfs_write_data: full list: bad slab pointer: 74736f705f73667e
kmem: nfs_write_data: free list: bad slab pointer: 74736f705f73668e
ffff81007350a5c0 nfs_write_data 760 36 40 8 4k
...
etc.
But if I run crash against the vmcore I do get the segfault...
BTW, if need be, would you be able to make the vmlinux/vmcore pair
available for download somewhere? (You can contact me off-list with
the particulars...)
I can work to make that happen if needed...
Mike