Re: [Crash-utility] kmem -[sS] segfault on 2.6.25.17

Thursday, 16 October 2008

On Thu, Oct 16, 2008 at 3:54 PM, Dave Anderson <anderson(a)redhat.com&gt; wrote:
...

 ----- "Mike Snitzer" <snitzer(a)gmail.com&gt; wrote:

> Frame 0 of crash's core shows:
> (gdb) bt
> #0  0x0000003b708773e0 in memset () from /lib64/libc.so.6
>
> I'm not sure how to get the faulting address though?  Is it just
> 0x0000003b708773e0?

 No, that's the text address in memset().  If you "disass memset",
 I believe that you'll see that the address above is dereferencing
 the rcx register/pointer.  So then, if you enter "info registers",
 you'll get a register dump, and rcx would be the failing address. 
OK.

0x0000003b708773e0 <memset+192>:        movnti %r8,(%rcx)

(gdb) info registers
...
rcx            0xa7b000 10989568

(gdb) x/x 0xa7b000
0xa7b000:       Cannot access memory at address 0xa7b000

...
> I've not rebooted the system at all either... now when I run
'kmem
> -s'
> in live crash I see:
>
> CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL
> SLABS  SSIZE
> ...
> kmem: nfs_direct_cache: full list: slab: ffff810073503000  bad inuse
> counter: 5
> kmem: nfs_direct_cache: full list: slab: ffff810073503000  bad inuse
> counter: 5
> kmem: nfs_direct_cache: partial list: bad slab pointer: 88
> kmem: nfs_direct_cache: full list: bad slab pointer: 98
> kmem: nfs_direct_cache: free list: bad slab pointer: a8
> kmem: nfs_direct_cache: partial list: bad slab pointer:
> 9f911029d74e35b
> kmem: nfs_direct_cache: full list: bad slab pointer: 6b6b6b6b6b6b6b6b
> kmem: nfs_direct_cache: free list: bad slab pointer: 6b6b6b6b6b6b6b6b
> kmem: nfs_direct_cache: partial list: bad slab pointer: 100000001
> kmem: nfs_direct_cache: full list: bad slab pointer: 100000011
> kmem: nfs_direct_cache: free list: bad slab pointer: 100000021
> ffff810073501600 nfs_direct_cache         192          2        40
>  2     4k
> ... 
...
 Are those warnings happening on *every* slab type?  When you run on
a
 live system, the "shifting sands" of the kernel underneath the crash
 utility can cause errors like the above.  But at least some/most of
 the other slabs' infrastructure should remain stable while the command
 runs. 
Ah makes sense, yes many of them do remain stable:

kmem: request_sock_TCPv6: full list: bad slab pointer: 79730070756b6f7f
kmem: request_sock_TCPv6: free list: bad slab pointer: 79730070756b6f8f
ffff810079199240 request_sock_TCPv6       160          0         0      0     4k
ffff81007919a200 TCPv6                   1896          3         4      2     4k
ffff81007dcb41c0 dm_mpath_io               64          0         0      0     4k
...
ffff81007d9ce580 sgpool-8                 280          2        42      3     4k
ffff81007d9cf540 scsi_bidi_sdb             48          0         0      0     4k
ffff81007d98b500 scsi_io_context          136          0         0      0     4k
ffff81007d95e4c0 ext3_inode_cache         992      38553     38712   9678     4k
ffff81007d960480 ext3_xattr               112         68       102      3     4k

etc

...
> But if I run crash against the vmcore I do get the segfault...
>

 When you run it on the vmcore, do you get the segfault immediately?
 Or do some slabs display their stats OK, but then when it deals with
 one particular slab it generates the segfault?

 I mean that it's possible that the target slab was in transition
 at the time of the crash, in which case you might see some error
 messages like you see on the live system.  But it is difficult to
 explain why it's dying specifically where it is, even if the slab
 was in transition.

 That all being said, even if the slab was in transition, obviously
 the crash utility should be able to handle it more gracefully... 
None of the slabs display their stats OK, crash segfaults immediately.

...
> > BTW, if need be, would you be able to make the
vmlinux/vmcore pair
> > available for download somewhere?  (You can contact me off-list
> with
> > the particulars...)
>
> I can work to make that happen if needed...

 FYI, I did try our RHEL5 "debug" kernel (2.6.18 + hellofalotofpatches),
 which has both CONFIG_DEBUG_SLAB and CONFIG_DEBUG_SLAB_LEAK turned on,
 but I don't see the problem.  So unless something obvious can be
 determined, that may be the only way I can help. 
Interesting.  OK, I'll work to upload them somewhere and I'll send you
a pointer off-list.

Thanks!
Mike

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] kmem -[sS] segfault on 2.6.25.17