----- Original Message -----
OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp> writes:
OK. More simpler proof, the following is enough to convince you?
No, I'm a believer. I could pretty much verify your count by looking
at the task_struct slab cache.
Interesting though, I was looking a slub corruption vmcore with your patch
applied, where a kmem_cache_cpu.freepointer got corrupted because of a
use-after-free bug that overwrote the next-free pointer in a free'd kmalloc-32
object. When that corrupted object was later allocated, its corrupted next-free
pointer was transferred to the kmem_cache_cpu.freepointer. It gets reported
like so:
crash> kmem -s kmalloc-32
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
kmem: kmalloc-32: slab: 0 invalid freepointer: ffff001090e33f80
ffff880333001c00 kmalloc-32 32 122658 125440 980 4k
crash>
And here are the per-cpu kmem_cache_cpu structures, where the corrupted one
is from cpu 3:
crash> kmem_cache.cpu_slab ffff880333001c00
cpu_slab = 0x163c0
crash> kmem_cache_cpu 0x163c0:a
[0]: ffff88033fc163c0
struct kmem_cache_cpu {
freelist = 0xffff88031c028fa0,
tid = 31034440,
page = 0xffffea000c700a00,
partial = 0xffffea000ca5d380
}
[1]: ffff88033fc963c0
struct kmem_cache_cpu {
freelist = 0xffff8802d44c91c0,
tid = 28218351,
page = 0xffffea000b513240,
partial = 0x0
}
[2]: ffff88033fd163c0
struct kmem_cache_cpu {
freelist = 0xffff8802d442ba80,
tid = 25768102,
page = 0xffffea000b510ac0,
partial = 0xffffea000c9bce40
}
[3]: ffff88033fd963c0
struct kmem_cache_cpu {
freelist = 0xffff001090e33f80, <== corrupted pointer
tid = 26298247,
page = 0xffffea0006438cc0,
partial = 0xffffea0002ec8b80
}
crash>
But going back to the error report, the "slab: 0" is kind of confusing:
crash> kmem -s kmalloc-32
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
kmem: kmalloc-32: slab: 0 invalid freepointer: ffff001090e33f80
ffff880333001c00 kmalloc-32 32 122658 125440 980 4k
crash>
Unlike do_slab_slub(), when get_kmem_cache_slub_data() calls count_free_objects(),
si->slab is not set:
switch (cmd)
{
case GET_SLUB_OBJECTS:
if (!readmem(cpu_slab_ptr + OFFSET(page_inuse),
KVADDR, &inuse, sizeof(short),
"page inuse", RETURN_ON_ERROR))
return FALSE;
objects = slub_page_objects(si, cpu_slab_ptr);
if (!objects)
return FALSE;
free_objects += objects - inuse;
free_objects += count_free_objects(si, cpu_freelist);
free_objects += count_cpu_partial(si, i);
if (!node_total_avail)
total_objects += inuse;
total_slabs++;
break;
And then count_free_objects() calls get_freepointer(), leading to the confusing
error message.
I'm thinking we should clarify that error message, perhaps by storing the cpu
number in si->cpu, and displaying it when si->slab is NULL?
Dave