Hello Atsushi,
I've committed a SLAB/SLUB kmem_cache-specific fix for this issue:
https://github.com/crash-utility/crash/commit/c0b7a74fc13121203810d06d163...
which is queued for crash-7.0.6.
Thanks,
Dave
----- Original Message -----
----- Original Message -----
> Hello,
>
> Finally, I've found the cause of the issue I mentioned as below
> when makedumpfile v1.5.5 was released:
>
> > 2. At first, the supported kernel will be updated to 3.12, but I
> > found an issue while testing for v1.5.5, which seems that the page
> > filtering works wrongly on kernel 3.12. I couldn't investigate this
> > yet and it will take some time to finish it.
> > Therefore, the latest supported kernel version is 3.11 in v1.5.5.
>
> This is neither a kernel issue nor a makedumpfile issue, it's a crash's
bug.
> It can happen when a slab cache is stored at almost end of a page.
>
> == Description ==
>
> At the beginning, I found the error message below when I used crash for
> a dumpfile generated by makedumpfile -d2:
>
> please wait... (gathering kmem slab cache data)
> crash: page excluded: kernel virtual address: f4e87000 type:
> "kmem_cache
> buffer"
>
> crash: unable to initialize kmem slab cache subsystem
>
> This message indicated that crash failed to get a slab cache during
> kmem_cache_init(), and according to the below, crash failed to get
> the slab cache stored at f4e86f40:
>
> crash> p kmem_cache
> kmem_cache = $1 = (struct kmem_cache *) 0xc0b1cbc0 <kmem_cache_boot>
> crash>
> crash> list kmem_cache.list -s kmem_cache.name -h 0xc0b1cbc0
> ...
> f4d37840
> name = 0xf4edf540 "uid_cache"
> f4e86f40
> list: page excluded: kernel virtual address: f4e87000 type:
> "gdb_readmem_callback"
>
> It seems that the slab cache covered two pages, [f4e86000- f4e87000] and
> [f4e87000- f4e88000]. Well, let's confirm the *real* size of it.
>
> Since slab caches except kmem_cache_boot are allocated as slab objects,
> we can confirm the size like below:
>
> crash> p kmem_cache
> kmem_cache = $2 = (struct kmem_cache *) 0xc0b1cbc0 <kmem_cache_boot>
> crash> struct kmem_cache.object_size 0xc0b1cbc0
> object_size = 104
> crash>
>
> In my environment, the size was 104 bytes. Therefore, the slab cache
> stored at f4e86f40 fits in the single page([f4e86000- f4e87000]) and
> the excluded page([f4e87000- f4e88000]) isn't a related page.
>
> On the other hand, crash get the size from vmlinux by using gdb,
> it was 216 bytes:
>
> crash> struct kmem_cache
> struct kmem_cache {
> unsigned int batchcount;
> unsigned int limit;
> ...
> struct kmem_cache_node **node;
> struct array_cache *array[33];
> }
> SIZE: 216
> crash>
>
> So crash mistook the correlative pages of the slab cache as
> [f4e86000- f4e87000] and [f4e87000- f4e88000] even though the latter
> was a irrelevant page.
>
> This gap came from the fact that the size of slab cache is variable.
>
> struct kmem_cache {
> ...
> struct kmem_cache_node **node;
> struct array_cache *array[NR_CPUS + MAX_NUMNODES];
> /*
> * Do not add fields after array[]
> */
> };
>
> The size of "array" is the variable factor of kmem_cache.
> When building vmlinux, the size of kmem_cache will be calculated with
> NR_CPUS and MAX_NUMNODES, and put it into vmlinux as a debug information.
> (Sorry, I don't know gcc well. I may misunderstand this.)
> However, the actual size will be smaller than the defined size because
> the actual size will be decided based on the actual number of CPUs and
> NODEs.
>
> void __init kmem_cache_init(void)::
> ...
> /*
> * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
> */
> create_boot_cache(kmem_cache, "kmem_cache",
> offsetof(struct kmem_cache, array[nr_cpu_ids]) +
> nr_node_ids * sizeof(struct
> kmem_cache_node
> *), // object_size
> SLAB_HWCACHE_ALIGN);
> list_add(&kmem_cache->list, &slab_caches);
>
>
> As for kmem_cache, we can get the actual size of it from kmem_cache_boot,
> but I suppose that kmem_cache is not the only struct in kernel whose size
> is variable. So I think we should discuss how to address such issues like
> this.
>
> By the way, I mentioned the case of *SLAB* in this mail,
> but SLUB seems have the same issue.
>
>
> Thanks
> Atsushi Kumagai
This is a "known" issue has been discussed on the crash-utility list in the
past,
at least with respect to the kmem_cache data structure. But for any random
data
structure that has such a construct, I'm not sure what can be done.
In the case of the CONFIG_SLAB kmem_cache data structure, there is a function
that is supposed to "downsize" the size value of the kmem_cache data
structure
that is returned by gdb. It is called here in kmem_cache_init(), just
prior to cycling through all of the kmem_cache structures, where the
page excluded error shown above occurred:
8561 if (!(pc->flags & RUNTIME))
8562 kmem_cache_downsize();
8563
8564 cache_buf = GETBUF(SIZE(kmem_cache_s));
8565 hq_open();
8566
8567 do {
8568 cache_count++;
8569
8570 if (!readmem(cache, KVADDR, cache_buf,
SIZE(kmem_cache_s),
8571 "kmem_cache buffer", RETURN_ON_ERROR)) {
8572 FREEBUF(cache_buf);
8573 vt->flags |= KMEM_CACHE_UNAVAIL;
8574 error(INFO,
8575 "%sunable to initialize kmem slab cache
subsystem\n\n",
8576 DUMPFILE() ? "\n" : "");
8577 hq_close();
8578 return;
8579 }
The SIZE(kmem_cache_s) value should have been downsized by that function,
but presumably it did not work. If CRASHDEBUG(1) was turned on during
initialization,
you would have seen either of these two messages from kmem_cache_downsize():
if (CRASHDEBUG(1))
fprintf(fp, "kmem_cache_downsize: %ld to %ld\n",
STRUCT_SIZE("kmem_cache"),
SIZE(kmem_cache_s));
or:
if (CRASHDEBUG(1)) {
fprintf(fp,
"\nkmem_cache_downsize: SIZE(kmem_cache_s): %ld "
"cache_cache.buffer_size: %d\n",
STRUCT_SIZE("kmem_cache"), buffer_size);
fprintf(fp,
"kmem_cache_downsize: nr_node_ids: %ld\n",
vt->kmem_cache_len_nodes);
}
The function failed probably failed due to some kernel change. In fact,
I just checked a 3.13 CONFIG_SLAB kernel, and I see that
kmem_cache_downsize()
no longer works for that kernel.
I see that kmem_cache_boot would be a good alternative for determining
the size on CONFIG_SLAB kernels, at least on 3.7 and later kernels where
it was introduced. And for CONFIG_SLUB, which doesn't currently have a
"downsize" function, it looks like its "kmem_cache" cache also has
size
fields that could be used.
By any chance can you make the 32-bit vmlinux/vmcore pair available for
me to download? Reply to me off-list if you can.
Thanks,
Dave