----- Original Message -----
More testing revealed a machine in our stable that either failed to
initialize kmem:
please wait... (gathering kmem slab cache data)
crash-6.0.3: page excluded: kernel virtual address: ffff8801263d6000
type: "kmem_cache buffer"
crash-6.0.3: unable to initialize kmem slab cache subsystem
Or succeeded on initialize and then failed on a kmem -s command:
crash-6.0.3> kmem -s
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
Segmentation fault
The problem is that the array struct at the end of kmem_cache remains declared as
32 elements, but for all dynamically allocated copies, is actually trimmed down
to nr_cpu_ids in length.
crash-6.0.3.best> struct kmem_cache
struct kmem_cache {
unsigned int batchcount;
...
struct list_head next;
struct kmem_list3 **nodelists;
struct array_cache *array[32];
}
SIZE: 368
On my normal play machine, nr_cpu_ids = 32 and actual cpus = 16.
On the failing machine, nr_cpus_ids and actual cpus are both 2.
Two problems occur:
1) max_cpudata_limit traverses the array until it finds a 0x0 or
reaches the real size. On the 2-cpu system, the "third" element in the
array belonged elsewhere, was non-zero, and pointed to data that caused
the apparent limit to be 0xffffffffffff8801, which didn't work well as
a length in a memcopy.
But your patch does this:
@@ -8117,8 +8135,9 @@ kmem_cache_s_array_nodes:
"array cache array", RETURN_ON_ERROR))
goto bail_out;
- for (i = max_limit = 0; (i < ARRAY_LENGTH(kmem_cache_s_array)) &&
- cpudata[i]; i++) {
+ for (i = max_limit = 0; (i < kmem_cache_nr_cpu)
+ && (i < ARRAY_LENGTH(kmem_cache_s_array))
+ && cpudata[i]; i++) {
if (!readmem(cpudata[i]+OFFSET(array_cache_limit),
KVADDR, &limit, sizeof(int),
"array cache limit", RETURN_ON_ERROR))
On "old" slab systems, your new "kmem_cache_nr_cpu" variable remains
at
its initialized value of zero, and the loop never gets entered. So I don't
think you wanted to keep the (i < kmem_cache_nr_cpu) there, right?
2) kmem_cache structs can be allocated near enough to the edge of a
page
that the old incorrect length crosses the page boundary, even though the
real smaller structure fits in the page. That caused a readmem of the
structure to cross into a coincidentally missing page in the dump.
Right -- that was the genesis of the kmem_cache_downsize() function.
This patch fixes both of those (after wrestling ARRAY_LENGTH to the
ground), but *does not* fix the similar page crossing problem when I try
to use a "struct kmem_cache" command on the particular structure at the
end of the page.
Yeah, damn, I don't know what can be done for that, aside from some
horrific kludge to gdb_readmem_callback() to return successfully even
if the readmem() failed.
Dave