Sharyathi Nagesh wrote:
This is the patch after making following changesHi Sharyathi,
o Created a new function gather_cpudata_list_v2_nodes().
o gather_cpudata_list_v2_nodes() will be called for each node and it will update avail with corresponding value of shared memory..
o gather_cpudata_list_v2_nodes() is called inside do_slab_chain_percpu_v2_nodes during SLAB_WALKTHROUGH instead outside as earlier..
o Have removed the commented out section of SLAB_WALKTHROUGH (specified slab).
o updated with FREEBUF at possible exit points.
o updated dump_vm_table() to dump vt->kmem_cache_len_nodesOpens
o The si->found field was not getting set for the dump I analysed. so if(si->found) part of the code was not getting executed.
Needs to be checked for this case.Please go through the patch and let me know of your opinions.
Thanks
Sharyathi Nagesh
OK -- it's getting there, but there are a couple problems with
this second patch.
The purpose of gather_cpudata_list_v2_nodes() is to fill
several arrays of cached objects:
(1) the list of per-cpu objects attached to the kmem_cache,
referenced by the "array_cache[NR_CPUS]" member:
struct kmem_cache {
...
struct array_cache
*array[NR_CPUS];
...
So, for each cpu, its cached per-cpu objects
from the
above get stored in crash in each dynamically-allocated,
per-cpu "cpudata" array:
struct si_meminfo {
...
ulong *cpudata[NR_CPUS];
Now that gather_cpudata_list_v2_nodes() is being
called
multiple times instead of just once, the loading
of each
of the cpudata[NR_CPUS] array is being done
redundantly.
This is harmless, although you could probably
just look
at the "index" argument you now pass in, and
if it's
non-zero, the redundant reading of the cached
per-cpu
objects above can be avoided.
(2) The second function of the current gather_cpudata_list_v2()
is to gather the "shared" list from the singular
kmem_list3
structure:
struct kmem_list3 {
...
struct array_cache
*shared;
Since this has (until now) been a single list
per kmem_cache,
crash has been storing the list of shared objects
in a singular
dynamically-allocated array:
struct si_meminfo {
...
ulong *shared_array_cache;
and that's the problem with your new gather_cpudata_list_v2_nodes().
It's using a singular "shared_array_cache" array multiple times,
once for each "index" of kmem_list3 structure. So it ends
up
over-writing the array each time, and so the si->shared_array_cache
ends up containing the list of objects from the *last* "index"
only.
> o The si->found field was not getting set for the dump I analysed.
> so if(si->found) part of the code was not getting
executed.
Given the above, this makes sense, since later on, when the
shared_array_cache list is searched for a particular object, it
most likely won't find it unless it happened to be contained in
the last "index'd" set of objects.
Now this can be solved in a couple of ways:
The dynamic allocation of the singular shared_array_cache array
can be increased by multiplying its currently predetermined
max-size *times* the vt->kmem_cache_len_nodes. And then each
time gather_cpudata_list_v2_nodes() is called, it could read
the next set of objects into the location in the array where
it left off the last time, i.e., the first non-zero entry.
Alternatively, a different array could be used, although
it couldn't use NR_CPUS as a delimiter, but would have
to be aware of a maximum-number-of-numa-nodes, which
would be kind of ugly. And if you use a new array, then
the check_shared_list() function would have to be updated
to check all of the lists instead of the current single
list.
BTW, there's a bug in check_shared_list() that I just
noticed now -- it's harmless, but dumb:
static int
check_shared_list(struct meminfo *si, ulong obj)
{
int i;
if (INVALID_MEMBER(kmem_list3_shared)
||
!si->shared_array_cache)
return FALSE;
for (i = 0; i < si->shared_array_cache[i];
i++) {
if (si->shared_array_cache[i] == obj)
return TRUE;
}
return FALSE;
}
The for loop should be:
for (i = 0; si->shared_array_cache[i]; i++) {
It works now by dumb luck, stopping at the first non-zero
entry (object address), which is what it's supposed
to do.
Finally, there's one other "gotcha" with this scheme.
During intitialization, the value of vt->kmem_max_limit
is determined in max_cpudata_limit, and later on it's
used to allocate the object array:
si->shared_array_cache = (ulong *)
GETBUF(vt->kmem_max_limit * sizeof(ulong));
vt->kmem_max_limit is calculated during initialization
by determining the maximum cached object list size amongst
both the kmem_cache->array[NR_CPUS] and the singular
kmem_list3->array. Therefore, the max_cpudata_limit()
function is only checking the first one:
/*
* If the
shared list can be accessed, check its size as well.
*/
if (VALID_MEMBER(kmem_list3_shared)
&&
VALID_MEMBER(kmem_cache_s_lists) &&
readmem(cache+OFFSET(kmem_cache_s_lists)+OFFSET(kmem_list3_shared),
KVADDR, &shared, sizeof(void *), "kmem_list3 shared",
RETURN_ON_ERROR|QUIET) &&
readmem(shared+OFFSET(array_cache_limit),
KVADDR, &limit, sizeof(int), "shared array_cache limit",
RETURN_ON_ERROR|QUIET)) {
if (limit > max_limit)
max_limit = limit;
}
Like your other new functions, if there's more than one list,
they should probably all be checked for the largest one.
Clear as mud?
Dave