----- Original Message -----
Hi Dave,
It was noticed that when SPLIT_RSS_ACCOUNTING is enabled in kernel, the RSS
values shown by ps command is huge (negative value represented as ulong)
for certain tasks. Shown below is a ps output of such kind.
1416 1 3 db506ac0 IN 36235.9 936 4194128 debuggerd
1417 1 3 db500040 IN 0.0 16512 36 rild
1418 1 1 db500580 IN 0.1 47160 488 surfaceflinger
1419 1 0 db74d040 IN 1.1 458544 5148 zygote
* 1420 1 1 db5d3ac0 IN 36235.9 13868 4194280 drmserver*
1421 1 1 db5d3580 IN 0.0 54944 20 mediaserver
1422 1 0 db7c9580 IN 0.0 900 20 installd
* 1423 1 0 db7c9040 IN 36235.8 1828 4194112 keystore
1424 1 3 db5d8580 IN 36235.9 1152 4194276 akmdfs
1426 1 3 db7bd040 IN 36235.9 10880 4194212 glgps
1428 1 0 db61c040 IN 36235.9 2084 4194152 usb_portd
1429 1 0 db753ac0 IN 36235.9 868 4194148 atxd_proxy
1431 1 1 da6fbac0 UN 36235.9 6100 4194280 bkmgrd*
*
*
When SPLIT_RSS_ACCOUNTING is enabled, the rss values are stored in
task_struct of each thread, and is synced to the mm_struct.rss_stat only at
certain events. So during a crash it is likely that mm.rss_stat contains
old values which can even be negative.
I have prepared a patch (attached) to sync the task_struct.rss_stat with
mm.rss_stat, in get_task_mem_usage, when SPLIT_RSS_ACCOUNTING is enabled.
Please share your thoughts on this.
Hi Vinayak,
Good catch -- it certainly should be fixed. The devil is in the details of how
best to accomplish it...
On the face of it, it would seem that the task group list gathering via do_list()
would be the best way. The problem is that it could potentially cause the "ps"
command to fail on live systems if the list becomes invalid, or is invalid with
respect to the pre-gathered set of tasks that get captured prior to execution
of the command.
Prior to each crash command whose command_table_entry structure contains the
REFRESH_TASK_TABLE flag (as does "ps"), the tt->refresh_task_table()
function
is called to scan the kernel's pid_hash[] for a stable set of tasks. And only
those tasks are accessed or used by that crash command instance.
By calling do_list(), it's conceivable (although fairly unlikely) that the command
could go either go off into the weeds following a list that's being modified, or
if the list has changed since the "stable" list was gathered prior to the
command.
So preferably, it would be best if the new functionality could be constrained to
the stable set of pre-gathered tasks -- again if for no other reason than the fact
that the crash utility only uses the pre-gathered task-set whenever it parses tasks.
There's also another potential problem in having get_task_mem_usage() use
open_tmpfile(). Since it's a function that gets called by other functions
or macros (IN_TASK_VMA() for example), it's possible that the caller may
also be operating inside an open_tmpfile() area, and open_tmpfile() is not
recursive. In those internal cases, open_tmpfile2() could be used, but again,
it's probably best to avoid the do_list() call.
So I tinkered with an optional manner of checking the thread group list, i.e.,
like this:
if (VALID_MEMBER(task_struct_rss_stat)) {
int i, cnt;
int sync_rss;
ulong tgid;
struct task_context *tc1;
tgid = task_tgid(task);
tc1 = FIRST_CONTEXT();
for (i = cnt = 0; i < RUNNING_TASKS(); i++, tc1++) {
if (task_tgid(tc1->task) != tgid)
continue;
/* count 0 -> filepages */
if (!readmem(tc1->task +
OFFSET(task_struct_rss_stat) +
OFFSET(task_rss_stat_count), KVADDR,
&sync_rss,
sizeof(int), "task_struct rss_stat 0",
RETURN_ON_ERROR))
continue;
rss += sync_rss;
/* count 1 -> anonpages */
if (!readmem(tc1->task +
OFFSET(task_struct_rss_stat) +
OFFSET(task_rss_stat_count) + sizeof(int),
KVADDR, &sync_rss,
sizeof(int), "task_struct rss_stat 0",
RETURN_ON_ERROR))
continue;
rss += sync_rss;
}
}
It's inefficient in comparison to using do_list(), but it does guarantee
safety on live systems.
A couple other minor issues -- the two readmem() type strings above both
indicate "rss_stat 0", so they should be differentiated with strings indicating
"task_struct rss_stat MM_FILEPAGES" and "task_struct rss_stat
MM_ANONPAGES".
And lastly, since MEMBER_EXISTS() and MEMBER_OFFSET_INIT() do essentially
the same thing, it simpler in this case to just call MEMBER_OFFSET_INIT()
the 3 times without any MEMBER_EXISTS() conditionalization:
+ if (MEMBER_EXISTS("task_struct", "thread_group"))
+ MEMBER_OFFSET_INIT(task_struct_thread_group, "task_struct",
+ "thread_group");
+
+ if (MEMBER_EXISTS("task_struct", "rss_stat")) {
+ MEMBER_OFFSET_INIT(task_struct_rss_stat, "task_struct",
+ "rss_stat");
+ MEMBER_OFFSET_INIT(task_rss_stat_count, "task_rss_stat",
+ "count");
+ }
+
If the members don't exist, they will just get set to INVALID_OFFSET (-1).
What do you think?
Dave