----- Original Message -----
On 09/20/2014 03:15 AM, Dave Anderson wrote:
----- Original Message -----
Hello Pan,
I've updated the patch I attached yesterday with a change that
caches the most-recent tgid search result. From ~70% to ~90% of
the time, either the last tgid entry or the very next one in the
tgid_array is the one being searched for, so it's not necessary
to call bsearch() every time. "help -t" will show the cache-hit
statistics.
Thanks,
Dave
Hello Pan,
This patch as written needs to be made less restrictive for use
on a live system.
When running on a live system that has many tasks constantly
forking/exec'ing, the "ps" command may occasionally fail like so:
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff81c13440 RU 0.0 0 0
[swapper/0]
0 0 1 ffff88021282d330 RU 0.0 0 0
[swapper/1]
> 0 0 2 ffff88021282dac0 RU 0.0 0 0
> [swapper/2]
0 0 3 ffff88021282e250 RU 0.0 0 0
[swapper/3]
1 0 1 ffff880212828000 IN 0.0 50140 3120 systemd
2 0 3 ffff880212828790 IN 0.0 0 0
[kthreadd]
... [ cut ] ...
7578 27670 0 ffff8801f45e3c80 DE 0.0 0 0 cc
7622 27668 1 ffff880210ee3c80 ZO 0.0 0 0 info
7629 27667 1 ffff8801075bd330 DE 0.0 0 0 rev
7631 27680 0 ffff8801075bf170 ZO 0.0 0 0 printenv
7635 27685 3 ffff880108bbe9e0 ZO 0.0 0 0 ypwhich
ps: bsearch for tgid failed: task: ffff880210ee6250 tgid: 7654
crash>
Without this patch, the search for the matching tgid would not generate
an error at all, but just quietly continue.
The problem is due to the task.tgid may change on a live system, or more
likely, the task itself may have been re-used.
I would like to fix it simply ignoring tgid bsearch failures on live
systems,
and just use the RSS stats stored in the per-tgid mm_struct.
Does that work for you?
Dave
.
ok!
But I don't understand the meaning of "
fix it simply ignoring tgid bsearch failures on live systems,
and just use the RSS stats stored in the per-tgid mm_struct.
", if tgid may be changed, the tgid_array is useless on live systems.
Well, in this case, it may be true for a particular task if the task struct
had been re-used in between the time the arrays were created and the time
that the "ps" command gets around to reading and displaying its various
statistics. And so the command may read invalid data w/respect to that task.
But let's be clear -- that kind of behavior is, and always has been, an
unavoidable circumstance when running the crash utility on live systems, or
when looking at a "live" dump.
It's not just the "ps" command, but any command that displays data that
is subject to the "shifting sands" syndrome, where the kernel data is
constantly being modified while the crash command is running.
So the idea is to not just cancel the whole command with an error(FATAL...)
if such an anomoly occurs on a live system.
And what is the "RSS stats stored in the per-tgid mm_struct" used for?
Sorry -- I meant to quietly skip the checking of the other tasks in the
task group, and simply use whatever is stored in the mm_struct pointed to
by the original task. Without your patch, if the tgid was not found, the
command would just continue. With your patch applied, it would be OK
do the error(FATAL) in the case of a static dumpfile. But in the case of
a live system (or live dump), it's not worth killing the command at that
point.
Clear?
Dave
More clearly, please.
thanks,
Pan
.