----- Original Message -----
On 09/20/2014 03:15 AM, Dave Anderson wrote:
>
> ----- Original Message -----
>> Hello Pan,
>>
>> I've updated the patch I attached yesterday with a change that
>> caches the most-recent tgid search result. From ~70% to ~90% of
>> the time, either the last tgid entry or the very next one in the
>> tgid_array is the one being searched for, so it's not necessary
>> to call bsearch() every time. "help -t" will show the cache-hit
>> statistics.
>>
>> Thanks,
>> Dave
> Hello Pan,
>
> This patch as written needs to be made less restrictive for use
> on a live system.
>
> When running on a live system that has many tasks constantly
> forking/exec'ing, the "ps" command may occasionally fail like so:
>
> crash> ps
> PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 0 0 0 ffffffff81c13440 RU 0.0 0 0
> [swapper/0]
> 0 0 1 ffff88021282d330 RU 0.0 0 0
> [swapper/1]
> > 0 0 2 ffff88021282dac0 RU 0.0 0 0
> > [swapper/2]
> 0 0 3 ffff88021282e250 RU 0.0 0 0
> [swapper/3]
> 1 0 1 ffff880212828000 IN 0.0 50140 3120 systemd
> 2 0 3 ffff880212828790 IN 0.0 0 0
> [kthreadd]
> ... [ cut ] ...
> 7578 27670 0 ffff8801f45e3c80 DE 0.0 0 0 cc
> 7622 27668 1 ffff880210ee3c80 ZO 0.0 0 0 info
> 7629 27667 1 ffff8801075bd330 DE 0.0 0 0 rev
> 7631 27680 0 ffff8801075bf170 ZO 0.0 0 0 printenv
> 7635 27685 3 ffff880108bbe9e0 ZO 0.0 0 0 ypwhich
> ps: bsearch for tgid failed: task: ffff880210ee6250 tgid: 7654
> crash>
>
> Without this patch, the search for the matching tgid would not generate
> an error at all, but just quietly continue.
>
> The problem is due to the task.tgid may change on a live system, or more
> likely, the task itself may have been re-used.
>
> I would like to fix it simply ignoring tgid bsearch failures on live
> systems,
> and just use the RSS stats stored in the per-tgid mm_struct.
>
> Does that work for you?
>
> Dave
>
>
> .
>
ok!
But I don't understand the meaning of "
fix it simply ignoring tgid bsearch failures on live systems,
and just use the RSS stats stored in the per-tgid mm_struct.
", if tgid may be changed, the tgid_array is useless on live systems.
Well, in this case, it may be true for a particular task if the task struct
had been re-used in between the time the arrays were created and the time
that the "ps" command gets around to reading and displaying its various
statistics. And so the command may read invalid data w/respect to that task.
But let's be clear -- that kind of behavior is, and always has been, an
unavoidable circumstance when running the crash utility on live systems, or
when looking at a "live" dump.
It's not just the "ps" command, but any command that displays data that
is subject to the "shifting sands" syndrome, where the kernel data is
constantly being modified while the crash command is running.
So the idea is to not just cancel the whole command with an error(FATAL...)
if such an anomoly occurs on a live system.
And what is the "RSS stats stored in the per-tgid
mm_struct" used for?
Sorry -- I meant to quietly skip the checking of the other tasks in the
task group, and simply use whatever is stored in the mm_struct pointed to
by the original task. Without your patch, if the tgid was not found, the
command would just continue. With your patch applied, it would be OK
do the error(FATAL) in the case of a static dumpfile. But in the case of
a live system (or live dump), it's not worth killing the command at that
point.
Clear?
Dave
More clearly, please.
thanks,
Pan