On Mon, Apr 9, 2018 at 11:32 AM Dave Anderson <anderson(a)redhat.com> wrote:
----- Original Message -----
> task_exists() scans known tasks for a given task address.
>
> refresh_radix_tree_task_table() looks like:
> hq_open()
> for cpus {
> hq_enter(per_cpu_idle_thread)
> }
> for ... {
> hq_enter(task)
> }
> hq_close()
>
> tt->running_tasks = 0
> for task in retrieve_list(hq_entries) {
> if (task_exists()) {
> duplicate task address
> continue
> }
> add_context()
> }
>
> Because retrieve_list returns unique tasks, the above task_exists()
> check will never find a duplicate task. So remove the check. This
> converts the above loop from O(N^2) to O(N) which saves considerable
> startup time when there are a large number of tasks. On a 1M task dump
> this patch reduces startup time: 44m => 1m.
>
> I was not able to test the other pid table routines, but I suspect they
> may also benefit from similar removal.
Hi Greg,
I'm thinking that the task_exists() removal may be the most
beneficial
piece of the patch set, right?
This patch (3/4) is the most beneficial part for startup time. But after
startup, 'ps' is still slow with a lot of tasks. We can speed up 'ps'
with
the final patch (4/4) ("index task_context by task"), which depends on 1/4
("refactor store_context => add_context") and 2/4 ("refactor
task_to_pid").
In my experiments on a 1M task dump:
* patch 3/4 saves ~45m of startup time
* patches 1,2,4/4 saves ~45m of ps time
Together patches 1-4 save ~90m for startup+ps.
I have a set of ~250 vmcore files gathered over the years, but only 2
of
them are from kernels using the idr radix tree scheme. I timed how long
it took to simply bring up all 250 vmcores to their crash prompt and then
quit immediately. These are the total times running with and without your
patch:
without the patch:
real 46m44.023s
user 17m29.956s
sys 0m58.940s
with your patch applied:
real 46m39.752s
user 17m36.496s
sys 0m58.269s
But as you've noted, given that all of the various refresh_xxx
functions
use hq_enter(), the task_exists() call is pretty much useless.
I'm going to remove the task_exists() call from the remaining
refresh_xxx
functions, rerun the test, and see how it affects the numbers. I suspect
it will help a bit, although the task count in the ~250 vmcores range from
22 to 4178, with only ~20 dumpfiles having more than 1000 tasks.
Thanks. I also suspect the other refresh_xxx routines will benefit.
Though I don't have such dumps handy to test.