Re: [Crash-utility] [PATCH] runq: search current task's runqueue explicitly

Thursday, 5 January 2012

----- Original Message -----
...
 Currently, runq sub-command doesn't consider CFS runqueue's
current
 task removed from CFS runqueue. Due to this, the remaining CFS
 runqueus that follow the current task's is not displayed. This patch
 fixes this by making runq sub-command search current task's runqueue
 explicitly.

 Note that CFS runqueue exists for each task group, and so does CFS
 runqueue's current task, and the above search needs to be done
 recursively.

 Test
 ====

 On vmcore I made 7 task groups:

   root group --- A --- AA --- AAA
                     +      +- AAB
                     |
                     +- AB --- ABA
                            +- ABB

 and then I ran three CPU bound tasks, which is exactly the same as

   int main(void) { for (;;) continue; return 0; }

 for each task group, including root group; so total 24 tasks. For
 readability, I annotated each task name with its belonging group name.
 For example, loop.ABA belongs to task group ABA.

 Look at CPU0 collumn below. [before] lacks 8 tasks and [after]
 successfully shows all tasks on the runqueue, which is identical to
 the result of [sched debug] that is expected to ouput correct result.

 I'll send this vmcore later.

 [before]

 crash> runq | cat
 CPU 0 RUNQUEUE: ffff88000a215f80
   CURRENT: PID: 28263  TASK: ffff880037aaa040  COMMAND: "loop.ABA"
   RT PRIO_ARRAY: ffff88000a216098
      [no tasks queued]
   CFS RB_ROOT: ffff88000a216010
      [120] PID: 28262  TASK: ffff880037cc40c0  COMMAND: "loop.ABA"

 <cut>

 [after]

 crash_fix> runq
 CPU 0 RUNQUEUE: ffff88000a215f80
   CURRENT: PID: 28263  TASK: ffff880037aaa040  COMMAND: "loop.ABA"
   RT PRIO_ARRAY: ffff88000a216098
      [no tasks queued]
   CFS RB_ROOT: ffff88000a216010
      [120] PID: 28262  TASK: ffff880037cc40c0  COMMAND: "loop.ABA"
      [120] PID: 28271  TASK: ffff8800787a8b40  COMMAND: "loop.ABB"
      [120] PID: 28272  TASK: ffff880037afd580  COMMAND: "loop.ABB"
      [120] PID: 28245  TASK: ffff8800785e8b00  COMMAND: "loop.AB"
      [120] PID: 28246  TASK: ffff880078628ac0  COMMAND: "loop.AB"
      [120] PID: 28241  TASK: ffff880078616b40  COMMAND: "loop.AA"
      [120] PID: 28239  TASK: ffff8800785774c0  COMMAND: "loop.AA"
      [120] PID: 28240  TASK: ffff880078617580  COMMAND: "loop.AA"
      [120] PID: 28232  TASK: ffff880079b5d4c0  COMMAND: "loop.A"
 <cut>

 [sched debug]

 crash> runq -d
 CPU 0
      [120] PID: 28232  TASK: ffff880079b5d4c0  COMMAND: "loop.A"
      [120] PID: 28239  TASK: ffff8800785774c0  COMMAND: "loop.AA"
      [120] PID: 28240  TASK: ffff880078617580  COMMAND: "loop.AA"
      [120] PID: 28241  TASK: ffff880078616b40  COMMAND: "loop.AA"
      [120] PID: 28245  TASK: ffff8800785e8b00  COMMAND: "loop.AB"
      [120] PID: 28246  TASK: ffff880078628ac0  COMMAND: "loop.AB"
      [120] PID: 28262  TASK: ffff880037cc40c0  COMMAND: "loop.ABA"
      [120] PID: 28263  TASK: ffff880037aaa040  COMMAND: "loop.ABA"
      [120] PID: 28271  TASK: ffff8800787a8b40  COMMAND: "loop.ABB"
      [120] PID: 28272  TASK: ffff880037afd580  COMMAND: "loop.ABB"
 <cut>

 Diff stat
 =========

  defs.h |    1 +
  task.c |   37 +++++++++++++++++--------------------
  2 files changed, 18 insertions(+), 20 deletions(-)

 Thanks.
 HATAYAMA, Daisuke 
Hello Daisuke,

Good catch!  Plus your re-worked patch cleans things up nicely.

And "runq -d" paid off quickly, didn't it?  ;-)

One minor problem, while testing your patch on a variety of kernels,
several "runq" commands failed because the test kernels were 
not configured with CONFIG_FAIR_GROUP_SCHED:

  struct sched_entity {
          struct load_weight      load;           /* for load-balancing */
          struct rb_node          run_node;
          struct list_head        group_node;
          unsigned int            on_rq;

          u64                     exec_start;
          u64                     sum_exec_runtime;
          u64                     vruntime;
          u64                     prev_sum_exec_runtime;

          u64                     nr_migrations;

  #ifdef CONFIG_SCHEDSTATS
          struct sched_statistics statistics;
  #endif

  #ifdef CONFIG_FAIR_GROUP_SCHED
          struct sched_entity     *parent;
          /* rq on which this entity is (to be) queued: */
          struct cfs_rq           *cfs_rq;
          /* rq "owned" by this entity/group: */
          struct cfs_rq           *my_q;
  #endif
  };

so they failed like so:

  CPU 0 RUNQUEUE: ffffffff825f7520
    CURRENT: PID: 3790   TASK: ffff88000c8f2cf0  COMMAND: "bash"
    RT PRIO_ARRAY: ffffffff825f75e8
       [no tasks queued]
    CFS RB_ROOT: ffffffff825f75a0
  runq: invalid structure member offset: sched_entity_my_q
        FILE: task.c  LINE: 7035  FUNCTION: dump_tasks_in_cfs_rq()

where line 7035 is where the first possible recursion is done:

 7021 static int
 7022 dump_tasks_in_cfs_rq(ulong cfs_rq)
 7023 {
 7024         struct task_context *tc;
 7025         struct rb_root *root;
 7026         struct rb_node *node;
 7027         ulong my_q, leftmost, curr, curr_my_q;
 7028         int total;
 7029 
 7030         total = 0;
 7031 
 7032         readmem(cfs_rq + OFFSET(cfs_rq_curr), KVADDR, &curr, sizeof(ulong),
 7033                 "curr", FAULT_ON_ERROR);
 7034         if (curr) {
 7035                 readmem(curr + OFFSET(sched_entity_my_q), KVADDR, &curr_my_q,
 7036                         sizeof(ulong), "curr->my_q", FAULT_ON_ERROR);
 7037                 if (curr_my_q)
 7038                         total += dump_tasks_in_cfs_rq(curr_my_q);
 7039         }  
 7040
 7041         readmem(cfs_rq + OFFSET(cfs_rq_rb_leftmost), KVADDR, &leftmost,
 7042                 sizeof(ulong), "rb_leftmost", FAULT_ON_ERROR);
 7043         root = (struct rb_root *)(cfs_rq + OFFSET(cfs_rq_tasks_timeline));
 7044 
 7045         for (node = rb_first(root); leftmost && node; node = rb_next(node))
{
 7046                 if (VALID_MEMBER(sched_entity_my_q)) {
 7047                         readmem((ulong)node - OFFSET(sched_entity_run_node)
 7048                                 + OFFSET(sched_entity_my_q), KVADDR, &my_q,
 7049                                 sizeof(ulong), "my_q", FAULT_ON_ERROR);
 7050                         if (my_q) {
 7051                                 total += dump_tasks_in_cfs_rq(my_q);
 7052                                 continue;
 7053                         }
 7054                 }

I fixed it by imposing a VALID_MEMBER(sched_entity_my_q) check, similar
to what is done at the second recursive call at line 7046 above:

        if (VALID_MEMBER(sched_entity_my_q)) {
                readmem(cfs_rq + OFFSET(cfs_rq_curr), KVADDR, &curr,
                        sizeof(ulong), "curr", FAULT_ON_ERROR);
                if (curr) {
                        readmem(curr + OFFSET(sched_entity_my_q), KVADDR,
                                &curr_my_q, sizeof(ulong), "curr->my_q",
                                FAULT_ON_ERROR);
                        if (curr_my_q)
                                total += dump_tasks_in_cfs_rq(curr_my_q);
                }
        }

and that worked OK.

I also added "sched_entity_my_q" to dump_offset_table() for "help
-o".

If you are OK with the changes above, the patch is queued for crash-6.0.3.

Thanks,
  Dave

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] [PATCH] runq: search current task's runqueue explicitly