----- Original Message -----
Currently, runq sub-command doesn't consider CFS runqueue's
current
task removed from CFS runqueue. Due to this, the remaining CFS
runqueus that follow the current task's is not displayed. This patch
fixes this by making runq sub-command search current task's runqueue
explicitly.
Note that CFS runqueue exists for each task group, and so does CFS
runqueue's current task, and the above search needs to be done
recursively.
Test
====
On vmcore I made 7 task groups:
root group --- A --- AA --- AAA
+ +- AAB
|
+- AB --- ABA
+- ABB
and then I ran three CPU bound tasks, which is exactly the same as
int main(void) { for (;;) continue; return 0; }
for each task group, including root group; so total 24 tasks. For
readability, I annotated each task name with its belonging group name.
For example, loop.ABA belongs to task group ABA.
Look at CPU0 collumn below. [before] lacks 8 tasks and [after]
successfully shows all tasks on the runqueue, which is identical to
the result of [sched debug] that is expected to ouput correct result.
I'll send this vmcore later.
[before]
crash> runq | cat
CPU 0 RUNQUEUE: ffff88000a215f80
CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA"
RT PRIO_ARRAY: ffff88000a216098
[no tasks queued]
CFS RB_ROOT: ffff88000a216010
[120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA"
<cut>
[after]
crash_fix> runq
CPU 0 RUNQUEUE: ffff88000a215f80
CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA"
RT PRIO_ARRAY: ffff88000a216098
[no tasks queued]
CFS RB_ROOT: ffff88000a216010
[120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA"
[120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB"
[120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB"
[120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB"
[120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB"
[120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA"
[120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA"
[120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA"
[120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A"
<cut>
[sched debug]
crash> runq -d
CPU 0
[120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A"
[120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA"
[120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA"
[120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA"
[120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB"
[120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB"
[120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA"
[120] PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA"
[120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB"
[120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB"
<cut>
Diff stat
=========
defs.h | 1 +
task.c | 37 +++++++++++++++++--------------------
2 files changed, 18 insertions(+), 20 deletions(-)
Thanks.
HATAYAMA, Daisuke
Hello Daisuke,
Good catch! Plus your re-worked patch cleans things up nicely.
And "runq -d" paid off quickly, didn't it? ;-)
One minor problem, while testing your patch on a variety of kernels,
several "runq" commands failed because the test kernels were
not configured with CONFIG_FAIR_GROUP_SCHED:
struct sched_entity {
struct load_weight load; /* for load-balancing */
struct rb_node run_node;
struct list_head group_node;
unsigned int on_rq;
u64 exec_start;
u64 sum_exec_runtime;
u64 vruntime;
u64 prev_sum_exec_runtime;
u64 nr_migrations;
#ifdef CONFIG_SCHEDSTATS
struct sched_statistics statistics;
#endif
#ifdef CONFIG_FAIR_GROUP_SCHED
struct sched_entity *parent;
/* rq on which this entity is (to be) queued: */
struct cfs_rq *cfs_rq;
/* rq "owned" by this entity/group: */
struct cfs_rq *my_q;
#endif
};
so they failed like so:
CPU 0 RUNQUEUE: ffffffff825f7520
CURRENT: PID: 3790 TASK: ffff88000c8f2cf0 COMMAND: "bash"
RT PRIO_ARRAY: ffffffff825f75e8
[no tasks queued]
CFS RB_ROOT: ffffffff825f75a0
runq: invalid structure member offset: sched_entity_my_q
FILE: task.c LINE: 7035 FUNCTION: dump_tasks_in_cfs_rq()
where line 7035 is where the first possible recursion is done:
7021 static int
7022 dump_tasks_in_cfs_rq(ulong cfs_rq)
7023 {
7024 struct task_context *tc;
7025 struct rb_root *root;
7026 struct rb_node *node;
7027 ulong my_q, leftmost, curr, curr_my_q;
7028 int total;
7029
7030 total = 0;
7031
7032 readmem(cfs_rq + OFFSET(cfs_rq_curr), KVADDR, &curr, sizeof(ulong),
7033 "curr", FAULT_ON_ERROR);
7034 if (curr) {
7035 readmem(curr + OFFSET(sched_entity_my_q), KVADDR, &curr_my_q,
7036 sizeof(ulong), "curr->my_q", FAULT_ON_ERROR);
7037 if (curr_my_q)
7038 total += dump_tasks_in_cfs_rq(curr_my_q);
7039 }
7040
7041 readmem(cfs_rq + OFFSET(cfs_rq_rb_leftmost), KVADDR, &leftmost,
7042 sizeof(ulong), "rb_leftmost", FAULT_ON_ERROR);
7043 root = (struct rb_root *)(cfs_rq + OFFSET(cfs_rq_tasks_timeline));
7044
7045 for (node = rb_first(root); leftmost && node; node = rb_next(node))
{
7046 if (VALID_MEMBER(sched_entity_my_q)) {
7047 readmem((ulong)node - OFFSET(sched_entity_run_node)
7048 + OFFSET(sched_entity_my_q), KVADDR, &my_q,
7049 sizeof(ulong), "my_q", FAULT_ON_ERROR);
7050 if (my_q) {
7051 total += dump_tasks_in_cfs_rq(my_q);
7052 continue;
7053 }
7054 }
I fixed it by imposing a VALID_MEMBER(sched_entity_my_q) check, similar
to what is done at the second recursive call at line 7046 above:
if (VALID_MEMBER(sched_entity_my_q)) {
readmem(cfs_rq + OFFSET(cfs_rq_curr), KVADDR, &curr,
sizeof(ulong), "curr", FAULT_ON_ERROR);
if (curr) {
readmem(curr + OFFSET(sched_entity_my_q), KVADDR,
&curr_my_q, sizeof(ulong), "curr->my_q",
FAULT_ON_ERROR);
if (curr_my_q)
total += dump_tasks_in_cfs_rq(curr_my_q);
}
}
and that worked OK.
I also added "sched_entity_my_q" to dump_offset_table() for "help
-o".
If you are OK with the changes above, the patch is queued for crash-6.0.3.
Thanks,
Dave