于 2012年11月08日 03:15, Dave Anderson 写道:
----- Original Message -----
>
> ok. I rewrite the patch and they are tested ok in my box.
>
> Thanks
> Zhang
My tests weren't so successful this time, and I also have some questions
about the runq -g output.
I tested your latest patches on a sample set of 70 dumpfiles whose
kernels all use CFS runqueues. In 7 of the 70 "runq -g" tests,
the command caused the crash session to fail like so:
<snip>
In a quick debugging session of your free_task_group_info_array()
I printed out the addresses being FREEBUF()'d, and I noted that
there were numerous instances of the same address being free twice:
static void
free_task_group_info_array(void)
{
int i;
for (i = 0; i < tgi_p; i++) {
if (tgi_array[i]->name)
FREEBUF(tgi_array[i]->name);
FREEBUF(tgi_array[i]);
}
tgi_p = 0;
FREEBUF(tgi_array);
}
I put one of the failing vmlinux/vmcore pairs here for you
to debug:
http://people.redhat.com/anderson/zhangyanfei
This is so weird. In my test on the vmcore you provided, 'runq -g' ran well
for the first time and caused the crash session to fail the next time.
From the debug information above and from my tests, I noticed that it
always
failed on the same place when FREEBUF a name. So I checked the function
get_task_group_name and changed the way to return a name buf. Now the command
works well on the vmcore.
Secondly, another question I have is the meaning of the command's output.
First, consider this "runq" output:
crash> runq
CPU 0 RUNQUEUE: ffff8800090436c0
CURRENT: PID: 588 TASK: ffff88007e4877a0 COMMAND: "udevd"
RT PRIO_ARRAY: ffff8800090437c8
[no tasks queued]
CFS RB_ROOT: ffff880009043740
[118] PID: 2110 TASK: ffff88007d470860 COMMAND: "check-cdrom.sh"
[118] PID: 2109 TASK: ffff88007f1247a0 COMMAND: "check-cdrom.sh"
[118] PID: 2114 TASK: ffff88007f20e080 COMMAND: "udevd"
CPU 1 RUNQUEUE: ffff88000905b6c0
CURRENT: PID: 2113 TASK: ffff88007e8ac140 COMMAND: "udevd"
RT PRIO_ARRAY: ffff88000905b7c8
[no tasks queued]
CFS RB_ROOT: ffff88000905b740
[118] PID: 2092 TASK: ffff88007d7a4760 COMMAND: "MAKEDEV"
[118] PID: 1983 TASK: ffff88007e59f140 COMMAND: "udevd"
[118] PID: 2064 TASK: ffff88007e40f7a0 COMMAND: "udevd"
[115] PID: 2111 TASK: ffff88007e4278a0 COMMAND: "kthreadd"
crash>
In the above case, the per-cpu "rq" structure addresses are shown as:
CPU 0 RUNQUEUE: ffff8800090436c0
CPU 1 RUNQUEUE: ffff88000905b6c0
And embedded in each of the rq structures above are these two rb_root
structures:
CFS RB_ROOT: ffff880009043740 (embedded in rq @ffff8800090436c0)
CFS RB_ROOT: ffff88000905b740 (embedded in rq @ffff88000905b6c0)
And starting at those rb_root structures, the tree of tasks are dumped.
Now, your "runq -q" option doesn't show any "starting point"
structure
address, but rather they just show "CPU 0" and "CPU 1":
crash> runq -g
CPU 0
CURRENT: PID: 588 TASK: ffff88007e4877a0 COMMAND: "udevd"
RT PRIO_ARRAY: ffff8800090437c8
[no tasks queued]
CFS RB_ROOT: ffff880009093548
[118] PID: 2110 TASK: ffff88007d470860 COMMAND: "check-cdrom.sh"
[118] PID: 2109 TASK: ffff88007f1247a0 COMMAND: "check-cdrom.sh"
[118] PID: 2114 TASK: ffff88007f20e080 COMMAND: "udevd"
CPU 1
CURRENT: PID: 2113 TASK: ffff88007e8ac140 COMMAND: "udevd"
RT PRIO_ARRAY: ffff88000905b7c8
[no tasks queued]
CFS RB_ROOT: ffff880009093548
[118] PID: 2092 TASK: ffff88007d7a4760 COMMAND: "MAKEDEV"
[118] PID: 1983 TASK: ffff88007e59f140 COMMAND: "udevd"
[118] PID: 2064 TASK: ffff88007e40f7a0 COMMAND: "udevd"
[115] PID: 2111 TASK: ffff88007e4278a0 COMMAND: "kthreadd"
crash>
I would think that there might be a useful address of a per-cpu
structure that could be shown there as well?
OK, this is added.
And secondly, I'm confused as to why the "CFS RB_ROOT" address for
all cpus is the same address -- for example, above they are both at
ffff880009093548. How can the two rb trees have the same rb_root?
My neglect, sorry. fixed.
Thanks
Zhang