----- "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com>
wrote:
> Hi Dave,
>
> I got an s390x dump of a Linux 2.6.36 system, where a task (kmcheck, pid=44) is
> missing in the ps output. I debugged the problem and I think that I found the
> reason:
>
> It looks like that crash does not walk the linked list of the pid hash table
> to the end, if it finds a NULL pointer in the pid.tasks[PIDTYPE_PID=0]
> array. Unfortunately, for the struct pid that is before our lost task in the
> linked list this condition is true. Therefore crash does not find our task.
That sounds similar to the fix Bob Montgomery made in 5.0.7:
- Fix for the potential to miss one or more tasks in 2.6.23 and earlier
kernels, presumably due to catching an entry the kernel's pid_hash[]
chain in transition. Without the patch, the task will simply not be
seen in the gathered task list.
(bob.montgomery(a)hp.com)
where this was his patch posting -- which fixed refresh_hlist_task_table_v2():
[Crash-utility] Missing PID 1 is crash problem with losing tasks
https://www.redhat.com/archives/crash-utility/2010-August/msg00049.html
and where your patch fixes refresh_hlist_task_table_v3().
I'll give it a test run...
Thanks,
Dave
Hi Michael,
Works well -- it's a rare occurrance, but the patch uncovered a total of
seven missing tasks in a test run on a sample set of 50 "v3" dumpfiles.
Queued for next release.
Thanks,
Dave
> The attached patch seems to fix this problem.
>
> Here my crash debug log with the 2.6.36 dump:
> ---------------------------------------------
> Task "kmcheck" is in hash slot 2941 in the linked list at position
2:
>
> crash> print pid_hash[2941]
> $4 = {
> first = 0x3f5fb7f8
> }
>
> crash> upid
> struct upid {
> int nr;
> struct pid_namespace *ns;
> struct hlist_node pid_chain;
> }
> SIZE: 32
>
> crash> upid.pid_chain
> struct upid {
> [16] struct hlist_node pid_chain;
> }
>
> crash> eval 0x3f5fb7f8 - 16
> hexadecimal: 3f5fb7e8
>
> crash> upid 3f5fb7e8 <<<<---- the first upid in the list
> struct upid {
> nr = 565,
> ns = 0x81d8f8,
> pid_chain = {
> next = 0x3edea2b0,
> pprev = 0x96554e8
> }
> }
>
> crash> pid
> struct pid {
> atomic_t count;
> unsigned int level;
> struct hlist_head tasks[3];
> struct rcu_head rcu;
> struct upid numbers[1];
> }
> SIZE: 80
>
> crash> pid.numbers
> struct pid {
> [48] struct upid numbers[1];
> }
>
> crash> eval 3f5fb7e8 - 48
> hexadecimal: 3f5fb7b8
>
> crash> pid 3f5fb7b8
> struct pid {
> count = {
> counter = 1
> },
> level = 0,
> tasks = {{
> first = 0x0 <<<----------- tasks[0] is NULL
> }, {
> first = 0x3d488620
> }, {
> first = 0x0
> }},
> rcu = {
> next = 0x5a5a5a5a5a5a5a5a,
> func = 0x5a5a5a5a5a5a5a5a
> },
> numbers = {{
> nr = 565,
> ns = 0x81d8f8,
> pid_chain = {
> next = 0x3edea2b0, <<<--------- Pointer to second element
in
> list
> pprev = 0x96554e8
> }
> }}
> }
>
> crash> eval 0x3edea2b0 - 16
> hexadecimal: 3edea2a0 <<<-- The second upid in the list
>
> crash> upid 0x3edea2a0
> struct upid {
> nr = 44, <<<--- Our missing pid=44 (kmcheck)
> ns = 0x81d8f8,
> pid_chain = {
> next = 0x0,
> pprev = 0x3f5fb7f8
> }
> }
>
> crash> eval 0x3edea2a0 - 48
> hexadecimal: 3edea270
>
> crash> pid 3edea270
> struct pid {
> count = {
> counter = 5
> },
> level = 0,
> tasks = {{
> first = 0x3e799908 <<<--- Pointer to our task_struct.pids
> }, {
> first = 0x0
> }, {
> first = 0x0
> }},
> rcu = {
> next = 0x5a5a5a5a5a5a5a5a,
> func = 0x5a5a5a5a5a5a5a5a
> },
> numbers = {{
> nr = 44,
> ns = 0x81d8f8,
> pid_chain = {
> next = 0x0,
> pprev = 0x3f5fb7f8
> }
> }}
> }
>
> crash> task_struct.pids
> struct task_struct {
> [712] struct pid_link pids[3];
> }
>
> crash> eval 0x3e799908 - 712
> hexadecimal: 3e799640
>
> crash> task_struct 3e799640 | grep comm
> comm = "kmcheck\000\000\000\000\000\000\000\000", <<<--- here
it
is
> ---
> task.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/task.c
> +++ b/task.c
> @@ -2006,7 +2006,7 @@ do_chained:
> }
>
> if (pid_tasks_0 == 0)
> - continue;
> + goto chain_next;
>
> next = pid_tasks_0 - OFFSET(task_struct_pids);
>
> @@ -2042,7 +2042,7 @@ do_chained:
> }
>
> cnt++;
> -
> +chain_next:
> if (pnext) {
> kpp = pnext;
> upid = pnext - OFFSET(upid_pid_chain);
>
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
>
https://www.redhat.com/mailman/listinfo/crash-utility
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility