Ah, it must be because of the common user-kernel virtual address space on s390x?  I can accept a patch if it's s390 only.

Dave



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Dave Anderson <anderson@redhat.com>
Date: 08/03/2015 11:18 AM (GMT-05:00)
To: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>, "Discussion list for crash utility usage, maintenance and development" <crash-utility@redhat.com>
Subject: Re: [Crash-utility] [PATCH] crash: Do not use bt -t flag in panic_search()



----- Original Message -----
> Hi Dave,
>
> I got a dump where a process "gmain" was incorrectly marked as running:
>
> crash> ps | grep gmain
> >   217      1   5      8bec23420     IN   0.0  463276  18240  gmain
>
> The reason was that the "brute force" way parsing the "bt -t -o"
> output in panic_search() found the symbol "panic" on the stack:
>
> crash> bt -t -o 8bec23420
> PID: 217    TASK: 8bec23420         CPU: 5   COMMAND: "gmain"
>               START: __schedule at 83f650
>   [       8b662b900] (null) at 0
>   [       8b662b950] (null) at 0
>   [       8b662b978] __schedule at 83f650
>   [       8b662b990] (null) at 0
> ...
>   [       8b662bb18] (null) at 0
>   [       8b662bb40] panic at 83679a  <<<<<--------------
>   [       8b662bb58] _ehead at 280da


I guess the obvious question is why "panic" was on the stack?

>
> The real stack trace was as follows:
>
> crash> bt  8bec23420
> Detaching after fork from child process 15508.
> PID: 217    TASK: 8bec23420         CPU: 5   COMMAND: "gmain"
>  #0 [8b662b8f0] __schedule at 83f650
>  #1 [8b662b958] schedule at 83fade
>  #2 [8b662b970] schedule_hrtimeout_range_clock at 842fc8
>  #3 [8b662ba10] poll_schedule_timeout at 2c6e8a
>  #4 [8b662ba30] do_sys_poll at 2c8604
>  #5 [8b662be40] sys_poll at 2c8852
>  #6 [8b662bea8] system_call at 843a66
>
> IMHO the "-t" method is quite risky (at least on s390). What about using
> the "normal" stack backtrace without the "-t" bt option?

That really worries me -- introducing the usage of normal backtrace on all tasks
instead of simply walking the stack memory looking for text addresses is a huge
change.

Dave


> ---
>  task.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/task.c
> +++ b/task.c
> @@ -6633,7 +6633,7 @@ panic_search(void)
>          fd = &foreach_data;
>  fd->keys = 1;
>  fd->keyword_array[0] = FOREACH_BT;
> - fd->flags |= (FOREACH_t_FLAG|FOREACH_o_FLAG);
> + fd->flags |= FOREACH_o_FLAG;

>  dietask = lasttask = NO_TASK;

>

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility