On Mon, 10 Aug 2015 10:32:12 -0400 (EDT)
Dave Anderson <anderson(a)redhat.com> wrote:
----- Original Message -----
>
> On Thu, 6 Aug 2015 11:25:29 -0400 (EDT)
> Dave Anderson <anderson(a)redhat.com> wrote:
>
> > Re: your dumpfile where the erroneous "panic" address in a random
user
> > task's exception frame register set gets picked up by mistake.
> >
> > Your original patch request modified the "bt" command used for the
> > kernel stack searches in panic_search(). But that piece of code
> > is the last-ditch effort for finding a panic task, which follows
> > this path:
> >
> > get_panic_context()
> > panic_search()
> > get_dumpfile_panic_task()
> > get_kdump_panic_task() (requires kdump "crashing_cpu"
symbol)
> > get_diskdump_panic_task() (requires kdump "crashing_cpu"
symbol)
>
> On s390 we don't have the "crashing_cpu" symbol in the kernel.
>
> > get_active_set_panic_task() (bt -r raw stack dump of active cpus)
> > ...
> >
> > Only if all of the above fail, does panic_search() initiate the
> > exhaustive walkthrough of all kernel stacks for evidence.
> >
> > Since you have gotten that far, I'm wondering whether your
> > target dumpfile with the faulty "panic" address is from an
> > s390x "live dump"? In that case, there can never be any task
> > with any such evidence, making the backtrace search a waste of
> > time to begin with.
>
> The "problem" dump is a s390 stand-alone dump of a hanging system.
> All CPUs have been in "psw_idle" when the dump was generated:
>
> PID: 0 TASK: c50f38 CPU: 0 COMMAND: "swapper/0"
> LOWCORE INFO:
> -psw : 0x0706c00180000000 0x000000000084410e
> -function : psw_idle at 84410e
>
> [snip]
>
> #0 [00c1fe70] arch_cpu_idle at 104d4a
> #1 [00c1fe90] cpu_startup_entry at 180430
> #2 [00c1fee8] start_kernel at d1fb10
> #3 [00c1ff60] _stext at 100020
>
>
> >
> > And if so, I'm thinking that since s390x will have set LIVE_DUMP
> > flag set, if get_dumpfile_panic_task() returns NO_TASK, then
> > panic_search() should just return a NULL to get_panic_context()
> > if it's a live dump, which will just default to the idle task on
> > cpu 0.
>
> Although it does not solve the above problem it makes sense for
> live dumps. What about the following patch?
> ---
> crash: do not search panic tasks for live dumps
>
> Always return "NO_TASK" if the "LIVE_DUMP" flag is set because
live dumps
> cannot have a panic task.
>
> Signed-off-by: Michael Holzheu <holzheu(a)linux.vnet.ibm.com>
> ---
> task.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> --- a/task.c
> +++ b/task.c
> @@ -6726,7 +6726,10 @@ get_dumpfile_panic_task(void)
> {
> ulong task;
>
> - if (NETDUMP_DUMPFILE()) {
> + if (pc->flags2 & LIVE_DUMP) {
> + /* No panic task because system itself created the dump */
> + return NO_TASK;
> + } else if (NETDUMP_DUMPFILE()) {
> task = pc->flags & REM_NETDUMP ?
> tt->panic_task : get_netdump_panic_task();
> if (task)
>
That makes sense, but I'm going to move the LIVE_DUMP check farther down
in get_dumpfile_panic_task() to just before the get_active_set() call.
Makes sense. That was also my first idea.
The reason for that another type of "LIVE_DUMP" is from the
snap.so extension
module, and in that case, get_kdump_panic_task() finds and returns the "crash"
task that was running the snap command on the live system.
Clarify something else for me: are there actually two types of live dumps
that can be taken by an s390x? There is the "zgetdump" facility, but is
there also another type that is taken by the firmware and/or the hypervisor?
With the zgetdump tool we create live dumps from /dev/mem or /dev/crash.
These dumps get the LIVE_DUMP flag indicating that data is not consistent.
Besides of this, we have two other non-disruptive live dump features:
- VMDUMP for z/VM guests
- Virsh dump for KVM guests
In contrast to the zgetdump method here the guest system is stopped
to get consistent snapshots. Therefore I think it is fine to *not* set
the LIVE_DUMP flag.
Besides of those live dump mechanisms (and kdump) we have our stand-alone dump
tools for DASD and SCSI. Also these dump methods are "Linux independent" and
therefore can produce dumps without panic tasks.
You can read more on s390 dump in the documents below:
*
http://www.vm.ibm.com/education/lvc/LVC1219.pdf
*
http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaaf/lnz_r_dt.h...
Michael