----- Original Message -----
> >
> > With the zgetdump tool we create live dumps from /dev/mem or /dev/crash.
> > These dumps get the LIVE_DUMP flag indicating that data is not
> > consistent.
> >
> > Besides of this, we have two other non-disruptive live dump features:
> >
> > - VMDUMP for z/VM guests
> > - Virsh dump for KVM guests
> >
> > In contrast to the zgetdump method here the guest system is stopped
> > to get consistent snapshots. Therefore I think it is fine to *not* set
> > the LIVE_DUMP flag.
> >
> > Besides of those live dump mechanisms (and kdump) we have our stand-alone dump
> > tools for DASD and SCSI. Also these dump methods are "Linux
independent" and
> > therefore can produce dumps without panic tasks.
> >
> > You can read more on s390 dump in the documents below:
> >
> > *
http://www.vm.ibm.com/education/lvc/LVC1219.pdf
> > *
http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaaf/lnz_r_dt.h...
> >
> > Michael
>
> OK, so from what I understand, there still can be s390x dumpfiles which have no
indication
> of the panic task or cpu (if there is one) in their headers, and therefore may try
the "bt -r"
> type search of the active tasks via raw_stack_dump() in
get_active_set_panic_task(),
> and if that fails, fall back to the "bt -t" search of all tasks in
panic_search().
>
> In those cases, I suppose you could:
>
> (1) restrict the raw_stack_dump() parameters in
> get_active_set_panic_task() to exclude
> the user register dump at the top of the stack, and
> (2) plug in a MACHDEP_BT_TEXT handler for the s390x instead of using the generic
version,
> and in that case, could prevent the search from entering the user-space
register dump
> at the top of the stack, or
> (2a) replace "bt -t" with just "bt" in panic_search() for s390x
as you did in the original
> patch.
>
> But (1) and (2) are not fool-proof, because even the kernel-only part of the stack
could
> simply contain "numbers" that by dumb luck fall into the zero-based
virtual address
> range of panic, crash_kexec, etc., and return a false positive. So I don't
know
> how that can be made absolutely reliable.
I still would prefer 2a. See patch below.
OK, that's fine with me.
>
> But at least with dumpfiles that have the live dump magic number (and I'm still
> not clear which of the 4 types do so),
Only the zgetdump live dump gets the live dump magic number.
OK, thanks for the clarification -- I'll update the changelog to indicate that.
Queued for crash-7.1.3:
https://github.com/crash-utility/crash/commit/3c2fc5f2a027fe192327101cdc6...
Thanks,
Dave
> the simple LIVE_PATCH-check patch covers
> them. I'm not sure whether it's worth doing anything beyond that.
---
crash: Do not use bt -t flag in panic_search()
On s390 we got a dump where a process "gmain" was incorrectly marked as
running panic task:
crash> ps | grep gmain
> 217 1 5 8bec23420 IN 0.0 463276 18240 gmain
The reason was that the "brute force" way parsing the "bt -t -o"
output in panic_search() found the symbol "panic" on the stack:
crash> bt -t -o 8bec23420
PID: 217 TASK: 8bec23420 CPU: 5 COMMAND: "gmain"
START: __schedule at 83f650
[ 8b662b900] (null) at 0
[ 8b662b978] __schedule at 83f650
...
[ 8b662bb18] (null) at 0
[ 8b662bb40] panic at 83679a <<<<<--------------
The real stack trace was as follows:
crash> bt 8bec23420
Detaching after fork from child process 15508.
PID: 217 TASK: 8bec23420 CPU: 5 COMMAND: "gmain"
#0 [8b662b8f0] __schedule at 83f650
#1 [8b662b958] schedule at 83fade
#2 [8b662b970] schedule_hrtimeout_range_clock at 842fc8
#3 [8b662ba10] poll_schedule_timeout at 2c6e8a
#4 [8b662ba30] do_sys_poll at 2c8604
#5 [8b662be40] sys_poll at 2c8852
#6 [8b662bea8] system_call at 843a66
The value 0x83679a (panic at 83679a) was a local variable on the stack
and was interpreted incorrectly as function call to "panic".
Especially for s390 there are dump methods, e.g. VMDUMP or stand-alone dump,
where the "bt -t -o" method will be used to find the panic task. Therefore
and because the "-t" method is quite risky, we use the "normal"
stack
backtrace without the "-t" bt option for s390.
Signed-off-by: Michael Holzheu <holzheu(a)linux.vnet.ibm.com>
---
task.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/task.c
+++ b/task.c
@@ -6633,7 +6633,11 @@ panic_search(void)
fd = &foreach_data;
fd->keys = 1;
fd->keyword_array[0] = FOREACH_BT;
+#ifdef S390X
+ fd->flags |= FOREACH_o_FLAG;
+#else
fd->flags |= (FOREACH_t_FLAG|FOREACH_o_FLAG);
+#endif
dietask = lasttask = NO_TASK;