----- Original Message -----
 > On Fri, Nov 20, 2015 at 03:18:55PM -0500, Dave Anderson wrote:
 > > 
 > > 
 > > ----- Original Message -----
 > > > 
 > > > 
 > > > ----- Original Message -----
 > > > > QEMU can generate both non-makedumpfile (just elf) and makedumpfile
 > > > > formatted kdumps. In neither case will crash_notes have prstatus, as
 > > > > crash_kexec doesn't run in the kernel, however the elf notes
will
 > > > > contain the prstatus, and we can dig them out of there.
 > > > 
 > > > I don't have a lot of ARM and ARM64 dumpfiles, but just doing a
 > > > quick sanity test of your patch, I came across this ARM dumpfile,
 > > > which I believe may be a QEMU-generated ELF vmcore.  I'm not sure,
 > > > but it only has 1 NT_PRSTATUS note for the 1 online cpu (of 5 cpus).
 > 
 > If it's more than a day old then it won't be from qemu. I just posted
 > the patches for that yesterday morning :-)
 
 Even for 32-bit ARM?  I honestly can't remember where/who it came from, 
Yup. The qemu patches I posted are for both arm and aarch64. Before those
patches it wasn't possible to generate a dump with qemu for either.
 although it's in my notes somewhere.  Because the non-panicking
cpus
 are offline, it's possible that it's a kdump-generated dumpfile, and
 the other cpu's note were not captured.  I don't know, but regardless,
 it turns out to be a damn good test case!  
 
 > 
 > > > 
 > > > But anyway, note that as expected, it cannot find the registers in the
 > > > kernel's uninitialized crash_notes -- here without your patch:
 > 
 > So it looks like there may be other dump types (besides qemu generated)
 > that can result in missing crash_notes. I don't know what those are, other
 > than just corruption? In any case, I think this dump is still a good test
 > case for the reason you found below.
 > 
 > > > 
 > > >   $ crash vmcore.pae  vmlinux.pae.gz
 > > > 
 > > >   crash 7.1.4rc15
 > > >   Copyright (C) 2002-2014  Red Hat, Inc.
 > > >   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
 > > >   Copyright (C) 1999-2006  Hewlett-Packard Co
 > > >   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
 > > >   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
 > > >   Copyright (C) 2005, 2011  NEC Corporation
 > > >   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
 > > >   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
 > > >   This program is free software, covered by the GNU General Public
 > > >   License,
 > > >   and you are welcome to change it and/or distribute copies of it under
 > > >   certain conditions.  Enter "help copying" to see the
conditions.
 > > >   This program has absolutely no warranty.  Enter "help
warranty" for
 > > >   details.
 > > >  
 > > >   GNU gdb (GDB) 7.6
 > > >   Copyright (C) 2013 Free Software Foundation, Inc.
 > > >   License GPLv3+: GNU GPL version 3 or later
 > > >   <
http://gnu.org/licenses/gpl.html>
 > > >   This is free software: you are free to change and redistribute it.
 > > >   There is NO WARRANTY, to the extent permitted by law.  Type "show
 > > >   copying"
 > > >   and "show warranty" for details.
 > > >   This GDB was configured as "--host=x86_64-unknown-linux-gnu
 > > >   --target=arm-elf-linux"...
 > > > 
 > > >   WARNING: invalid note (n_type != NT_PRSTATUS)
 > > >   WARNING: cannot retrieve registers for active tasks
 > > > 
 > > >         KERNEL: vmlinux.pae.gz
 > > >       DUMPFILE: vmcore.pae
 > > >           CPUS: 5 [OFFLINE: 4]
 > > >           DATE: Sun Jun  8 18:27:39 2014
 > > >         UPTIME: 00:03:22
 > > >   LOAD AVERAGE: 0.16, 0.16, 0.07
 > > >          TASKS: 51
 > > >       NODENAME: buildroot
 > > >        RELEASE: 3.13.5
 > > >        VERSION: #3 SMP Mon Jun 9 05:58:39 CST 2014
 > > >        MACHINE: armv7l  (unknown Mhz)
 > > >         MEMORY: 256 MB
 > > >          PANIC: "SysRq : Trigger a crash"
 > > >            PID: 732
 > > >        COMMAND: "sh"
 > > >           TASK: 8bcead00  [THREAD_INFO: 8ad32000]
 > > >            CPU: 0
 > > >          STATE: TASK_RUNNING (SYSRQ)
 > > > 
 > > >   crash> bt -a
 > > >   PID: 732    TASK: 8bcead00  CPU: 0   COMMAND: "sh"
 > > >   bt: WARNING: cannot determine starting stack frame for task 8bcead00
 > > > 
 > > >   PID: 0      TASK: 8bc561c0  CPU: 1   COMMAND: "swapper/1"
 > > >   bt: WARNING: cannot determine starting stack frame for task 8bc561c0
 > > > 
 > > >   PID: 0      TASK: 8bc56580  CPU: 2   COMMAND: "swapper/2"
 > > >   bt: WARNING: cannot determine starting stack frame for task 8bc56580
 > > > 
 > > >   PID: 0      TASK: 8bc56940  CPU: 3   COMMAND: "swapper/3"
 > > >   bt: WARNING: cannot determine starting stack frame for task 8bc56940
 > > > 
 > > >   PID: 0      TASK: 8bc56d00  CPU: 4   COMMAND: "swapper/4"
 > > >   bt: WARNING: cannot determine starting stack frame for task 8bc56d00
 > > >   crash>
 > > > 
 > > > 
 > > > With your patch applied, it generates a SIGSEGV in arm_get_crash_notes():
 > > > 
 > > >   $ ./crash /usr/dumps/ARM/vmcore.pae  /usr/dumps/ARM/vmlinux.pae.gz
 > > > 
 > > >   crash 7.1.4rc15
 > > >   Copyright (C) 2002-2014  Red Hat, Inc.
 > > >   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
 > > >   Copyright (C) 1999-2006  Hewlett-Packard Co
 > > >   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
 > > >   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
 > > >   Copyright (C) 2005, 2011  NEC Corporation
 > > >   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
 > > >   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
 > > >   This program is free software, covered by the GNU General Public
 > > >   License,
 > > >   and you are welcome to change it and/or distribute copies of it under
 > > >   certain conditions.  Enter "help copying" to see the
conditions.
 > > >   This program has absolutely no warranty.  Enter "help
warranty" for
 > > >   details.
 > > >  
 > > >   GNU gdb (GDB) 7.6
 > > >   Copyright (C) 2013 Free Software Foundation, Inc.
 > > >   License GPLv3+: GNU GPL version 3 or later
 > > >   <
http://gnu.org/licenses/gpl.html>
 > > >   This is free software: you are free to change and redistribute it.
 > > >   There is NO WARRANTY, to the extent permitted by law.  Type "show
 > > >   copying"
 > > >   and "show warranty" for details.
 > > >   This GDB was configured as "--host=x86_64-unknown-linux-gnu
 > > >   --target=arm-elf-linux"...
 > > > 
 > > >   Segmentation fault (core dumped)
 > > >   $
 > > > 
 > > > I haven't debugged it other than determining that the "note"
it looks to
 > > > have found the single note OK, but then upon continuation the next time
 > > > through the loop, the "note" pointer is valid at line 597, but
your
 > > > function sets it back to NULL, and therefore it craps out at line 622:
 > > > 
 > > >     597                 note = (Elf32_Nhdr *)buf;
 > > >     598                 p = buf + sizeof(Elf32_Nhdr);
 > > >     599
 > > >     600                 /*
 > > >     601                  * dumpfiles created with qemu won't have
 > > >     crash_notes, but there will
 > > >     602                  * be elf notes.
 > > >     603                  */
 > > >     604                 if (note->n_namesz == 0 &&
(DISKDUMP_DUMPFILE()
 > > >     ||
 > > >     KDUMP_DUMPFILE())) {
 > > >     605                         if (DISKDUMP_DUMPFILE())
 > > >     606                                 note =
 > > >     diskdump_get_prstatus_percpu(i);
 > > >     607                         else if (KDUMP_DUMPFILE())
 > > >     608                                 note =
 > > >     netdump_get_prstatus_percpu(i);
 > > >     609                         if (note) {
 > > >     610                                 /*
 > > >     611                                  * SIZE(note_buf) accounts for a
 > > >     "final note", which is a
 > > >     612                                  * trailing empty elf note
 > > >     header.
 > > >     613                                  */
 > > >     614                                 long notesz = SIZE(note_buf) -
 > > >     sizeof(Elf32_Nhdr);
 > > >     615
 > > >     616                                 if (sizeof(Elf32_Nhdr) +
 > > >     roundup(note->n_namesz, 4) +
 > > >     617                                     note->n_descsz == notesz)
 > > >     618                                         BCOPY((char *)note, buf,
 > > >     notesz);
 > > >     619                         }
 > > >     620                 }
 > > >     621
 > > >     622                 if (note->n_type != NT_PRSTATUS) {
 > > >     623                         error(WARNING, "invalid note (n_type
!=
 > > >     NT_PRSTATUS)\n");
 > > >     624                         goto fail;
 > > >     625                 }
 > > > 
 > > > Not sure how you want to handle that, probably just bail out the same way
 > > > if note becomes NULL?
 > > 
 > > If I add this to arm_get_crash_notes(), just after your new function:
 > > 
 > >       if (!note) {
 > >               error(WARNING, "cannot find NT_PRSTATUS note for cpu:
%d\n",
 > >               i);
 > >               continue;
 > >       }
 > > 
 > > I get this:
 > >   
 > >   $ ./crash /usr/dumps/ARM/vmcore.pae*  /usr/dumps/ARM/vmlinux.pae.gz
 > >   
 > >   crash 7.1.4rc15
 > >   Copyright (C) 2002-2014  Red Hat, Inc.
 > >   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
 > >   Copyright (C) 1999-2006  Hewlett-Packard Co
 > >   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
 > >   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
 > >   Copyright (C) 2005, 2011  NEC Corporation
 > >   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
 > >   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
 > >   This program is free software, covered by the GNU General Public License,
 > >   and you are welcome to change it and/or distribute copies of it under
 > >   certain conditions.  Enter "help copying" to see the conditions.
 > >   This program has absolutely no warranty.  Enter "help warranty"
for
 > >   details.
 > >    
 > >   GNU gdb (GDB) 7.6
 > >   Copyright (C) 2013 Free Software Foundation, Inc.
 > >   License GPLv3+: GNU GPL version 3 or later
 > >   <
http://gnu.org/licenses/gpl.html>
 > >   This is free software: you are free to change and redistribute it.
 > >   There is NO WARRANTY, to the extent permitted by law.  Type "show
 > >   copying"
 > >   and "show warranty" for details.
 > >   This GDB was configured as "--host=x86_64-unknown-linux-gnu
 > >   --target=arm-elf-linux"...
 > >   
 > >   WARNING: cannot find NT_PRSTATUS note for cpu: 1
 > >   WARNING: cannot find NT_PRSTATUS note for cpu: 2
 > >   WARNING: cannot find NT_PRSTATUS note for cpu: 3
 > >   WARNING: cannot find NT_PRSTATUS note for cpu: 4
 > >         KERNEL: /usr/dumps/ARM/vmlinux.pae.gz
 > >       DUMPFILE: /usr/dumps/ARM/vmcore.pae
 > >           CPUS: 5 [OFFLINE: 4]
 > >           DATE: Sun Jun  8 18:27:39 2014
 > >         UPTIME: 00:03:22
 > >   LOAD AVERAGE: 0.16, 0.16, 0.07
 > >          TASKS: 51
 > >       NODENAME: buildroot
 > >        RELEASE: 3.13.5
 > >        VERSION: #3 SMP Mon Jun 9 05:58:39 CST 2014
 > >        MACHINE: armv7l  (unknown Mhz)
 > >         MEMORY: 256 MB
 > >          PANIC: "SysRq : Trigger a crash"
 > >            PID: 732
 > >        COMMAND: "sh"
 > >           TASK: 8bcead00  [THREAD_INFO: 8ad32000]
 > >            CPU: 0
 > >          STATE: TASK_RUNNING (SYSRQ)
 > >   
 > >   crash> bt -a
 > >   PID: 732    TASK: 8bcead00  CPU: 0   COMMAND: "sh"
 > >    #0 [<80265064>] (sysrq_handle_crash) from [<80265810>]
 > >    #1 [<80265810>] (__handle_sysrq) from [<80265928>]
 > >    #2 [<80265928>] (write_sysrq_trigger) from [<80112120>]
 > >    #3 [<80112120>] (proc_reg_write) from [<800c9840>]
 > >    #4 [<800c9840>] (vfs_write) from [<800c9be4>]
 > >    #5 [<800c9be4>] (sys_write) from [<8000e3e0>]
 > >       pc : [<76e9cfdc>]    lr : [<0000f998>]    psr: 600d0010
 > >       sp : 7eab862c  ip : 00000000  fp : 000a82a4
 > >       r10: 00000020  r9 : 000a8294  r8 : 00000001
 > >       r7 : 00000004  r6 : 000a9bf0  r5 : 00000001  r4 : 000a7d88
 > >       r3 : 00000000  r2 : 00000002  r1 : 000a9bf0  r0 : 00000001
 > >       Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM
 > >   
 > >   PID: 0      TASK: 8bc561c0  CPU: 1   COMMAND: "swapper/1"
 > >   bt: WARNING: cannot determine starting stack frame for task 8bc561c0
 > >   
 > >   PID: 0      TASK: 8bc56580  CPU: 2   COMMAND: "swapper/2"
 > >   bt: WARNING: cannot determine starting stack frame for task 8bc56580
 > >   
 > >   PID: 0      TASK: 8bc56940  CPU: 3   COMMAND: "swapper/3"
 > >   bt: WARNING: cannot determine starting stack frame for task 8bc56940
 > >   
 > >   PID: 0      TASK: 8bc56d00  CPU: 4   COMMAND: "swapper/4"
 > >   bt: WARNING: cannot determine starting stack frame for task 8bc56d00
 > >   crash>
 > >   
 > > Note that if I did "goto fail" instead of "continue", I
lose the good
 > > cpu 0 backtrace from the NT_PRSTATUS that your patch found, so doing it
 > > this way is the best of both worlds.
 > 
 > I agree that the 'if (!note) continue' that you added is a good idea to
 > try and salvage this type of dump. It shouldn't happen with qemu generated
 > dumps, but anything's possible when a kernel panics...
 > 
 > Would you like me to spin a v4 with this condition added? Or, since it
 > actually seems to be addressing a non-qemu-generated dump issue, then
 > maybe you just want to submit it as a new patch on top of the qemu
 > patch?
 
 No, I'm cleaning it up now -- there are a couple other gotcha's w/respect
 to backtracing, since we're now going to keep the ms->panic_task_regs[cpus]
 array intact, even if there's no note in some of them.  I'm going to make
 sure it's zero-filled in the not-found case, and I have to make sure that 
 it's recognized on both architectures.  I'll post the the updated additions
 to arm.c and arm64.c so that you can do a quick test on a legitimate
 QEMU dump.  
 
 Might be Monday, I'm not sure...