On Fri, Nov 20, 2015 at 03:18:55PM -0500, Dave Anderson wrote:
>
>
> ----- Original Message -----
> >
> >
> > ----- Original Message -----
> > > QEMU can generate both non-makedumpfile (just elf) and makedumpfile
> > > formatted kdumps. In neither case will crash_notes have prstatus, as
> > > crash_kexec doesn't run in the kernel, however the elf notes will
> > > contain the prstatus, and we can dig them out of there.
> >
> > I don't have a lot of ARM and ARM64 dumpfiles, but just doing a
> > quick sanity test of your patch, I came across this ARM dumpfile,
> > which I believe may be a QEMU-generated ELF vmcore. I'm not sure,
> > but it only has 1 NT_PRSTATUS note for the 1 online cpu (of 5 cpus).
If it's more than a day old then it won't be from qemu. I just posted
the patches for that yesterday morning :-)
Even for 32-bit ARM? I honestly can't remember where/who it came from,
although it's in my notes somewhere. Because the non-panicking cpus
are offline, it's possible that it's a kdump-generated dumpfile, and
the other cpu's note were not captured. I don't know, but regardless,
it turns out to be a damn good test case!
> >
> > But anyway, note that as expected, it cannot find the registers in the
> > kernel's uninitialized crash_notes -- here without your patch:
So it looks like there may be other dump types (besides qemu generated)
that can result in missing crash_notes. I don't know what those are, other
than just corruption? In any case, I think this dump is still a good test
case for the reason you found below.
> >
> > $ crash vmcore.pae vmlinux.pae.gz
> >
> > crash 7.1.4rc15
> > Copyright (C) 2002-2014 Red Hat, Inc.
> > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> > Copyright (C) 1999-2006 Hewlett-Packard Co
> > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> > Copyright (C) 2005, 2011 NEC Corporation
> > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> > This program is free software, covered by the GNU General Public
> > License,
> > and you are welcome to change it and/or distribute copies of it under
> > certain conditions. Enter "help copying" to see the conditions.
> > This program has absolutely no warranty. Enter "help warranty"
for
> > details.
> >
> > GNU gdb (GDB) 7.6
> > Copyright (C) 2013 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > <
http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law. Type "show
> > copying"
> > and "show warranty" for details.
> > This GDB was configured as "--host=x86_64-unknown-linux-gnu
> > --target=arm-elf-linux"...
> >
> > WARNING: invalid note (n_type != NT_PRSTATUS)
> > WARNING: cannot retrieve registers for active tasks
> >
> > KERNEL: vmlinux.pae.gz
> > DUMPFILE: vmcore.pae
> > CPUS: 5 [OFFLINE: 4]
> > DATE: Sun Jun 8 18:27:39 2014
> > UPTIME: 00:03:22
> > LOAD AVERAGE: 0.16, 0.16, 0.07
> > TASKS: 51
> > NODENAME: buildroot
> > RELEASE: 3.13.5
> > VERSION: #3 SMP Mon Jun 9 05:58:39 CST 2014
> > MACHINE: armv7l (unknown Mhz)
> > MEMORY: 256 MB
> > PANIC: "SysRq : Trigger a crash"
> > PID: 732
> > COMMAND: "sh"
> > TASK: 8bcead00 [THREAD_INFO: 8ad32000]
> > CPU: 0
> > STATE: TASK_RUNNING (SYSRQ)
> >
> > crash> bt -a
> > PID: 732 TASK: 8bcead00 CPU: 0 COMMAND: "sh"
> > bt: WARNING: cannot determine starting stack frame for task 8bcead00
> >
> > PID: 0 TASK: 8bc561c0 CPU: 1 COMMAND: "swapper/1"
> > bt: WARNING: cannot determine starting stack frame for task 8bc561c0
> >
> > PID: 0 TASK: 8bc56580 CPU: 2 COMMAND: "swapper/2"
> > bt: WARNING: cannot determine starting stack frame for task 8bc56580
> >
> > PID: 0 TASK: 8bc56940 CPU: 3 COMMAND: "swapper/3"
> > bt: WARNING: cannot determine starting stack frame for task 8bc56940
> >
> > PID: 0 TASK: 8bc56d00 CPU: 4 COMMAND: "swapper/4"
> > bt: WARNING: cannot determine starting stack frame for task 8bc56d00
> > crash>
> >
> >
> > With your patch applied, it generates a SIGSEGV in arm_get_crash_notes():
> >
> > $ ./crash /usr/dumps/ARM/vmcore.pae /usr/dumps/ARM/vmlinux.pae.gz
> >
> > crash 7.1.4rc15
> > Copyright (C) 2002-2014 Red Hat, Inc.
> > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> > Copyright (C) 1999-2006 Hewlett-Packard Co
> > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> > Copyright (C) 2005, 2011 NEC Corporation
> > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> > This program is free software, covered by the GNU General Public
> > License,
> > and you are welcome to change it and/or distribute copies of it under
> > certain conditions. Enter "help copying" to see the conditions.
> > This program has absolutely no warranty. Enter "help warranty"
for
> > details.
> >
> > GNU gdb (GDB) 7.6
> > Copyright (C) 2013 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > <
http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law. Type "show
> > copying"
> > and "show warranty" for details.
> > This GDB was configured as "--host=x86_64-unknown-linux-gnu
> > --target=arm-elf-linux"...
> >
> > Segmentation fault (core dumped)
> > $
> >
> > I haven't debugged it other than determining that the "note" it
looks to
> > have found the single note OK, but then upon continuation the next time
> > through the loop, the "note" pointer is valid at line 597, but your
> > function sets it back to NULL, and therefore it craps out at line 622:
> >
> > 597 note = (Elf32_Nhdr *)buf;
> > 598 p = buf + sizeof(Elf32_Nhdr);
> > 599
> > 600 /*
> > 601 * dumpfiles created with qemu won't have
> > crash_notes, but there will
> > 602 * be elf notes.
> > 603 */
> > 604 if (note->n_namesz == 0 &&
(DISKDUMP_DUMPFILE()
> > ||
> > KDUMP_DUMPFILE())) {
> > 605 if (DISKDUMP_DUMPFILE())
> > 606 note =
> > diskdump_get_prstatus_percpu(i);
> > 607 else if (KDUMP_DUMPFILE())
> > 608 note =
> > netdump_get_prstatus_percpu(i);
> > 609 if (note) {
> > 610 /*
> > 611 * SIZE(note_buf) accounts for a
> > "final note", which is a
> > 612 * trailing empty elf note
> > header.
> > 613 */
> > 614 long notesz = SIZE(note_buf) -
> > sizeof(Elf32_Nhdr);
> > 615
> > 616 if (sizeof(Elf32_Nhdr) +
> > roundup(note->n_namesz, 4) +
> > 617 note->n_descsz == notesz)
> > 618 BCOPY((char *)note, buf,
> > notesz);
> > 619 }
> > 620 }
> > 621
> > 622 if (note->n_type != NT_PRSTATUS) {
> > 623 error(WARNING, "invalid note (n_type !=
> > NT_PRSTATUS)\n");
> > 624 goto fail;
> > 625 }
> >
> > Not sure how you want to handle that, probably just bail out the same way
> > if note becomes NULL?
>
> If I add this to arm_get_crash_notes(), just after your new function:
>
> if (!note) {
> error(WARNING, "cannot find NT_PRSTATUS note for cpu:
%d\n",
> i);
> continue;
> }
>
> I get this:
>
> $ ./crash /usr/dumps/ARM/vmcore.pae* /usr/dumps/ARM/vmlinux.pae.gz
>
> crash 7.1.4rc15
> Copyright (C) 2002-2014 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005, 2011 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for
> details.
>
> GNU gdb (GDB) 7.6
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <
http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show
> copying"
> and "show warranty" for details.
> This GDB was configured as "--host=x86_64-unknown-linux-gnu
> --target=arm-elf-linux"...
>
> WARNING: cannot find NT_PRSTATUS note for cpu: 1
> WARNING: cannot find NT_PRSTATUS note for cpu: 2
> WARNING: cannot find NT_PRSTATUS note for cpu: 3
> WARNING: cannot find NT_PRSTATUS note for cpu: 4
> KERNEL: /usr/dumps/ARM/vmlinux.pae.gz
> DUMPFILE: /usr/dumps/ARM/vmcore.pae
> CPUS: 5 [OFFLINE: 4]
> DATE: Sun Jun 8 18:27:39 2014
> UPTIME: 00:03:22
> LOAD AVERAGE: 0.16, 0.16, 0.07
> TASKS: 51
> NODENAME: buildroot
> RELEASE: 3.13.5
> VERSION: #3 SMP Mon Jun 9 05:58:39 CST 2014
> MACHINE: armv7l (unknown Mhz)
> MEMORY: 256 MB
> PANIC: "SysRq : Trigger a crash"
> PID: 732
> COMMAND: "sh"
> TASK: 8bcead00 [THREAD_INFO: 8ad32000]
> CPU: 0
> STATE: TASK_RUNNING (SYSRQ)
>
> crash> bt -a
> PID: 732 TASK: 8bcead00 CPU: 0 COMMAND: "sh"
> #0 [<80265064>] (sysrq_handle_crash) from [<80265810>]
> #1 [<80265810>] (__handle_sysrq) from [<80265928>]
> #2 [<80265928>] (write_sysrq_trigger) from [<80112120>]
> #3 [<80112120>] (proc_reg_write) from [<800c9840>]
> #4 [<800c9840>] (vfs_write) from [<800c9be4>]
> #5 [<800c9be4>] (sys_write) from [<8000e3e0>]
> pc : [<76e9cfdc>] lr : [<0000f998>] psr: 600d0010
> sp : 7eab862c ip : 00000000 fp : 000a82a4
> r10: 00000020 r9 : 000a8294 r8 : 00000001
> r7 : 00000004 r6 : 000a9bf0 r5 : 00000001 r4 : 000a7d88
> r3 : 00000000 r2 : 00000002 r1 : 000a9bf0 r0 : 00000001
> Flags: nZCv IRQs on FIQs on Mode USER_32 ISA ARM
>
> PID: 0 TASK: 8bc561c0 CPU: 1 COMMAND: "swapper/1"
> bt: WARNING: cannot determine starting stack frame for task 8bc561c0
>
> PID: 0 TASK: 8bc56580 CPU: 2 COMMAND: "swapper/2"
> bt: WARNING: cannot determine starting stack frame for task 8bc56580
>
> PID: 0 TASK: 8bc56940 CPU: 3 COMMAND: "swapper/3"
> bt: WARNING: cannot determine starting stack frame for task 8bc56940
>
> PID: 0 TASK: 8bc56d00 CPU: 4 COMMAND: "swapper/4"
> bt: WARNING: cannot determine starting stack frame for task 8bc56d00
> crash>
>
> Note that if I did "goto fail" instead of "continue", I lose the
good
> cpu 0 backtrace from the NT_PRSTATUS that your patch found, so doing it
> this way is the best of both worlds.
I agree that the 'if (!note) continue' that you added is a good idea to
try and salvage this type of dump. It shouldn't happen with qemu generated
dumps, but anything's possible when a kernel panics...
Would you like me to spin a v4 with this condition added? Or, since it
actually seems to be addressing a non-qemu-generated dump issue, then
maybe you just want to submit it as a new patch on top of the qemu
patch?
No, I'm cleaning it up now -- there are a couple other gotcha's w/respect
to backtracing, since we're now going to keep the ms->panic_task_regs[cpus]
array intact, even if there's no note in some of them. I'm going to make
sure it's zero-filled in the not-found case, and I have to make sure that
it's recognized on both architectures. I'll post the the updated additions
to arm.c and arm64.c so that you can do a quick test on a legitimate
QEMU dump.
Might be Monday, I'm not sure...
Thanks,
Dave