Re: [Crash-utility] Throw read error on vmcore produced by ARM soc.

Friday, 29 March 2013

----- Original Message -----
...
 2013/3/28 Dave Anderson <anderson(a)redhat.com&gt;:
 >
 >
 > ----- Original Message -----
 >> 2013/3/27 Dave Anderson <anderson(a)redhat.com&gt;:
 >> >
 >> >
 >> > ----- Original Message -----
 >> >> 2013/3/26 Dave Anderson <anderson(a)redhat.com&gt;:
 >> >> >
 >> >> >
 >> >> > ----- Original Message -----
 >> >> >> Hi, list.
 >> >> >>
 >> >> >> I use crash-utility to analyse crash dump core from ARM soc.
 >> >> >> When I
 >> >> >> execute command below, I get the error "crash: read
error:
 >> >> >> kernel
 >> >> >> virtual address: c0c1e040  type: "first vmap_area
 >> >> >> va_start"". I also
 >> >> >> test it by gdb. It works fine. The Linux kernel's version
is
 >> >> >> v3.0.8.
 >> >> >>
 >> >> >> hfli@pc1935:~/work/crash-utility$ ./crash vmlinux Vmcore
 >> >> >>
 >> >> >> crash 6.1.4
 >> >> >> Copyright (C) 2002-2013  Red Hat, Inc.
 >> >> >> Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
 >> >> >> Copyright (C) 1999-2006  Hewlett-Packard Co
 >> >> >> Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
 >> >> >> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
 >> >> >> Copyright (C) 2005, 2011  NEC Corporation
 >> >> >> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
 >> >> >> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical
 >> >> >> Linux,
 >> >> >> Inc.
 >> >> >> This program is free software, covered by the GNU General
 >> >> >> Public License,
 >> >> >> and you are welcome to change it and/or distribute copies of
 >> >> >> it under
 >> >> >> certain conditions.  Enter "help copying" to see
the
 >> >> >> conditions.
 >> >> >> This program has absolutely no warranty.  Enter "help
 >> >> >> warranty" for
 >> >> >> details.
 >> >> >>
 >> >> >> GNU gdb (GDB) 7.3.1
 >> >> >> Copyright (C) 2011 Free Software Foundation, Inc.
 >> >> >> License GPLv3+: GNU GPL version 3 or later
 >> >> >> <http://gnu.org/licenses/gpl.html>
 >> >> >> This is free software: you are free to change and
 >> >> >> redistribute it.
 >> >> >> There is NO WARRANTY, to the extent permitted by law.  Type
 >> >> >> "show copying"
 >> >> >> and "show warranty" for details.
 >> >> >> This GDB was configured as "--host=i686-pc-linux-gnu
 >> >> >> --target=arm-elf-linux"...
 >> >> >>
 >> >> >> crash: read error: kernel virtual address: c0c1e040  type:
 >> >> >> "first vmap_area va_start"
 >> >> >>
 >> >> >> Errors like the one above typically occur when the kernel
 >> >> >> and memory source
 >> >> >> do not match.  These are the files being used:
 >> >> >>
 >> >> >>       KERNEL: vmlinux
 >> >> >>     DUMPFILE: Vmcore
 >> >> >
 >> >> > You've answered your own question -- you should always see
 >> >> > errors if the vmlinux
 >> >> > kernel does not match the kernel crashed system.
 >> >> >
 >> >> > If you cannot find/access the original vmlinux file that was
 >> >> > being run
 >> >> > by the crashed kernel, then get the /boot/System.map file of
 >> >> > the crashed
 >> >> > kernel, and enter it on the command line:
 >> >> Thanks for your reply.
 >> >>
 >> >> The vmlinux, include debug information, and crash kernel, is
 >> >> cross-compile built and produced together. I couldn't
 >> >> understand why
 >> >> crash throw this warning "kernel and source doesn't
match".
 >> >>
 >> >> >
 >> >> >  $ crash vmlinux Vmcore System.map
 >> >> >
 >> >> > The crash utility will replace all of the invalid symbol
 >> >> > values from the
 >> >> > "wrong" vmlinux file with their correct values from the
 >> >> > System.map file.
 >> >>
 >> >>
 >> >> A moment ago. I rebuilt the arm kernel source again. And took
 >> >> "echo c
 >> >> > /proc/sysrq-trigger" command to trigger system panic. The
 >> >> > status lists below.
 >> >> hfli@pc1935:~/work/crash-utility$ ./crash vmlinux0327
 >> >> Vmcore0327
 >> >>
 >> >> crash 6.1.4
 >> >> Copyright (C) 2002-2013  Red Hat, Inc.
 >> >> Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
 >> >> Copyright (C) 1999-2006  Hewlett-Packard Co
 >> >> Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
 >> >> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
 >> >> Copyright (C) 2005, 2011  NEC Corporation
 >> >> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
 >> >> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux,
 >> >> Inc.
 >> >> This program is free software, covered by the GNU General
 >> >> Public License,
 >> >> and you are welcome to change it and/or distribute copies of it
 >> >> under
 >> >> certain conditions.  Enter "help copying" to see the
 >> >> conditions.
 >> >> This program has absolutely no warranty.  Enter "help
warranty"
 >> >> for
 >> >> details.
 >> >>
 >> >> GNU gdb (GDB) 7.3.1
 >> >> Copyright (C) 2011 Free Software Foundation, Inc.
 >> >> License GPLv3+: GNU GPL version 3 or later
 >> >> <http://gnu.org/licenses/gpl.html>
 >> >> This is free software: you are free to change and redistribute
 >> >> it.
 >> >> There is NO WARRANTY, to the extent permitted by law.  Type
 >> >> "show copying"
 >> >> and "show warranty" for details.
 >> >> This GDB was configured as "--host=i686-pc-linux-gnu
 >> >> --target=arm-elf-linux"...
 >> >>
 >> >> please wait... (gathering kmem slab cache data)
 >> >> crash: read error: kernel virtual address: c0c91840  type:
 >> >> "kmem_cache buffer"
 >> >>
 >> >> crash: unable to initialize kmem slab cache subsystem
 >> >>
 >> >>
 >> >> WARNING: invalid note (n_type != NT_PRSTATUS)
 >> >>
 >> >> WARNING: could not retrieve crash_notes
 >> >> please wait... (gathering task table data)
 >> >> crash: cannot read pid_hash upid
 >> >>
 >> >> crash: cannot read pid_hash upid
 >> >> please wait... (determining panic task)
 >> >> WARNING: cannot get stackframe for task
 >> >>       KERNEL: vmlinux0327
 >> >>     DUMPFILE: Vmcore0327
 >> >>         CPUS: 1
 >> >>         DATE: Thu Jan  1 08:00:00 1970
 >> >>       UPTIME: 00:00:00
 >> >> LOAD AVERAGE: 0.00, 0.00, 0.00
 >> >>        TASKS: 1
 >> >>     NODENAME: 10.38.50.241
 >> >>      RELEASE: 3.0.8-00010-gb7f16a3-dirty
 >> >>      VERSION: #339 Wed Mar 27 10:39:43 CST 2013
 >> >>      MACHINE: armv7l  (unknown Mhz)
 >> >>       MEMORY: 19 MB
 >> >>        PANIC: ""
 >> >>          PID: 0
 >> >>      COMMAND: "swapper"
 >> >>         TASK: c02e0620  [THREAD_INFO: c02dc000]
 >> >>          CPU: 0
 >> >>        STATE: TASK_RUNNING (ACTIVE)
 >> >>      WARNING: panic task not found
 >> >>
 >> >> crash>
 >> >>
 >> >>
 >> >> It also didn't works so fine. Then I appended system.map, the
 >> >> output
 >> >> result is also the same.
 >> >
 >> > OK, so then it's not clear to me why you're seeing those errors.
 >> >
 >> > Was the dumpfile created using kdump?  It almost looks like the
 >> > dump
 >> > was taken while the system was still running?  Have you *ever*
 >> > created
 >> > a dumpfile that resulted in an error-free crash session?
 >>
 >> Yes, the dumpfile is created by kdump. The dump was taken by "echo
 >> c >
 >> /proc/sysrq-trigger".
 >>
 >> I will try another case by inserting a panic module tomorrow.
 >> >
 >> > Perhaps the ARM users on this list have seen this kind of thing?
 >> >
 >> > If you enter "crash -d8 ..." on the command line, you may get a
 >> > better
 >> > picture of what leads up to the errors shown above, and of most
 >> > interest, the readmem() calls that generate the errors.  If you
 >> > see a "crash: read error: ...", then that means that the
 >> > dumpfile
 >> > doesn't contain the physical page associated with the virtual
 >> > address shown.  But it's not clear whether the address itself
 >> > is legitimate, i.e., was it gathered from the wrong location.
 >>
 >> Sounds reasonable.
 >>
 >> >
 >> >>
 >> >> I try GDB to test it.
 >> >> hfli@pc1935:~/work/crash-utility$ ./gdb-7.5/gdb/gdb vmlinux0327
 >> >> Vmcore0327
 >> >> GNU gdb (GDB) 7.5
 >> >> Copyright (C) 2012 Free Software Foundation, Inc.
 >> >> License GPLv3+: GNU GPL version 3 or later
 >> >> <http://gnu.org/licenses/gpl.html>
 >> >> This is free software: you are free to change and redistribute
 >> >> it.
 >> >> There is NO WARRANTY, to the extent permitted by law.  Type
 >> >> "show copying"
 >> >> and "show warranty" for details.
 >> >> This GDB was configured as "--host=x86
 >> >> --target=arm-linux-gnueabi".
 >> >> For bug reporting instructions, please see:
 >> >> <http://www.gnu.org/software/gdb/bugs/>...
 >> >> Reading symbols from
 >> >> /home/hfli/work/crash-utility/vmlinux0327...done.
 >> >>
 >> >> warning: exec file is newer than core file.
 >> >
 >> > Again, this bothers me -- why is it "newer" than the core file?
 >> > Are you sure that they are *exactly* the same?
 >>
 >> I am sure they are *exactly* the same. :-)
 >>
 >> I'm not clear the internals of how to judge exec file and core
 >> file.
 >
 > gdb is warning that it appears that you must have compiled the
 > vmlinux0327
 > after the Vmcore0327 dumpfile was created?  Perhaps it's because
 > you copied
 > the two files to the host system where you're running gdb from in
 > the
 > "wrong" order.
 >
 > What I was trying to confirm is that when you rebuilt the vmlinux
 > file
 > with debuginfo data, that you also *installed* that rebuilt kernel
 > onto
 > the target system prior to crashing it.
 >
 >>
 >> >
 >> >> [New LWP 278]
 >> >> #0  0xc0155f7c in sysrq_handle_crash (key=99) at
 >> >> drivers/tty/sysrq.c:134
 >> >> 134             *killer = 1;
 >> >> (gdb) list
 >> >> 129     {
 >> >> 130             char *killer = NULL;
 >> >> 131
 >> >> 132             panic_on_oops = 1;      /* force panic */
 >> >> 133             wmb();
 >> >> 134             *killer = 1;
 >> >> 135     }
 >> >> 136     static struct sysrq_key_op sysrq_crash_op = {
 >> >> 137             .handler        = sysrq_handle_crash,
 >> >> 138             .help_msg       = "Crash",
 >> >> (gdb)
 >> >>
 >> >> gdb also works fine.
 >> >>
 >> >
 >> > It works fine for gdb in the very limited case above.  The crash
 >> > utility
 >> > is also "working fine" for a much more expansive access of the
 >> > dumpfile.
 >> > But if you tried to access the same locations in the dumpfile
 >> > that the
 >> > crash utility is doing during its initialization, then gdb would
 >> > also
 >> > fail.
 >> >
 >> > Let's take a simple example -- in your first email, you saw this
 >> > error:
 >> >
 >> >  crash: read error: kernel virtual address: c0c1e040  type:
 >> >  "first
 >> >  vmap_area va_start"
 >> >
 >> > which came from here:
 >> >
 >> >         if (vt->flags & USE_VMAP_AREA) {
 >> >                 get_symbol_data("vmap_area_list", sizeof(void
 >> >                 *),
 >> >                 &vmap_area);
 >> >                 if (!vmap_area)
 >> >                         return 0;
 >> >                 if (!readmem(vmap_area - OFFSET(vmap_area_list)
 >> >                 +
 >> >                     OFFSET(vmap_area_va_start), KVADDR,
 >> >                     &vmalloc_start,
 >> >                     sizeof(void *), "first vmap_area va_start",
 >> >                     RETURN_ON_ERROR))
 >> >                         non_matching_kernel();
 >> >
 >> > If I look at a sample ARM dumpfile I have, I see this:
 >> >
 >> >  crash> p vmap_area_list
 >> >  vmap_area_list = $8 = {
 >> >    next = 0xc30d4d78,
 >> >    prev = 0xc06702b8
 >> >  }
 >> >
 >> > where the "next" pointer of 0xc30d4d78 above points to the
 >> > "list" member
 >> > of a vmap_area structure:
 >> >
 >> >  crash> struct vmap_area
 >> >  struct vmap_area {
 >> >      long unsigned int va_start;
 >> >      long unsigned int va_end;
 >> >      long unsigned int flags;
 >> >      struct rb_node rb_node;
 >> >      struct list_head list;         <== "next" points here
 >> >      struct list_head purge_list;
 >> >      void *private;
 >> >      struct rcu_head rcu_head;
 >> >  }
 >> >  SIZE: 52
 >> >  crash>
 >> >
 >> > And I can dump that vmap_area structure like this:
 >> >
 >> >  crash> struct -x vmap_area -l vmap_area.list 0xc30d4d78
 >> >  struct vmap_area {
 >> >    va_start = 0xbf000000,
 >> >    va_end = 0xbf005000,
 >> >    flags = 0x4,
 >> >    rb_node = {
 >> >      rb_parent_color = 0xc2ca076d,
 >> >      rb_right = 0x0,
 >> >      rb_left = 0x0
 >> >    },
 >> >    list = {
 >> >      next = 0xc2ca0778,
 >> >      prev = 0xc0411ed4
 >> >    },
 >> >    purge_list = {
 >> >      next = 0x0,
 >> >      prev = 0x0
 >> >    },
 >> >    private = 0xc3396860,
 >> >    rcu_head = {
 >> >      next = 0x0,
 >> >      func = 0
 >> >    }
 >> >  }
 >> >
 >> > But your kernel found a "vmap_area_list.next" pointer of
 >> > c0c1e040,
 >> > but it was not accessible from the dumpfile.
 >> >
 >> > So either:
 >> >
 >> >  (1) the "vmap_area_list" symbol value was not correct, or
 >> >  (2) the page containing the first vmap_area structure was
 >> >      not included in the dumpfile.
 >> >
 >> > Problem (1) can happen if your crashed kernel doesn't match the
 >> > vmlinux file, i.e., the symbol values don't match.  But if the
 >> > "vmap_area_list" symbol was correct, then (2) mush have
 >> > occurred,
 >> > and that should never happen unless the dumpfile was corrupted
 >> > or
 >> > was created incorrectly.
 >> >
 >>
 >> Agree.
 >>
 >> Thanks for your patience again.
 >>
 >> For my case, the crashkernel cmdline of crash kernel is
 >> crashkernel=20M@10M. When the capture kernel launch, the
 >> elfcorehdr=0x1d00000, and the initialization of /proc/vmcore will
 >> fail
 >> with WARN_ON(pfn_valid(pfn)) throwing.
 >>
 >> The routine is
 >>
vmcore_init->parse_crash_elf_headers->read_from_oldmem->copy_oldmem_page->ioremap->__arm_ioremap->arch_ioremap_caller->__arm_ioremap_caller->__arm_ioremap_pfn_caller->WARN_ON(pfn_valid(pfn)).
 >>
 >> My temporary solution is comment the WARN_ON() to make
 >> /proc/vmcore work.
 >>
 >> May my comment method corrupt the vmcore?
 >
 > Does the crash session come up cleanly?
 >
 > I don't know about the arm_ioremap issue -- that's for the ARM guys
 > to answer.
 >
 > I'm not familiar with the specifics on how the kernel's vmcore
 > creation works,
 > but do you see differences in the contents of the PT_LOAD segments
 > after applying
 > your temporary solution?  In other words, if you do this with an
 > old vmcore
 > vs. a new vmcore:
 >
 > $ readelf -a vmcore
 > ELF Header:
 >   Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
 >   Class:                             ELF32
 >   Data:                              2's complement, little endian
 >   Version:                           1 (current)
 >   OS/ABI:                            UNIX - System V
 >   ABI Version:                       0
 >   Type:                              CORE (Core file)
 >   Machine:                           ARM
 >   Version:                           0x1
 >   Entry point address:               0x0
 >   Start of program headers:          52 (bytes into file)
 >   Start of section headers:          0 (bytes into file)
 >   Flags:                             0x0
 >   Size of this header:               52 (bytes)
 >   Size of program headers:           32 (bytes)
 >   Number of program headers:         3
 >   Size of section headers:           0 (bytes)
 >   Number of section headers:         0
 >   Section header string table index: 0
 >
 > There are no sections in this file.
 >
 > There are no sections to group in this file.
 >
 > Program Headers:
 >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg
 >   Align
 >   NOTE           0x000094 0x00000000 0x00000000 0x00514 0x00514
 >       0
 >   LOAD           0x0005a8 0xc0000000 0xc0000000 0x2000000 0x2000000
 >   RWE 0
 >   LOAD           0x20005a8 0xc2800000 0xc2800000 0x1800000
 >   0x1800000 RWE 0
 >
 > There is no dynamic section in this file.
 >
 > There are no relocations in this file.
 >
 > No version information found in this file.
 >
 > Notes at offset 0x00000094 with length 0x00000514:
 >   Owner                 Data size       Description
 >   CORE                 0x00000094       NT_PRSTATUS (prstatus
 >   structure)
 >   VMCOREINFO           0x00000452       Unknown note type:
 >   (0x00000000)
 > $
 >
 > Are the LOAD sections different?

 hfli@msh-pc1935:~/work/crash-utility$ readelf -a Vmcore308
 ELF Header:
   Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
   Class:                             ELF32
   Data:                              2's complement, little endian
   Version:                           1 (current)
   OS/ABI:                            UNIX - System V
   ABI Version:                       0
   Type:                              CORE (Core file)
   Machine:                           ARM
   Version:                           0x1
   Entry point address:               0x0
   Start of program headers:          52 (bytes into file)
   Start of section headers:          0 (bytes into file)
   Flags:                             0x0
   Size of this header:               52 (bytes)
   Size of program headers:           32 (bytes)
   Number of program headers:         3
   Size of section headers:           0 (bytes)
   Number of section headers:         0
   Section header string table index: 0

 There are no sections in this file.

 There are no sections to group in this file.

 Program Headers:
   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg
   Align
   NOTE           0x000094 0x00000000 0x00000000 0x000a8 0x000a8     0
   LOAD           0x00013c 0xc0000000 0x00000000 0xa00000 0xa00000 RWE 0
   LOAD           0xa0013c 0xc1e00000 0x01e00000 0x6200000 0x6200000 RWE 0

 There is no dynamic section in this file.

 There are no relocations in this file.

 No version information found in this file.

 Notes at offset 0x00000094 with length 0x000000a8:
   Owner                 Data size       Description
   CORE                 0x00000094       NT_PRSTATUS (prstatus
   structure)

 ---
 I notice Notes section has not _VMCOREINFO_.

 The following is my step of using kdump and crash utility.

 1. built linux kernel source
 2. Put arch/arm/boot/uImage to tftp server;
     Put arch/arm/boot/uImage to nfs server.(kernel launch rootfs by
     NFS)
 3. bootup uImage with "crashkernel=20M@10M"
 4. load uImage of capture kernel。
     $./sbin/kexec -p --atags --append="console=ttyAM0,38400n8
 root=/dev/nfs rw nfsroot=10.38.50.248:/nfs/nfs ip=10.38.50.241
 loglevel=15 rdinit=/rdinit" /uImagetahoe308
 5  inserting panic module to trigger panic.
    $insmod module.ko
 6 capture kernel boots up. (In the progress of booting, capture will
 initialize /proc/vmcore. if the initialization of vmcore fails,
 /proc/vmcore won't existence.)
 7. use _cp_ tool dump the vmcore
   $cp /proc/vmcore /Vmcore308
 8. copy vmlinux & Vmcore308 to crash working directory and use crash
 utility analyse the Vmcore 308.

 hfli@pc1935:~/work/crash-utility$ ./crash vmlinux308 Vmcore308

 crash 6.1.4
 Copyright (C) 2002-2013  Red Hat, Inc.
 Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
 Copyright (C) 1999-2006  Hewlett-Packard Co
 Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
 Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
 Copyright (C) 2005, 2011  NEC Corporation
 Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
 Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
 This program is free software, covered by the GNU General Public
 License,
 and you are welcome to change it and/or distribute copies of it under
 certain conditions.  Enter "help copying" to see the conditions.
 This program has absolutely no warranty.  Enter "help warranty" for
 details.

 GNU gdb (GDB) 7.3.1
 Copyright (C) 2011 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later
 <http://gnu.org/licenses/gpl.html>
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type "show
 copying"
 and "show warranty" for details.
 This GDB was configured as "--host=i686-pc-linux-gnu
 --target=arm-elf-linux"...

 crash: read error: kernel virtual address: c0c1e040  type: "first vmap_area
va_start"

 Errors like the one above typically occur when the kernel and memory
 source
 do not match.  These are the files being used:

       KERNEL: vmlinux308
     DUMPFILE: Vmcore308

 --
 Unfortunately, the crash also read error and deduce the kernel and
 memory source don't match.

 The vmcore initialization looks like fine. and copying the dump file
 of /proc/vmcore also works fine.

 I couldn't know whether and why the vmcore is corrupt. 
I don't know either, but in the case above, kernel virtual address c0c1e040
doesn't fit in the virtual address ranges declared in the vmcore header:

...
 Program Headers:
   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg
   Align
   NOTE           0x000094 0x00000000 0x00000000 0x000a8 0x000a8     0
   LOAD           0x00013c 0xc0000000 0x00000000 0xa00000 0xa00000 RWE 0
   LOAD           0xa0013c 0xc1e00000 0x01e00000 0x6200000 0x6200000 RWE 0 
If you go through the exercise I showed a few messages back, i.e, look at the
kernel's vmap_area_list list_head structure by entering "p vmap_area_list",
you
should see its "next" pointer containing the c0c1e040 address.  But the vmcore
shows a hole between c0a00000 and c1e00000.

Dave

...

 Thanks.
 >
 > Anyway, if the crash session comes up cleanly when you apply your
 > temporary
 > solution, then clearly you've identified the problem at hand.
 >
 > Dave
 >
 >
 > --
 > Crash-utility mailing list
 > Crash-utility(a)redhat.com
 > https://www.redhat.com/mailman/listinfo/crash-utility

 --
 Crash-utility mailing list
 Crash-utility(a)redhat.com
 https://www.redhat.com/mailman/listinfo/crash-utility 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Throw read error on vmcore produced by ARM soc.