Re: [Crash-utility] Throw read error on vmcore produced by ARM soc.

Thursday, 28 March 2013

2013/3/27 Dave Anderson <anderson(a)redhat.com&gt;:
...

 ----- Original Message -----
> 2013/3/26 Dave Anderson <anderson(a)redhat.com&gt;:
> >
> >
> > ----- Original Message -----
> >> Hi, list.
> >>
> >> I use crash-utility to analyse crash dump core from ARM soc. When I
> >> execute command below, I get the error "crash: read error: kernel
> >> virtual address: c0c1e040  type: "first vmap_area va_start"".
I also
> >> test it by gdb. It works fine. The Linux kernel's version is
> >> v3.0.8.
> >>
> >> hfli@pc1935:~/work/crash-utility$ ./crash vmlinux Vmcore
> >>
> >> crash 6.1.4
> >> Copyright (C) 2002-2013  Red Hat, Inc.
> >> Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> >> Copyright (C) 1999-2006  Hewlett-Packard Co
> >> Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> >> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> >> Copyright (C) 2005, 2011  NEC Corporation
> >> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> >> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> >> This program is free software, covered by the GNU General Public License,
> >> and you are welcome to change it and/or distribute copies of it under
> >> certain conditions.  Enter "help copying" to see the conditions.
> >> This program has absolutely no warranty.  Enter "help warranty"
for
> >> details.
> >>
> >> GNU gdb (GDB) 7.3.1
> >> Copyright (C) 2011 Free Software Foundation, Inc.
> >> License GPLv3+: GNU GPL version 3 or later
> >> <http://gnu.org/licenses/gpl.html>
> >> This is free software: you are free to change and redistribute it.
> >> There is NO WARRANTY, to the extent permitted by law.  Type "show
copying"
> >> and "show warranty" for details.
> >> This GDB was configured as "--host=i686-pc-linux-gnu
--target=arm-elf-linux"...
> >>
> >> crash: read error: kernel virtual address: c0c1e040  type: "first
vmap_area va_start"
> >>
> >> Errors like the one above typically occur when the kernel and memory source
> >> do not match.  These are the files being used:
> >>
> >>       KERNEL: vmlinux
> >>     DUMPFILE: Vmcore
> >
> > You've answered your own question -- you should always see errors if the
vmlinux
> > kernel does not match the kernel crashed system.
> >
> > If you cannot find/access the original vmlinux file that was being run
> > by the crashed kernel, then get the /boot/System.map file of the crashed
> > kernel, and enter it on the command line:
> Thanks for your reply.
>
> The vmlinux, include debug information, and crash kernel, is
> cross-compile built and produced together. I couldn't understand why
> crash throw this warning "kernel and source doesn't match".
>
> >
> >  $ crash vmlinux Vmcore System.map
> >
> > The crash utility will replace all of the invalid symbol values from the
> > "wrong" vmlinux file with their correct values from the System.map
file.
>
>
> A moment ago. I rebuilt the arm kernel source again. And took "echo c
> > /proc/sysrq-trigger" command to trigger system panic. The status lists
below.
> hfli@pc1935:~/work/crash-utility$ ./crash vmlinux0327 Vmcore0327
>
> crash 6.1.4
> Copyright (C) 2002-2013  Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> Copyright (C) 1999-2006  Hewlett-Packard Co
> Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> Copyright (C) 2005, 2011  NEC Corporation
> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions.  Enter "help copying" to see the conditions.
> This program has absolutely no warranty.  Enter "help warranty" for
> details.
>
> GNU gdb (GDB) 7.3.1
> Copyright (C) 2011 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "--host=i686-pc-linux-gnu
--target=arm-elf-linux"...
>
> please wait... (gathering kmem slab cache data)
> crash: read error: kernel virtual address: c0c91840  type: "kmem_cache
buffer"
>
> crash: unable to initialize kmem slab cache subsystem
>
>
> WARNING: invalid note (n_type != NT_PRSTATUS)
>
> WARNING: could not retrieve crash_notes
> please wait... (gathering task table data)
> crash: cannot read pid_hash upid
>
> crash: cannot read pid_hash upid
> please wait... (determining panic task)
> WARNING: cannot get stackframe for task
>       KERNEL: vmlinux0327
>     DUMPFILE: Vmcore0327
>         CPUS: 1
>         DATE: Thu Jan  1 08:00:00 1970
>       UPTIME: 00:00:00
> LOAD AVERAGE: 0.00, 0.00, 0.00
>        TASKS: 1
>     NODENAME: 10.38.50.241
>      RELEASE: 3.0.8-00010-gb7f16a3-dirty
>      VERSION: #339 Wed Mar 27 10:39:43 CST 2013
>      MACHINE: armv7l  (unknown Mhz)
>       MEMORY: 19 MB
>        PANIC: ""
>          PID: 0
>      COMMAND: "swapper"
>         TASK: c02e0620  [THREAD_INFO: c02dc000]
>          CPU: 0
>        STATE: TASK_RUNNING (ACTIVE)
>      WARNING: panic task not found
>
> crash>
>
>
> It also didn't works so fine. Then I appended system.map, the output
> result is also the same.

 OK, so then it's not clear to me why you're seeing those errors.

 Was the dumpfile created using kdump?  It almost looks like the dump
 was taken while the system was still running?  Have you *ever* created
 a dumpfile that resulted in an error-free crash session? 
Yes, the dumpfile is created by kdump. The dump was taken by "echo c >
/proc/sysrq-trigger".

I will try another case by inserting a panic module tomorrow.
...

 Perhaps the ARM users on this list have seen this kind of thing?

 If you enter "crash -d8 ..." on the command line, you may get a better
 picture of what leads up to the errors shown above, and of most
 interest, the readmem() calls that generate the errors.  If you
 see a "crash: read error: ...", then that means that the dumpfile
 doesn't contain the physical page associated with the virtual
 address shown.  But it's not clear whether the address itself
 is legitimate, i.e., was it gathered from the wrong location. 
Sounds reasonable.

...

>
> I try GDB to test it.
> hfli@pc1935:~/work/crash-utility$ ./gdb-7.5/gdb/gdb vmlinux0327
> Vmcore0327
> GNU gdb (GDB) 7.5
> Copyright (C) 2012 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show
> copying"
> and "show warranty" for details.
> This GDB was configured as "--host=x86 --target=arm-linux-gnueabi".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from
> /home/hfli/work/crash-utility/vmlinux0327...done.
>
> warning: exec file is newer than core file.

 Again, this bothers me -- why is it "newer" than the core file?
 Are you sure that they are *exactly* the same? 
I am sure they are *exactly* the same. :-)

I'm not clear the internals of how to judge exec file and core file.

...

> [New LWP 278]
> #0  0xc0155f7c in sysrq_handle_crash (key=99) at
> drivers/tty/sysrq.c:134
> 134             *killer = 1;
> (gdb) list
> 129     {
> 130             char *killer = NULL;
> 131
> 132             panic_on_oops = 1;      /* force panic */
> 133             wmb();
> 134             *killer = 1;
> 135     }
> 136     static struct sysrq_key_op sysrq_crash_op = {
> 137             .handler        = sysrq_handle_crash,
> 138             .help_msg       = "Crash",
> (gdb)
>
> gdb also works fine.
>

 It works fine for gdb in the very limited case above.  The crash utility
 is also "working fine" for a much more expansive access of the dumpfile.
 But if you tried to access the same locations in the dumpfile that the
 crash utility is doing during its initialization, then gdb would also
 fail.

 Let's take a simple example -- in your first email, you saw this error:

  crash: read error: kernel virtual address: c0c1e040  type: "first vmap_area
va_start"

 which came from here:

         if (vt->flags & USE_VMAP_AREA) {
                 get_symbol_data("vmap_area_list", sizeof(void *),
&vmap_area);
                 if (!vmap_area)
                         return 0;
                 if (!readmem(vmap_area - OFFSET(vmap_area_list) +
                     OFFSET(vmap_area_va_start), KVADDR, &vmalloc_start,
                     sizeof(void *), "first vmap_area va_start",
RETURN_ON_ERROR))
                         non_matching_kernel();

 If I look at a sample ARM dumpfile I have, I see this:

  crash> p vmap_area_list
  vmap_area_list = $8 = {
    next = 0xc30d4d78,
    prev = 0xc06702b8
  }

 where the "next" pointer of 0xc30d4d78 above points to the "list"
member
 of a vmap_area structure:

  crash> struct vmap_area
  struct vmap_area {
      long unsigned int va_start;
      long unsigned int va_end;
      long unsigned int flags;
      struct rb_node rb_node;
      struct list_head list;         <== "next" points here
      struct list_head purge_list;
      void *private;
      struct rcu_head rcu_head;
  }
  SIZE: 52
  crash>

 And I can dump that vmap_area structure like this:

  crash> struct -x vmap_area -l vmap_area.list 0xc30d4d78
  struct vmap_area {
    va_start = 0xbf000000,
    va_end = 0xbf005000,
    flags = 0x4,
    rb_node = {
      rb_parent_color = 0xc2ca076d,
      rb_right = 0x0,
      rb_left = 0x0
    },
    list = {
      next = 0xc2ca0778,
      prev = 0xc0411ed4
    },
    purge_list = {
      next = 0x0,
      prev = 0x0
    },
    private = 0xc3396860,
    rcu_head = {
      next = 0x0,
      func = 0
    }
  }

 But your kernel found a "vmap_area_list.next" pointer of c0c1e040,
 but it was not accessible from the dumpfile.

 So either:

  (1) the "vmap_area_list" symbol value was not correct, or
  (2) the page containing the first vmap_area structure was
      not included in the dumpfile.

 Problem (1) can happen if your crashed kernel doesn't match the
 vmlinux file, i.e., the symbol values don't match.  But if the
 "vmap_area_list" symbol was correct, then (2) mush have occurred,
 and that should never happen unless the dumpfile was corrupted or
 was created incorrectly.

Agree.

Thanks for your patience again.

For my case, the crashkernel cmdline of crash kernel is
crashkernel=20M@10M. When the capture kernel launch, the
elfcorehdr=0x1d00000, and the initialization of /proc/vmcore will fail
with WARN_ON(pfn_valid(pfn)) throwing.

The routine is
vmcore_init->parse_crash_elf_headers->read_from_oldmem->copy_oldmem_page->ioremap->__arm_ioremap->arch_ioremap_caller->__arm_ioremap_caller->__arm_ioremap_pfn_caller->WARN_ON(pfn_valid(pfn)).

My temporary solution is comment the WARN_ON() to make /proc/vmcore work.

May my comment method corrupt the vmcore?

Thanks.

...
 Dave

 --
 Crash-utility mailing list
 Crash-utility(a)redhat.com
 https://www.redhat.com/mailman/listinfo/crash-utility 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Throw read error on vmcore produced by ARM soc.