2013/3/28 Dave Anderson <anderson(a)redhat.com>:
>
>
> ----- Original Message -----
>> 2013/3/27 Dave Anderson <anderson(a)redhat.com>:
>> >
>> >
>> > ----- Original Message -----
>> >> 2013/3/26 Dave Anderson <anderson(a)redhat.com>:
>> >> >
>> >> >
>> >> > ----- Original Message -----
>> >> >> Hi, list.
>> >> >>
>> >> >> I use crash-utility to analyse crash dump core from ARM soc.
>> >> >> When I
>> >> >> execute command below, I get the error "crash: read
error:
>> >> >> kernel
>> >> >> virtual address: c0c1e040 type: "first vmap_area
>> >> >> va_start"". I also
>> >> >> test it by gdb. It works fine. The Linux kernel's version
is
>> >> >> v3.0.8.
>> >> >>
>> >> >> hfli@pc1935:~/work/crash-utility$ ./crash vmlinux Vmcore
>> >> >>
>> >> >> crash 6.1.4
>> >> >> Copyright (C) 2002-2013 Red Hat, Inc.
>> >> >> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
>> >> >> Copyright (C) 1999-2006 Hewlett-Packard Co
>> >> >> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
>> >> >> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
>> >> >> Copyright (C) 2005, 2011 NEC Corporation
>> >> >> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
>> >> >> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical
>> >> >> Linux,
>> >> >> Inc.
>> >> >> This program is free software, covered by the GNU General
>> >> >> Public License,
>> >> >> and you are welcome to change it and/or distribute copies of
>> >> >> it under
>> >> >> certain conditions. Enter "help copying" to see
the
>> >> >> conditions.
>> >> >> This program has absolutely no warranty. Enter "help
>> >> >> warranty" for
>> >> >> details.
>> >> >>
>> >> >> GNU gdb (GDB) 7.3.1
>> >> >> Copyright (C) 2011 Free Software Foundation, Inc.
>> >> >> License GPLv3+: GNU GPL version 3 or later
>> >> >> <
http://gnu.org/licenses/gpl.html>
>> >> >> This is free software: you are free to change and
>> >> >> redistribute it.
>> >> >> There is NO WARRANTY, to the extent permitted by law. Type
>> >> >> "show copying"
>> >> >> and "show warranty" for details.
>> >> >> This GDB was configured as "--host=i686-pc-linux-gnu
>> >> >> --target=arm-elf-linux"...
>> >> >>
>> >> >> crash: read error: kernel virtual address: c0c1e040 type:
>> >> >> "first vmap_area va_start"
>> >> >>
>> >> >> Errors like the one above typically occur when the kernel
>> >> >> and memory source
>> >> >> do not match. These are the files being used:
>> >> >>
>> >> >> KERNEL: vmlinux
>> >> >> DUMPFILE: Vmcore
>> >> >
>> >> > You've answered your own question -- you should always see
>> >> > errors if the vmlinux
>> >> > kernel does not match the kernel crashed system.
>> >> >
>> >> > If you cannot find/access the original vmlinux file that was
>> >> > being run
>> >> > by the crashed kernel, then get the /boot/System.map file of
>> >> > the crashed
>> >> > kernel, and enter it on the command line:
>> >> Thanks for your reply.
>> >>
>> >> The vmlinux, include debug information, and crash kernel, is
>> >> cross-compile built and produced together. I couldn't
>> >> understand why
>> >> crash throw this warning "kernel and source doesn't
match".
>> >>
>> >> >
>> >> > $ crash vmlinux Vmcore System.map
>> >> >
>> >> > The crash utility will replace all of the invalid symbol
>> >> > values from the
>> >> > "wrong" vmlinux file with their correct values from the
>> >> > System.map file.
>> >>
>> >>
>> >> A moment ago. I rebuilt the arm kernel source again. And took
>> >> "echo c
>> >> > /proc/sysrq-trigger" command to trigger system panic. The
>> >> > status lists below.
>> >> hfli@pc1935:~/work/crash-utility$ ./crash vmlinux0327
>> >> Vmcore0327
>> >>
>> >> crash 6.1.4
>> >> Copyright (C) 2002-2013 Red Hat, Inc.
>> >> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
>> >> Copyright (C) 1999-2006 Hewlett-Packard Co
>> >> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
>> >> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
>> >> Copyright (C) 2005, 2011 NEC Corporation
>> >> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
>> >> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux,
>> >> Inc.
>> >> This program is free software, covered by the GNU General
>> >> Public License,
>> >> and you are welcome to change it and/or distribute copies of it
>> >> under
>> >> certain conditions. Enter "help copying" to see the
>> >> conditions.
>> >> This program has absolutely no warranty. Enter "help
warranty"
>> >> for
>> >> details.
>> >>
>> >> GNU gdb (GDB) 7.3.1
>> >> Copyright (C) 2011 Free Software Foundation, Inc.
>> >> License GPLv3+: GNU GPL version 3 or later
>> >> <
http://gnu.org/licenses/gpl.html>
>> >> This is free software: you are free to change and redistribute
>> >> it.
>> >> There is NO WARRANTY, to the extent permitted by law. Type
>> >> "show copying"
>> >> and "show warranty" for details.
>> >> This GDB was configured as "--host=i686-pc-linux-gnu
>> >> --target=arm-elf-linux"...
>> >>
>> >> please wait... (gathering kmem slab cache data)
>> >> crash: read error: kernel virtual address: c0c91840 type:
>> >> "kmem_cache buffer"
>> >>
>> >> crash: unable to initialize kmem slab cache subsystem
>> >>
>> >>
>> >> WARNING: invalid note (n_type != NT_PRSTATUS)
>> >>
>> >> WARNING: could not retrieve crash_notes
>> >> please wait... (gathering task table data)
>> >> crash: cannot read pid_hash upid
>> >>
>> >> crash: cannot read pid_hash upid
>> >> please wait... (determining panic task)
>> >> WARNING: cannot get stackframe for task
>> >> KERNEL: vmlinux0327
>> >> DUMPFILE: Vmcore0327
>> >> CPUS: 1
>> >> DATE: Thu Jan 1 08:00:00 1970
>> >> UPTIME: 00:00:00
>> >> LOAD AVERAGE: 0.00, 0.00, 0.00
>> >> TASKS: 1
>> >> NODENAME: 10.38.50.241
>> >> RELEASE: 3.0.8-00010-gb7f16a3-dirty
>> >> VERSION: #339 Wed Mar 27 10:39:43 CST 2013
>> >> MACHINE: armv7l (unknown Mhz)
>> >> MEMORY: 19 MB
>> >> PANIC: ""
>> >> PID: 0
>> >> COMMAND: "swapper"
>> >> TASK: c02e0620 [THREAD_INFO: c02dc000]
>> >> CPU: 0
>> >> STATE: TASK_RUNNING (ACTIVE)
>> >> WARNING: panic task not found
>> >>
>> >> crash>
>> >>
>> >>
>> >> It also didn't works so fine. Then I appended system.map, the
>> >> output
>> >> result is also the same.
>> >
>> > OK, so then it's not clear to me why you're seeing those errors.
>> >
>> > Was the dumpfile created using kdump? It almost looks like the
>> > dump
>> > was taken while the system was still running? Have you *ever*
>> > created
>> > a dumpfile that resulted in an error-free crash session?
>>
>> Yes, the dumpfile is created by kdump. The dump was taken by "echo
>> c >
>> /proc/sysrq-trigger".
>>
>> I will try another case by inserting a panic module tomorrow.
>> >
>> > Perhaps the ARM users on this list have seen this kind of thing?
>> >
>> > If you enter "crash -d8 ..." on the command line, you may get a
>> > better
>> > picture of what leads up to the errors shown above, and of most
>> > interest, the readmem() calls that generate the errors. If you
>> > see a "crash: read error: ...", then that means that the
>> > dumpfile
>> > doesn't contain the physical page associated with the virtual
>> > address shown. But it's not clear whether the address itself
>> > is legitimate, i.e., was it gathered from the wrong location.
>>
>> Sounds reasonable.
>>
>> >
>> >>
>> >> I try GDB to test it.
>> >> hfli@pc1935:~/work/crash-utility$ ./gdb-7.5/gdb/gdb vmlinux0327
>> >> Vmcore0327
>> >> GNU gdb (GDB) 7.5
>> >> Copyright (C) 2012 Free Software Foundation, Inc.
>> >> License GPLv3+: GNU GPL version 3 or later
>> >> <
http://gnu.org/licenses/gpl.html>
>> >> This is free software: you are free to change and redistribute
>> >> it.
>> >> There is NO WARRANTY, to the extent permitted by law. Type
>> >> "show copying"
>> >> and "show warranty" for details.
>> >> This GDB was configured as "--host=x86
>> >> --target=arm-linux-gnueabi".
>> >> For bug reporting instructions, please see:
>> >> <
http://www.gnu.org/software/gdb/bugs/>...
>> >> Reading symbols from
>> >> /home/hfli/work/crash-utility/vmlinux0327...done.
>> >>
>> >> warning: exec file is newer than core file.
>> >
>> > Again, this bothers me -- why is it "newer" than the core file?
>> > Are you sure that they are *exactly* the same?
>>
>> I am sure they are *exactly* the same. :-)
>>
>> I'm not clear the internals of how to judge exec file and core
>> file.
>
> gdb is warning that it appears that you must have compiled the
> vmlinux0327
> after the Vmcore0327 dumpfile was created? Perhaps it's because
> you copied
> the two files to the host system where you're running gdb from in
> the
> "wrong" order.
>
> What I was trying to confirm is that when you rebuilt the vmlinux
> file
> with debuginfo data, that you also *installed* that rebuilt kernel
> onto
> the target system prior to crashing it.
>
>>
>> >
>> >> [New LWP 278]
>> >> #0 0xc0155f7c in sysrq_handle_crash (key=99) at
>> >> drivers/tty/sysrq.c:134
>> >> 134 *killer = 1;
>> >> (gdb) list
>> >> 129 {
>> >> 130 char *killer = NULL;
>> >> 131
>> >> 132 panic_on_oops = 1; /* force panic */
>> >> 133 wmb();
>> >> 134 *killer = 1;
>> >> 135 }
>> >> 136 static struct sysrq_key_op sysrq_crash_op = {
>> >> 137 .handler = sysrq_handle_crash,
>> >> 138 .help_msg = "Crash",
>> >> (gdb)
>> >>
>> >> gdb also works fine.
>> >>
>> >
>> > It works fine for gdb in the very limited case above. The crash
>> > utility
>> > is also "working fine" for a much more expansive access of the
>> > dumpfile.
>> > But if you tried to access the same locations in the dumpfile
>> > that the
>> > crash utility is doing during its initialization, then gdb would
>> > also
>> > fail.
>> >
>> > Let's take a simple example -- in your first email, you saw this
>> > error:
>> >
>> > crash: read error: kernel virtual address: c0c1e040 type:
>> > "first
>> > vmap_area va_start"
>> >
>> > which came from here:
>> >
>> > if (vt->flags & USE_VMAP_AREA) {
>> > get_symbol_data("vmap_area_list", sizeof(void
>> > *),
>> > &vmap_area);
>> > if (!vmap_area)
>> > return 0;
>> > if (!readmem(vmap_area - OFFSET(vmap_area_list)
>> > +
>> > OFFSET(vmap_area_va_start), KVADDR,
>> > &vmalloc_start,
>> > sizeof(void *), "first vmap_area va_start",
>> > RETURN_ON_ERROR))
>> > non_matching_kernel();
>> >
>> > If I look at a sample ARM dumpfile I have, I see this:
>> >
>> > crash> p vmap_area_list
>> > vmap_area_list = $8 = {
>> > next = 0xc30d4d78,
>> > prev = 0xc06702b8
>> > }
>> >
>> > where the "next" pointer of 0xc30d4d78 above points to the
>> > "list" member
>> > of a vmap_area structure:
>> >
>> > crash> struct vmap_area
>> > struct vmap_area {
>> > long unsigned int va_start;
>> > long unsigned int va_end;
>> > long unsigned int flags;
>> > struct rb_node rb_node;
>> > struct list_head list; <== "next" points here
>> > struct list_head purge_list;
>> > void *private;
>> > struct rcu_head rcu_head;
>> > }
>> > SIZE: 52
>> > crash>
>> >
>> > And I can dump that vmap_area structure like this:
>> >
>> > crash> struct -x vmap_area -l vmap_area.list 0xc30d4d78
>> > struct vmap_area {
>> > va_start = 0xbf000000,
>> > va_end = 0xbf005000,
>> > flags = 0x4,
>> > rb_node = {
>> > rb_parent_color = 0xc2ca076d,
>> > rb_right = 0x0,
>> > rb_left = 0x0
>> > },
>> > list = {
>> > next = 0xc2ca0778,
>> > prev = 0xc0411ed4
>> > },
>> > purge_list = {
>> > next = 0x0,
>> > prev = 0x0
>> > },
>> > private = 0xc3396860,
>> > rcu_head = {
>> > next = 0x0,
>> > func = 0
>> > }
>> > }
>> >
>> > But your kernel found a "vmap_area_list.next" pointer of
>> > c0c1e040,
>> > but it was not accessible from the dumpfile.
>> >
>> > So either:
>> >
>> > (1) the "vmap_area_list" symbol value was not correct, or
>> > (2) the page containing the first vmap_area structure was
>> > not included in the dumpfile.
>> >
>> > Problem (1) can happen if your crashed kernel doesn't match the
>> > vmlinux file, i.e., the symbol values don't match. But if the
>> > "vmap_area_list" symbol was correct, then (2) mush have
>> > occurred,
>> > and that should never happen unless the dumpfile was corrupted
>> > or
>> > was created incorrectly.
>> >
>>
>> Agree.
>>
>> Thanks for your patience again.
>>
>> For my case, the crashkernel cmdline of crash kernel is
>> crashkernel=20M@10M. When the capture kernel launch, the
>> elfcorehdr=0x1d00000, and the initialization of /proc/vmcore will
>> fail
>> with WARN_ON(pfn_valid(pfn)) throwing.
>>
>> The routine is
>>
vmcore_init->parse_crash_elf_headers->read_from_oldmem->copy_oldmem_page->ioremap->__arm_ioremap->arch_ioremap_caller->__arm_ioremap_caller->__arm_ioremap_pfn_caller->WARN_ON(pfn_valid(pfn)).
>>
>> My temporary solution is comment the WARN_ON() to make
>> /proc/vmcore work.
>>
>> May my comment method corrupt the vmcore?
>
> Does the crash session come up cleanly?
>
> I don't know about the arm_ioremap issue -- that's for the ARM guys
> to answer.
>
> I'm not familiar with the specifics on how the kernel's vmcore
> creation works,
> but do you see differences in the contents of the PT_LOAD segments
> after applying
> your temporary solution? In other words, if you do this with an
> old vmcore
> vs. a new vmcore:
>
> $ readelf -a vmcore
> ELF Header:
> Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
> Class: ELF32
> Data: 2's complement, little endian
> Version: 1 (current)
> OS/ABI: UNIX - System V
> ABI Version: 0
> Type: CORE (Core file)
> Machine: ARM
> Version: 0x1
> Entry point address: 0x0
> Start of program headers: 52 (bytes into file)
> Start of section headers: 0 (bytes into file)
> Flags: 0x0
> Size of this header: 52 (bytes)
> Size of program headers: 32 (bytes)
> Number of program headers: 3
> Size of section headers: 0 (bytes)
> Number of section headers: 0
> Section header string table index: 0
>
> There are no sections in this file.
>
> There are no sections to group in this file.
>
> Program Headers:
> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg
> Align
> NOTE 0x000094 0x00000000 0x00000000 0x00514 0x00514
> 0
> LOAD 0x0005a8 0xc0000000 0xc0000000 0x2000000 0x2000000
> RWE 0
> LOAD 0x20005a8 0xc2800000 0xc2800000 0x1800000
> 0x1800000 RWE 0
>
> There is no dynamic section in this file.
>
> There are no relocations in this file.
>
> No version information found in this file.
>
> Notes at offset 0x00000094 with length 0x00000514:
> Owner Data size Description
> CORE 0x00000094 NT_PRSTATUS (prstatus
> structure)
> VMCOREINFO 0x00000452 Unknown note type:
> (0x00000000)
> $
>
> Are the LOAD sections different?
hfli@msh-pc1935:~/work/crash-utility$ readelf -a Vmcore308
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: CORE (Core file)
Machine: ARM
Version: 0x1
Entry point address: 0x0
Start of program headers: 52 (bytes into file)
Start of section headers: 0 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 0 (bytes)
Number of section headers: 0
Section header string table index: 0
There are no sections in this file.
There are no sections to group in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg
Align
NOTE 0x000094 0x00000000 0x00000000 0x000a8 0x000a8 0
LOAD 0x00013c 0xc0000000 0x00000000 0xa00000 0xa00000 RWE 0
LOAD 0xa0013c 0xc1e00000 0x01e00000 0x6200000 0x6200000 RWE 0
There is no dynamic section in this file.
There are no relocations in this file.
No version information found in this file.
Notes at offset 0x00000094 with length 0x000000a8:
Owner Data size Description
CORE 0x00000094 NT_PRSTATUS (prstatus
structure)
---
I notice Notes section has not _VMCOREINFO_.
The following is my step of using kdump and crash utility.
1. built linux kernel source
2. Put arch/arm/boot/uImage to tftp server;
Put arch/arm/boot/uImage to nfs server.(kernel launch rootfs by
NFS)
3. bootup uImage with "crashkernel=20M@10M"
4. load uImage of capture kernel。
$./sbin/kexec -p --atags --append="console=ttyAM0,38400n8
root=/dev/nfs rw nfsroot=10.38.50.248:/nfs/nfs ip=10.38.50.241
loglevel=15 rdinit=/rdinit" /uImagetahoe308
5 inserting panic module to trigger panic.
$insmod module.ko
6 capture kernel boots up. (In the progress of booting, capture will
initialize /proc/vmcore. if the initialization of vmcore fails,
/proc/vmcore won't existence.)
7. use _cp_ tool dump the vmcore
$cp /proc/vmcore /Vmcore308
8. copy vmlinux & Vmcore308 to crash working directory and use crash
utility analyse the Vmcore 308.
hfli@pc1935:~/work/crash-utility$ ./crash vmlinux308 Vmcore308
crash 6.1.4
Copyright (C) 2002-2013 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public
License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for
details.
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show
copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu
--target=arm-elf-linux"...
crash: read error: kernel virtual address: c0c1e040 type: "first vmap_area
va_start"
Errors like the one above typically occur when the kernel and memory
source
do not match. These are the files being used:
KERNEL: vmlinux308
DUMPFILE: Vmcore308
--
Unfortunately, the crash also read error and deduce the kernel and
memory source don't match.
The vmcore initialization looks like fine. and copying the dump file
of /proc/vmcore also works fine.
I couldn't know whether and why the vmcore is corrupt.
I don't know either, but in the case above, kernel virtual address c0c1e040
doesn't fit in the virtual address ranges declared in the vmcore header:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg
Align
NOTE 0x000094 0x00000000 0x00000000 0x000a8 0x000a8 0
LOAD 0x00013c 0xc0000000 0x00000000 0xa00000 0xa00000 RWE 0
LOAD 0xa0013c 0xc1e00000 0x01e00000 0x6200000 0x6200000 RWE 0
If you go through the exercise I showed a few messages back, i.e, look at the
kernel's vmap_area_list list_head structure by entering "p vmap_area_list",
you
should see its "next" pointer containing the c0c1e040 address. But the vmcore
shows a hole between c0a00000 and c1e00000.
Dave