Re: [Crash-utility] how to use crash utility to parse the binary memory dump

Monday, 28 July 2014

----- Original Message -----
...
 > I tested your latest patch on the sample ARM and ARM64 RAM
dumps
 > you sent me.
 >
 > As far as the patch itself is concerned, I ran into a problem
 > where if crash is invoked in a directory where it does not have
 > write permission, the session hangs trying to write to a bad file
 > descriptor -- because of this:
 >
 >         fd2 = open(out_elf, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
 >         if (!fd2) {
 >                 error(INFO, "%s open error\n", out_elf);
 >                 goto end1;
 >         }
 >
 > It should be "if (fd2 < 0)".
 >
 Thanks. The corrected patch is attached.

 > I should have been more clear w/respect to "a temporary file".
 > what I was suggesting was that you do something like using
 > mkstemp(3) to create a temporay file in /var/tmp, and then
 > unlink() it immediately so it would only exist until the crash
 > session ends.
 >
 Done.

 > So I'm guessing that this dumpfile was taken before the "init" task
was even
 > created, and the kernel data structures were not fully initialized?
 >
 That's possible. IIRC, because of some issue with the setup, I had to
 stop execution and get the ramdump before the ramdisk was mounted.

 > Maybe you can try taking a RAM dump on an ARM64 machine after
 > it is up and running?
 >
 Unfortunately I don't have access to any arm64 machines. I am not sure
 when I can get one. 
Testing with just the 32-bit ARM RAM dumpfile you gave me
yielded some interesting, albeit somewhat strange, results.

Running exactly as you specified looks like this:

  $ crash vmlinux AP__0x81e00000.hex@0x81e00000
  ... [ cut ] ...
      DUMPFILE: /var/tmp/ramdump_elf2z0YjA
  ... [ cut ] ...
  crash> !readelf -a /var/tmp/ramdump_elf2z0YjA
  ELF Header:
    Magic:   7f 45 4c 46 01 01 01 03 00 00 00 00 00 00 00 00 
    Class:                             ELF32
    Data:                              2's complement, little endian
    Version:                           1 (current)
    OS/ABI:                            UNIX - GNU
    ABI Version:                       0
    Type:                              CORE (Core file)
    Machine:                           ARM
    Version:                           0x1
    Entry point address:               0x0
    Start of program headers:          52 (bytes into file)
    Start of section headers:          0 (bytes into file)
    Flags:                             0x0
    Size of this header:               52 (bytes)
    Size of program headers:           32 (bytes)
    Number of program headers:         2
    Size of section headers:           0 (bytes)
    Number of section headers:         0
    Section header string table index: 0

  There are no sections in this file.

  There are no sections to group in this file.

  Program Headers:
    Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
    NOTE           0x000074 0x00000074 0x00000000 0x00000 0x00000     0
    LOAD           0x000074 0x00000000 0x81e00000 0x1e200000 0x1e200000 RWE 0

  There is no dynamic section in this file.

  There are no relocations in this file.

  No version information found in this file.
  crash> help -m | grep phys_base
            phys_base: 81e00000
  crash>

Everything looks good, and both the "PhysAddr" field in the PT_LOAD
segment and the machdep->phys_base address correctly reflect the
offset value.

However, throughout the crash utility, if it is possible to
non-ambiguously enter a hexadecimal address, you can enter
the number without the "0x" prepended.  And elsewhere, it just
requires a hexadecimal value -- which should be the case with
these dumpfile offsets.  

So I tried using "AP__0x81e00000.hex@81e00000" on the crash command
line -- and although it seemed "to work", check out the PT_LOAD PhysAddr
and the phys_base value:

  $ crash vmlinux AP__0x81e00000.hex@81e00000
  ... [ cut ] ...
  crash> !readelf -a /var/tmp/ramdump_elfuDqyZq
  ... [ cut ] ...
  Program Headers:
    Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
    NOTE           0x000074 0x00000074 0x00000000 0x00000 0x00000     0
    LOAD           0x000074 0x00000000 0x00000051 0x1e200000 0x1e200000 RWE 0
  ...
  crash> help -m | grep phys_base
            phys_base: 51
  crash> 

That's because you're using strtoul(), which presumed that 81e00000
was a decimal string, threw out the e00000, used just the "81", and 
came up with the 0x51 value.

And as it turns out, you can put just about anything as the physical offset:

  $ crash vmlinux AP__0x81e00000.hex@0xdeadbeef
  ... [ cut ] ...
  Program Headers:
    Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
    NOTE           0x000074 0x00000074 0x00000000 0x00000 0x00000     0
    LOAD           0x000074 0x00000000 0xdeadbeef 0x1e200000 0x1e200000 RWE 0
  ...  
  crash> help -m | grep phys_base
            phys_base: deadbeef
  crash> 

Including 0:

  $ crash vmlinux AP__0x81e00000.hex@0
  ... [ cut  ] ...
  Program Headers:
    Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
    NOTE           0x000074 0x00000074 0x00000000 0x00000 0x00000     0
    LOAD           0x000074 0x00000000 0x00000000 0x1e200000 0x1e200000 RWE 0
  ...
  crash> help -m | grep phys_base
            phys_base: 0
  crash> 

However, although it *appears* to work correctly, it fails to translate/read
user-space or vmalloc addresses, because the PTEs contain reference to 
physical addresses that are based at 0x81e00000.  If your dumpfile sample
had been a kernel that was running with any kernel modules, you would have
seen an error during initialization when the linked list of kernel modules
gets read.

In any case, if you use "htol()" instead of strtoul(), the user won't
have to enter the "0x", because it will presume it's a hexadecimal
address.  

However, you cannot put *nothing* as the offset value:

  $ crash vmlinux AP__0x81e00000.hex@

  crash 7.0.8rc12
  Copyright (C) 2002-2014  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.

  Segmentation fault (core dumped)
  $ 

So that also needs fixing.

A few other questions/comments...

When 32-bit x86 kdumps are created, they typically default to using
the 64-bit ELF format.  That is required in case there is physical memory
above the 4GB threshold, which cannot be described in a 32-bit ELF
header.  Since 32-bit ARM can also be PAE, would it make more sense
to *always* use 64-bit ELF headers for both ARM and ARM64?  I don't
see why not, and it should simplify ramdump.c quite a bit.

And thinking this through a bit more, to me it seems really wasteful
to have to create a whole new dumpfile.  Even if it's only a temporary
file, it's still seemingly an unnecessary duplication of disk space.

Here's an idea, not fully thought through, but seems like it could
work when the "temporary" dumpfile option is used:

 (1) Create a temporary file that *only* consists of the ELF header. 
 (2) Set a new RAMDUMP flag in pc->flags2.
 (3) Pass that temporary ELF header file to is_kdump() as you do now.
 (4) is_kdump() passes it to is_netdump(), and I believe that is_netdump()
     should parse just the ELF header and accept it as a KDUMP_ELF32 or
     KDUMP_ELF64 file without even being aware that there's no physical
     memory data attached.
 (5) When kdump_init() is called, it passes through to netdump_init()
     which calls check_dumpfile_size() -- which would need to look
     at the RAMDUMP flag to do the right thing.
 (6) And instead of using read_kdump(), copy it to a new read_ramdump()
     function that reads from the original RAM dump image(s), figuring
     out the file offsets accordingly.  Keep the new read_ramdump()
     function in netdump.c so it can continue to use the "nd" vmcore_data
     structure that is statically defined there.  read_ramdump() may have
     to interact with function(s) in ramdump.c, or perhaps the ramdump_def
     structure can be shared with netdump.c somehow.

There will presumably be a few other glitches that will require checking
the new RAMDUMP flag, but I don't think that there would be anything that
couldn't be overcome.  That really would be the ideal way to handle these
files.

If the "-o elf_file" option is used, then you could proceed as you
do now -- create the new full ELF file, and pass it to is_kdump() without
having set the new RAMDUMP flag.

What do you think?

Dave

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] how to use crash utility to parse the binary memory dump