Re: [Crash-utility] questions about crash utility

Thursday, 17 January 2013

 Hi Dave:       thank you very much for your detail answer, this really helpful. please
see my inline words. thanks.> Date: Thu, 17 Jan 2013 14:17:36 -0500
...
 From: anderson(a)redhat.com
 To: crash-utility(a)redhat.com
 Subject: Re: [Crash-utility] questions about crash utility

...
 The fact that crash gets as far as it does at least means that the
 ELF header you've created was deemed acceptable as an ARM vmcore.
 However, the error messages re: "cpu_present_mask indicates..." and
 "cannot determine base kernel version" indicate that the data
 that was read from the vmcore was clearly not the correct data.

 The "cpu_present_mask" value that it read contained too
 many bits -- presuming that the 32-bit ARM processor is
 still limited to only 4 cpus.  (looks like upstream that
 CONFIG_NR_CPUS is still 2 in the arch/arm/configs files.)

 But more indicative of the wrong data being read is the second
 "cannot determine base kernel version" message, which was generated
 after it read the kernel's "init_uts_ns" uts_namespace structure.
 After reading it, it sees that the "release" string contains
 non-ASCII data, whereas it should contain the kernel version:

 crash> p init_uts_ns
 init_uts_ns = $3 = {
   kref = {
     refcount = {
       counter = 2
     }
   }, 
   name = {
     sysname =
"Linux\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",

     nodename =
"phenom-01.lab.bos.redhat.com\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",

     release =
"2.6.32-313.el6.x86_64\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",

     version = "#1 SMP Thu Sep 27 16:25:19 EDT
2012\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",

     machine =
"x86_64\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",

     domainname =
"(none)\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
   }
 }
 crash>

 So it appears that you're reading data from the wrong
 locations in the dumpfile.  You should be able to verify 
 that by bringing up the crash session with the --minimal
 flag like this:

   $ crash --minimal vmlinux vmcore

 That will bypass most of the initialization, including all
 readmem() calls of the vmcore.  Then do this:

  crash> rd linux_banner 20
  ffffffff818000a0:  65762078756e694c 2e33206e6f697372   Linux version 3.
  ffffffff818000b0:  63662e312d312e35 365f3638782e3731   5.1-1.fc17.x86_6
  ffffffff818000c0:  626b636f6d282034 69756240646c6975   4 (mockbuild@bui
  ffffffff818000d0:  2e33322d6d76646c 6465662e32786870   ldvm-23.phx2.fed
  ffffffff818000e0:  656a6f727061726f 202967726f2e7463   oraproject.org) 
  ffffffff818000f0:  7265762063636728 372e34206e6f6973   (gcc version 4.7
  ffffffff81800100:  303231303220302e 6465522820373035   .0 20120507 (Red
  ffffffff81800110:  372e342074614820 47282029352d302e    Hat 4.7.0-5) (G
  ffffffff81800120:  3123202920294343 75685420504d5320   CC) ) #1 SMP Thu
  ffffffff81800130:  3120392067754120 2033343a30353a37    Aug 9 17:50:43 
  crash> rd -a linux_banner
  ffffffff818000a0:  Linux version 3.5.1-1.fc17.x86_64 (mockbuild(a)buildvm-23.phx2
  ffffffff818000dc:  .fedoraproject.org) (gcc version 4.7.0 20120507 (Red Hat 4.7
  ffffffff81800118:  .0-5) (GCC) ) #1 SMP Thu Aug 9 17:50:43 UTC 2012
  crash>

 I'm guessing that you will not see a string starting with "Linux version"
 with your dumpfile as shown above.

 If that's the case, then it's clear that the readmem() function is ultimately
 reading from the wrong vmcore file offset.  

 Here's what you can try doing.  Taking the linux_banner example above, 
 you can check where in the dumpfile it's reading from by setting the debug
 flag, before doing a simple read -- like this example on an ARM dumpfile:

  crash> set debug 8
  debug: 8
  crash> rd linux_banner
  <addr: c033ea10 count: 1 flag: 488 (KVADDR)>
  <readmem: c033ea10, KVADDR, "32-bit KVADDR", 4, (FOE), ff94f048>
  <read_kdump: addr: c033ea10 paddr: 33ea10 cnt: 4>
  read_netdump: addr: c033ea10 paddr: 33ea10 cnt: 4 offset: 33f088
  c033ea10:  756e694c                              Linu
  crash>

 The linux_banner is at virtual address c033ea10 (addr).  First it gets translated
 into physical address 33ea10 (paddr).  Then that paddr is translated into the
 vmcore file offset of 33f088.  It lseeks to vmcore file offset 33f088 and
 reads 4 bytes, which contain "756e694c", or the first 4 bytes of the
 "Linux version ..." string.

 Note that if I subtract the physical address from vmcore file offset
 I get this:

  crash> eval 33f088 - 33ea10
  hexadecimal: 678  
      decimal: 1656  
        octal: 3170
       binary: 00000000000000000000011001111000
  crash>

 which would put physical address 0 at a vmcore file offset of 0x678, and
 therefore implying that that the ELF header comprises the first 0x678 bytes.
 And looking at the vmcore, that can be verified:
  yes you are right, here i get the result as below:crash> set debug 8
 debug: 8
crash> rd linux_banner
<addr: c065a071 count: 1 flag: 488 (KVADDR)>
<readmem: c065a071, KVADDR, "32-bit KVADDR", 4, (FOE), ffdf297c>
<read_kdump: addr: c065a071 paddr: 85a071 cnt: 4>
read_netdump: addr: c065a071 paddr: 85a071 cnt: 4 offset: 65a0e5
c065a071:  03e59130                              0...
   the virtual address is 0xc065a071 , and the physical address is 0x85a071 , and the
offset is 0x65a0e5.    my elf header is 116 bytes long, 0x65a0e5 - 116=0x65A071, which has
a gap 0x00200000 with the physical address 0x85a071.   
...
  $ readelf -a vmcore
  ELF Header:
    Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
    Class:                             ELF32
    Data:                              2's complement, little endian
    Version:                           1 (current)
    OS/ABI:                            UNIX - System V
    ABI Version:                       0
    Type:                              CORE (Core file)
    Machine:                           ARM
    Version:                           0x1
    Entry point address:               0x0
    Start of program headers:          52 (bytes into file)
    Start of section headers:          0 (bytes into file)
    Flags:                             0x0
    Size of this header:               52 (bytes)
    Size of program headers:           32 (bytes)
    Number of program headers:         3
    Size of section headers:           0 (bytes)
    Number of section headers:         0
    Section header string table index: 0

  There are no sections in this file.

  There are no sections to group in this file.

  Program Headers:
    Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
    NOTE           0x000094 0x00000000 0x004e345c 0x005e4 0x005e4     0
    LOAD           0x000678 0xc0000000 0x00000000 0x5600000 0x5600000 RWE 0
    LOAD           0x5600678 0xc5700000 0x05700000 0x100000 0x100000 RWE 0
  ...

 Note that the "Offset" value of the first PT_LOAD segment has a file offset
 value of 0x678.  
  here i got the result as below:Program Headers:
   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  NOTE           0x000000 0x00000000 0x00000000 0x00000 0x00000     0
  LOAD           0x000074 0xc0000000 0x00200000 0x2fe00000 0x2fe00000 RWE 0   so the
problem is i don't understand the elf header meaning accurately. if i modify code as
below, everything is ok for me: offset += sizeof(struct elf_phdr);
phdr->p_offset = offset+0x00200000;
 phdr->p_vaddr = 0xc0000000;
 phdr->p_paddr = 0x00200000;
 phdr->p_filesz = phdr->p_memsz = MEMSIZE-0x00200000;
    although my modification can make crash utility work well, i want to know exactly
whether i am doing the right thing.   1. our platform has the ddr address from physical
address 0x0.   2. when compiling Linux kernel, our platform set in .config file:
CONFIG_PHYS_OFFSET=0x00200000   3. when Kernel crash, all ddr content will be dumped, from
address 0x0~768MB. but kernel data starts from 0x00200000 actually.     my questions are: 
  1. whether my setting of ELF header is correct this time? the offset, paddr, and p_memsz
?    2. i am wondering how does crash utility translate virtual address to physical
address before and after it get the kernel page table? before get kernel page table, does
it calculate as : (virtual_addr - p_vaddr + p_paddr) ?   after get kernel page table, does
it walk through the page table and find out the real physical address accordingly?     3.
my real purpose is to get the ftrace content from dump file by crash utility , but seem
the command trace is not for this case, do i need to compile the extension
"trace" of crash utility? is there any guide to follow?  
...
 Another thing to do is to verify that your phys_base of 0x20000000
 is being properly seen.  In the --minimal session, you can verify that
 by doing this:

  crash> help -m | grep phys_base

 Trying the above should yield some clues into the problem you're encountering.

 Dave

 --
 Crash-utility mailing list
 Crash-utility(a)redhat.com
 https://www.redhat.com/mailman/listinfo/crash-utility

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] questions about crash utility