Re: [Crash-utility] [patch]Crash can't process xen dump core files larger that 4GB.

Thursday, 4 February 2010

----- "Dave Anderson" <anderson(a)redhat.com&gt; wrote:

...
 ----- "xiaowei hu" <xiaowei.hu(a)oracle.com&gt; wrote:

 > Hi all,
 >
 > There is a bug when using crash to process the xen domU dump core that
 > larger that 4GB(it is found at processing a 10GB guest core dump file).
 > crash reporting this errors:
 > crash: cannot find mfn 8392757 (0x801035) in page index
 >
 >
 > crash: cannot read/find cr3 page
 >
 > this is caused by a var overflow,in the structure of
 > typedef struct xc_core_header {
 >      unsigned int xch_magic;
 >      unsigned int xch_nr_vcpus;
 >      unsigned int xch_nr_pages;
 >      unsigned int xch_ctxt_offset;
 >      unsigned int xch_index_offset;
 >      unsigned int xch_pages_offset;
 > } xc_core_header_t;
 >
 > the xch_ctxt_offset,xch_index_offset and xch_pages_offset mean the
 > offsets in the core dump file , when it is defined as unsingend
 > long ,that means the file can't be more that 4GB,so when processing with
 > core dump files that more than 4GB may have error (I encountered
 > overflow on that 10GB file),so changing those offset vars to unsigned
 > long ,make sure crash can seek to the right position.
 > btw,please reply directly to me ,I am not in the mail list.
 >
 >
 > Signed-off-by: Xiaowei Hu <xiaowei.hu(a)oracle.com&gt;
 >
 >
 > diff -up crash-5.0.0/xendump.h.org crash-5.0.0/xendump.h
 > --- crash-5.0.0/xendump.h.org	2010-02-04 03:48:04.000000000 +0800
 > +++ crash-5.0.0/xendump.h	2010-02-04 05:41:27.000000000 +0800
 > @@ -28,9 +28,9 @@ typedef struct xc_core_header {
 >      unsigned int xch_magic;
 >      unsigned int xch_nr_vcpus;
 >      unsigned int xch_nr_pages;
 > -    unsigned int xch_ctxt_offset;
 > -    unsigned int xch_index_offset;
 > -    unsigned int xch_pages_offset;
 > +    unsigned long xch_ctxt_offset;
 > +    unsigned long xch_index_offset;
 > +    unsigned long xch_pages_offset;
 >  } xc_core_header_t;
 >
 >  struct pfn_offset_cache {

 First question -- are you saying that the change above works for you?

 And second -- in your dumpfile, even with 10GB of memory, wouldn't
 the base offset value of all three indexes still fit well below
 the 4GB mark?

 The xc_core_header in crash is a copy of that found in "tools/libxc/xenctrl.h",

 and is presumptively the beginning/header of the dumpfile.  And so making the
 wholesale change above breaks all earlier (?) versions.

 But what is confusing is that the latest/final version of "xenctrl.h" used in
RHEL5
 (3.0.3 vintage), as well as the current version in Fedora (3.4.0-2.fc12) still use
 unsigned int offsets, and I just checked with one of our xen masters, and the Xensource
 git tree also still has unsigned int values in the header data
 structure:

 typedef struct xc_core_header {
     unsigned int xch_magic;
     unsigned int xch_nr_vcpus;
     unsigned int xch_nr_pages;
     unsigned int xch_ctxt_offset;
     unsigned int xch_index_offset;
     unsigned int xch_pages_offset;
 } xc_core_header_t;

 #define XC_CORE_MAGIC     0xF00FEBED
 #define XC_CORE_MAGIC_HVM 0xF00FEBEE

 Are your xen userspace tools an Oracle hybrid?  
Ah -- it's becoming clearer now...

The evolution of the various xendump formats is the cause for confusion
and the issue at hand.

In the beginning, the "xm dump-core" facility used its own unique dumpfile
format, where the xc_core_header shown above was at the beginning
of the dumpfile and served as its primary header.

Much later, "xm dump-core" started using an ELF format, where it
carried forward 3 of the old xc_core_header fields above into either
this ELF note: 

  struct xen_dumpcore_elfnote_header_desc {
    uint64_t    xch_magic;
    uint64_t    xch_nr_vcpus;
    uint64_t    xch_nr_pages;
    uint64_t    xch_page_size;
  };

or into one of several ELF section headers.  The remaining 3 "offset" fields
are stored like so:

   xch_ctxt_offset: in the ".xen_prstatus" ELF section header
  xch_index_offset: in the ".xen_pfn" or ".xen_p2m" ELF section
header
                    depending whether it's fully-virtualized or
                    paravirtualized.
  xch_pages_offset: in the ".xen_pages" ELF section header

The offsets are in the ELF section headers are of "sh_offset" fields
of the Elf64_Shdr (or Elf32_Shdr if ELFCLASS32):

  typedef struct
  {
    Elf64_Word    sh_name;                /* Section name (string tbl index) */
    Elf64_Word    sh_type;                /* Section type */
    Elf64_Xword   sh_flags;               /* Section flags */
    Elf64_Addr    sh_addr;                /* Section virtual addr at execution */
    Elf64_Off     sh_offset;              /* Section file offset */
    Elf64_Xword   sh_size;                /* Section size in bytes */
    Elf64_Word    sh_link;                /* Link to another section */
    Elf64_Word    sh_info;                /* Additional section information */
    Elf64_Xword   sh_addralign;           /* Section alignment */
    Elf64_Xword   sh_entsize;             /* Entry size if section holds table */
  } Elf64_Shdr;

FWIW, I don't know (or recall) whether ELFCLASS32 is ever used, even with 32-bit
xen hosts/guests, because the "sh_offset" in the Elf32_Shdr is of type 
Elf32_Off, which is 32-bits:

  /* Type of file offsets.  */
  typedef uint32_t Elf32_Off;
  typedef uint64_t Elf64_Off;

Anyway, the problem is that the crash utility started using the old xc_core_header
data structure when it was the only header.  When they started using ELF format
dumpfiles, the sh_offset values from the ELF section headers were copied into
the old xc_core_header data structure in the crash utility so that the old code
base could still be used.  But if any of the sh_offset values overflowed into
the upper 32-bits, then they would be truncated when the copy was made. 

In any case, getting back to the crash utility issue, the patch that you
proposed cannot be used alone because it will break backwards-compatibility.

What could be done is to have the xc_core_verify() initialization code read
the dumpfile header into an "original" xc_core_header structure type, verify it
as one of the "old-style" dumpfiles, but then store the offsets into your
updated xc_core_header structure.

Dave

The xc_core_header above 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005