----- "xiaowei hu" <xiaowei.hu(a)oracle.com> wrote:
Hi all,
There is a bug when using crash to process the xen domU dump core that
larger that 4GB(it is found at processing a 10GB guest core dump file).
crash reporting this errors:
crash: cannot find mfn 8392757 (0x801035) in page index
crash: cannot read/find cr3 page
this is caused by a var overflow,in the structure of
typedef struct xc_core_header {
unsigned int xch_magic;
unsigned int xch_nr_vcpus;
unsigned int xch_nr_pages;
unsigned int xch_ctxt_offset;
unsigned int xch_index_offset;
unsigned int xch_pages_offset;
} xc_core_header_t;
the xch_ctxt_offset,xch_index_offset and xch_pages_offset mean the
offsets in the core dump file , when it is defined as unsingend
long ,that means the file can't be more that 4GB,so when processing with
core dump files that more than 4GB may have error (I encountered
overflow on that 10GB file),so changing those offset vars to unsigned
long ,make sure crash can seek to the right position.
btw,please reply directly to me ,I am not in the mail list.
Signed-off-by: Xiaowei Hu <xiaowei.hu(a)oracle.com>
diff -up
crash-5.0.0/xendump.h.org crash-5.0.0/xendump.h
---
crash-5.0.0/xendump.h.org 2010-02-04 03:48:04.000000000 +0800
+++ crash-5.0.0/xendump.h 2010-02-04 05:41:27.000000000 +0800
@@ -28,9 +28,9 @@ typedef struct xc_core_header {
unsigned int xch_magic;
unsigned int xch_nr_vcpus;
unsigned int xch_nr_pages;
- unsigned int xch_ctxt_offset;
- unsigned int xch_index_offset;
- unsigned int xch_pages_offset;
+ unsigned long xch_ctxt_offset;
+ unsigned long xch_index_offset;
+ unsigned long xch_pages_offset;
} xc_core_header_t;
struct pfn_offset_cache {
First question -- are you saying that the change above works for you?
yes, this change works for me on a 10GB dump core file,whose .xen_p2m segment's offset
at
0x280005000, this offset can't be stored in a unsinged int var.
And second -- in your dumpfile, even with 10GB of memory,
wouldn't
the base offset value of all three indexes still fit well below
the 4GB mark?
actually from the xen-dump-core document the .xen_p2m segment should be located before
the .xen_pages segment, in this order ,there is should not have problem.
but according the segment table read by readelf,I found the core dump file have the
xen_p2m
segment located at offset 0x2800025000 after the .xen_pages segment,beyond the 4GB mark.
The xc_core_header in crash is a copy of that found in
"tools/libxc/xenctrl.h",
and is presumptively the beginning/header of the dumpfile. And so making the
wholesale change above breaks all earlier (?) versions.
But what is confusing is that the latest/final version of
"xenctrl.h" used in RHEL5
(3.0.3 vintage), as well as the current version in Fedora (3.4.0-2.fc12) still use
unsigned int offsets, and I just checked with one of our xen masters, and the Xensource
git tree also still has unsigned int values in the header data structure:
typedef struct xc_core_header {
unsigned int xch_magic;
unsigned int xch_nr_vcpus;
unsigned int xch_nr_pages;
unsigned int xch_ctxt_offset;
unsigned int xch_index_offset;
unsigned int xch_pages_offset;
} xc_core_header_t;
#define XC_CORE_MAGIC 0xF00FEBED
#define XC_CORE_MAGIC_HVM 0xF00FEBEE
Are your xen userspace tools an Oracle hybrid?
yes, the core dump file is generated on oracle virtualization server.But I did not check
the ovm
source code for changes of this header data structure.will check it and replay again
tommorrow.
Dave
thanks
xiaowei