Hi,
> >> Another theoretical question -- since the /proc/iomem
exports the
> >> xen_phys_start variable to user space, is there any way it could
> >> be handled entirely in user space by kexec-tools?
> >
> > Theoretically yes. But it is necessary to add another ElfNote section.
> > I thought it is the smallest modication necessary to add
'xen_phys_start'
> > in the crashinfo.
It may be necessary to modify xen hypervisor and/or linux kernel too.
Let me consider for a while.
Anyway, I will bring up this discussion on the kexec list.
Thanks.
Itsuro Oda
On Tue, 22 Apr 2008 07:21:35 +0900
Itsuro ODA <oda(a)valinux.co.jp> wrote:
> Hi Dave,
>
> On Mon, 21 Apr 2008 09:09:59 -0400
> Dave Anderson <anderson(a)redhat.com> wrote:
>
> > Itsuro ODA wrote:
> > > Hi Dave,
> > >
> > > I thought it is better to be small value from the view point
> > > to save 'xen_phys_start' value in the crashinfo.
> > >
> > > From the view point of users you are right. It should be
> > > '--xen_phys_start=<address>'.
> >
> > Hi Itsuro,
> >
> > But even from the kernel viewpoint, I'm not sure why it would be better?
> > The xen_phys_start symbol is an unsigned long, and its container in the
> > crash_xen_info_t is an unsigned long. So I don't see any benefit or reason
> > to make it smaller.
>
> Yes, I was too anxious.
>
> > >
> >> Another theoretical question -- since the /proc/iomem
exports the
> >> xen_phys_start variable to user space, is there any way it could
> >> be handled entirely in user space by kexec-tools?
> >
> > Theoretically yes. But it is necessary to add another ElfNote section.
> > I thought it is the smallest modication necessary to add
'xen_phys_start'
> > in the crashinfo.
> > > Do you think which is better ?
> >
> > That's a good question. My concern would be to have an easier way for
customers
> > to deal with the problem without having to upgrade the kernel package; it's
always
> > easier to upgrade a user-package.
>
> I see.
>
> > However, whatever gets fixed goes beyond the scope of the crash utility.
> > For now, the crash utility will need the --xen-phys_start=xxxx workaround
> > because the RHEL5.2 kernel will be released without a fix. And that
> > decision will have to made by the kernel kexec/kdump and kexec-tools
> > maintainers. Can you bring up this discussion on the kexec list?
>
> Yes, sure.
>
> Thanks.
> Itsuro Oda
>
> >
> > Thanks,
> > Dave
> >
> > >
> > > Thanks.
> > > Itsuro Oda
> > >
> > > On Fri, 18 Apr 2008 10:14:04 -0400
> > > Dave Anderson <anderson(a)redhat.com> wrote:
> > >
> > >> Itsuro ODA wrote:
> > >>> Hi,
> > >>>
> > >>> The attached is some fixes which are necessary for xencrash to use
> > >>> for the newer version of xen (after 3.1.0, ex. 3.1.2, 3.2.0).
> > >>>
> > >>> * x86
> > >>> - add some symbols which are used to know the boundary
> > >>> condition at the tracing in 'bt'.
> > >>> - fix for the symbol 'idle_pg_table_l3' going away.
> > >>>
> > >>> * x86_64
> > >>> - fix for the virtual address area of xen code/data is added.
> > >>> - fix for the symbol 'idle_pg_table_l4' going away.
> > >>> - add the option '--xen_phys_start_mfn='
> > >>> newer version of xen (ex. RHEL5.2, 3.2.0) moves the
> > >>> physical(machine) address of xen code/data area after
> > >>> the system started up.
> > >>> The relocated physical(machine) address of xen code/data
> > >>> area is necessary to use xencrash.
> > >>> The value can be got by 'cat /proc/iomem' under dom0.
> > >>> -------------------------------------------------------
> > >>> # cat /proc/iomem
> > >>> ...
> > >>> 7e600000-7f5fffff : Hypervisor code and data *** this line
> > >>> ...
> > >>> -------------------------------------------------------
> > >>> ex. to use xencrash:
> > >>> # crash --xen_phys_start_mfn=0x7e600 xen-syms vmcore
> > >>> - extract 'xen_phys_start_mfn' from the
XEN_ELFNOTE_CRASH_INFO
> > >>> ElfNote.
> > >>> This is experimental code because it is necessary to modify
> > >>> the xen to add the xen_phys_start_mfn value in the ElfNote
> > >>> at crash time. (the patch is also attached in this mail.)
> > >>> A sample dump is availavle in the following URL:
> > >>>
http://people.valinux.co.jp/~oda/x86_64_dump_080417.tar.gz
> > >>>
> > >>> This patch is for crash-4.0-6.2.
> > >>>
> > >>> Thanks.
> > >>> Itsuro Oda
> > >> Hi Itsuro,
> > >>
> > >> Thanks very much for the quick response.
> > >>
> > >> I have one relatively minor issue. I understand that the previously
existing
> > >> "--p2m_mfn" option was expressed as an mfn, and that made
sense because its value
> > >> exists in the xen hypervisor code as an mfn value (actually a
xen_pfn_t).
> > >>
> > >> But it seems unnecessary to bother going through the translation of
the
> > >> hypservisors's "xen_phys_start" physical address into an
mfn? And then
> > >> only to have the crash utility have to turn it back into a physical
address
> > >> when it uses it? Why not just leave it as a physical address in both
the
> > >> kernel patch, and when passing it to the crash utility?
> > >>
> > >> From a user's standpoint, when it's necessary to pre-determine
the argument from
> > >> a hypervisor/dom0 vmcore that doesn't contain it, the
xen_phys_start can be determined
> > >> by either reading /proc/iomem on the live system, where as you've
shown in
> > >> the example above, it's 7e600000:
> > >>
> > >> > # cat /proc/iomem
> > >> > ...
> > >> > 7e600000-7f5fffff : Hypervisor code and data *** this line
> > >> > ...
> > >>
> > >> Alternatively, the same dom0/HV kdump vmcore can be run in a crash
session using
> > >> the dom0 vmlinux, where in this example vmcore, it's 0x3f000000:
> > >>
> > >> crash> px xen_hypervisor_res
> > >> xen_hypervisor_res = $3 = {
> > >> start = 0x3f000000,
> > >> end = 0x3fffffff,
> > >> name = 0xffffffff8049ab72 "Hypervisor code and data",
> > >> flags = 0x80000200,
> > >> parent = 0xffff880000001180,
> > >> sibling = 0x0,
> > >> child = 0xffff8800000000a8
> > >> }
> > >>
> > >> For convenience sake, I'd prefer to just leave it as that, without
> > >> needlessly forcing the user to translate the above into an mfn, so
> > >> that they can simply enter: "crash --xen_phys_start=0x3f000000
..."
> > >>
> > >> The same thing applies to the kernel patch -- just leave it in its
> > >> natural state as a physical (machine) address.
> > >>
> > >> Unfortunately it's already going to be an annoyance to have to
pre-determine
> > >> the xen_phys_start value -- so why bother making it even more cryptic
by
> > >> forcing the user to turn it into an mfn?
> > >>
> > >> Another theoretical question -- since the /proc/iomem exports the
> > >> xen_phys_start variable to user space, is there any way it could
> > >> be handled entirely in user space by kexec-tools?
> > >>
> > >> Dave
> > >>
> > >>
> > >>> === for x86 ===
> > >>> ---
lkcd_x86_trace.c.org 2008-04-17 10:13:05.000000000 +0900
> > >>> +++ lkcd_x86_trace.c 2008-04-17 10:13:17.000000000 +0900
> > >>> @@ -1423,6 +1423,7 @@
> > >>> if (XEN_HYPER_MODE()) {
> > >>> func_name = kl_funcname(pc);
> > >>> if (STREQ(func_name, "idle_loop") || STREQ(func_name,
"hypercall")
> > >>> + || STREQ(func_name, "tracing_off")
> > >>> || STREQ(func_name, "handle_exception")) {
> > >>> UPDATE_FRAME(func_name, pc, 0, sp, bp, asp, 0, 0, bp - sp,
0);
> > >>> return(trace->nframes);
> > >>> @@ -1682,6 +1683,7 @@
> > >>> if (func_name && XEN_HYPER_MODE()) {
> > >>> if (STREQ(func_name, "continue_nmi") ||
> > >>> STREQ(func_name, "vmx_asm_vmexit_handler") ||
> > >>> + STREQ(func_name, "handle_nmi_mce") ||
> > >>> STREQ(func_name, "deferred_nmi")) {
> > >>> /* Interrupt frame */
> > >>> sp = curframe->fp + 4;
> > >>> ---
x86.c.org 2008-04-17 10:29:21.000000000 +0900
> > >>> +++ x86.c 2008-04-17 10:31:51.000000000 +0900
> > >>> @@ -2846,7 +2846,10 @@
> > >>> *paddr = kvaddr - DIRECTMAP_VIRT_START;
> > >>> return TRUE;
> > >>> }
> > >>> - pgd = (ulonglong *)symbol_value("idle_pg_table_l3");
> > >>> + if (symbol_exists("idle_pg_table_l3"))
> > >>> + pgd = (ulonglong *)symbol_value("idle_pg_table_l3");
> > >>> + else
> > >>> + pgd = (ulonglong *)symbol_value("idle_pg_table");
> > >>> } else {
> > >>> if (!vt->vmalloc_start) {
> > >>> *paddr = VTOP(kvaddr);
> > >>> @@ -4965,7 +4968,8 @@
> > >>> break;
> > >>>
> > >>> case PRE_GDB:
> > >>> - if (symbol_exists("idle_pg_table_l3")) {
> > >>> + if (symbol_exists("create_pae_xen_mappings") ||
> > >>> + symbol_exists("idle_pg_table_l3")) {
> > >>> machdep->flags |= PAE;
> > >>> PGDIR_SHIFT = PGDIR_SHIFT_3LEVEL;
> > >>> PTRS_PER_PTE = PTRS_PER_PTE_3LEVEL;
> > >>>
> > >>> === for x86_64 ===
> > >>> ---
defs.h.org 2008-04-17 15:03:14.000000000 +0900
> > >>> +++ defs.h 2008-04-17 15:41:10.000000000 +0900
> > >>> @@ -2162,10 +2162,13 @@
> > >>>
> > >>> #define FILL_PML4_HYPER() { \
> > >>> if (!machdep->machspec->last_pml4_read) { \
> > >>> - readmem(symbol_value("idle_pg_table_4"), KVADDR, \
> > >>> - machdep->machspec->pml4, PAGESIZE(),
"idle_pg_table_4", \
> > >>> + unsigned long idle_pg_table = \
> > >>> + symbol_exists("idle_pg_table_4") ?
symbol_value("idle_pg_table_4") : \
> > >>> + symbol_value("idle_pg_table"); \
> > >>> + readmem(idle_pg_table, KVADDR, \
> > >>> + machdep->machspec->pml4, PAGESIZE(),
"idle_pg_table", \
> > >>> FAULT_ON_ERROR); \
> > >>> - machdep->machspec->last_pml4_read =
symbol_value("idle_pg_table_4"); \
> > >>> + machdep->machspec->last_pml4_read = idle_pg_table; \
> > >>> }\
> > >>> }
> > >>>
> > >>> @@ -4081,6 +4084,8 @@
> > >>> void get_kdump_regs(struct bt_info *, ulong *, ulong *);
> > >>> void xen_kdump_p2m_mfn(char *);
> > >>> int is_sadump_xen(void);
> > >>> +void set_xen_phys_start_mfn(char *);
> > >>> +ulong xen_phys_start_mfn(void);
> > >>>
> > >>> /*
> > >>> * diskdump.c
> > >>> ---
main.c.org 2008-04-17 15:26:37.000000000 +0900
> > >>> +++ main.c 2008-04-17 15:45:18.000000000 +0900
> > >>> @@ -48,6 +48,7 @@
> > >>> {"no_ikconfig", 0, 0, 0},
> > >>> {"hyper", 0, 0, 0},
> > >>> {"p2m_mfn", required_argument, 0, 0},
> > >>> + {"xen_phys_start_mfn", required_argument, 0, 0},
> > >>> {"zero_excluded", 0, 0, 0},
> > >>> {"no_panic", 0, 0, 0},
> > >>> {"more", 0, 0, 0},
> > >>> @@ -155,6 +156,9 @@
> > >>> else if (STREQ(long_options[option_index].name,
"p2m_mfn"))
> > >>> xen_kdump_p2m_mfn(optarg);
> > >>>
> > >>> + else if (STREQ(long_options[option_index].name,
"xen_phys_start_mfn"))
> > >>> + set_xen_phys_start_mfn(optarg);
> > >>> +
> > >>> else if (STREQ(long_options[option_index].name,
"zero_excluded"))
> > >>> *diskdump_flags |= ZERO_EXCLUDED;
> > >>>
> > >>> ---
netdump.c.org 2008-04-17 15:09:06.000000000 +0900
> > >>> +++ netdump.c 2008-04-17 15:48:44.000000000 +0900
> > >>> @@ -1521,6 +1521,8 @@
> > >>> */
> > >>> if (!nd->xen_kdump_data->p2m_mfn)
> > >>> nd->xen_kdump_data->p2m_mfn = *(uptr+(words-1));
> > >>> + if (words > 9 &&
!nd->xen_kdump_data->xen_phys_start_mfn)
> > >>> + nd->xen_kdump_data->xen_phys_start_mfn =
*(uptr+(words-2));
> > >>> }
> > >>> }
> > >>> break;
> > >>> @@ -1724,6 +1726,8 @@
> > >>> */
> > >>> if (!nd->xen_kdump_data->p2m_mfn)
> > >>> nd->xen_kdump_data->p2m_mfn =
*(up+(words-1));
> > >>> + if (words > 9 &&
!nd->xen_kdump_data->xen_phys_start_mfn)
> > >>> + nd->xen_kdump_data->xen_phys_start_mfn =
*(up+(words-2));
> > >>> }
> > >>> }
> > >>> break;
> > >>> @@ -2312,3 +2316,22 @@
> > >>>
> > >>> return FALSE;
> > >>> }
> > >>> +
> > >>> +void
> > >>> +set_xen_phys_start_mfn(char *arg)
> > >>> +{
> > >>> + ulong value;
> > >>> + int errflag = 0;
> > >>> +
> > >>> + value = htol(arg, RETURN_ON_ERROR|QUIET, &errflag);
> > >>> + if (!errflag)
> > >>> + xen_kdump_data.xen_phys_start_mfn = value;
> > >>> + else
> > >>> + error(WARNING, "invalid xen_phys_start_mfn argument:
%s\n", arg);
> > >>> +}
> > >>> +
> > >>> +ulong
> > >>> +xen_phys_start_mfn(void)
> > >>> +{
> > >>> + return nd->xen_kdump_data->xen_phys_start_mfn;
> > >>> +}
> > >>> ---
netdump.h.org 2008-04-17 15:38:59.000000000 +0900
> > >>> +++ netdump.h 2008-04-17 15:39:38.000000000 +0900
> > >>> @@ -109,6 +109,7 @@
> > >>> ulong accesses;
> > >>> int p2m_frames;
> > >>> ulong *p2m_mfn_frame_list;
> > >>> + ulong xen_phys_start_mfn;
> > >>> };
> > >>>
> > >>> #define KDUMP_P2M_INIT (0x1)
> > >>> ---
x86_64.c.org 2008-04-17 15:34:32.000000000 +0900
> > >>> +++ x86_64.c 2008-04-17 15:36:53.000000000 +0900
> > >>> @@ -1401,6 +1401,10 @@
> > >>> return FALSE;
> > >>>
> > >>> if (XEN_HYPER_MODE()) {
> > >>> + if (XENTEXT_VIRT_ADDR(kvaddr)) {
> > >>> + *paddr = kvaddr - __XEN_VIRT_START + (xen_phys_start_mfn()
<< PAGESHIFT());
> > >>> + return TRUE;
> > >>> + }
> > >>> if (DIRECTMAP_VIRT_ADDR(kvaddr)) {
> > >>> *paddr = kvaddr - DIRECTMAP_VIRT_START;
> > >>> return TRUE;
> > >>> ---
xen_hyper_defs.h.org 2008-04-17 15:31:23.000000000 +0900
> > >>> +++ xen_hyper_defs.h 2008-04-17 15:34:15.000000000 +0900
> > >>> @@ -64,6 +64,9 @@
> > >>> #define DIRECTMAP_VIRT_START (0xffff830000000000)
> > >>> #define DIRECTMAP_VIRT_END (0xffff840000000000)
> > >>> #define PAGE_OFFSET_XEN_HYPER DIRECTMAP_VIRT_START
> > >>> +#define __XEN_VIRT_START (0xFFFF828C80000000)
> > >>> +#define XENTEXT_VIRT_ADDR(vaddr) \
> > >>> + (((vaddr) >= __XEN_VIRT_START) && ((vaddr) <
DIRECTMAP_VIRT_START))
> > >>> #endif
> > >>>
> > >>> #ifdef IA64
> > >>>
> > >>> === Xen (explanation purpose) ===
> > >>> ---
include/xen/elfcore.h.org 2008-04-17 14:11:41.000000000 +0900
> > >>> +++ include/xen/elfcore.h 2008-04-17 14:11:57.000000000 +0900
> > >>> @@ -66,6 +66,7 @@
> > >>> unsigned long xen_compile_time;
> > >>> unsigned long tainted;
> > >>> #ifdef CONFIG_X86
> > >>> + unsigned long xen_phys_start_mfn;
> > >>> unsigned long dom0_pfn_to_mfn_frame_list_list;
> > >>> #endif
> > >>> } crash_xen_info_t;
> > >>> ---
arch/x86/crash.c.org 2008-04-17 14:12:51.000000000 +0900
> > >>> +++ arch/x86/crash.c 2008-04-17 14:13:13.000000000 +0900
> > >>> @@ -102,6 +102,7 @@
> > >>> hvm_disable();
> > >>>
> > >>> info = kexec_crash_save_info();
> > >>> + info->xen_phys_start_mfn = xen_phys_start >>
PAGE_SHIFT;
> > >>> info->dom0_pfn_to_mfn_frame_list_list =
> > >>> arch_get_pfn_to_mfn_frame_list_list(dom0);
> > >>> }
> > >
> >
>
> --
> Itsuro ODA <oda(a)valinux.co.jp>
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
>
https://www.redhat.com/mailman/listinfo/crash-utility
--
Itsuro ODA <oda(a)valinux.co.jp>