Hi Dave,
On Mon, 21 Apr 2008 09:09:59 -0400
Dave Anderson <anderson(a)redhat.com> wrote:
 Itsuro ODA wrote:
 > Hi Dave,
 > 
 > I thought it is better to be small value from the view point
 > to save 'xen_phys_start' value in the crashinfo. 
 > 
 > From the view point of users you are right. It should be
 > '--xen_phys_start=<address>'.
 
 Hi Itsuro,
 
 But even from the kernel viewpoint, I'm not sure why it would be better?
 The xen_phys_start symbol is an unsigned long, and its container in the
 crash_xen_info_t is an unsigned long.  So I don't see any benefit or reason
 to make it smaller. 
 > 
 >> Another theoretical question -- since the /proc/iomem exports the
 >> xen_phys_start variable to user space, is there any way it could
 >> be handled entirely in user space by kexec-tools?
 > 
 > Theoretically yes. But it is necessary to add another ElfNote section.
 > I thought it is the smallest modication necessary to add 'xen_phys_start'
 > in the crashinfo.
 > Do you think which is better ?
 
 That's a good question.  My concern would be to have an easier way for customers
 to deal with the problem without having to upgrade the kernel package; it's always
 easier to upgrade a user-package. 
 However, whatever gets fixed goes beyond the scope of the crash
utility.
 For now, the crash utility will need the --xen-phys_start=xxxx workaround
 because the RHEL5.2 kernel will be released without a fix.  And that
 decision will have to made by the kernel kexec/kdump and kexec-tools
 maintainers.  Can you bring up this discussion on the kexec list? 
 
 Thanks,
    Dave
 
 > 
 > Thanks.
 > Itsuro Oda
 > 
 > On Fri, 18 Apr 2008 10:14:04 -0400
 > Dave Anderson <anderson(a)redhat.com> wrote:
 > 
 >> Itsuro ODA wrote:
 >>> Hi,
 >>>
 >>> The attached is some fixes which are necessary for xencrash to use
 >>> for the newer version of xen (after 3.1.0, ex. 3.1.2, 3.2.0).
 >>>
 >>> * x86
 >>>   - add some symbols which are used to know the boundary
 >>>     condition at the tracing in 'bt'.
 >>>   - fix for the symbol 'idle_pg_table_l3' going away.
 >>>
 >>> * x86_64
 >>>   - fix for the virtual address area of xen code/data is added.
 >>>   - fix for the symbol 'idle_pg_table_l4' going away.
 >>>   - add the option '--xen_phys_start_mfn='
 >>>     newer version of xen (ex. RHEL5.2, 3.2.0) moves the
 >>>     physical(machine) address of xen code/data area after 
 >>>     the system started up.
 >>>     The relocated physical(machine) address of xen code/data
 >>>     area is necessary to use xencrash.
 >>>     The value can be got by 'cat /proc/iomem' under dom0.
 >>>     -------------------------------------------------------
 >>>     # cat /proc/iomem
 >>>     ...
 >>>       7e600000-7f5fffff : Hypervisor code and data  *** this line
 >>>     ...
 >>>     -------------------------------------------------------
 >>>     ex. to use xencrash:
 >>>     # crash --xen_phys_start_mfn=0x7e600 xen-syms vmcore
 >>>   - extract 'xen_phys_start_mfn' from the XEN_ELFNOTE_CRASH_INFO
 >>>     ElfNote.
 >>>     This is experimental code because it is necessary to modify
 >>>     the xen to add the xen_phys_start_mfn value in the ElfNote
 >>>     at crash time. (the patch is also attached in this mail.)
 >>>     A sample dump is availavle in the following URL:
 >>>     
http://people.valinux.co.jp/~oda/x86_64_dump_080417.tar.gz
 >>>
 >>> This patch is for crash-4.0-6.2.
 >>>
 >>> Thanks.
 >>> Itsuro Oda
 >> Hi Itsuro,
 >>
 >> Thanks very much for the quick response.
 >>
 >> I have one relatively minor issue.  I understand that the previously existing
 >> "--p2m_mfn" option was expressed as an mfn, and that made sense
because its value
 >> exists in the xen hypervisor code as an mfn value (actually a xen_pfn_t).
 >>
 >> But it seems unnecessary to bother going through the translation of the
 >> hypservisors's "xen_phys_start" physical address into an mfn?  And
then
 >> only to have the crash utility have to turn it back into a physical address
 >> when it uses it?  Why not just leave it as a physical address in both the
 >> kernel patch, and when passing it to the crash utility?
 >>
 >>  From a user's standpoint, when it's necessary to pre-determine the
argument from
 >> a hypervisor/dom0 vmcore that doesn't contain it, the xen_phys_start can be
determined
 >> by either reading /proc/iomem on the live system, where as you've shown in
 >> the example above, it's 7e600000:
 >>
 >>  >  # cat /proc/iomem
 >>  >  ...
 >>  >    7e600000-7f5fffff : Hypervisor code and data  *** this line
 >>  >  ...
 >>
 >> Alternatively, the same dom0/HV kdump vmcore can be run in a crash session
using
 >> the dom0 vmlinux, where in this example vmcore, it's 0x3f000000:
 >>
 >> crash> px xen_hypervisor_res
 >>    xen_hypervisor_res = $3 = {
 >>      start = 0x3f000000,
 >>      end = 0x3fffffff,
 >>      name = 0xffffffff8049ab72 "Hypervisor code and data",
 >>      flags = 0x80000200,
 >>      parent = 0xffff880000001180,
 >>      sibling = 0x0,
 >>      child = 0xffff8800000000a8
 >>    }
 >>
 >> For convenience sake, I'd prefer to just leave it as that, without
 >> needlessly forcing the user to translate the above into an mfn, so
 >> that they can simply enter: "crash --xen_phys_start=0x3f000000 ..."
 >>
 >> The same thing applies to the kernel patch -- just leave it in its
 >> natural state as a physical (machine) address.
 >>
 >> Unfortunately it's already going to be an annoyance to have to
pre-determine
 >> the xen_phys_start value -- so why bother making it even more cryptic by
 >> forcing the user to turn it into an mfn?
 >>
 >> Another theoretical question -- since the /proc/iomem exports the
 >> xen_phys_start variable to user space, is there any way it could
 >> be handled entirely in user space by kexec-tools?
 >>
 >> Dave
 >>
 >>
 >>> === for x86 ===
 >>> --- 
lkcd_x86_trace.c.org	2008-04-17 10:13:05.000000000 +0900
 >>> +++ lkcd_x86_trace.c	2008-04-17 10:13:17.000000000 +0900
 >>> @@ -1423,6 +1423,7 @@
 >>>  		if (XEN_HYPER_MODE()) {
 >>>  			func_name = kl_funcname(pc);
 >>>  			if (STREQ(func_name, "idle_loop") || STREQ(func_name,
"hypercall")
 >>> +				|| STREQ(func_name, "tracing_off")
 >>>  				|| STREQ(func_name, "handle_exception")) {
 >>>  				UPDATE_FRAME(func_name, pc, 0, sp, bp, asp, 0, 0, bp - sp, 0);
 >>>  				return(trace->nframes);
 >>> @@ -1682,6 +1683,7 @@
 >>>  		if (func_name && XEN_HYPER_MODE()) {
 >>>  			if (STREQ(func_name, "continue_nmi") ||
 >>>  			    STREQ(func_name, "vmx_asm_vmexit_handler") ||
 >>> +			    STREQ(func_name, "handle_nmi_mce") ||
 >>>  			    STREQ(func_name, "deferred_nmi")) {
 >>>  				/* Interrupt frame */
 >>>  				sp = curframe->fp + 4;
 >>> --- 
x86.c.org	2008-04-17 10:29:21.000000000 +0900
 >>> +++ x86.c	2008-04-17 10:31:51.000000000 +0900
 >>> @@ -2846,7 +2846,10 @@
 >>>  			*paddr = kvaddr - DIRECTMAP_VIRT_START;
 >>>  			return TRUE;
 >>>  		}
 >>> -		pgd = (ulonglong *)symbol_value("idle_pg_table_l3");
 >>> +		if (symbol_exists("idle_pg_table_l3"))
 >>> +			pgd = (ulonglong *)symbol_value("idle_pg_table_l3");
 >>> +		else
 >>> +			pgd = (ulonglong *)symbol_value("idle_pg_table");
 >>>  	} else {
 >>>  		if (!vt->vmalloc_start) {
 >>>  			*paddr = VTOP(kvaddr);
 >>> @@ -4965,7 +4968,8 @@
 >>>  		break;
 >>>  
 >>>  	case PRE_GDB:
 >>> -		if (symbol_exists("idle_pg_table_l3")) {
 >>> +		if (symbol_exists("create_pae_xen_mappings") ||
 >>> +		    symbol_exists("idle_pg_table_l3")) {
 >>>                  	machdep->flags |= PAE;
 >>>  			PGDIR_SHIFT = PGDIR_SHIFT_3LEVEL;
 >>>  			PTRS_PER_PTE = PTRS_PER_PTE_3LEVEL;
 >>>
 >>> === for x86_64 ===
 >>> --- 
defs.h.org	2008-04-17 15:03:14.000000000 +0900
 >>> +++ defs.h	2008-04-17 15:41:10.000000000 +0900
 >>> @@ -2162,10 +2162,13 @@
 >>>  
 >>>  #define FILL_PML4_HYPER() { \
 >>>  	if (!machdep->machspec->last_pml4_read) { \
 >>> -		readmem(symbol_value("idle_pg_table_4"), KVADDR, \
 >>> -			machdep->machspec->pml4, PAGESIZE(), "idle_pg_table_4",
\
 >>> +		unsigned long idle_pg_table = \
 >>> +		    symbol_exists("idle_pg_table_4") ?
symbol_value("idle_pg_table_4") : \
 >>> +			symbol_value("idle_pg_table"); \
 >>> +		readmem(idle_pg_table, KVADDR, \
 >>> +			machdep->machspec->pml4, PAGESIZE(), "idle_pg_table", \
 >>>  			FAULT_ON_ERROR); \
 >>> -		machdep->machspec->last_pml4_read =
symbol_value("idle_pg_table_4"); \
 >>> +		machdep->machspec->last_pml4_read = idle_pg_table; \
 >>>  	}\
 >>>  }
 >>>  
 >>> @@ -4081,6 +4084,8 @@
 >>>  void get_kdump_regs(struct bt_info *, ulong *, ulong *);
 >>>  void xen_kdump_p2m_mfn(char *);
 >>>  int is_sadump_xen(void);
 >>> +void set_xen_phys_start_mfn(char *);
 >>> +ulong xen_phys_start_mfn(void);
 >>>  
 >>>  /*
 >>>   *  diskdump.c
 >>> --- 
main.c.org	2008-04-17 15:26:37.000000000 +0900
 >>> +++ main.c	2008-04-17 15:45:18.000000000 +0900
 >>> @@ -48,6 +48,7 @@
 >>>          {"no_ikconfig", 0, 0, 0},
 >>>          {"hyper", 0, 0, 0},
 >>>  	{"p2m_mfn", required_argument, 0, 0},
 >>> +	{"xen_phys_start_mfn", required_argument, 0, 0},
 >>>  	{"zero_excluded", 0, 0, 0},
 >>>  	{"no_panic", 0, 0, 0},
 >>>          {"more", 0, 0, 0},
 >>> @@ -155,6 +156,9 @@
 >>>  		        else if (STREQ(long_options[option_index].name,
"p2m_mfn")) 
 >>>  				xen_kdump_p2m_mfn(optarg);
 >>>  
 >>> +		        else if (STREQ(long_options[option_index].name,
"xen_phys_start_mfn")) 
 >>> +				set_xen_phys_start_mfn(optarg);
 >>> +
 >>>  		        else if (STREQ(long_options[option_index].name,
"zero_excluded")) 
 >>>  				*diskdump_flags |= ZERO_EXCLUDED;
 >>>  
 >>> --- 
netdump.c.org	2008-04-17 15:09:06.000000000 +0900
 >>> +++ netdump.c	2008-04-17 15:48:44.000000000 +0900
 >>> @@ -1521,6 +1521,8 @@
 >>>  				 */
 >>>  				if (!nd->xen_kdump_data->p2m_mfn)
 >>>  					nd->xen_kdump_data->p2m_mfn = *(uptr+(words-1));
 >>> +				if (words > 9 &&
!nd->xen_kdump_data->xen_phys_start_mfn)
 >>> +					nd->xen_kdump_data->xen_phys_start_mfn = *(uptr+(words-2));
 >>>  			}
 >>>  		}
 >>>  		break;
 >>> @@ -1724,6 +1726,8 @@
 >>>  				 */
 >>>  	                        if (!nd->xen_kdump_data->p2m_mfn)
 >>>  	                        	nd->xen_kdump_data->p2m_mfn =
*(up+(words-1));
 >>> +				if (words > 9 &&
!nd->xen_kdump_data->xen_phys_start_mfn)
 >>> +					nd->xen_kdump_data->xen_phys_start_mfn = *(up+(words-2));
 >>>  			}
 >>>  		}
 >>>                  break;
 >>> @@ -2312,3 +2316,22 @@
 >>>  
 >>>  	return FALSE;
 >>>  }
 >>> +
 >>> +void
 >>> +set_xen_phys_start_mfn(char *arg)
 >>> +{
 >>> +	ulong value;
 >>> +	int errflag = 0;
 >>> +
 >>> +	value = htol(arg, RETURN_ON_ERROR|QUIET, &errflag);
 >>> +	if (!errflag)
 >>> +		xen_kdump_data.xen_phys_start_mfn = value;
 >>> +	else 
 >>> +		error(WARNING, "invalid xen_phys_start_mfn argument: %s\n",
arg);
 >>> +}
 >>> +
 >>> +ulong
 >>> +xen_phys_start_mfn(void)
 >>> +{
 >>> +	return nd->xen_kdump_data->xen_phys_start_mfn;
 >>> +}
 >>> --- 
netdump.h.org	2008-04-17 15:38:59.000000000 +0900
 >>> +++ netdump.h	2008-04-17 15:39:38.000000000 +0900
 >>> @@ -109,6 +109,7 @@
 >>>  	ulong accesses;
 >>>  	int p2m_frames;
 >>>          ulong *p2m_mfn_frame_list;
 >>> +	ulong xen_phys_start_mfn;
 >>>  };
 >>>  
 >>>  #define KDUMP_P2M_INIT  (0x1)
 >>> --- 
x86_64.c.org	2008-04-17 15:34:32.000000000 +0900
 >>> +++ x86_64.c	2008-04-17 15:36:53.000000000 +0900
 >>> @@ -1401,6 +1401,10 @@
 >>>                  return FALSE;
 >>>  
 >>>  	if (XEN_HYPER_MODE()) {
 >>> +		if (XENTEXT_VIRT_ADDR(kvaddr)) {
 >>> +			*paddr = kvaddr - __XEN_VIRT_START + (xen_phys_start_mfn() <<
PAGESHIFT());
 >>> +			return TRUE;
 >>> +		}
 >>>  		if (DIRECTMAP_VIRT_ADDR(kvaddr)) {
 >>>  			*paddr = kvaddr - DIRECTMAP_VIRT_START;
 >>>  			return TRUE;
 >>> --- 
xen_hyper_defs.h.org	2008-04-17 15:31:23.000000000 +0900
 >>> +++ xen_hyper_defs.h	2008-04-17 15:34:15.000000000 +0900
 >>> @@ -64,6 +64,9 @@
 >>>  #define DIRECTMAP_VIRT_START  (0xffff830000000000)
 >>>  #define DIRECTMAP_VIRT_END    (0xffff840000000000)
 >>>  #define PAGE_OFFSET_XEN_HYPER DIRECTMAP_VIRT_START
 >>> +#define __XEN_VIRT_START        (0xFFFF828C80000000)
 >>> +#define XENTEXT_VIRT_ADDR(vaddr) \
 >>> +    (((vaddr) >= __XEN_VIRT_START) && ((vaddr) <
DIRECTMAP_VIRT_START))
 >>>  #endif
 >>>  
 >>>  #ifdef IA64
 >>>
 >>> === Xen (explanation purpose) ===
 >>> --- 
include/xen/elfcore.h.org	2008-04-17 14:11:41.000000000 +0900
 >>> +++ include/xen/elfcore.h	2008-04-17 14:11:57.000000000 +0900
 >>> @@ -66,6 +66,7 @@
 >>>      unsigned long xen_compile_time;
 >>>      unsigned long tainted;
 >>>  #ifdef CONFIG_X86
 >>> +    unsigned long xen_phys_start_mfn;
 >>>      unsigned long dom0_pfn_to_mfn_frame_list_list;
 >>>  #endif
 >>>  } crash_xen_info_t;
 >>> --- 
arch/x86/crash.c.org	2008-04-17 14:12:51.000000000 +0900
 >>> +++ arch/x86/crash.c	2008-04-17 14:13:13.000000000 +0900
 >>> @@ -102,6 +102,7 @@
 >>>      hvm_disable();
 >>>  
 >>>      info = kexec_crash_save_info();
 >>> +    info->xen_phys_start_mfn = xen_phys_start >> PAGE_SHIFT;
 >>>      info->dom0_pfn_to_mfn_frame_list_list =
 >>>          arch_get_pfn_to_mfn_frame_list_list(dom0);
 >>>  }
 >