Magnus Damm wrote:
Hi Dave,

Thanks for your comments!

On Wed, 2006-10-18 at 10:49 -0400, Dave Anderson wrote:
> Kazuo Moriwaka wrote:
>
> > Hello,
> >
> > From: Dave Anderson <anderson@redhat.com>
> > Subject: Re: [Crash-utility] kdump format may be updated
> > Date: Tue, 17 Oct 2006 09:01:32 -0400
> >
> > > As we discussed before, it would have been preferable in my
> > > opinion to have the starting-point mfn value for all the domains,
> > > thereby making the dumpfile usable for all domains instead of
> > > just dom0.  But I will be happy with at least getting this change
> > > in place so that crash can be used directly on the xen dumpfile
> > > for dom0 analysis without having to run it through some other
> > > utility.
> >
> > Yes, I remember the discussion and I think it is possible to make headers.
> > Now, Magnus is cleaning the patch.  He and I discussed, but I'm not
> > enough to convince him that dom0 information is need for crash.
>
> Just to clarify this discussion.  Magnus's patch *does* include the
> dom0 cr3 information for x86, and I am quite happy with that.  With that
> single, simple, dom0 cr3 value, the crash utility can use the common
> xen/dom0 vmcore file unmodified.

Yes, the patch includes that information today. But say that this value
wouldn't be provided as a crash note - isn't it possible for software to
locate the value anyhow?
 

Believe me -- if you could tell me how, I would do it, and
we could avoid this discussion...
 
Or maybe the CR3 value isn't saved at all? Shouldn't it be saved instead
when all the other registers are saved?

And why do we chose to save CR3 and not CR4? I know what CR3 is used
for, but I kind of dislike the ad-hoc approach of how these things are
added. Also, maybe more registers are needed under Xen compared to the
Linux kernel.

I'm especially thinking about segment registers and descriptor tables.

A bit of history -- and I don't mean to "dumb-down" the discussion...

If you just give me a dump of memory, I can make the crash utility work
with it, as long as I can take a physical address and turn it into a
file offset into the dumpfile.

The first set of physical memory reads that crash does take the
unity-mapped virtual addresses of key kernel data, strip off the
identifier and pass the physical address to the read function of the
particular memory device, i.e., be it the live memory, netdumps,
diskdumps, LKCD dumps, mcore dumps, kdumps, xen dumps, and xen
"xm save" files.

The issue at hand with the xen kdump vmcore is that its ELF header
describes physical memory in machine memory terms.  To use that
vmcore with respect to dom0 instead of the xen binary, for each
required physical address derived from a dom0 kernel virtual address,
the physical address is a pseudo-physical address with no correlation
with the physical (machine) memory descriptors in the ELF header.

So, given that the dom0 pseudo-physical address needs to be
translated into a machine address, I need to be able to find my way
to the phys_to_machine_mapping array.  From that point on, it's
becomes a matter of searching the array for the desired pseudo-physical
address, getting the associated machine address, and then using
the PT_LOAD segments of the ELF header to find the memory.

To find the phys_to_machine_mapping array, there are two keys
to Pandora's box:

(1) the dom0 cr3 value -- which in a writable page table kernel,
    will contain an mfn value.  With that starting point, a page
    table walk can be initiated for the "phys_to_machine_mapping"
    virtual address.

(2) alternatively, given the dom0 pfn_to_mfn_frame_list_list mfn,
    I also have a starting point in order to reconstruct the
    phys_to_machine_mapping array.

Either one works.  I preferred #2 because it would presumably work
for both writable and shadow page table kernels.  But, I've never
done any work with shadow page table kernels (Red Hat is going with
writable...), so I don't know what the ramifications are for those
kernels.

In fact, my original suggestion was for an ELF note with an array
of cr3's or pfn_to_mfn_frame_list_list mfn's, i.e., for dom0 and
all the other domUs running on the system.  That would be an awesome
capability -- a single vmcore that could be used against the xen
binary using gdb, or with the crash utility for analyszing dom0
or any of the other domU sessions.

But, there apparently was an issue with the idea of having
an ELF note with an undetermined size.

Anyway, like I mentioned to Kazuo, I'd be happy with just the
dom0 "key"...

Also, in the ELF NT_PRSTATUS note, the cr3 value is not stored
since it consists of a processor-specific user_regs_struct.
So if you were to append it there, I could use that instead.
Your initial patch seems to have put it both there and
in the new NT_XEN_DOM0_CR3 note:

+/* The cr3 for dom0 on each of its vcpus
+ * It is added as ELF_Prstatus prstatus.pr_reg[ELF_NGREG-1)], where
+ * prstatus is the data of the elf note, and ELF_NGREG was extended
+ * by one to allow extra space.
+ * This code runs after all cpus except the crashing one have
+ * been shutdown so as to avoid having to hold domlist_lock,
+ * as locking after a crash is playing with fire */
 

+               buf = append_elf_note(buf, "Xen Domanin-0 CR3",
+                       NT_XEN_DOM0_CR3, &cr3, 4);

But, again, only for x86.
 

 

> What I don't understand is whether the same thing is going to be done
> for x86_64?
>
> NT_XEN_DOM0_CR3 is #define'd in xen/include/xen/elfcore.h in
> this patch:
>
>   [Xen-devel] [PATCH 02/04] Kexec / Kdump: Code shared between x86_32 and x86_64
>
> NT_XEN_DOM0_CR3 is used by the find_dom0_cr3() function in
> xen/arch/x86/crash.c, in this patch:
>
>   [Xen-devel] [PATCH 03/04] Kexec / Kdump: x86_32 specific code
>
> But there is no analogous x86_64 usage in this patch:
>
>   [Xen-devel] [PATCH 04/04] Kexec / Kdump: x86_64 specific code
>
> Is NT_XEN_DOM0_CR3 not being used by x86_64 by mistake,
> or on purpose?  Or perhaps you're saying that it's going to be
> pulled out completely?

Yes and maybe. It was a mistake to put it into that patch, but I think
the patch ends up in the common code anyway. Next version will be
improved.

Ripping it out? Well, I'm tempted. But nothing is decided. Feel free to
argue. =)

The main reason behind it is that we need to make kexec-tools xen-aware.
And this awareness may lead to that we don't need any crash notes at all
in the Xen case. Mostly because kexec-tools only knows about dom0, but
the crash dump is about the entire system.

Today we have some code that stores the elf notes in the hypervisor in
the same format as the kernel, and to pass these notes we need to hook
in hypercalls in the kernel so that the user-space tool can find the
address of the crash notes.

I think it would make sense to let dom0 save it's registers within dom0
using the good old crash note format, but to use a simpler format for
registers for the hypervisor. And the let the tools locate the registers
by looking up global symbols.

> > I know you don't want to treat xen binary file with crash, but I'm
> > not clear why.  Please discuss with him directly to make up xen kdump
> > file formats.  The patch will be merged into xen-3.0.4.
> > I hope we can find solution before merge.
> >
>
> The crash utility is wholly based upon the internal structure
> of the Linux kernel.

So why can't you just require that Xen dumps needs to be cut out with
dom0 cut?
 

Well, to answer your question with a question:

 Why should it be required if it could be so easily avoided?

As Henry David Thoreau said, "Simplicity, simplicity, simplicity..."

My two cents,
  Dave