Re: [Crash-utility] kdump format may be updated

Friday, 27 October 2006

On Wed, 2006-10-25 at 10:02 -0400, Dave Anderson wrote:
...
 Magnus Damm wrote: 
 > On Tue, 2006-10-24 at 09:45 -0400, Dave Anderson wrote: 
 > > Magnus Damm wrote: 
 > > > The idea is that the crash_notes contents in the Xen hypervisor 
 > > space 
 > > > contains registers indexed by physical cpu number. 
 > > > 
 > > > It is possible to locate the crashing physical cpu by looking up
 > a 
 > > > global variable in hypervisor symbol, and from there it should
 > be 
 > > > possible to backtrack and find the domain pseudo-phys to virt 
 > > mapping 
 > > > table. I say "should" because it is probably pretty hairy. 
 > > > 
 > > 
 > > Actually, given that the crash utility is only interested in the 
 > > specifics of the dom0 kernel, it has no interest in physical 
 > > cpus.  If you're specifically interested in debugging a crash 
 > > that occurred while operating in the xen binary, you're going 
 > > to want to use gdb on the vmcore file with xen-syms-xxx 
 > > namelist file.  You can still run crash on the same vmcore to 
 > > find out what was going on in the dom0 kernel, but there's 
 > > no awareness of the xen hypervisor underpinnings; you'll 
 > > just get the state of the dom0 kernel at the time of the crash. 
 > 
 > Exactly. But for gdb to work we need to provide crash notes to the
 > xen 
 > vmcore file - and with physical cpu crash notes in this case. I
 > just 
 > hacked up some code to do this and it seems like the default number
 > of 
 > cpus in xen-unstable is set to 32. That's 32 crash notes. 
 >  
 > 
 That's fine -- the crash utility will ignore the registers in the 
 NT_PRSTATUS notes if it can determine that the vmcore is 
 a hypervisor/dom0/kdump-type dumpfile.  In my prototype, it 
 does just that when it sees the unique NT_XEN_DOM0_CR3 note. 
 (crash doesn't really need the NR_PRSTATUS register contents, 
 because it can find them elsewhere if need be.)  
Excellent!

...
 > > But I would guess-timate that the majority of the crashes
are 
 > > going to have occurred in the dom0 kernel, and not while 
 > > running in the hypervisor. 
 > 
 > Yeah, given the number of lines of code... 
 > 
 > > Now, given that that the crash_notes context contains registers 
 > > that are indexed by the physical cpu number, well, that's not 
 > > helpful to crash's needs with respect to dom0.  That's why you 
 > > guys must have created the additional NT_XEN_DOM0_CR3 
 > > ELF note. 
 > 
 > I'm note sure exactly why we created it - I thought it was because
 > you 
 > wanted it - but I'm pretty sure Moriwaka-sans tool can locate
 > things 
 > without it. 
 >  
 > 
 That's exactly right -- I requested dom0's cr3 or
 pfn_to_mfn_list_list 
 value, and Moriwaka-san provided me with a prototype vmcore that 
 contained the NT_XEN_DOM0_CR3 ELF note. 

 I presume that the i386 vmcore he sent me was also usable with 
 gdb and xen-syms.  Here's what the header looks like: 

 $ readelf -a vmcore 
 ELF Header: 
   Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
   Class:                             ELF32 
   Data:                              2's complement, little endian 
   Version:                           1 (current) 
   OS/ABI:                            UNIX - System V 
   ABI Version:                       0 
   Type:                              CORE (Core file) 
   Machine:                           Intel 80386 
   Version:                           0x1 
   Entry point address:               0x0 
   Start of program headers:          52 (bytes into file) 
   Start of section headers:          0 (bytes into file) 
   Flags:                             0x0 
   Size of this header:               52 (bytes) 
   Size of program headers:           32 (bytes) 
   Number of program headers:         5 
   Size of section headers:           0 (bytes) 
   Number of section headers:         0 
   Section header string table index: 0 

 There are no sections in this file. 

 Program Headers: 
   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg
 Align 
   NOTE           0x0000d4 0x00000000 0x00000000 0x00190 0x00190     0 
   LOAD           0x000264 0xc0000000 0x00000000 0xa0000 0xa0000 RWE 0 
   LOAD           0x0a0264 0xc0100000 0x00100000 0x3f00000 0x3f00000
 RWE 0 
   LOAD           0x3fa0264 0xc8000000 0x08000000 0x30000000 0x30000000
 RWE 0 
   LOAD           0x33fa0264 0xffffffff 0x38000000 0x7ffb000 0x7ffb000
 RWE 0 

 There is no dynamic segment in this file. 

 There are no relocations in this file. 

 There are no unwind sections in this file. 

 No version information found in this file. 

 Notes at offset 0x000000d4 with length 0x00000190: 
   Owner         Data size       Description 
   CORE          0x00000090      NT_PRSTATUS (prstatus structure) 
   Xen Domanin-0 CR3             0x00000004      Unknown note type:
 (0x10000001) 
   CORE          0x00000090      NT_PRSTATUS (prstatus structure) 
   Xen Domanin-0 CR3             0x00000004      Unknown note type:
 (0x10000001) 
 $  
I think it looks ok, but the values in the program headers will change.
This dump seems to export the virtual address for the kernel which is
wrong from the system perspective. We will also change the name of "Xen
Domainin-0 CR3" into something better.

...
 > > Now we're getting complex -- I'm pretty sure I
don't know what 
 > > you're talking about here...  Or how it can possibly lead to a
 > dom0 
 > > cr3 or pfn_to_mfn_frame_list_list value? 
 > 
 > Let's put it like this: Your tool, does it use xen-syms today? How 
 > difficult would it be to use lookup a symbol in hypervisor space?
 > And 
 > how do you feel about requiring xen-syms to be able to parse dom0? 
 >  
 > 
 Definitely not.  In the interest of simplicity, the idea is to keep 
 things the way they have always been, i.e., "crash vmlinux vmcore", 
 and having to drag out the relevant xen-syms binary defeats that 
 concept.  And that's not to mention having to read it, figure out 
 how to translate hypervisor virtual addresses, then navigate around 
 the vmcore to find the location of a symbol, having to deal with the 
 following of data structures that may change over time -- to
 eventually 
 find what's needed...  
Right, I thought that would be the answer. =)

...
 >   
 > I'm not talking about your tool walking arch-dependent cross linked
 > data 
 > structures in the xen hypervisor, just basic symbol lookup. 
 >  
 But I'm not sure what good would that do?  The "domain_list" symbol 
 is only the beginning of the structure chain...  
I will try to fix something up that works without symbol lookup.

...
 > I will continue working on cleaning up the code. Some of the
 > changes 
 > that have been made or are going to happen are: 
 > 
 > - Separate dom0 notes from hypervisor notes.
 > 
 Looks like a reasonable spot to stuff cr3 (or
 pfn_to_mfn_frame_list_list)? 
 (i.e. stuffing the dom0 cr3 at the end of the dom0 NT_PRSTATUS 
 register dump(s), like your current patch talks about)  
Yeah, I will either go with expanding NT_PRSTATUS or adding a new note.
Probably a new note.

...
 >   
 > - Xen vmcore files should have per physical cpu hypervisor
 > crash_notes. 
 > - These hypervisor notes should be pointed out by the program
 > header. 
 > - Xen vmcore files should have the hypervisor in a PT_LOAD segment. 
 > - Xen vmcore files should have PT_LOAD headers for RAM, but with VA
 > = 0.
 With respect to the PT_LOAD segments, the crash utility pretty much 
 ignores the p_vaddr values -- it's particularly interested in the 
 p_paddr values.  And once able to translate a unity-mapped kernel 
 virtual address into a physical address, it can find that physical 
 address by checking the p_paddr/p_filesz values in each PT_LOAD 
 segment.  (Again -- that only works with xen kernels if the pseudo 
 physical address can be translated into a machine address, and then 
 a vmcore file location -- hence the pre-requisite requirement for 
 either the dom0 cr3 or pfn_to_mfn_list_list value.)  
Yes, I am with you.

...
 The one exception to "ignore-the-p_vaddr" rule is for the
new 
 relocatable x86_64 kernels, where the PT_LOAD mapping for the 
 kernel's __START_KERNEL_map region needs to be looked at in 
 order to figure out where the kernel was physically loaded, 
 because in relocatable kernels, unity-mapping can no longer be 
 done by simply stripping off the PAGE_OFFSET value.  Relocatable 
 kernels are being introduced so that there is no need for separate 
 "first-kernel" and the secondary kexec'd "kdump-kernel".  
That or change the kexec-tool to fill in proper addresses for the
unity-mapped kernel area.

...
 But AFAIK, the new kernel relocation scheme shouldn't infect xen

 x86_64 kernels, so I don't believe that would ever be an issue. 
 In other words, xen kernels should remain unity-mapped in the 
 traditional manner, albeit with pseudo-physical addresses instead 
 of machine addresses.  Again, I *believe* that to be true, since 
 there's no reason to relocate them.  
I think so too.

...
 >   
 > - Xen vmcore files should provide crash with something like CR3.
 Or this would be good...   ;-) 
 >    
Hehe, yeah. =)

...
 > - kexec-tools will be made xen aware. 
 > 
 > I'm sorry if we've been going over and over about this, but I'm a
 > bit 
 > confused by the current status, what we want to have and if that is 
 > going to work. The points above show the direction. Please shout if
 > you 
 > think they sound bad.
 > 
 Nothing sounds bad to me!  What would be really helpful is if, 
 during your development, you could provide me with sample 
 vmlinux/vmcore pairs that I could work with?  Just so I don't 
 have to scramble at the end of it all, and only then find out 
 that there's a "gotcha" that we're not considering.  
I will contact you when I've updated and tested my code. It will take a
while but I guess within two weeks.

One final question: Is it ok with you that we only use 64-bit ELF file
format for vmcore under Xen?

Thanks!

/ magnus

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] kdump format may be updated