crash versions 4.0-4.9 and later will not work with SLES9 IA64 dumps
by Alan Tyson
Hello and Happy New Year,
Changes that were made in 4.0-4.9 to get information out of the LKCD
header of the dump file now prevent crash from opening SLES9 IA64 lkcd
dumps.
The reason is that the LKCD header as defined in crash does not match
that used in SLES9. Now, I'm not familiar with any other distros that
use LKCD v9 so don't know if this problem is unique to SLES9 or if it
affects others.
On SLES9, the element dha_kernel_addr of struct _dump_header_asm_s is
not at the end of the structure. It's in between dha_header_size and
dha_pt_regs so lkcd_dump_init_v8_arch() ends up with a zero for the load
address instead of 0x04000000 (in the case I'm looking at).
Bernard, do you know if this is the case with all LKCD v9 distros? If
so I don't mind creating lkcd*v9* functions. Otherwise I'd be open to
suggestions as to how we get round this. Perhaps I just need to keep a
different version of crash for IA64 SLES9....
Thanks,
Alan Tyson, HP.
16 years, 12 months
Question for LKCD maintainers
by Dave Anderson
Long after I stopped tinkering with the LKCD code in crash,
changes were contributed to support physical memory zones
in the LKCD dumpfile format. Specifically there is this
piece of save_offset() in lkcd_common.c:
/* find the zone */
for (ii=0; ii < lkcd->num_zones; ii++) {
if (lkcd->zones[ii].start == zone) {
if (lkcd->zones[ii].pages[page].offset != 0) {
if (lkcd->zones[ii].pages[page].offset != off) {
error(INFO, "conflicting page: zone %lld, "
"page %lld: %lld, %lld != %lld\n",
(unsigned long long)zone,
(unsigned long long)page,
(unsigned long long)paddr,
(unsigned long long)off,
(unsigned long long) \
lkcd->zones[ii].pages[page].offset);
abort();
}
ret = 0;
} else {
lkcd->zones[ii].pages[page].offset = off;
ret = 1;
}
break;
}
}
The call to abort() above kills the crash session, which is both
annoying and unnecessary.
I am seeing it in a customer dumpfile, who have their own dumping scheme
that is based upon LKCD version 7. I understand that this may be a
problem with their LKCD port, but nonetheless, it's the only place in
the crash utility that doesn't recover gracefully from dumpfile access
errors.
Anyway, I would like to either:
1. change the error(INFO...) to error(FATAL...) so that run-time
commands encountering this error will just fail, and the session
will return to the crash> prompt, or
2. return 0, so that a "seek error" can be subsequently displayed
by the readmem() command.
Number 2 is preferable, because it yields more clues as to where the
readmem() came from, but since I don't know much about the LKCD
physical memory zones stuff, is there any reason that shouldn't
be done?
Thanks,
Dave
17 years
Re: [Crash-utility] [PATCH] Improve error handling when architecture doesn't match
by Dave Anderson
> Bernhard Walle wrote:
> > Dave Anderson wrote:
> >
> > Actually this patch has just turned up different issues
> > that have to be handled, because the e_type and e_phnum
> > get deferred until after the e_machine and endianness
> > are checked.
> >
> > Among them the fact that an i386 xen guest core file
> > taken by an x86_64 host has the e_machine type set
> > to x86_64 (don't ask me why they did that...), and has
> > an e_phnum of 0. Anyway, that requires the e_phnum
> > to be checked *before* the machine type and endianness.
> >
> > And another, since the e_type doesn't get checked
> > until *after* the machine type and endianness,
> > it allows the vmlinux file (ET_EXEC) to get passed
> > through, which can generate a bogus error message
> > about the vmlinux file!
> >
> > And there's probably others...
> >
> > There was a method to my madness in the way it's written
> > now. I'm going to have to spend some more time with
> > this because I don't want to introduce false alarms
> > or print error messages that don't make any sense...
>
> *Arrrg*, sorry for not taking all this into account. I only tested
> with a few Kdump dumps from different architectures, but not with Xen
> dumps.
>
> You're right, and I'll send a new patch that tries to handle all this.
> But probably next year ...
No need -- I've got it all in place. Prior to the generic "not a supported
file format" fatal error message, the following mismatches will be
explicitly reported:
1. Machine type mismatches in netdump, kdump, diskdump and
xendump ELF dumpfiles.
2. Machine type mismatches in compressed diskdump and
compressed kdump (via makedumpfile) dumpfiles.
3. Machine type mismatches in vmlinux files.
4. Endian mismatches in netdump, kdump, diskdump and
xendump ELF core dumpfiles.
5. Endian mismatches in vmlinux files.
This was long overdue -- thanks a lot for getting the ball rolling.
Dave
17 years