On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:
Andrew Hecox wrote:
> hello,
>
> I'm looking at a customer issue where diskdumpmsg is unable to read a
> vmcore file. It is not clear if this a problem with the vmcore file or
> diskdumpmsg. I can load the vmcore with crash and in my naive usage of
> it, can see no problems. However, I'm new to the tool so that doesn't
> give me a lot of confidence.
>
> Does anyone have any suggestions on how or if I can use crash to help
> determine if there's corruption in the vmcore file? Or any other way of
> approaching the problem?
>
> Thanks much,
>
> Andrew
>
I'm not sure what you expect the crash utility to do -- if it comes
up to a prompt with no error or warning messages, it means that the
ELF header contains what appears to be valid usable information,
and that the minimum kernel memory contents required to set up the
crash utility's notion of the running system are all in place. That's
not to say that there is no chance that the vmcore contains some
corruption that was not recognized.
Thanks. Any other suggestions on how to determine if a vmcore is "valid"
or is that not even a reasonable question to try and ask? The problem
I'm trying to solve is described better below:
With respect to diskdumpmsg, as I understand it, it was fairly
recently
changed from a perl script to a C file so that it could be run
earlier in time so as to be able to use the swap partition. Looking
at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
error types and associated error messages. What do you mean when you
say that "diskdumpmsg is unable to read a vmcore file"?
Specifically:
- user reported a floating point exception from diskdump on startup
- the result was reproducible locally but only with their vmcore file
- fpe occurred in get_logbuf:
log_end %= log_buf_len;
- log_buf_len had been set to 0 in read_buffer
if (!page_is_dumpable(pfn, dump->device)) {
memset(buf, 0, copy_len);
} else {
- I don't know enough to say if the page really wasn't dumpable.
static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)
{
return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
}
- I wrote a patch with one way to avoid the FPE (attached) and sent it
to SEG.
Now I'm trying to determine if the vmcore file should be readable by
diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
or a problem with the vmcore file prior to it getting to diskdumpmsg.
Unfortunately, I don't understand the problem domain very well at all,
hence the probably naive questions :)
Any suggestions are appreciated.
-Andrew
Dave