Andrew Hecox wrote:
On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:
> Andrew Hecox wrote:
>> hello,
>>
>> I'm looking at a customer issue where diskdumpmsg is unable to read a
>> vmcore file. It is not clear if this a problem with the vmcore file or
>> diskdumpmsg. I can load the vmcore with crash and in my naive usage of
>> it, can see no problems. However, I'm new to the tool so that doesn't
>> give me a lot of confidence.
>>
>> Does anyone have any suggestions on how or if I can use crash to help
>> determine if there's corruption in the vmcore file? Or any other way of
>> approaching the problem?
>>
>> Thanks much,
>>
>> Andrew
>>
> I'm not sure what you expect the crash utility to do -- if it comes
> up to a prompt with no error or warning messages, it means that the
> ELF header contains what appears to be valid usable information,
> and that the minimum kernel memory contents required to set up the
> crash utility's notion of the running system are all in place. That's
> not to say that there is no chance that the vmcore contains some
> corruption that was not recognized.
>
Thanks. Any other suggestions on how to determine if a vmcore is "valid"
or is that not even a reasonable question to try and ask? The problem
I'm trying to solve is described better below:
> With respect to diskdumpmsg, as I understand it, it was fairly recently
> changed from a perl script to a C file so that it could be run
> earlier in time so as to be able to use the swap partition. Looking
> at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
> error types and associated error messages. What do you mean when you
> say that "diskdumpmsg is unable to read a vmcore file"?
Specifically:
- user reported a floating point exception from diskdump on startup
- the result was reproducible locally but only with their vmcore file
- fpe occurred in get_logbuf:
log_end %= log_buf_len;
- log_buf_len had been set to 0 in read_buffer
if (!page_is_dumpable(pfn, dump->device)) {
memset(buf, 0, copy_len);
} else {
- I don't know enough to say if the page really wasn't dumpable.
static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)
{
return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
}
- I wrote a patch with one way to avoid the FPE (attached) and sent it
to SEG.
Now I'm trying to determine if the vmcore file should be readable by
diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
or a problem with the vmcore file prior to it getting to diskdumpmsg.
Unfortunately, I don't understand the problem domain very well at all,
hence the probably naive questions :)
Any suggestions are appreciated.
-Andrew
So it appears that the page containing the log_buf_len symbol is not
readable or contained in the dumpfile. BTW, is this a compressed
dumpfile or an ELF formatted dumpfile? And what "dump_level" did
they configure?
Anyway, back to the log_buf_len symbol read, what happens when you
enter the "log" command while in a crash session? It attempts to
read that symbol immediately.
Dave
>
> ------------------------------------------------------------------------
>
> diff -rupN diskdumputils-1.4.1.orig/diskdumpmsg.c diskdumputils-1.4.1/diskdumpmsg.c
> --- diskdumputils-1.4.1.orig/diskdumpmsg.c 2008-02-06 14:32:41.000000000 -0500
> +++ diskdumputils-1.4.1/diskdumpmsg.c 2008-02-06 15:56:22.000000000 -0500
> @@ -208,6 +208,10 @@ static int get_logbuf(DumpFile *dump, ch
>
> len = log_end;
> } else {
> + if (!log_buf_len) {
> + ret = READ_ERROR_IN_DUMP_FILE;
> + goto err;
> + }
> log_end %= log_buf_len;
>
> ret = read_buffer(dump, log_buf + log_end,