Re: [Crash-utility] determining a "valid" vmcore

Thursday, 7 February 2008

Andrew Hecox wrote:
...
 On Thu, 2008-02-07 at 11:27 -0500, Dave Anderson wrote:
> Andrew Hecox wrote:
>> On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:
>>> Andrew Hecox wrote:
>>>> hello,
>>>>
>>>> I'm looking at a customer issue where diskdumpmsg is unable to read
a
>>>> vmcore file. It is not clear if this a problem with the vmcore file or
>>>> diskdumpmsg. I can load the vmcore with crash and in my naive usage of
>>>> it, can see no problems. However, I'm new to the tool so that
doesn't
>>>> give me a lot of confidence. 
>>>>
>>>> Does anyone have any suggestions on how or if I can use crash to help
>>>> determine if there's corruption in the vmcore file? Or any other way
of
>>>> approaching the problem? 
>>>>
>>>> Thanks much,
>>>>
>>>> Andrew
>>>>
>>> I'm not sure what you expect the crash utility to do -- if it comes
>>> up to a prompt with no error or warning messages, it means that the
>>> ELF header contains what appears to be valid usable information,
>>> and that the minimum kernel memory contents required to set up the
>>> crash utility's notion of the running system are all in place. 
That's
>>> not to say that there is no chance that the vmcore contains some
>>> corruption that was not recognized.
>>>
>> Thanks. Any other suggestions on how to determine if a vmcore is
"valid"
>> or is that not even a reasonable question to try and ask? The problem
>> I'm trying to solve is described better below:
>>
>>> With respect to diskdumpmsg, as I understand it, it was fairly recently
>>> changed from a perl script to a C file so that it could be run
>>> earlier in time so as to be able to use the swap partition.  Looking
>>> at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
>>> error types and associated error messages.  What do you mean when you
>>> say that "diskdumpmsg is unable to read a vmcore file"?
>> Specifically: 
>>
>>  - user reported a floating point exception from diskdump on startup
>>  - the result was reproducible locally but only with their vmcore file
>>  - fpe occurred in get_logbuf:
>>                 log_end %= log_buf_len;
>>  - log_buf_len had been set to 0 in read_buffer
>>           if (!page_is_dumpable(pfn, dump->device)) {
>>               memset(buf, 0, copy_len);
>>           } else {
>>  - I don't know enough to say if the page really wasn't dumpable. 
>> static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)
>> {
>>   return device->dumpable_bitmap[nr>>3] & (1 << (nr &
7));
>> }
>>  - I wrote a patch with one way to avoid the FPE (attached) and sent it
>> to SEG.
>>
>> Now I'm trying to determine if the vmcore file should be readable by
>> diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
>> or a problem with the vmcore file prior to it getting to diskdumpmsg.
>> Unfortunately, I don't understand the problem domain very well at all,
>> hence the probably naive questions :)
>>
>> Any suggestions are appreciated.
>>
>> -Andrew
> So it appears that the page containing the log_buf_len symbol is not
> readable or contained in the dumpfile.  BTW, is this a compressed
> dumpfile or an ELF formatted dumpfile?  And what "dump_level" did
> they configure?
>

 compressed, level is 19.

> Anyway, back to the log_buf_len symbol read, what happens when you
> enter the "log" command while in a crash session?  It attempts to
> read that symbol immediately.
>

 I get what appears to be a full and valid dump of the kernel message
 buffer. 

The crash utility has the same page_is_dumpable() function, which I presume
looks at precisely the same bitmap data from the dumpfile.  And that
must be working, given that the "log" command works as expected.

One difference is that diskdumpmsg uses /boot/System.map-<release> for
the symbol values, whereas crash uses the vmlinux file.  It might be
of interest to determine whether the value of "log_buf_len" used by
diskdumpmsg is the same symbol value as used by crash.

Dave

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] determining a "valid" vmcore