Re: [Crash-utility] determining a "valid" vmcore

Thursday, 7 February 2008

On Thu, 2008-02-07 at 11:27 -0500, Dave Anderson wrote:
...
 Andrew Hecox wrote:
 > On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:
 >> Andrew Hecox wrote:
 >>> hello,
 >>>
 >>> I'm looking at a customer issue where diskdumpmsg is unable to read a
 >>> vmcore file. It is not clear if this a problem with the vmcore file or
 >>> diskdumpmsg. I can load the vmcore with crash and in my naive usage of
 >>> it, can see no problems. However, I'm new to the tool so that
doesn't
 >>> give me a lot of confidence. 
 >>>
 >>> Does anyone have any suggestions on how or if I can use crash to help
 >>> determine if there's corruption in the vmcore file? Or any other way of
 >>> approaching the problem? 
 >>>
 >>> Thanks much,
 >>>
 >>> Andrew
 >>>
 >> I'm not sure what you expect the crash utility to do -- if it comes
 >> up to a prompt with no error or warning messages, it means that the
 >> ELF header contains what appears to be valid usable information,
 >> and that the minimum kernel memory contents required to set up the
 >> crash utility's notion of the running system are all in place.  That's
 >> not to say that there is no chance that the vmcore contains some
 >> corruption that was not recognized.
 >>
 > 
 > Thanks. Any other suggestions on how to determine if a vmcore is "valid"
 > or is that not even a reasonable question to try and ask? The problem
 > I'm trying to solve is described better below:
 > 
 >> With respect to diskdumpmsg, as I understand it, it was fairly recently
 >> changed from a perl script to a C file so that it could be run
 >> earlier in time so as to be able to use the swap partition.  Looking
 >> at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
 >> error types and associated error messages.  What do you mean when you
 >> say that "diskdumpmsg is unable to read a vmcore file"?
 > 
 > Specifically: 
 > 
 >  - user reported a floating point exception from diskdump on startup
 >  - the result was reproducible locally but only with their vmcore file
 >  - fpe occurred in get_logbuf:
 >                 log_end %= log_buf_len;
 >  - log_buf_len had been set to 0 in read_buffer
 >           if (!page_is_dumpable(pfn, dump->device)) {
 >               memset(buf, 0, copy_len);
 >           } else {
 >  - I don't know enough to say if the page really wasn't dumpable. 
 > static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)
 > {
 >   return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
 > }
 >  - I wrote a patch with one way to avoid the FPE (attached) and sent it
 > to SEG.
 > 
 > Now I'm trying to determine if the vmcore file should be readable by
 > diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
 > or a problem with the vmcore file prior to it getting to diskdumpmsg.
 > Unfortunately, I don't understand the problem domain very well at all,
 > hence the probably naive questions :)
 > 
 > Any suggestions are appreciated.
 > 
 > -Andrew

 So it appears that the page containing the log_buf_len symbol is not
 readable or contained in the dumpfile.  BTW, is this a compressed
 dumpfile or an ELF formatted dumpfile?  And what "dump_level" did
 they configure?

compressed, level is 19.

...
 Anyway, back to the log_buf_len symbol read, what happens when you
 enter the "log" command while in a crash session?  It attempts to
 read that symbol immediately.

I get what appears to be a full and valid dump of the kernel message
buffer. 

-Andrew

...

 Dave

 >>
 >> ------------------------------------------------------------------------
 >>
 >> diff -rupN diskdumputils-1.4.1.orig/diskdumpmsg.c
diskdumputils-1.4.1/diskdumpmsg.c
 >> --- diskdumputils-1.4.1.orig/diskdumpmsg.c	2008-02-06 14:32:41.000000000 -0500
 >> +++ diskdumputils-1.4.1/diskdumpmsg.c	2008-02-06 15:56:22.000000000 -0500
 >> @@ -208,6 +208,10 @@ static int get_logbuf(DumpFile *dump, ch
 >>  
 >>  		len = log_end;
 >>  	} else {
 >> +		if (!log_buf_len) { 
 >> +			ret = READ_ERROR_IN_DUMP_FILE;
 >> +			goto err;
 >> +		}
 >>  		log_end %= log_buf_len;
 >>  
 >>  		ret = read_buffer(dump, log_buf + log_end,

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] determining a "valid" vmcore