Re: [Crash-utility] Question for LKCD maintainers - How about adding a debug flag to crash and only calling abort() if crash is started with '-d' flag provided?

Wednesday, 26 December 2007

Dave Anderson wrote:
...

 Long after I stopped tinkering with the LKCD code in crash,
 changes were contributed to support physical memory zones
 in the LKCD dumpfile format. 
Hi Dave:

That could easily have been me.    I added zone support to the
LKCD kernel and lcrash code and then updated your crash code
to support zones. I kinda recall LKCD not dumping in monotonically
increasing order and my modifying your crash code to live with this
new feature in the LKCD dumps. I was trying to get the LKCD folks into
supporting crash in addition to lcrash but failed to get any support from
Tom Morano or Matt Robinson. I didn't realize that I had broken crash
with the zone changes and felt responsible to fix crash to deal with this
change that I had made. I also like the crash interface over the lcrash
interface. I proposed to Tom using the elf format like KEXEC uses but
he didn't go for it. I don't know why we can't hid additional crash info
into ELF files and maintain as much compatibility as possible.

...
   Specifically there is this
 piece of save_offset() in lkcd_common.c:

         /* find the zone */
         for (ii=0; ii < lkcd->num_zones; ii++) {
                 if (lkcd->zones[ii].start == zone) {
                         if (lkcd->zones[ii].pages[page].offset != 0) {
                            if (lkcd->zones[ii].pages[page].offset !=
 off) {
                                 error(INFO, "conflicting page: zone
 %lld, "
                                         "page %lld: %lld, %lld !=
 %lld\n",
                                         (unsigned long long)zone,
                                         (unsigned long long)page,
                                         (unsigned long long)paddr,
                                         (unsigned long long)off,
                                         (unsigned long long) \

 lkcd->zones[ii].pages[page].offset);
                                 abort();
                            }
                            ret = 0;
                         } else {
                            lkcd->zones[ii].pages[page].offset = off;
                            ret = 1;
                         }
                         break;
                 }
         } The printf looks a bit like my coding style, though I don't know
why (I ?)  decided to abort() in this case. I suppose the idea is
to look at the situation with gdb on the resulting core file.

...

 The call to abort() above kills the crash session, which is both
 annoying and unnecessary. Isn't it worth while to look at the core file to
understand the reason
for the abort() being called for?

...

 I am seeing it in a customer dumpfile, who have their own dumping scheme
 that is based upon LKCD version 7.  I understand that this may be a
 problem with their LKCD port, but nonetheless, it's the only place in
 the crash utility that doesn't recover gracefully from dumpfile access
 errors.

 Anyway, I would like to either:

  1. change the error(INFO...) to error(FATAL...) so that run-time
     commands encountering this error will just fail, and the session
     will return to the crash> prompt, or
  2. return 0, so that a "seek error" can be subsequently displayed
     by the readmem() command.

 Number 2 is preferable, because it yields more clues as to where the
 readmem() came from, but since I don't know much about the LKCD
 physical memory zones stuff, is there any reason that shouldn't
 be done? 
How about having a crash debug flag and only calling abort if the
debug flag is set. You might print in the error message that the
user can force a core dump by adding a '-d' flag on invocation of
crash and sending you the core file.

While I've got your attention. I'm upgrading our 2.6.12-stable kernel to
2.6.16-stable and want to start supporting core dumps. Ideally I'd like to
have core dumps that are compatible with gdb and crash. Can crash
handle the elf core files generated by KEXEC/KCORE. Last I thought
about this I recall there being incompatibilities and it getting worse
with kernels being compiled to be relocatable and kgdb having a problem
because it wasn't aware of the relocation.

When you have some free time could you let me know your current opinion
of the state of these computability issues and where it's going?

-piet

...

 Thanks,
   Dave

 -- 
 Crash-utility mailing list
 Crash-utility(a)redhat.com
 https://www.redhat.com/mailman/listinfo/crash-utility 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Question for LKCD maintainers - How about adding a debug flag to crash and only calling abort() if crash is started with '-d' flag provided?