Dave Anderson wrote:
Long after I stopped tinkering with the LKCD code in crash,
changes were contributed to support physical memory zones
in the LKCD dumpfile format.
Hi Dave:
That could easily have been me. I added zone support to the
LKCD kernel and lcrash code and then updated your crash code
to support zones. I kinda recall LKCD not dumping in monotonically
increasing order and my modifying your crash code to live with this
new feature in the LKCD dumps. I was trying to get the LKCD folks into
supporting crash in addition to lcrash but failed to get any support from
Tom Morano or Matt Robinson. I didn't realize that I had broken crash
with the zone changes and felt responsible to fix crash to deal with this
change that I had made. I also like the crash interface over the lcrash
interface. I proposed to Tom using the elf format like KEXEC uses but
he didn't go for it. I don't know why we can't hid additional crash info
into ELF files and maintain as much compatibility as possible.
Specifically there is this
piece of save_offset() in lkcd_common.c:
/* find the zone */
for (ii=0; ii < lkcd->num_zones; ii++) {
if (lkcd->zones[ii].start == zone) {
if (lkcd->zones[ii].pages[page].offset != 0) {
if (lkcd->zones[ii].pages[page].offset !=
off) {
error(INFO, "conflicting page: zone
%lld, "
"page %lld: %lld, %lld !=
%lld\n",
(unsigned long long)zone,
(unsigned long long)page,
(unsigned long long)paddr,
(unsigned long long)off,
(unsigned long long) \
lkcd->zones[ii].pages[page].offset);
abort();
}
ret = 0;
} else {
lkcd->zones[ii].pages[page].offset = off;
ret = 1;
}
break;
}
}
The printf looks a bit like my coding style, though I don't know
why (I ?) decided to abort() in this case. I suppose the idea is
to look at the situation with gdb on the resulting core file.
The call to abort() above kills the crash session, which is both
annoying and unnecessary.
Isn't it worth while to look at the core file to
understand the reason
for the abort() being called for?
I am seeing it in a customer dumpfile, who have their own dumping scheme
that is based upon LKCD version 7. I understand that this may be a
problem with their LKCD port, but nonetheless, it's the only place in
the crash utility that doesn't recover gracefully from dumpfile access
errors.
Anyway, I would like to either:
1. change the error(INFO...) to error(FATAL...) so that run-time
commands encountering this error will just fail, and the session
will return to the crash> prompt, or
2. return 0, so that a "seek error" can be subsequently displayed
by the readmem() command.
Number 2 is preferable, because it yields more clues as to where the
readmem() came from, but since I don't know much about the LKCD
physical memory zones stuff, is there any reason that shouldn't
be done?
How about having a crash debug flag and only calling abort if the
debug flag is set. You might print in the error message that the
user can force a core dump by adding a '-d' flag on invocation of
crash and sending you the core file.
While I've got your attention. I'm upgrading our 2.6.12-stable kernel to
2.6.16-stable and want to start supporting core dumps. Ideally I'd like to
have core dumps that are compatible with gdb and crash. Can crash
handle the elf core files generated by KEXEC/KCORE. Last I thought
about this I recall there being incompatibilities and it getting worse
with kernels being compiled to be relocatable and kgdb having a problem
because it wasn't aware of the relocation.
When you have some free time could you let me know your current opinion
of the state of these computability issues and where it's going?
-piet
Thanks,
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility