Castor Fu wrote:
On Wed, 9 Nov 2005, Dave Anderson wrote:
> Castor Fu wrote:
>
>> Hi guys:
>>
>> We're trying to bring in a newer version of crash, and have
>> some changes which we've made since going to version 4.0-2.10
>> which I'll outline below, and have attached a patch:
>>
>> 1. Makefile, help.c, extensions.c, new file kw.c
>> Add a keyword expansion mechanism to the 'extend' command.
>> We did this primarily to let us load different extensions
>> based on the dump file itself or the version of crash.
>>
>> +"\n The named <shared-object> is subjected to a simple keyword
expansion",
>> +" scheme with the following rules:",
>> +" $(crash_version) version of crash",
>> +" $(corestr-<varname>) string named <varname> taken
from the core file",
>> +" $(builddir) build directory for the kernel",
>
> I still have no idea what this does? Can you expand upon how this works,
> what does it do, what does it looks like, can it be ignored, etc? Couldn't
> this kind of thing can be self-contained in each extension module?
> It just seems to be way too "implementation-specific" for lack of a
> better term?
Consider a set of extension commands which rely on some data structure
which tends to change from version to version of say, a device driver.
The extension code could, with a great deal more work, be written
to extract structure offsets, etc. from the debugging information,
but it's much easier to just '#include' the header file.
This way, one could have something like
extend /usr/lib/crash/fc$(corestr-fc_version).o
in one's .crashrc, and the appropriate extensions could be loaded without
the person analyzing the crash having to figure out which version it came from
right away.
Wow. You guys have gone way beyond whatever I could have
imagined for extensions!
I'm going to have to let this one stew for a while...
[ Along these lines, I'm trying to look at a RHEL diskdump and I must
be missing the point about how one pulls together the modules to
look at the dump. I have the rpm kernel-debuginfo-2.6.9-22.EL loaded, and all the
kernel modules have .debug appended on the filename so 'mod -S' doesn't work.
]
Not sure why it doesn't work for you. The module.ko files come with the base
kernel package. The debuginfo package just supplements them with debug data.
One could also implement this with an expect wrapper which would give one much
more generality, but then everyone is going to end up with their own little
wrapper....
...
>> 6. task.c
>> Be more defensive about the value of tc->processor
>>
>
> Why is it necessary to do a verify_task() again -- it had to
> have gone through verify_task() in order to get here, right?
> What exactly did you see that required this?
We've seen dumps where the task structures get corrupted by
other bad pointers, and crash tended to just give up the ghost
because of dereferencing tc->processor which would get
set to some ridiculous value.
I think I've seen errors at both these spots, but one or the
other may have been from searching through the code for tc->processor.
Ok -- I figured that it was due to some unnatural dumpfile.
Anyway, it's harmless, and if it works in those cases, that's fine.
>
> --- crash-4.0-2.10/task.c 2005-11-07 07:44:06 -08:00
> +++ crash-4.0-2.10-new/task.c 2005-11-07 18:57:19 -08:00
> @@ -2942,7 +2942,7 @@
> fprintf(fp, "(SYSRQ)");
> else if (machdep->flags & INIT)
> fprintf(fp, "(INIT)");
> - else if (kt->cpu_flags[tc->processor] & NMI)
> + else if (verify_task(tc, 2) &&
kt->cpu_flags[tc->processor] & NMI)
> fprintf(fp, "(NMI)");
> else if (tc->task == tt->panic_task)
> fprintf(fp, "(PANIC)");
>
> This I'm presuming is related to the above, but again, what did you
> see that requires this?:
>
> @@ -4337,7 +4340,8 @@
>
> tc = FIRST_CONTEXT();
> for (i = 0; i < RUNNING_TASKS(); i++, tc++) {
> - if (task_has_cpu(tc->task, NULL)) {
> + if (task_has_cpu(tc->task, NULL)
> + && tc->processor >= 0 && tc->processor
< NR_CPUS) {
> tt->panic_threads[tc->processor] = tc->task;
> found++;
> }
>
>
>> Also, try to set the panic string correctly in when decoding dump.
>
> This is LKCD specific anyway, right?
I don't think so. in get_panicmsg() it looks like there's code which
is trying to get the panicmsg from either tt_panicmsg or LKCD.
This string will get overwritten by looking through the msgbuf...
Yeah -- I should have mentioned that the get_panicmsg() code is only
satisfied by ancient MCLX-style dumpfiles. I'm pretty sure nobody
still has any of those...
So it is LKCD only, which is fine.
>> 7. x86.c
>>
>> Be a little more liberal deciding that the frame pointer has been
included.
>> gcc 3.x can re-schedule code so that instead of
>>
>> push bp
>> mov esp, bp
>>
>> we sometimes see
>>
>> push bp
>> xor eax,eax
>> move esp, bp
>>
>> so detect both sequences.
>
> This looks reasonable. Didn't think anybody built kernels that way anymore...
You mean nobody builds with frame pointers anymore? I guess we do because it's
easier
to debug. Having 'bt' pull out the arguments on the stack is nice. If we can
get
similar functionality without frame pointers, we'd be happy to chuck them...
-castor
Thanks,
Dave