----- "Sharyathi Nagesh" <sharyath(a)in.ibm.com> wrote:
Dave
Attaching the patches that we have developed till now.
We have tried to accommodate your suggestions regarding Makefiles in
this patches. We have tried to provide unwinding support in ppc64, x86_64
and x86 architecture. The feature is tested on ppc64 with dumps taken through
panic and echo c > /proc/sysrq-trigger. We have observed that many times
the variable information is optimized out and accessing variable
information is not possible (volatile variables are shown correctly),
IMHO this behavior is similar to that shown by gdb, please spare your thoughts
on this. This implementation works reasonably on ppc64, we were able to unwind
to stack and get variable information. But we are still facing problem with x86
and x86_64 architecture, Our generic implementation for stack unwinding using dwarf
is not succeeding, since frame pointer is optimized out by default. IMHO stack
unwinding as talked about in ABI's is not valid for the same reason. But if you
enable CONFIG_FRAME_POINTER in kernel, then unwinding using bp will be possible.
Please let me know your thoughts in general
Well, it's not like I didn't warn you several months ago when you decided
to undertake this task...
My general thoughts are the still the same:
- Tesing with just panic() and "echo c > /proc/sysrq-trigger" are pretty
much exceptional ways of the kernel being shut down. Typically a "real"
crash is due to a BUG() or segmentation violation, or some scenario where
an exception frame is laid down on the stack -- and perhaps a switch is
made to another kernel stack entirely. Coming from panic() or "echo c",
the trace will stay on the kernel stack and no exception will be raised.
(unless you're using a RHEL4 or earlier kernel that called BUG() if either
netdump/diskdump were in play.) In any case, without an exception, you're
making it "easy" for the unwinder. Your code is going to have to deal with
exception frames on the stack and with potential stack-switches.
- These days you'd be hard-pressed to find a distribution kernel that is
compiled with CONFIG_FRAME_POINTER. And if it is, the handling can
be folded into the currently-existing backtracer. (In fact, the original
x86 backtrace code does have a separate code path for kernels that
were built without -fomit-frame-pointer.) But by the time x86_64 came
around, -fomit-frame-pointer was pretty much the default, and I don't think
I've ever seen an x86_64 dumpfile with frame-pointers, certainly not
from a distributor. So, yes it's nice you've got something that works
with that configuration, but it's pretty unrealistic.
- The problems you're running into getting local variables is not surprising.
And without a rock-solid backtrace as a prerequisite, it doesn't make any
sense to even try getting local variables.
- Your dependency on getting the starting register set from the ELF header
presumes just that -- i.e., that you've got them and that they are meaningful.
I'm not sure how/if you've worked around the "hand-carved" pt_regs
created
when no exception frame has been laid down (i.e. when panic() or "echo c"
are used.) But the bigger picture is that there are multiple types of
dumpfiles, and while kdump ELF vmcores are prevalent, they are often only
the starting point. With the huge cumbersome memory sizes of modern machines,
it's more likely that the ELF vmcore seen by the secondary kdump kernel will
be run through "makedumpfile -c ..." into the compressed kdump format
prior to the dump ever being seen by whoever analyzes it. At Red Hat, the
support organization pretty much makes all customers use "makedumpfile -c
..."
by default. And of course with the compressed kdump format, there is no
register set as your starting point.
That all being said, here's what I can do for you. I will take your new
functions that you've added to netdump.c, and their protos in defs.h,
and apply them to the next crash utility release. Since they are not being
used by anybody but your module at the moment, it's harmless to add them,
and then you will not have to patch the crash sources at all in your
subsequent module patch/posts.
Dave
Here I am attaching 3 patches
1. Display-local-variables-and-function-parameters.patch
Provides feature to print local variables and arguments in the current
stack frame using dwarf information.
2. Provide-stack-unwinding-feature.patch
Provides option to unwind the stack, using dwarf information, works
on ppc64
3. unwind_x86_64.patch
provides unwinding feature in x86_64 with out using dwarf information
but requires CONFIG_FRAME_POINTER to be enabled