----- "Sharyathi Nagesh" <sharyath(a)in.ibm.com> wrote:
Dave
Excuse me for overlooking this part of the code, I am attaching a fix
to this, hope this fixes the issue.
That one looks better...
Dave I have few observations regarding the points you have raised
mkdumpfile -c stripping the ELF Note for pt_regs:
mkdumpfile won't be saving much by stripping ELF Notes of pt_regs
information. It will be ~256 bytes * number of cpus which is not much.
We will discuss with mkdumpfile developers to check out the possibility
of retaining this ELF Note information.
Regarding CONFIG_FRAMEPOINTER
We understand this is disabled so as to release one more
register,bp, for general purpose operations and this is default.
Ideally this information should have got saved in dwarf section, so
theoretically speaking we should be able to unwind the x86/x86_64 dump
even with out CONFIG_FRAMEPOINTER. But some how the stack unwinding is
not as direct as it is in ppc64 we are re-looking into this
implementation.
Regarding Exception Frame on the top of the stack frame
As we understand if we have the pt_regs of the topmost stack we
should be able to unwind to the next stack frame, even if top most
stack frame is an exception frame, atleast in ppc64. We are not sure
of x86 and x86_64 we can relook into that too.
I understand that with the topmost pt_regs you can then start the
backtrace OK. That's not what I'm referring to.
What I'm talking about is bumping into another exception frame
while unwinding from the topmost pt_regs. Or what happens
when the crash occurs while operating on an alternate kernel
stack.
Just take a simple example -- what happens when you actually enter
"alt-sysrq-c" on an x86_64, generating a keyboard interrupt and therefore
a transition to the per-cpu IRQ stack? By the time crash-kexec()
is called, you're already operating on the IRQ stack. Your
code will work its way back to the top of the per-cpu IRQ stack,
but then what does it do? Or suppose the task takes a page fault,
lays down an exception frame, and then later BUG()'s out while attempting
the handle the fault. Your code will start from the most-recently
occurring exception frame, but will bump into the page fault exception
during the unwind operation. Does you code properly recognize the new
exception frame (and the passage through assembly-language code
when that happens)?
All I'm saying is basing your test results simply on instances where
panic() is called or "echo c ..." was entered is the most trivial type
of kernel crash -- because there's no kernel exception frame
laid down until crash_kexec() gets called. That's a fairly rare
occurance w/respect to typical kernel crashes.
Dave