This problem has been reported before, but the discussion on it seemed
to move off track and I don't think that anyone really found the root cause.
The problem is that the x86 backtrace functionality in crash is
dependent upon the struct pt_regs taken from <asm/ptrace.h> at compile
time. struct pt_regs changed in 2.6.20. The result of this is that if
crash is compiled on 2.6.20 or later and subsequently used to look at a
2.6.19 or earlier dump, then exception frames are incorrectly displayed
and backtraces stop at them.
Here is an example of a 2.6.22-compiled crash displaying a trace from a
RHEL5 (2.6.18) dump:
crash> bt
PID: 3490 TASK: f7f5a000 CPU: 0 COMMAND: "insmod"
#0 [f664ddd0] crash_kexec at c0441c78
#1 [f664de14] die at c04064a4
#2 [f664de44] do_page_fault at c0605eea
#3 [f664de94] error_code (via page_fault) at c0405a6f
EAX: 00000000 EBX: f8dd3400 ECX: 00200082 EDX: 00200000
DS: 007b ESI: f7bbeab0 ES: 007b EDI: f7bbe800
SS: ffffe800 ESP: 00000000 EBP: f7bbead8
CS: 0060 EIP: f8dd300d ERR: ffffffff EFLAGS: 00210296
crash>
Note that in the above, crash thinks that the exception frame is a user
mode one and not a kernel frame.
If crash was compiled on RHEL5 (2.6.18), then the trace looks like this:
crash> bt
PID: 3490 TASK: f7f5a000 CPU: 0 COMMAND: "insmod"
#0 [f664ddd0] crash_kexec at c0441c78
#1 [f664de14] die at c04064a4
#2 [f664de44] do_page_fault at c0605eea
#3 [f664de94] error_code (via page_fault) at c0405a6f
EAX: 00000000 EBX: f8dd3400 ECX: 00200082 EDX: 00200000 EBP:
f7bbead8
DS: 007b ESI: f7bbeab0 ES: 007b EDI: f7bbe800
CS: 0060 EIP: f8dd300d ERR: ffffffff EFLAGS: 00210296
#4 [f664dec8] function2 at f8dd300d
#5 [f664dee0] sys_init_module at c043e717
#6 [f664dfb8] system_call at c0404ef8
EAX: ffffffda EBX: 0861a028 ECX: 00010144 EDX: 0861a018
DS: 007b ESI: 00000000 ES: 007b EDI: 00307ff4
SS: 007b ESP: bfe5695c EBP: bfe569a8
CS: 0073 EIP: 00d37402 ERR: 00000080 EFLAGS: 00200206
crash>
A similar problem happens if crash is compiled on pre-2.6.20 and then
used to analyse a 2.6.20 or later dump.
Dave, I have attached a patch to this e-mail which removes the
dependence upon <asm/prtrace.h> from lkcd_x86_trace.c (which is used for
non-LKCD dumps as well as LKCD dumps by the way). I notice that
eframe_init() in x86.c initialises several variables which correspond to
the struct pt_regs so I've had to make these external for
lkcd_x86_trace.c's use. I have no problem in this being reworked if you
feel that these symbols really should be in defs.h (or any other rework
that you think is fit, for that matter).
Regards,
Alan Tyson, HP.