Hi, I've found a problem on crash 4.0-2.
On x86_64 system, crash causes segmentation fault
by executing "bt -f" for the dumpfile created by NMI.
My System is as follows.
CPU : AMD Opteron(tm) Processor 252
arch : x86_64
memory : 16GB
kernel : 2.6.9-22.EL (RHEL4-U2)
crash : 4.0-2 (RHEL4-U2)
diskdumputils: 1.1.9-4 (RHEL4-U2)
The reproduction step is as follows.
1.Boot x86_64 kernel.
2.Start diskdump service.
3.Execute diskdump by pushing the NMI button.
4.Reboot x86_64 kernel.
5.Get the dumpfile by starting diskdump service.
6.Activate crash and execute "bt -f" for the dumpfile.
7.Segmentation fault after printing exception stack.
After printing the NMI exception frame, x86_64_low_budget_back_trace_cmd
calculates the next bt->frameptr without changing RSP. This will cause
the condition
bt->frameptr > rsp
in line x86_64.c:1097 at x86_64_display_full_frame,
causing the following loop to run continuously until it stops with
a segmentation fault.
The attached patch adds the sanity check (bt->frameptr < rsp) in
x86_64_display_full_frame.
The following example describes this problem when NMI occurs within
"default_idle".
@x86_64_low_budget_back_trace_cmd (x86_64.c:1367)
1.about Exception Stack (x86_64.c:1416)
a. Print Exception Stack.
b. Print Register Info(RIP,RSP) from Exception Stack as function before NMI
exception.
The RIP points the text in "default_idle".
But the area pointed by RSP keeps the address of the text in "cpu_idle",
because RSP doesn't change while "default_idle" is running.
c. bt->frameptr = RSP + sizeof(ulong).
2.about Process Stack (x86_64.c:1655)
a. Try to print stack of "cpu_idle" in x86_64_display_full_frame.
b. bt->frameptr > RSP because of Section 1.c.
c. Cause segmentation fault.
Ken'ichi Ohmichi