Richard, the SS is "bogus" because it is NOT saved by the processor unless there
is a privilege level change with the exception, and in this case there was no privilege
change. I think you'll find SS is valid when a fault occurs in user land, resulting in
a priv change as we enter the kernel.
And NO.. I don't want to see a different format for priv-level-change vs
non-priv-level change exceptions and this makes it harder to post process with perl,
etc..
As for error-code, I don't know why it would be replace with -1
- jim
On Tue, 25 Sep 2007 22:58:44 +0100
Richard J Moore <richardj_moore(a)uk.ibm.com> wrote:
I've been puzzling over why the regs formatted with a backtrace
on an IA32
dump are invalid. Here's what I mean:
PID: 2692 TASK: f4656630 CPU: 0 COMMAND: "rmmod"
#0 [f463ce54] crash_kexec at c044a1f7
#1 [f463ce9c] die at c040651a
#2 [f463ced4] do_page_fault at c0603107
#3 [f463cf14] error_code (via page_fault) at c060190a
EAX: 00000018 EBX: f8b43400 ECX: f8b4304f EDX: 00200000
DS: 007b ESI: 00000000 ES: 007b EDI: 00000000
SS: 304f ESP: f8b4302b EBP: f463c000
CS: 0060 EIP: f8b43004 ERR: ffffffff EFLAGS: 00210286
They are supposed to represent a valid set of regs that are presented to
do_page_fault, which I presume are meant to be valid at the time the
exception occurred.
Of they can never be a set of valid regs for the simple reason that the
CPL is 0 (CS=60) and the RPL of SS is 3, which is an automatic GPF.
Since I manufactured the exception that caused this dump, by causing an
unrecoverable page fault in ring 0, I known the CS is correct but SS is
bogus.
Furthermore the the error code (ERR), which is stored by the processor as
part of the exception stack frame uses only bits 0-2 for page faults and
at most bits 0-15 for other exceptions, the unused bit positions are zero.
So ERR is also bogus.
On looking at the code in entry.S at page_fault and the other exception
entry points I see no attempt to save regs to create a pt_regs struct. The
fact that do_page_fault takes pt_regs as the first arg is a hack to get at
CS:EIP and SS:ESP at the time of exception. Furthermore error_code loads
the exception error code into edx then wipes it out from the stack by
storing -1 into this location. I can't actually see a good reason for
wiping out the error code. By convention exceptions and interrupts have a
-ve integer stored at the error-code location to distinguish them from
system calls, but I don't think this is used. signal.c seems to be the
only place to look for an error code >=0 but I don't see an exception
affects signal.c
Can anyone confirm whether setting the error code to -1 is essential. If
it isn't then I think we should consider leaving it in place.
The long and short of it is: the only thing that has any meaning is CS,
EIP and EFLAGS. All of which are saved by the processor. SS and ESP are
only saved when the exception occurred at a privilege level >0 but these
can never generate a panic.
I'd recommend that we change the bt output to format only the three valid
regs (possibly SS and ESP, if CPL at time of exception >0). Is there any
reason why this shouldn't be changed?
Richard
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU