I've been puzzling over why the regs formatted with a backtrace on an IA32 dump are invalid. Here's what I mean:

PID: 2692   TASK: f4656630  CPU: 0   COMMAND: "rmmod"
 #0 [f463ce54] crash_kexec at c044a1f7
 #1 [f463ce9c] die at c040651a
 #2 [f463ced4] do_page_fault at c0603107
 #3 [f463cf14] error_code (via page_fault) at c060190a
    EAX: 00000018  EBX: f8b43400  ECX: f8b4304f  EDX: 00200000
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000000
    SS:  304f      ESP: f8b4302b  EBP: f463c000
    CS:  0060      EIP: f8b43004  ERR: ffffffff  EFLAGS: 00210286


They are supposed to represent a valid set of regs that are presented to do_page_fault, which I presume are meant to be valid at the time the exception occurred.
Of they can never be a set of valid regs for the simple reason that the CPL is 0 (CS=60) and the RPL of SS is 3, which is an automatic GPF.
Since I manufactured the exception that caused this dump, by causing an unrecoverable page fault in ring 0, I known the CS is correct but SS is bogus.
Furthermore the the error code (ERR), which is stored by the processor as part of the exception stack frame uses only bits 0-2 for page faults and at most bits 0-15 for other exceptions, the unused bit positions are zero. So ERR is also bogus.

On looking at the code in entry.S at page_fault and the other exception entry points I see no attempt to save regs to create a pt_regs struct. The fact that do_page_fault takes pt_regs as the first arg is a hack to get at CS:EIP and SS:ESP at the time of exception. Furthermore error_code loads the exception error code into edx then wipes it out from the stack by storing -1 into this location. I can't actually see a good reason for wiping out the error code. By convention exceptions and interrupts have a -ve integer stored at the error-code location to distinguish them from system calls, but I don't think this is used. signal.c seems to be the only place to look for an error code >=0 but I don't see an exception affects signal.c

Can anyone confirm whether setting the error code to -1 is essential. If it isn't then I think we should consider leaving it in place.


The long and short of it is: the only thing that has any meaning is CS, EIP and EFLAGS. All of which are saved by the processor.  SS and ESP are only saved when the exception occurred at a privilege level >0 but these can never generate a panic.

I'd recommend that we change the bt output to format only the three valid regs (possibly SS and ESP, if CPL at time of exception >0). Is there any reason why this shouldn't be changed?

Richard







Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU