On Mon, Oct 03, 2011 at 01:02:03PM +0530, K.Prasad wrote:
There are certain types of crashes induced by faulty hardware in
which
capturing crashing kernel's memory (through kdump) makes no sense (or sometimes
dangerous).
A case in point, is unrecoverable memory errors (resulting in fatal machine
check exceptions) in which reading from the faulty memory location from the
kexec'ed kernel will cause double fault and system reset (leaving no
information for the user).
This patch introduces a framework called 'slimdump' enabled through a new
elf-note NT_NOCOREDUMP. Any error whose cause cannot be attributed to a
software error and cannot be detected by analysing the kernel memory may
decide to add this elf-note to the vmcore and indicate the futility of
such an exercise. Tools such as 'kexec', 'makedumpfile' and
'crash' are
also modified in tandem to recognise this new elf-note and capture
'slimdump'.
The physical address and size of the NT_NOCOREDUMP are made available to the
user-space through a "/sys/kernel/nt_nocoredump" sysfs file (just like other
kexec related files).
Even if kernel has to signal to user space the reason for crash, why not
add this info to existing vmcoreinfo note. Something like another filed.
PANIC_MCE=1.
Secondly, the note name NT_NOCOREDUMP itself sounds binding. Kernel can
export the reason of panic and then it is up to user space what do they
want to do with it.
So to me,
Signed-off-by: K.Prasad <prasad(a)linux.vnet.ibm.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 28 ++++++++++++++++++++++++++++
include/linux/elf.h | 18 ++++++++++++++++++
include/linux/kexec.h | 1 +
kernel/kexec.c | 11 +++++++++++
kernel/ksysfs.c | 10 ++++++++++
5 files changed, 68 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 08363b0..483b2fc 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -238,6 +238,34 @@ static atomic_t mce_paniced;
static int fake_panic;
static atomic_t mce_fake_paniced;
+void arch_add_nocoredump_note(u32 *buf)
+{
+ struct elf_note note;
+ const char note_name[] = "PANIC_MCE";
+ const char desc_msg[] = "Crash induced due to a fatal machine "
+ "check error";
+
Again, note_name and desc_msg seem to be only two exports. Frankly desc
string seems pretty obivious and we should be able to ignore it. So just
exporting PANIC_MCE=true or something like that in case of MCE.
Thanks
Vivek