Re: [Crash-utility] [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump

Wednesday, 5 October 2011

On Wed, Oct 05, 2011 at 03:17:27PM +0530, K.Prasad wrote:
...
 We don't want to capture memory dump when the machine crashes due
to
 faulty cache, because the end-user derives no benefit by receiving a
 bulky vmcore and running crash analysis tools over them. Instead a
 'slimdump' that contains a meaningful message about the origin of crash
 (and which can be understood by his analysis tools) would be better, or
 so I thought. 
Ok, this makes sense, a meaningful message along with the MCE decoded
properly in userfriendly language so that one can understand why the
system has not captured vmcore.

...
 There are possibly several hardware errors which cause system crash
and
 the kdump would capture full vmcore, although it doesn't make sense (I
 wouldn't have cared about the second example, you cited, if they did not
 generate MCE, but a different exception). In an ideal situation, each of
 these error paths would 'subscribe' to slimdump and add a meaningful
 message in the NT_NOCOREDUMP note instead of letting the user-space copy
 the old kernel memory. 
Yep, I see.

...
 Fine with me. I see that the various IA32_MCi_Status registers will
hold
 information about the error and use that to classify MCEs.

 I think the best way to go about is to retain NT_NOCOREDUMP for non-DRAM
 errors also, but use the note-name field in the elf-note and distinguish the
 various types of errors...say, by using names such as "PANIC_MCE_DRAM",
 "PANIC_MCE_CACHE", etc (similar to the error codes described in the Intel
 manual). The upstream tools like 'makedumpfile' and 'crash' will have to
 be taught to parse the elf-note name and act accordingly. 
Right, so Valdis had the right question in the other mail, let me
generalize it here: does it ever make sense to save vmcore on a hardware
error?

With DRAM errors, you probably could use the additional info coming with
the MCE do decode to the physical address and map back to the DIMM and
swap it. Any other use cases?

Thanks.

-- 
Regards/Gruss,
Boris.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump