On Wed, Oct 05, 2011 at 09:33:13AM +0200, Borislav Petkov wrote:
On Wed, Oct 05, 2011 at 12:48:44PM +0530, K.Prasad wrote:
> On Tue, Oct 04, 2011 at 10:04:37AM -0400, Vivek Goyal wrote:
> > On Mon, Oct 03, 2011 at 01:02:03PM +0530, K.Prasad wrote:
> > > There are certain types of crashes induced by faulty hardware in which
> > > capturing crashing kernel's memory (through kdump) makes no sense (or
sometimes
> > > dangerous).
> > >
> > > A case in point, is unrecoverable memory errors (resulting in fatal
machine
> > > check exceptions) in which reading from the faulty memory location from
the
> > > kexec'ed kernel will cause double fault and system reset (leaving no
> > > information for the user).
> >
> > Prasad,
> >
> > I am just trying to remember what was wrong with Andi's approach of
> > disable MCE while copying the dump?
> >
>
> Hi Vivek,
> The behaviour upon a read operation on an UC memory location is
> undefined and so we want to avoid it (previously discussed here:
>
http://article.gmane.org/gmane.linux.kernel/1146799). When we disable
> MCE and copy the dump, we will invariably read the faulty memory
> location.
Right, from the message above:
"- To disable MCE exceptions as done by the patches cited above. However
the result of a read operation on corrupted memory is unknown and the
system behaviour is undefined. We're unsure if this is a safe thing to
do."
Can you elaborate more on that? Are we talking poisoned memory here or
undetected and uncorrectable memory errors?
It refers to uncorrected memory errors that are not consumed and the
corresponding 'struct page's are marked PG_hwpoison. Typically the SRAO
type errors that are handled in mm/memory-failure.c.
If MCE is enabled, during a kdump, we will deliberately trigger a read
operation over the poisoned memory and make the UCE fatal. It is not
clear what would happen if MCE is disabled in the above case.
Thanks,
K.Prasad