On Wed, Oct 05, 2011 at 09:31:11AM +0200, Borislav Petkov wrote:
On Wed, Oct 05, 2011 at 12:37:28PM +0530, K.Prasad wrote:
> > Well, there are MCE types for which we need to panic but we don't
> > necessarily corrupt memory. Your approach is to unconditionally avoid
> > dumping core whenever we panic while you should look at the MCE
> > signature and decide then whether to capture crashed kernel memory or
> > not.
> >
> > For example, if the MCE signature says UC DRAM error, then you can
> > be pretty sure that there is a landmine somewhere in the DRAM region
> > mapping the crashed kernel. If it is, say, a UC when doing data fills
> > from L2 to L1, that doesn't necessarily mean that DRAM is corrupted. But
> > even in the first case, you can evaluate the MCi_ADDR reported with the
> > UC DRAM error and simply skip that particular cacheline when dumping the
> > core instead of not capturing anything at all.
> >
>
> True. Like stated by me earlier, there could be two possible outcomes
> from capturing memory dump in such cases - they're either dangerous or
> doesn't make sense.
Why, in the second example the only corruption is to the L2 cache so
your memory image is intact. Why wouldn't you want to capture a memory
dump then? It is business as usual in that case.
We don't want to capture memory dump when the machine crashes due to
faulty cache, because the end-user derives no benefit by receiving a
bulky vmcore and running crash analysis tools over them. Instead a
'slimdump' that contains a meaningful message about the origin of crash
(and which can be understood by his analysis tools) would be better, or
so I thought.
There are possibly several hardware errors which cause system crash and
the kdump would capture full vmcore, although it doesn't make sense (I
wouldn't have cared about the second example, you cited, if they did not
generate MCE, but a different exception). In an ideal situation, each of
these error paths would 'subscribe' to slimdump and add a meaningful
message in the NT_NOCOREDUMP note instead of letting the user-space copy
the old kernel memory.
> It is best to avoid a normal kdump in both cases,
> although the elf-note doesn't distinguish between the two.
>
> NT_NOCOREDUMP, in my opinion, is just the first step towards introducing
> a framework where different code paths that lead to panic() can
> 'opt-out' from kdump by adding an elf-note.
>
> We can modify this to add more fine-grained messages using different elf-note
> types (or use the elf-note name under the NT_NOCOREDUMP type) to
> indicate the cause/type of crash.
>
> I'd like to hear further from you and the rest of the community to see if
> there's a need felt for such a change.
I'd make this conditional on whether you have had memory corruption or
not by evaluating MCE signatures and acting accordingly.
Fine with me. I see that the various IA32_MCi_Status registers will hold
information about the error and use that to classify MCEs.
I think the best way to go about is to retain NT_NOCOREDUMP for non-DRAM
errors also, but use the note-name field in the elf-note and distinguish the
various types of errors...say, by using names such as "PANIC_MCE_DRAM",
"PANIC_MCE_CACHE", etc (similar to the error codes described in the Intel
manual). The upstream tools like 'makedumpfile' and 'crash' will have to
be taught to parse the elf-note name and act accordingly.
Thanks for your comments and review.
-- K.Prasad