On Wed, Oct 05, 2011 at 06:16:26PM +0200, Borislav Petkov wrote:
On Wed, Oct 05, 2011 at 12:00:38PM -0400, Valdis.Kletnieks(a)vt.edu
wrote:
> On Wed, 05 Oct 2011 11:52:17 EDT, Vivek Goyal said:
>
> > I am assuming that basic MCE error messages are available in kernel log.
>
> They're not in the kernel log if it's an MCE that causes the kernel to
declare
> a panic. There's some MCE's that you can retry the operation and continue,
and
> some that you can get away with poisoning a page, killing the process, and rest
> of the system is OK. But some you really need to roll over and die because you
> can't guarantee kernel integrity anymore.
>
> And at that point, those messages are never gonna make it to syslogd and onto
> disk.
AFAICT, and for the sake of getting the MCE info, arguably one could
look for log_buf in the vmcore of the old kernel and try to find the
last lines in the kernel log ringbuffer. They should be the MCE error
information from the do_machine_check MCE handler.
This all assuming of course we've managed to dump vmcore successfully by
sidestepping the landmines :-).
We don't have to dump the whole vmcore. "makedumpfile" can extract just
the kernel log buffer. Following is a excerpt from makedumpfile man page.
***************************************************************************
--dump-dmesg
This option overrides the normal behavior of makedumpfile.
Instead of compressing and filtering a VMCORE to make it
smaller, it simply extracts the dmesg log from a VMCORE and
writes it to the specified LOGFILE.
*****************************************************************************
So we can automate the whole thing by first extracting dmesg only, looking
for MCE messages and if there is none in last few lines, then save the
whole vmcore otherwise just save dmesg.
Or break down the dump saving process in two parts. First we just extract
and save the dmesg in a file and then try to save vmcore and in the
process if MCE happens, anyway we will reboot.
Exporting an ELF note just makes MCE information more structured instead
of scanning through dmesg.
Thanks
Vivek