----- Original Message -----
On Tuesday, February 3, 2015 12:53 PM, Dave Anderson <anderson(a)redhat.com>
wrote:
> I'll move the hardware error check to the bottom, and only use it if there
> are no other relevant strings found, and then re-test that configuration.
how about match this string "Machine Check Exception:" ? or can I use a
pattern matching ?this is a memory failure at bank 12, usually indicates we
need to replace this memory in bank 12,
here we internally have a tool depends on crash to analyze kernel crash and
find out reasons and solutions, I hope the
"[Hardware Error]: CPU 14: Machine Check Exception: 5 Bank 12:
fe00014b001000c3" line can be matched, instead of the less useful
"Kernel panic - not syncing: Fatal machine check on current CPU" currently
selected,
Why? The kernel panic "Fatal machine check" message above gives you the actual
reason
for the panic. And even if you were to alternatively display the first pr_emerg(HW_ERR)
string, the error message continues with 5 subsequent pr_cont() or pr_emerg(HW_ERR) lines
that you would have to check the log for the details.
I have added several more search strings to your original patch to pretty much prevent
any more generic "Oops ..." messages. But if the sequence gets as far as
checking for
the pr_emerg("Kernel panic - ...", I don't want to override that string. It
*is* the
"panic" string that's supposed to be displayed there.
Dave
<0>[Hardware Error]: CPU 14: Machine Check Exception: 5 Bank
12: fe00014b001000c3
<0>[Hardware Error]: RIP !INEXACT! 10:<ffffffff810ace8a>
{tick_check_idle+0xca/0xe0}
<0>[Hardware Error]: TSC 52e41ed579869d ADDR 53a92b000 MISC 908424000803e8c
<0>[Hardware Error]: PROCESSOR 0:306e4 TIME 1423045186 SOCKET 0 APIC 5
<0>[Hardware Error]: Some CPUs didn't answer in synchronization
<0>[Hardware Error]: Machine check: Invalid
<0>Kernel panic - not syncing: Fatal machine check on current CPU
<4>Pid: 0, comm: swapper Tainted: G M ---------------
2.6.32-431.23.3.el6.YAHOO.20140804.x86_64 #1
<4>Call Trace:
<4> <#MC> [<ffffffff8152866c>] ? panic+0xa7/0x16f
<4> [<ffffffff8102880f>] ? mce_panic+0x20f/0x230
<4> [<ffffffff81029c87>] ? do_machine_check+0x7a7/0xaf0
<4> [<ffffffff810ace8a>] ? tick_check_idle+0xca/0xe0
<4> [<ffffffff8152bc9c>] ? machine_check+0x1c/0x30
<4> [<ffffffff810ace8a>] ? tick_check_idle+0xca/0xe0
<4> <<EOE>> <IRQ> [<ffffffff8107a51c>] ?
irq_enter+0x6c/0x80
<4> [<ffffffff8102b1d3>] ? smp_threshold_interrupt+0x13/0x40
<4> [<ffffffff8100bd13>] ? threshold_interrupt+0x13/0x20
<4> <EOI> [<ffffffff812e14ee>] ? intel_idle+0xde/0x170
<4> [<ffffffff812e14d1>] ? intel_idle+0xc1/0x170
<4> [<ffffffff814274a7>] ? cpuidle_idle_call+0xa7/0x140
<4> [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
<4> [<ffffffff8152219c>] ? start_secondary+0x2ac/0x2ef
Thanks,
- Derek