----- Original Message -----
There are just too many kinds of panic types are categorized under
the same Oops: xxxx, makes this field really ambiguous and not so useful
PANIC: "Oops: 0000 [#1] SMP " (check log for details)
this patch separated 3 kinds of panicmsg out, as the most happening cases
among the machines managed by me; the match string are copied
from kernel source code exactly, after applied, I got panicmsg like:
include/linux/kernel.h:#define HW_ERR
panicmsg: "[Hardware Error]: CPU 7: Machine Check Exception: 5 Bank
11: f200003f000100b2"
drivers/char/sysrq.c:__handle_sysrq
panicmsg: "SysRq : Trigger a crash"
arch/x86/kernel/traps.c:do_general_protection
panicmsg: "general protection fault: 8800 [#1] SMP"
arch/x86/mm/fault.c:show_fault_oops
panicmsg: "BUG: unable to handle kernel paging request at
00001248a68eb328"
We need to move the SysRq matching lines to before matching "Oops", because
SysRq lines usually also has the Oops, need to take precedence for SysRq.
Signed-off-by: Derek Che <drc(a)yahoo-inc.com>
Hi Derek,
As I mentioned earlier, in addition to checking for the general
protection faults, in my testing I found several other instances
where the "Oops" message could be replaced with the more meaningful
messages that preceded it, such as double faults, divide errors,
stack segment faults, "Kernel BUG" (with a capital K), "Unable to
handle kernel ..." (with a capital U), etc. I also added a few
break instructions after a search-for message was found instead
of continuing to parse the kernel log.
However, the machine check string search does follow the "kernel panic - "
check, which I understand you would prefer to be the opposite. The
fatal error string searches that are being made come from from die()
calls, or from other message sources that are part of the kernel crash
sequence. On the other hand, the machine check messages are generated
from a stream of pr_emerg(HW_ERR) calls, and are not necessarily
(although likely) crash precedents. But since the kernel panic
message does contain the "Fatal machine check" message, the reason
behind the crash is readily evident.
I appreciate your getting the ball rolling here, as it was certainly
due for an update/improvement.
Queued for crash-7.1.0:
https://github.com/crash-utility/crash/commit/c3840016bf1770b6b1cf571202f...
Thanks,
Dave