[PATCH] take Hardware Error & kernel pointer bug as separate panicmsg
by drc@yahoo-inc.com
There are just too many kinds of panic types are categorized under
the same Oops: xxxx, makes this field really ambiguous and not so useful
PANIC: "Oops: 0000 [#1] SMP " (check log for details)
this patch separated 3 kinds of panicmsg out, as the most happening cases
among the machines managed by me; the match string are copied
from kernel source code exactly, after applied, I got panicmsg like:
include/linux/kernel.h:#define HW_ERR
panicmsg: "[Hardware Error]: CPU 7: Machine Check Exception: 5 Bank 11: f200003f000100b2"
drivers/char/sysrq.c:__handle_sysrq
panicmsg: "SysRq : Trigger a crash"
arch/x86/mm/fault.c:show_fault_oops
panicmsg: "BUG: unable to handle kernel paging request at 00001248a68eb328"
Signed-off-by: Derek Che <drc(a)yahoo-inc.com>
---
task.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/task.c b/task.c
index 4214d7f..74e6028 100644
--- a/task.c
+++ b/task.c
@@ -5509,8 +5509,19 @@ get_panicmsg(char *buf)
}
rewind(pc->tmpfile);
while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) {
- if (strstr(buf, "Oops: ") ||
- strstr(buf, "kernel BUG at"))
+ if (strstr(buf, "[Hardware Error]: "))
+ msg_found = TRUE;
+ }
+ rewind(pc->tmpfile);
+ while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) {
+ if (strstr(buf, "SysRq : "))
+ msg_found = TRUE;
+ }
+ rewind(pc->tmpfile);
+ while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) {
+ if (strstr(buf, "Oops: ") ||
+ strstr(buf, "kernel BUG at") ||
+ strstr(buf, "BUG: unable to handle kernel "))
msg_found = TRUE;
}
rewind(pc->tmpfile);
9 years, 10 months
[PATCH] take Hardware Error & kernel pointer bug as separate panicmsg
by drc@yahoo-inc.com
There are just too many kinds of panic types are categorized under
the same Oops: xxxx, this is really ambiguous and makes it not so useful
PANIC: "Oops: 0000 [#1] SMP " (check log for details)
this patch separated two kinds out, as two most happening cases
among the machines managed by me; the match string are copied
from kernel source code exactly, after applied, I got panicmsg like:
include/linux/kernel.h:#define HW_ERR
panicmsg: "[Hardware Error]: CPU 7: Machine Check Exception: 5 Bank 11: f200003f000100b2"
arch/x86/mm/fault.c:show_fault_oops
panicmsg: "BUG: unable to handle kernel paging request at 00001248a68eb328"
Signed-off-by: Derek Che <drc(a)yahoo-inc.com>
---
task.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/task.c b/task.c
index 4214d7f..26b6728 100644
--- a/task.c
+++ b/task.c
@@ -5509,8 +5509,14 @@ get_panicmsg(char *buf)
}
rewind(pc->tmpfile);
while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) {
- if (strstr(buf, "Oops: ") ||
- strstr(buf, "kernel BUG at"))
+ if (strstr(buf, "[Hardware Error]: "))
+ msg_found = TRUE;
+ }
+ rewind(pc->tmpfile);
+ while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) {
+ if (strstr(buf, "Oops: ") ||
+ strstr(buf, "kernel BUG at") ||
+ strstr(buf, "BUG: unable to handle kernel "))
msg_found = TRUE;
}
rewind(pc->tmpfile);
9 years, 10 months
Re: [Crash-utility] crash-7.0.9 vs. 7.0.8 on ARM - crashing
by gmane@reliableembeddedsystems.com
Hi,
On 2015-01-30 08:26, Dave Anderson wrote:
>
> The pc->read_vmcoreinfo method is only initialized for ELF kdumps and
> compressed kdumps. So either a dummy function should be put in there
> that returns a NULL or arm_init() should check for its existence.
>
> I appreciate the bug report -- I'll post something today for
> crash-7.1.0.
OK cool. Just let me know if you have something to test for me.
I compile crash on my embedded ARM target systems also as a test how
stable they whole system is. It might take hours to compile ;)
>
> Thanks,
> Dave
>
Thanks you,
Robert
9 years, 10 months
Re: [Crash-utility] crash-7.0.9 vs. 7.0.8 on ARM - crashing
by Dave Anderson
----- Original Message -----
> Hi,
>
> It looks like crash-7.0.9 is broken on ARM, while 7.0.8 works without
> any problems.
>
> I compiled both versions on my ARM target board exactly the same way,
> but 7.0.9 throws a core dump when invoked.[1]
>
> # CONFIG_ARM_LPAE is not set
>
> [1] http://pastebin.com/HpHeHBAF
>
> Please advise
>
> Regards,
>
> Robert
Good catch. Apparently the ARM users and maintainers on this list don't
ever run "live" as you are doing. And since I don't have any ARM hardware,
I can only test it on supplied dumpfiles with an x86 binary built with
"make target=ARM".
The problem is this patch that went into crash-7.0.9:
Improve the method for determining whether a 32-bit ARM vmlinux is
an LPAE enabled kernel by first checking whether CONFIG_ARM_LPAE
exists in the vmcoreinfo data, and if it does not, by then checking
whether the next higher symbol above "swapper_pg_dir" is 0x5000 bytes
higher in value.
(sdu.liu(a)huawei.com)
diff --git a/arm.c b/arm.c
index cb7d841..e7d3dbc 100644
--- a/arm.c
+++ b/arm.c
@@ -190,6 +190,8 @@ void
arm_init(int when)
{
ulong vaddr;
+ char *string;
+ struct syment *sp;
#if defined(__i386__) || defined(__x86_64__)
if (ACTIVE())
@@ -229,8 +231,13 @@ arm_init(int when)
* LPAE requires an additional page for the PGD,
* so PG_DIR_SIZE = 0x5000 for LPAE
*/
- if ((symbol_value("_text") - symbol_value("swapper_pg_dir")) == 0x5000)
+ if ((string = pc->read_vmcoreinfo("CONFIG_ARM_LPAE"))) {
machdep->flags |= PAE;
+ free(string);
+ } else if ((sp = next_symbol("swapper_pg_dir", NULL)) &&
+ (sp->value - symbol_value("swapper_pg_dir")) == 0x5000)
+ machdep->flags |= PAE;
+
machdep->kvbase = symbol_value("_stext") & ~KVBASE_MASK;
machdep->identity_map_base = machdep->kvbase;
machdep->is_kvaddr = arm_is_kvaddr;
The pc->read_vmcoreinfo method is only initialized for ELF kdumps and
compressed kdumps. So either a dummy function should be put in there
that returns a NULL or arm_init() should check for its existence.
I appreciate the bug report -- I'll post something today for crash-7.1.0.
Thanks,
Dave
> ..."One of my most productive days was throwing away 1000 lines of
> code." - Ken Thompson.
>
> My public pgp key is available,at:
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x90320BF1
>
>
9 years, 10 months
Fwd: crash-7.0.9 vs. 7.0.8 on ARM - crashing
by Dave Anderson
----- Forwarded Message -----
From: "Robert Berger" <gmane(a)reliableembeddedsystems.com>
Cc: "Robert Berger" <robert.berger(a)reliableembeddedsystems.com>, anderson(a)redhat.com
Sent: Friday, January 30, 2015 4:02:01 AM
Subject: crash-7.0.9 vs. 7.0.8 on ARM - crashing
Hi,
It looks like crash-7.0.9 is broken on ARM, while 7.0.8 works without
any problems.
I compiled both versions on my ARM target board exactly the same way,
but 7.0.9 throws a core dump when invoked.[1]
# CONFIG_ARM_LPAE is not set
[1] http://pastebin.com/HpHeHBAF
Please advise
Regards,
Robert
..."One of my most productive days was throwing away 1000 lines of
code." - Ken Thompson.
My public pgp key is available,at:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x90320BF1
9 years, 10 months
unwind not working on x86?
by Jan Willeke
Hello
I am trying to use both the unwind function on x86 in crash, and the fp
extension. Both are not working for me.
I did the following tests: (crash 7.0.9)
crash/crash vmcore.201412161409 linux-3.2.64/vmlinux
GNU gdb (GDB) 7.6
KERNEL: linux-3.2.64/vmlinux
DUMPFILE: vmcore.201412161409
CPUS: 1
DATE: Tue Dec 16 15:09:46 2014
UPTIME: 00:00:46
LOAD AVERAGE: 0.05, 0.01, 0.01
TASKS: 55
NODENAME: debian
RELEASE: 3.2.64
VERSION: #2 SMP Tue Dec 16 15:08:10 CET 2014
MACHINE: x86_64 (2392 Mhz)
MEMORY: 383.5 MB
PANIC: "[ 46.736164] Oops: 0002 [#1] SMP " (check log for details)
PID: 1962
COMMAND: "tee"
TASK: ffff88000e5e0000 [THREAD_INFO: ffff88000c8e8000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 1962 TASK: ffff88000e5e0000 CPU: 0 COMMAND: "tee"
#0 [ffff88000c8e99f0] machine_kexec at ffffffff81038e0a
#1 [ffff88000c8e9a60] crash_kexec at ffffffff810b3a92
#2 [ffff88000c8e9b30] oops_end at ffffffff816427e8
#3 [ffff88000c8e9b60] no_context at ffffffff81635d6f
#4 [ffff88000c8e9bc0] __bad_area_nosemaphore at ffffffff81635f49
#5 [ffff88000c8e9c20] bad_area at ffffffff81635fc2
#6 [ffff88000c8e9c50] do_page_fault at ffffffff81645454
#7 [ffff88000c8e9d60] do_async_page_fault at ffffffff81644b75
#8 [ffff88000c8e9d80] async_page_fault at ffffffff81641de5
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff813cad46 RSP: ffff88000c8e9e38 RFLAGS: 00010092
RAX: 0000000000000010 RBX: 0000000000000063 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000063
RBP: ffff88000c8e9e38 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81c725e0
R13: 0000000000000286 R14: 0000000000000007 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff88000c8e9e40] __handle_sysrq at ffffffff813cb461
#10 [ffff88000c8e9e80] write_sysrq_trigger at ffffffff813cb51a
#11 [ffff88000c8e9eb0] proc_reg_write at ffffffff811d60e2
#12 [ffff88000c8e9f00] vfs_write at ffffffff81176053
#13 [ffff88000c8e9f30] sys_write at ffffffff8117637a
#14 [ffff88000c8e9f80] sysenter_dispatch at ffffffff8164bc70
RIP: 00000000f775e430 RSP: 00000000ffe6a720 RFLAGS: 00000296
RAX: 0000000000000004 RBX: ffffffff8164bc70 RCX: 00000000ffe6a81c
RDX: 0000000000000002 RSI: 0000000000000002 RDI: 00000000ffe6a81c
RBP: 00000000ffe6a758 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000
ORIG_RAX: 0000000000000004 CS: 0023 SS: 002b
crash> set unwind on
unwind: on
crash> bt
PID: 1962 TASK: ffff88000e5e0000 CPU: 0 COMMAND: "tee"
#0 [ffff88000c8e99f0] machine_kexec at ffffffff81038e0a
RIP: 00000000f775e430 RSP: 00000000ffe6a720 RFLAGS: 00000296
RAX: 0000000000000004 RBX: ffffffff8164bc70 RCX: 00000000ffe6a81c
RDX: 0000000000000002 RSI: 0000000000000002 RDI: 00000000ffe6a81c
RBP: 00000000ffe6a758 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000
ORIG_RAX: 0000000000000004 CS: 0023 SS: 002b
crash>
-> no functions no parameters
-------------------------------------------------------------------------------------------------------------
crash> extend crash/extensions/fp.so
./crash/extensions/fp.so: shared object loaded
crash> fp
.................
-> no functions no parameters
Did I anything wrong?
Best Regards,
Jan Willeke
9 years, 10 months
[PATCH] fix missing RT PRIO_ARRAY table with CONFIG_RT_GROUP_SCHED=n
by Mitsuya Shibata
On the kernel with CONFIG_RT_GROUP_SCHED=n, the "RT PRIO_ARRAY" table of runq
command always empty, nevertheless exists "rt_sched_class" task.
This cause to substract offset "task_struct->rt - task_struct" only if there is
my_q member (ie. CONFIG_RT_GROUP_SCHED=y). Therefore dump_RT_prio_array()
passes the address of "rt member of task_struct" to task_to_context().
This patch ensure to pass the address of "task_struct" to task_to_context().
---
task.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/task.c b/task.c
index 147ff5c..50c82c8 100644
--- a/task.c
+++ b/task.c
@@ -8688,9 +8688,9 @@ dump_RT_prio_array(ulong k_prio_array, char *u_prio_array)
&rt_rq_buf[OFFSET(rt_rq_active)]);
FREEBUF(rt_rq_buf);
continue;
- } else
- task_addr -= OFFSET(task_struct_rt);
+ }
}
+ task_addr -= OFFSET(task_struct_rt);
if (!(tc = task_to_context(task_addr)))
continue;
--
1.9.1
9 years, 10 months
What is the unit for 'last_arrival' in 'task_struct.sched_info.last_arrival'.
by Saravanan Palanisamy
Hi,
What is the unit for 'last_arrival' in
'task_struct.sched_info.last_arrival' ?
I see that this value is used by 'ps -l' crash-utility command.
>>
-l display the task last_run or timestamp value, whichever applies,
of selected, or all, tasks; the list is sorted with the most
recently-run task (largest last_run/timestamp) shown first.
>>
I see that this value (16 decimal digits) is much higher than jiffies
value (10 decimal digits) in my crash dumps.
This value (unsigned long long) seems to be equal to the 'jiffies'
(unsigned long) value when the task was scheduled.
crash> p jiffies
jiffies = $9 = 5310085968
crash>
crash> ps -l
..
[4058835599089874] PID: 4136 TASK: ffff8801309ce640 CPU: 4 COMMAND:
"kcapwdt"
...
System info:
-----------------
MACHINE: x86_64 (2533 Mhz)
Linux Kernel Version : 3.2.30
Thanks,
Saravanan
9 years, 10 months
Fwd: [PATCH] crash: use %lu for counters
by Dave Anderson
----- Forwarded Message -----
From: "Alexey Dobriyan" <adobriyan(a)gmail.com>
To: "Dave Anderson" <anderson(a)redhat.com>
Sent: Thursday, January 22, 2015 5:56:11 AM
Subject: [PATCH] crash: use %lu for counters
These counters are "unsigned long" in kernel and positive in principle.
Seen during debugging OOM apocalypse event.
9 years, 10 months