January 2015 - Crash-utility - Crash Utility List Archives

[PATCH] take Hardware Error & kernel pointer bug as separate panicmsg

by drc＠yahoo-inc.com

There are just too many kinds of panic types are categorized under the same Oops: xxxx, makes this field really ambiguous and not so useful PANIC: "Oops: 0000 [#1] SMP " (check log for details) this patch separated 3 kinds of panicmsg out, as the most happening cases among the machines managed by me; the match string are copied from kernel source code exactly, after applied, I got panicmsg like: include/linux/kernel.h:#define HW_ERR panicmsg: "[Hardware Error]: CPU 7: Machine Check Exception: 5 Bank 11: f200003f000100b2" drivers/char/sysrq.c:__handle_sysrq panicmsg: "SysRq : Trigger a crash" arch/x86/mm/fault.c:show_fault_oops panicmsg: "BUG: unable to handle kernel paging request at 00001248a68eb328" Signed-off-by: Derek Che <drc(a)yahoo-inc.com> --- task.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/task.c b/task.c index 4214d7f..74e6028 100644 --- a/task.c +++ b/task.c @@ -5509,8 +5509,19 @@ get_panicmsg(char *buf) } rewind(pc->tmpfile); while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) { - if (strstr(buf, "Oops: ") || - strstr(buf, "kernel BUG at")) + if (strstr(buf, "[Hardware Error]: ")) + msg_found = TRUE; + } + rewind(pc->tmpfile); + while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) { + if (strstr(buf, "SysRq : ")) + msg_found = TRUE; + } + rewind(pc->tmpfile); + while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) { + if (strstr(buf, "Oops: ") || + strstr(buf, "kernel BUG at") || + strstr(buf, "BUG: unable to handle kernel ")) msg_found = TRUE; } rewind(pc->tmpfile);

10 years, 5 months

1
0
0 / 0

[PATCH] take Hardware Error & kernel pointer bug as separate panicmsg

by drc＠yahoo-inc.com

There are just too many kinds of panic types are categorized under the same Oops: xxxx, this is really ambiguous and makes it not so useful PANIC: "Oops: 0000 [#1] SMP " (check log for details) this patch separated two kinds out, as two most happening cases among the machines managed by me; the match string are copied from kernel source code exactly, after applied, I got panicmsg like: include/linux/kernel.h:#define HW_ERR panicmsg: "[Hardware Error]: CPU 7: Machine Check Exception: 5 Bank 11: f200003f000100b2" arch/x86/mm/fault.c:show_fault_oops panicmsg: "BUG: unable to handle kernel paging request at 00001248a68eb328" Signed-off-by: Derek Che <drc(a)yahoo-inc.com> --- task.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/task.c b/task.c index 4214d7f..26b6728 100644 --- a/task.c +++ b/task.c @@ -5509,8 +5509,14 @@ get_panicmsg(char *buf) } rewind(pc->tmpfile); while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) { - if (strstr(buf, "Oops: ") || - strstr(buf, "kernel BUG at")) + if (strstr(buf, "[Hardware Error]: ")) + msg_found = TRUE; + } + rewind(pc->tmpfile); + while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) { + if (strstr(buf, "Oops: ") || + strstr(buf, "kernel BUG at") || + strstr(buf, "BUG: unable to handle kernel ")) msg_found = TRUE; } rewind(pc->tmpfile);

10 years, 5 months

1
0
0 / 0

Re: [Crash-utility] crash-7.0.9 vs. 7.0.8 on ARM - crashing

by gmane＠reliableembeddedsystems.com

Hi, On 2015-01-30 08:26, Dave Anderson wrote: > > The pc->read_vmcoreinfo method is only initialized for ELF kdumps and > compressed kdumps. So either a dummy function should be put in there > that returns a NULL or arm_init() should check for its existence. > > I appreciate the bug report -- I'll post something today for > crash-7.1.0. OK cool. Just let me know if you have something to test for me. I compile crash on my embedded ARM target systems also as a test how stable they whole system is. It might take hours to compile ;) > > Thanks, > Dave > Thanks you, Robert

10 years, 5 months

2
3
0 / 0

Re: [Crash-utility] crash-7.0.9 vs. 7.0.8 on ARM - crashing

by Dave Anderson

----- Original Message ----- > Hi, > > It looks like crash-7.0.9 is broken on ARM, while 7.0.8 works without > any problems. > > I compiled both versions on my ARM target board exactly the same way, > but 7.0.9 throws a core dump when invoked.[1] > > # CONFIG_ARM_LPAE is not set > > [1] http://pastebin.com/HpHeHBAF > > Please advise > > Regards, > > Robert Good catch. Apparently the ARM users and maintainers on this list don't ever run "live" as you are doing. And since I don't have any ARM hardware, I can only test it on supplied dumpfiles with an x86 binary built with "make target=ARM". The problem is this patch that went into crash-7.0.9: Improve the method for determining whether a 32-bit ARM vmlinux is an LPAE enabled kernel by first checking whether CONFIG_ARM_LPAE exists in the vmcoreinfo data, and if it does not, by then checking whether the next higher symbol above "swapper_pg_dir" is 0x5000 bytes higher in value. (sdu.liu(a)huawei.com) diff --git a/arm.c b/arm.c index cb7d841..e7d3dbc 100644 --- a/arm.c +++ b/arm.c @@ -190,6 +190,8 @@ void arm_init(int when) { ulong vaddr; + char *string; + struct syment *sp; #if defined(__i386__) || defined(__x86_64__) if (ACTIVE()) @@ -229,8 +231,13 @@ arm_init(int when) * LPAE requires an additional page for the PGD, * so PG_DIR_SIZE = 0x5000 for LPAE */ - if ((symbol_value("_text") - symbol_value("swapper_pg_dir")) == 0x5000) + if ((string = pc->read_vmcoreinfo("CONFIG_ARM_LPAE"))) { machdep->flags |= PAE; + free(string); + } else if ((sp = next_symbol("swapper_pg_dir", NULL)) && + (sp->value - symbol_value("swapper_pg_dir")) == 0x5000) + machdep->flags |= PAE; + machdep->kvbase = symbol_value("_stext") & ~KVBASE_MASK; machdep->identity_map_base = machdep->kvbase; machdep->is_kvaddr = arm_is_kvaddr; The pc->read_vmcoreinfo method is only initialized for ELF kdumps and compressed kdumps. So either a dummy function should be put in there that returns a NULL or arm_init() should check for its existence. I appreciate the bug report -- I'll post something today for crash-7.1.0. Thanks, Dave > ..."One of my most productive days was throwing away 1000 lines of > code." - Ken Thompson. > > My public pgp key is available,at: > http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x90320BF1 > >

10 years, 5 months

1
0
0 / 0

Fwd: crash-7.0.9 vs. 7.0.8 on ARM - crashing

by Dave Anderson

----- Forwarded Message ----- From: "Robert Berger" <gmane(a)reliableembeddedsystems.com> Cc: "Robert Berger" <robert.berger(a)reliableembeddedsystems.com>, anderson(a)redhat.com Sent: Friday, January 30, 2015 4:02:01 AM Subject: crash-7.0.9 vs. 7.0.8 on ARM - crashing Hi, It looks like crash-7.0.9 is broken on ARM, while 7.0.8 works without any problems. I compiled both versions on my ARM target board exactly the same way, but 7.0.9 throws a core dump when invoked.[1] # CONFIG_ARM_LPAE is not set [1] http://pastebin.com/HpHeHBAF Please advise Regards, Robert ..."One of my most productive days was throwing away 1000 lines of code." - Ken Thompson. My public pgp key is available,at: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x90320BF1

10 years, 5 months

1
0
0 / 0

unwind not working on x86?

by Jan Willeke

Hello I am trying to use both the unwind function on x86 in crash, and the fp extension. Both are not working for me. I did the following tests: (crash 7.0.9) crash/crash vmcore.201412161409 linux-3.2.64/vmlinux GNU gdb (GDB) 7.6 KERNEL: linux-3.2.64/vmlinux DUMPFILE: vmcore.201412161409 CPUS: 1 DATE: Tue Dec 16 15:09:46 2014 UPTIME: 00:00:46 LOAD AVERAGE: 0.05, 0.01, 0.01 TASKS: 55 NODENAME: debian RELEASE: 3.2.64 VERSION: #2 SMP Tue Dec 16 15:08:10 CET 2014 MACHINE: x86_64 (2392 Mhz) MEMORY: 383.5 MB PANIC: "[ 46.736164] Oops: 0002 [#1] SMP " (check log for details) PID: 1962 COMMAND: "tee" TASK: ffff88000e5e0000 [THREAD_INFO: ffff88000c8e8000] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> bt PID: 1962 TASK: ffff88000e5e0000 CPU: 0 COMMAND: "tee" #0 [ffff88000c8e99f0] machine_kexec at ffffffff81038e0a #1 [ffff88000c8e9a60] crash_kexec at ffffffff810b3a92 #2 [ffff88000c8e9b30] oops_end at ffffffff816427e8 #3 [ffff88000c8e9b60] no_context at ffffffff81635d6f #4 [ffff88000c8e9bc0] __bad_area_nosemaphore at ffffffff81635f49 #5 [ffff88000c8e9c20] bad_area at ffffffff81635fc2 #6 [ffff88000c8e9c50] do_page_fault at ffffffff81645454 #7 [ffff88000c8e9d60] do_async_page_fault at ffffffff81644b75 #8 [ffff88000c8e9d80] async_page_fault at ffffffff81641de5 [exception RIP: sysrq_handle_crash+22] RIP: ffffffff813cad46 RSP: ffff88000c8e9e38 RFLAGS: 00010092 RAX: 0000000000000010 RBX: 0000000000000063 RCX: 00000000ffffffff RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000063 RBP: ffff88000c8e9e38 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81c725e0 R13: 0000000000000286 R14: 0000000000000007 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88000c8e9e40] __handle_sysrq at ffffffff813cb461 #10 [ffff88000c8e9e80] write_sysrq_trigger at ffffffff813cb51a #11 [ffff88000c8e9eb0] proc_reg_write at ffffffff811d60e2 #12 [ffff88000c8e9f00] vfs_write at ffffffff81176053 #13 [ffff88000c8e9f30] sys_write at ffffffff8117637a #14 [ffff88000c8e9f80] sysenter_dispatch at ffffffff8164bc70 RIP: 00000000f775e430 RSP: 00000000ffe6a720 RFLAGS: 00000296 RAX: 0000000000000004 RBX: ffffffff8164bc70 RCX: 00000000ffe6a81c RDX: 0000000000000002 RSI: 0000000000000002 RDI: 00000000ffe6a81c RBP: 00000000ffe6a758 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: 0000000000000004 CS: 0023 SS: 002b crash> set unwind on unwind: on crash> bt PID: 1962 TASK: ffff88000e5e0000 CPU: 0 COMMAND: "tee" #0 [ffff88000c8e99f0] machine_kexec at ffffffff81038e0a RIP: 00000000f775e430 RSP: 00000000ffe6a720 RFLAGS: 00000296 RAX: 0000000000000004 RBX: ffffffff8164bc70 RCX: 00000000ffe6a81c RDX: 0000000000000002 RSI: 0000000000000002 RDI: 00000000ffe6a81c RBP: 00000000ffe6a758 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000 ORIG_RAX: 0000000000000004 CS: 0023 SS: 002b crash> -> no functions no parameters ------------------------------------------------------------------------------------------------------------- crash> extend crash/extensions/fp.so ./crash/extensions/fp.so: shared object loaded crash> fp ................. -> no functions no parameters Did I anything wrong? Best Regards, Jan Willeke

10 years, 5 months

2
1
0 / 0

[PATCH] fix missing RT PRIO_ARRAY table with CONFIG_RT_GROUP_SCHED=n

by Mitsuya Shibata

On the kernel with CONFIG_RT_GROUP_SCHED=n, the "RT PRIO_ARRAY" table of runq command always empty, nevertheless exists "rt_sched_class" task. This cause to substract offset "task_struct->rt - task_struct" only if there is my_q member (ie. CONFIG_RT_GROUP_SCHED=y). Therefore dump_RT_prio_array() passes the address of "rt member of task_struct" to task_to_context(). This patch ensure to pass the address of "task_struct" to task_to_context(). --- task.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/task.c b/task.c index 147ff5c..50c82c8 100644 --- a/task.c +++ b/task.c @@ -8688,9 +8688,9 @@ dump_RT_prio_array(ulong k_prio_array, char *u_prio_array) &rt_rq_buf[OFFSET(rt_rq_active)]); FREEBUF(rt_rq_buf); continue; - } else - task_addr -= OFFSET(task_struct_rt); + } } + task_addr -= OFFSET(task_struct_rt); if (!(tc = task_to_context(task_addr))) continue; -- 1.9.1

10 years, 5 months

2
4
0 / 0

What is the unit for 'last_arrival' in 'task_struct.sched_info.last_arrival'.

by Saravanan Palanisamy

Hi, What is the unit for 'last_arrival' in 'task_struct.sched_info.last_arrival' ? I see that this value is used by 'ps -l' crash-utility command. >> -l display the task last_run or timestamp value, whichever applies, of selected, or all, tasks; the list is sorted with the most recently-run task (largest last_run/timestamp) shown first. >> I see that this value (16 decimal digits) is much higher than jiffies value (10 decimal digits) in my crash dumps. This value (unsigned long long) seems to be equal to the 'jiffies' (unsigned long) value when the task was scheduled. crash> p jiffies jiffies = $9 = 5310085968 crash> crash> ps -l .. [4058835599089874] PID: 4136 TASK: ffff8801309ce640 CPU: 4 COMMAND: "kcapwdt" ... System info: ----------------- MACHINE: x86_64 (2533 Mhz) Linux Kernel Version : 3.2.30 Thanks, Saravanan

10 years, 5 months

3
2
0 / 0

Fwd: [PATCH] crash: use %lu for counters

by Dave Anderson

----- Forwarded Message ----- From: "Dave Anderson" <anderson(a)redhat.com> To: "Alexey Dobriyan" <adobriyan(a)gmail.com> Sent: Thursday, January 22, 2015 1:52:16 PM Subject: Re: [PATCH] crash: use %lu for counters ----- Original Message ----- > These counters are "unsigned long" in kernel and positive in principle. > Seen during debugging OOM apocalypse event. > Queued for crash-7.1.0: https://github.com/crash-utility/crash/commit/2aa609f7d947941055a0e95db61... The "PAGES_SCANNED" segment of your patch was reported/fixed/committed a couple days ago: https://github.com/crash-utility/crash/commit/a58a34e95cf32cd2f8609a71c36... Thanks, Dave

10 years, 6 months

1
0
0 / 0

Fwd: [PATCH] crash: use %lu for counters

by Dave Anderson

----- Forwarded Message ----- From: "Alexey Dobriyan" <adobriyan(a)gmail.com> To: "Dave Anderson" <anderson(a)redhat.com> Sent: Thursday, January 22, 2015 5:56:11 AM Subject: [PATCH] crash: use %lu for counters These counters are "unsigned long" in kernel and positive in principle. Seen during debugging OOM apocalypse event.

10 years, 6 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility January 2015