[PATCH] take Hardware Error & kernel pointer bug as separate panicmsg
                                
                                
                                
                                    
                                        by drc@yahoo-inc.com
                                    
                                
                                
                                        There are just too many kinds of panic types are categorized under
the same Oops: xxxx, makes this field really ambiguous and not so useful
       PANIC: "Oops: 0000 [#1] SMP " (check log for details)
this patch separated 3 kinds of panicmsg out, as the most happening cases
among the machines managed by me; the match string are copied
from kernel source code exactly, after applied, I got panicmsg like:
 include/linux/kernel.h:#define HW_ERR
          panicmsg: "[Hardware Error]: CPU 7: Machine Check Exception: 5 Bank 11: f200003f000100b2"
 drivers/char/sysrq.c:__handle_sysrq
          panicmsg: "SysRq : Trigger a crash"
 arch/x86/mm/fault.c:show_fault_oops
          panicmsg: "BUG: unable to handle kernel paging request at 00001248a68eb328"
Signed-off-by: Derek Che <drc(a)yahoo-inc.com>
---
 task.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/task.c b/task.c
index 4214d7f..74e6028 100644
--- a/task.c
+++ b/task.c
@@ -5509,8 +5509,19 @@ get_panicmsg(char *buf)
 	}
 	rewind(pc->tmpfile);
 	while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) {
-	        if (strstr(buf, "Oops: ") || 
-		    strstr(buf, "kernel BUG at")) 
+		if (strstr(buf, "[Hardware Error]: "))
+			msg_found = TRUE;
+	}
+	rewind(pc->tmpfile);
+	while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) {
+		if (strstr(buf, "SysRq : "))
+			msg_found = TRUE;
+	}
+	rewind(pc->tmpfile);
+	while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) {
+	        if (strstr(buf, "Oops: ") ||
+		    strstr(buf, "kernel BUG at") ||
+		    strstr(buf, "BUG: unable to handle kernel "))
 	        	msg_found = TRUE;
 	}
         rewind(pc->tmpfile);
                                
                         
                        
                                
                                10 years, 9 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        [PATCH] take Hardware Error & kernel pointer bug as separate panicmsg
                                
                                
                                
                                    
                                        by drc@yahoo-inc.com
                                    
                                
                                
                                        There are just too many kinds of panic types are categorized under
the same Oops: xxxx, this is really ambiguous and makes it not so useful
       PANIC: "Oops: 0000 [#1] SMP " (check log for details)
this patch separated two kinds out, as two most happening cases
among the machines managed by me; the match string are copied
from kernel source code exactly, after applied, I got panicmsg like:
 include/linux/kernel.h:#define HW_ERR
          panicmsg: "[Hardware Error]: CPU 7: Machine Check Exception: 5 Bank 11: f200003f000100b2"
 arch/x86/mm/fault.c:show_fault_oops
          panicmsg: "BUG: unable to handle kernel paging request at 00001248a68eb328"
Signed-off-by: Derek Che <drc(a)yahoo-inc.com>
---
 task.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/task.c b/task.c
index 4214d7f..26b6728 100644
--- a/task.c
+++ b/task.c
@@ -5509,8 +5509,14 @@ get_panicmsg(char *buf)
 	}
 	rewind(pc->tmpfile);
 	while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) {
-	        if (strstr(buf, "Oops: ") || 
-		    strstr(buf, "kernel BUG at")) 
+		if (strstr(buf, "[Hardware Error]: "))
+			msg_found = TRUE;
+	}
+	rewind(pc->tmpfile);
+	while (!msg_found && fgets(buf, BUFSIZE, pc->tmpfile)) {
+	        if (strstr(buf, "Oops: ") ||
+		    strstr(buf, "kernel BUG at") ||
+		    strstr(buf, "BUG: unable to handle kernel "))
 	        	msg_found = TRUE;
 	}
         rewind(pc->tmpfile);
                                
                         
                        
                                
                                10 years, 9 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] crash-7.0.9 vs. 7.0.8 on ARM - crashing
                                
                                
                                
                                    
                                        by gmane@reliableembeddedsystems.com
                                    
                                
                                
                                        Hi,
On 2015-01-30 08:26, Dave Anderson wrote:
> 
> The pc->read_vmcoreinfo method is only initialized for ELF kdumps and
> compressed kdumps.  So either a dummy function should be put in there
> that returns a NULL or arm_init() should check for its existence.
> 
> I appreciate the bug report -- I'll post something today for 
> crash-7.1.0.
OK cool. Just let me know if you have something to test for me.
I compile crash on my embedded ARM target systems also as a test how 
stable they whole system is. It might take hours to compile ;)
> 
> Thanks,
>   Dave
> 
Thanks you,
Robert
                                
                         
                        
                                
                                10 years, 9 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] crash-7.0.9 vs. 7.0.8 on ARM - crashing
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- Original Message -----
> Hi,
>
> It looks like crash-7.0.9 is broken on ARM, while 7.0.8 works without
> any problems.
>
> I compiled both versions on my ARM target board exactly the same way,
> but 7.0.9 throws a core dump when invoked.[1]
>
> # CONFIG_ARM_LPAE is not set
>
> [1] http://pastebin.com/HpHeHBAF
>
> Please advise
>
> Regards,
>
> Robert
Good catch.  Apparently the ARM users and maintainers on this list don't
ever run "live" as you are doing.  And since I don't have any ARM hardware,
I can only test it on supplied dumpfiles with an x86 binary built with
"make target=ARM".
The problem is this patch that went into crash-7.0.9:
  Improve the method for determining whether a 32-bit ARM vmlinux is
  an LPAE enabled kernel by first checking whether CONFIG_ARM_LPAE
  exists in the vmcoreinfo data, and if it does not, by then checking
  whether the next higher symbol above "swapper_pg_dir" is 0x5000 bytes
  higher in value.
  (sdu.liu(a)huawei.com)
diff --git a/arm.c b/arm.c
index cb7d841..e7d3dbc 100644
--- a/arm.c
+++ b/arm.c
@@ -190,6 +190,8 @@ void
 arm_init(int when)
 {
        ulong vaddr;
+       char *string;
+       struct syment *sp;
 
 #if defined(__i386__) || defined(__x86_64__)
        if (ACTIVE())
@@ -229,8 +231,13 @@ arm_init(int when)
                 * LPAE requires an additional page for the PGD,
                 * so PG_DIR_SIZE = 0x5000 for LPAE
                 */
-               if ((symbol_value("_text") - symbol_value("swapper_pg_dir")) == 0x5000)
+               if ((string = pc->read_vmcoreinfo("CONFIG_ARM_LPAE"))) {
                        machdep->flags |= PAE;
+                       free(string);
+               } else if ((sp = next_symbol("swapper_pg_dir", NULL)) &&
+                        (sp->value - symbol_value("swapper_pg_dir")) == 0x5000)
+                         machdep->flags |= PAE;
+
                machdep->kvbase = symbol_value("_stext") & ~KVBASE_MASK;
                machdep->identity_map_base = machdep->kvbase;
                machdep->is_kvaddr = arm_is_kvaddr;
The pc->read_vmcoreinfo method is only initialized for ELF kdumps and
compressed kdumps.  So either a dummy function should be put in there
that returns a NULL or arm_init() should check for its existence.
I appreciate the bug report -- I'll post something today for crash-7.1.0.
Thanks,
  Dave
> ..."One of my most productive days was throwing away 1000 lines of
> code." - Ken Thompson.
>
> My public pgp key is available,at:
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x90320BF1
>
> 
                                
                         
                        
                                
                                10 years, 9 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Fwd: crash-7.0.9 vs. 7.0.8 on ARM - crashing
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- Forwarded Message -----
From: "Robert Berger" <gmane(a)reliableembeddedsystems.com>
Cc: "Robert Berger" <robert.berger(a)reliableembeddedsystems.com>, anderson(a)redhat.com
Sent: Friday, January 30, 2015 4:02:01 AM
Subject: crash-7.0.9 vs. 7.0.8 on ARM - crashing
Hi,
It looks like crash-7.0.9 is broken on ARM, while 7.0.8 works without
any problems.
I compiled both versions on my ARM target board exactly the same way,
but 7.0.9 throws a core dump when invoked.[1]
# CONFIG_ARM_LPAE is not set
[1] http://pastebin.com/HpHeHBAF
Please advise
Regards,
Robert
..."One of my most productive days was throwing away 1000 lines of
code." - Ken Thompson.
My public pgp key is available,at:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x90320BF1
                                
                         
                        
                                
                                10 years, 9 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        unwind not working on x86?
                                
                                
                                
                                    
                                        by Jan Willeke
                                    
                                
                                
                                        
Hello
I am trying to use both the unwind function on x86 in crash, and the fp
extension. Both are not working for me.
I did the following tests: (crash 7.0.9)
crash/crash vmcore.201412161409 linux-3.2.64/vmlinux
GNU gdb (GDB) 7.6
      KERNEL: linux-3.2.64/vmlinux
    DUMPFILE: vmcore.201412161409
        CPUS: 1
        DATE: Tue Dec 16 15:09:46 2014
      UPTIME: 00:00:46
LOAD AVERAGE: 0.05, 0.01, 0.01
       TASKS: 55
    NODENAME: debian
     RELEASE: 3.2.64
     VERSION: #2 SMP Tue Dec 16 15:08:10 CET 2014
     MACHINE: x86_64  (2392 Mhz)
      MEMORY: 383.5 MB
       PANIC: "[   46.736164] Oops: 0002 [#1] SMP " (check log for details)
         PID: 1962
     COMMAND: "tee"
        TASK: ffff88000e5e0000  [THREAD_INFO: ffff88000c8e8000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 1962   TASK: ffff88000e5e0000  CPU: 0   COMMAND: "tee"
 #0 [ffff88000c8e99f0] machine_kexec at ffffffff81038e0a
 #1 [ffff88000c8e9a60] crash_kexec at ffffffff810b3a92
 #2 [ffff88000c8e9b30] oops_end at ffffffff816427e8
 #3 [ffff88000c8e9b60] no_context at ffffffff81635d6f
 #4 [ffff88000c8e9bc0] __bad_area_nosemaphore at ffffffff81635f49
 #5 [ffff88000c8e9c20] bad_area at ffffffff81635fc2
 #6 [ffff88000c8e9c50] do_page_fault at ffffffff81645454
 #7 [ffff88000c8e9d60] do_async_page_fault at ffffffff81644b75
 #8 [ffff88000c8e9d80] async_page_fault at ffffffff81641de5
    [exception RIP: sysrq_handle_crash+22]
    RIP: ffffffff813cad46  RSP: ffff88000c8e9e38  RFLAGS: 00010092
    RAX: 0000000000000010  RBX: 0000000000000063  RCX: 00000000ffffffff
    RDX: 0000000000000000  RSI: 0000000000000082  RDI: 0000000000000063
    RBP: ffff88000c8e9e38   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffffffff81c725e0
    R13: 0000000000000286  R14: 0000000000000007  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88000c8e9e40] __handle_sysrq at ffffffff813cb461
#10 [ffff88000c8e9e80] write_sysrq_trigger at ffffffff813cb51a
#11 [ffff88000c8e9eb0] proc_reg_write at ffffffff811d60e2
#12 [ffff88000c8e9f00] vfs_write at ffffffff81176053
#13 [ffff88000c8e9f30] sys_write at ffffffff8117637a
#14 [ffff88000c8e9f80] sysenter_dispatch at ffffffff8164bc70
    RIP: 00000000f775e430  RSP: 00000000ffe6a720  RFLAGS: 00000296
    RAX: 0000000000000004  RBX: ffffffff8164bc70  RCX: 00000000ffe6a81c
    RDX: 0000000000000002  RSI: 0000000000000002  RDI: 00000000ffe6a81c
    RBP: 00000000ffe6a758   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000003  R15: 0000000000000000
    ORIG_RAX: 0000000000000004  CS: 0023  SS: 002b
crash> set unwind on
unwind: on
crash> bt
PID: 1962   TASK: ffff88000e5e0000  CPU: 0   COMMAND: "tee"
 #0 [ffff88000c8e99f0] machine_kexec at ffffffff81038e0a
    RIP: 00000000f775e430  RSP: 00000000ffe6a720  RFLAGS: 00000296
    RAX: 0000000000000004  RBX: ffffffff8164bc70  RCX: 00000000ffe6a81c
    RDX: 0000000000000002  RSI: 0000000000000002  RDI: 00000000ffe6a81c
    RBP: 00000000ffe6a758   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000003  R15: 0000000000000000
    ORIG_RAX: 0000000000000004  CS: 0023  SS: 002b
crash>
-> no functions no parameters
-------------------------------------------------------------------------------------------------------------
crash> extend crash/extensions/fp.so
./crash/extensions/fp.so: shared object loaded
crash> fp
.................
-> no functions no parameters
Did I anything wrong?
Best Regards,
Jan Willeke
                                
                         
                        
                                
                                10 years, 9 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                        
                                
                                
                                        
                                                
                                        
                                        
                                        [PATCH] fix missing RT PRIO_ARRAY table with CONFIG_RT_GROUP_SCHED=n
                                
                                
                                
                                    
                                        by Mitsuya Shibata
                                    
                                
                                
                                        On the kernel with CONFIG_RT_GROUP_SCHED=n, the "RT PRIO_ARRAY" table of runq
command always empty, nevertheless exists "rt_sched_class" task.
This cause to substract offset "task_struct->rt - task_struct" only if there is
my_q member (ie. CONFIG_RT_GROUP_SCHED=y). Therefore dump_RT_prio_array()
passes the address of "rt member of task_struct" to task_to_context().
This patch ensure to pass the address of "task_struct" to task_to_context().
---
 task.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/task.c b/task.c
index 147ff5c..50c82c8 100644
--- a/task.c
+++ b/task.c
@@ -8688,9 +8688,9 @@ dump_RT_prio_array(ulong k_prio_array, char *u_prio_array)
 						&rt_rq_buf[OFFSET(rt_rq_active)]);
 					FREEBUF(rt_rq_buf);
 					continue;
-				} else
-					task_addr -= OFFSET(task_struct_rt);
+				}
 			}
+			task_addr -= OFFSET(task_struct_rt);
 			if (!(tc = task_to_context(task_addr)))
 				continue;
 
-- 
1.9.1
                                
                         
                        
                                
                                10 years, 9 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                        
                                
                                
                                        
                                                
                                        
                                        
                                        What is the unit for 'last_arrival' in 'task_struct.sched_info.last_arrival'.
                                
                                
                                
                                    
                                        by Saravanan Palanisamy
                                    
                                
                                
                                        Hi,
    What is the unit for 'last_arrival' in
'task_struct.sched_info.last_arrival' ?
    I see that this value is used by 'ps -l' crash-utility command.
>>
       -l  display the task last_run or timestamp value, whichever applies,
           of selected, or all, tasks; the list is sorted with the most
           recently-run task (largest last_run/timestamp) shown first.
>>
     I see that this value (16 decimal digits) is much higher than jiffies
value (10 decimal digits) in my crash dumps.
    This value (unsigned long long) seems to be equal to the 'jiffies'
(unsigned long) value when the task was scheduled.
crash> p jiffies
jiffies = $9 = 5310085968
crash>
crash> ps -l
..
[4058835599089874]  PID: 4136   TASK: ffff8801309ce640  CPU: 4   COMMAND:
"kcapwdt"
...
System info:
-----------------
     MACHINE: x86_64  (2533 Mhz)
     Linux Kernel Version : 3.2.30
Thanks,
Saravanan
                                
                         
                        
                                
                                10 years, 9 months
                        
                        
                 
         
 
        
            
        
        
        
            
        
        
        
                
                        
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Fwd: [PATCH] crash: use %lu for counters
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- Forwarded Message -----
From: "Alexey Dobriyan" <adobriyan(a)gmail.com>
To: "Dave Anderson" <anderson(a)redhat.com>
Sent: Thursday, January 22, 2015 5:56:11 AM
Subject: [PATCH] crash: use %lu for counters
These counters are "unsigned long" in kernel and positive in principle.
Seen during debugging OOM apocalypse event.
                                
                         
                        
                                
                                10 years, 9 months