Re: [Crash-utility] Question on online/present/possible CPUS
                                
                                
                                
                                    
                                        by Hagen, Jeffrey
                                    
                                
                                
                                        
Hi Petr and Dave,
I have a couple of comments on Petr's email regarding CPU count.
When the dump is the result of an NMI (nmi switch pressed) due to a hung
system, one often needs to analyze the state and backtrace for all the
CPU's.  Since the kernel halts all but CPU0, the crash utility cannot
see the other "offline" CPU's.
This behavior has changed for the x86 architecture somewhere between
2.6.16 (SLES10) and 2.6.32 (SLES11) due to the removal of the x8664_pda
structure.  
The function x86_64_init (in x86_64.c) now calls x86_64_per_cpu_init
which doesn't count the offline CPUS when calculating the number of
CPU's.  Previously, x86_64_cpu_pda_init (called if x8664_pda exists),
didn't check for online/offline status.
Regarding #3 in Petr's email.  It appears that the set command won't
accept a value >= kt_cpus (number of CPUS).  It doesn't check if the CPU
is offline or not.
Thanks,
Jeff Hagen
>
> Hi all,
>
> before making a larger cleanup, I want to ask here for your opinion.
It
> seems that there is quite a bit of confusion about the meaning of CPU
> count printed out by the crash utility.
>
> 1. Number of CPUs
>
> Some people think that crash should always output the number of CPUs
in
> the system (ie. a quad-core server should always output 'CPUS: 4'),
> while other people think that only online CPUs should be counted.
>
> 2. CPU numbering
>
> For example, if there are 4 CPUs in the system, but some of them are
> taken offline (e.g. CPU 1 and CPU 3), _and_ crash output the number of
> online CPUs, it would print out 'CPUS: 2'. It's not easy to find out
> that valid CPU numbers are 0 and 2 in this case.
Hi Petr,
For all but ppc64, the number shown by the initial banner and the
"sys" command is essentially "the-highest-cpu-number-plus-one".
For ppc64 (as requested and implemented by the IBM/ppc64 maintainers),
it shows the number of online cpus.  There's reasons for doing it
either of the two ways, but I'm on vacation now, and you can research
the list archives for the various arguments for-and-against doing it
either way.  Check the changelog.html for when it was changed for
ppc64, and then cross-reference the revision date with the list
archives.
> 3. Examining offline CPU
>
> Sometimes, it may be useful to examine the state of an offline CPU.
Now,
> I know that the saved state is most likely stale, but it can be useful
> in some cases (e.g. a crash after dropping to kdb). The crash utility
> currently refuses to select an offline CPU with 'set -c #'. Are there
> any concerns about allowing it?
I tend to agree with you, but the only thing that's useful and
available from an offline cpu is the swapper task for that cpu
and the runqueue for that cpu.  And both of those entities are
readily accessible if you really need them.  Although I don't know
anything about kdb status, so maybe there's something of per-cpu
interest, but I don't know why it would be necessary to "set"
that cpu?
In any case, like I said before, I'm just temporarily online while
on vacation, and will be back to work on the 9th.
Thanks,
  Dave
                                
                         
                        
                                
                                15 years, 1 month
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] Problem on getting kernel backtrace with a `virsh dump' dumped kvm dumpfile
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> If the backtrace above occurs with the user-space-endless-loop
> dumpfile, then it's not necessary to send them to me.  But if the
> bt errors still occur with the dumpfile containing the user-space
> endless-loop (and with no System.map), then yes, I would like to
> see the vmlinux/dumpfile pair.  (You can send the download details
> off-list...)
Hu Tao,
I reproduced this by using the "correct" System-map file for a KVM
guest dumpfile -- which I presume that you also did in your test 
as well. 
Even though it is not recommended to use a System.map file as 
a command line argument -- *unless* the vmlinux file is different
from the kernel that caused the crash -- I was surprised that
doing so resulted in the "bt" errors when using the "correct" 
System.map, because the symbols that get back-patched during
initialization would be the same values.
As it turns out, it is a bug.  However, it will only be seen if you
use a System.map file.  Nonetheless, it should not happen when
the System.map file matches the crashed kernel's vmlinux.
The bug is in the is_kernel_text() function, which is incorrectly
returning TRUE on non-kernel text addresses in kernels where
the __per_cpu_start value is no longer a large absolute value well
beyond _etext, but changed to a low offset value.  For example, in
older kernels, it used to be an absolute (A) value like this:
  ffffffff80419000 (A) __per_cpu_start
But in newer kernels it is zero-based (D) value:
  0 (D) __per_cpu_start
And that bug is what's causing the "bt" command to fail.
In any case, I'll fix it in the next release.
But -- as is always the case -- do *not* use a System.map file
as an argument unless it is absolutely necessary!
Thanks,
  Dave
    
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] Problem on getting kernel backtrace with a `virsh dump' dumped kvm dumpfile
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- hutao(a)cn.fujitsu.com wrote:
> Hi,
> 
> I encountered a problem on getting backtrace with a `virsh dump' dumped
> kvm dumpfile, the bt command did not get kernel backtrace properly.
> 
> guest kernel: 2.6.32
> crash: 5.0.6 patched with qemu_ram_version_4.patch(attached)
> 
> steps to get dumpfile:
> 
>   1. virsh start vm
>   2. connect to vm, say by vnc
>   3. On guest, build and run the code:
> 
> int main(void)
> {
> 	while (1);
> 
> 	return 0;
> }
> 
>   4. On host, run `virsh dump vm
> /mnt/data/kernel-2.6.32.dump3-userspace-endless-loop'
> 
> Then run crash:
> 
>   crash /mnt/data/kernel/linux-2.6.32/System.map
> /mnt/data/kernel/linux-2.6.32/vmlinux
> /mnt/data/kernel-2.6.32.dump3-userspace-endless-loop
> 
> got the result:
> 
> crash 5.0.6
> Copyright (C) 2002-2010  Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006  IBM Corporation
> Copyright (C) 1999-2006  Hewlett-Packard Co
> Copyright (C) 2005, 2006  Fujitsu Limited
> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> Copyright (C) 2005  NEC Corporation
> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions.  Enter "help copying" to see the conditions.
> This program has absolutely no warranty.  Enter "help warranty" for
> details.
>  
> GNU gdb (GDB) 7.0                               
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-unknown-linux-gnu"...
> 
>   SYSTEM MAP: /mnt/data/kernel/linux-2.6.32/System.map                
> DEBUG KERNEL: /mnt/data/kernel/linux-2.6.32/vmlinux (2.6.32)
>     DUMPFILE: /mnt/data/kernel-2.6.32.dump3-userspace-endless-loop
>         CPUS: 1
>         DATE: Fri Aug 27 05:18:12 2010
>       UPTIME: 00:00:51
> LOAD AVERAGE: 0.44, 0.11, 0.03
>        TASKS: 67
>     NODENAME: localhost.localdomain
>      RELEASE: 2.6.32
>      VERSION: #2 SMP PREEMPT Wed Aug 25 15:26:48 CST 2010
>      MACHINE: x86_64  (2925 Mhz)
>       MEMORY: 511.6 MB
>        PANIC: "Oops: 0003 [#1] PREEMPT SMP " (check log for details)
>          PID: 0
>      COMMAND: "swapper"
>         TASK: ffffffff8158df70  [THREAD_INFO: ffffffff8154e000]
>          CPU: 0
>        STATE: TASK_RUNNING 
>      WARNING: panic task not found
> 
> crash> bt
> PID: 0      TASK: ffffffff8158df70  CPU: 0   COMMAND: "swapper"
>  #0 [ffffffff8154fe28] schedule at ffffffff8138baa3
> bt: invalid kernel virtual address: 41  type: "call byte"
> bt: invalid kernel virtual address: 44e6835ad  type: "call byte"
> bt: load_memfile_offset: read: Success
> bt: read error: kernel virtual address: fffffffffffffffc  type: "call byte"
> bt: invalid kernel virtual address: e7ab  type: "call byte"
> bt: invalid kernel virtual address: e273  type: "call byte"
> bt: invalid kernel virtual address: 13a7b  type: "call byte"
> bt: invalid kernel virtual address: 935cb  type: "call byte"
> bt: load_memfile_offset: read: Success
> bt: read error: kernel virtual address: fffffffffffffffb  type: "call byte"
> bt: invalid kernel virtual address: 935cb  type: "call byte"
>  #1 [ffffffff8154fef0] cpu_idle at ffffffff8100ad1e
> crash> 
> 
> 
> Note the output of `bt' command. Without running that endless-loop
> code then`bt' got:
> 
> 
> crash> bt
> PID: 0      TASK: ffffffff8158df70  CPU: 0   COMMAND: "swapper"
>  #0 [ffffffff8154fe28] schedule at ffffffff8138baa3
>  #1 [ffffffff8154fe48] apic_timer_interrupt at ffffffff8100c65e
>  #2 [ffffffff8154fed0] need_resched at ffffffff810125a8
>  #3 [ffffffff8154fee0] default_idle at ffffffff81012e03
>  #4 [ffffffff8154fef0] cpu_idle at ffffffff8100acd6
> crash> 
> 
> 
> Any suggestions on how to solve the problem?
Not really.
If there's no kernel crash, then the selection of the current
context defaults to the cpu 0 swapper task.  I don't know
what was happening to the "swapper" task at the time that the
guest was paused.
If you want to make the vmlinux/dumpfile available for me
to download, I can take a look.  (I don't know why you're
using a System.map).
Dave
 
> 
> Regards,
> Hu Tao
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        [ANNOUNCE] crash version 5.0.7 is available
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
 - Introduction of ARM processor support for the crash utility.  This is
   the result of collaborative effort between Nokia and Sony Ericsson.
   The crash utility can be built as a native ARM binary to analyze ARM
   dumpfiles or run live on an ARM host, or alternatively it can be 
   built as an x86 binary to analyze ARM dumpfiles.  To build crash as 
   an ARM binary on an ARM host, enter "make" alone.  To build crash as 
   an x86 binary on an x86 host, enter "make target=ARM".  By extension, 
   the x86 binary can also be run on an x86_64 host.  It supports kdump,
   and diskdump formats, and live using /dev/mem on ARM hosts.  Stack
   unwinding support uses both frame pointers and ARM unwind tables.
   (ext-mika.1.westerberg(a)nokia.com, Jan.Karlsson(a)sonyericsson.com,
    Thomas.Fange(a)sonyericsson.com)
 - Fix to support KVM dumpfiles that have "ram" device header sections 
   with a version_id of 4.  Without the patch, the crash session fails
   with the error message "crash: qemu-load.c:267: ram_init_load: 
   Assertion `version_id == 3' failed".
   (pbonzini(a)redhat.com, anderson(a)redhat.com)
 - Fix for KVM dumpfiles from guests that were provisioned with more
   than 3.5GB of RAM.  KVM virtual systems contain an I/O hole in the
   physical memory region from 0xe0000000 to 0x100000000 (3.5GB to 4GB). 
   If a guest is provisioned with more than 3.5GB of RAM, then the
   memory above 3.5GB is "pushed up" to start at 0x100000000 (4GB).  
   But the "ram" device headers in the KVM dumpfiles do not reflect 
   that, and so without the patch, all kinds of error messages would be
   displayed during invocation, and in all probability, the session 
   would fail.
   (anderson(a)redhat.com)
 - Minor fix to memory.c to address a compiler warning when building 
   with "make warn", or a compiler failure when using "make Warn".
   (anderson(a)redhat.com)
 - Fix for a segmentation violation caused by the "mount" command in the
   rare circumstance where the "init" task (pid 1) does not exist.
   (bob.montgomery(a)hp.com)
 - CONFIG_PREEMPT_RT x86_64 realtime kernels allocate only 3 exception 
   stacks to handle the 5 possible exception types, and therefore the 
   same per-cpu stack may be used for different exception types.  This 
   could cause "bt" output that contained exception stack name strings 
   to be incorrect.  The patch displays all exception stack name strings
   in RT kernels to all show "RT", as in "--- <RT exception stack> ---".
   (anderson(a)redhat.com)
 - Fix for the potential to miss one or more tasks in 2.6.23 and earlier
   kernels, presumably due to catching an entry the kernel's pid_hash[] 
   chain in transition.  Without the patch, the task will simply not be
   seen in the gathered task list.
   (bob.montgomery(a)hp.com)
 - Fix to correct a presumption that the kernel's task_state_array[] 
   is NULL terminated.
   (holzheu(a)linux.vnet.ibm.com)
 Download from: http://people.redhat.com/anderson
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] [PATCH] Fix reading of "task_state_array"
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com> wrote:
> Hi Dave,
> 
> Crash seems to assume that the "task_state_array" is NULL terminated.
> This is
> not the case:
> 
> static const char *task_state_array[] = {
>         "R (running)",          /*  0 */
>         "S (sleeping)",         /*  1 */
> ...
>         "X (dead)"              /* 32 */
> };
> 
> I have a dump where this leads to a crash crash.
> 
> I think, when reading the array, we should use the array size as
> loop exit criteria instead of checking for NULL termination.
Agreed -- I'll just change your patch to just call get_array_length()
one time, and stash the result for use by the loop.
Qeued for the next release.
Thanks Mike,
  Dave
> 
> Michael
> ---
> diff -Naurp crash-5.0.6/task.c
> crash-5.0.6-task_state_array-fix//task.c
> --- crash-5.0.6/task.c	2010-07-19 21:21:33.000000000 +0200
> +++ crash-5.0.6-task_state_array-fix//task.c	2010-08-27
> 15:22:16.000000000 +0200
> @@ -4296,6 +4296,7 @@ initialize_task_state(void)
>  	ulong bitpos;
>  	ulong str, task_state_array;
>  	char buf[BUFSIZE];
> +	int i;
>  
>  	if (!symbol_exists("task_state_array") ||
>  	    !readmem(task_state_array = symbol_value("task_state_array"),
> @@ -4313,7 +4314,7 @@ old_defaults:
>  	}
>  		
>  	bitpos = 0;
> -	while (str) {
> +	for (i = 0; i < get_array_length("task_state_array", NULL, 0); i++)
> {
>  		if (!read_string(str, buf, BUFSIZE-1))
>  			break;
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        [PATCH] Fix reading of "task_state_array"
                                
                                
                                
                                    
                                        by Michael Holzheu
                                    
                                
                                
                                        Hi Dave,
Crash seems to assume that the "task_state_array" is NULL terminated. This is
not the case:
static const char *task_state_array[] = {
        "R (running)",          /*  0 */
        "S (sleeping)",         /*  1 */
...
        "X (dead)"              /* 32 */
};
I have a dump where this leads to a crash crash.
I think, when reading the array, we should use the array size as
loop exit criteria instead of checking for NULL termination.
Michael
---
diff -Naurp crash-5.0.6/task.c crash-5.0.6-task_state_array-fix//task.c
--- crash-5.0.6/task.c	2010-07-19 21:21:33.000000000 +0200
+++ crash-5.0.6-task_state_array-fix//task.c	2010-08-27 15:22:16.000000000 +0200
@@ -4296,6 +4296,7 @@ initialize_task_state(void)
 	ulong bitpos;
 	ulong str, task_state_array;
 	char buf[BUFSIZE];
+	int i;
 
 	if (!symbol_exists("task_state_array") ||
 	    !readmem(task_state_array = symbol_value("task_state_array"),
@@ -4313,7 +4314,7 @@ old_defaults:
 	}
 		
 	bitpos = 0;
-	while (str) {
+	for (i = 0; i < get_array_length("task_state_array", NULL, 0); i++) {
 		if (!read_string(str, buf, BUFSIZE-1))
 			break;
 
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Problem on getting kernel backtrace with a `virsh dump' dumped kvm dumpfile
                                
                                
                                
                                    
                                        by hutao@cn.fujitsu.com
                                    
                                
                                
                                        Hi,
I encountered a problem on getting backtrace with a `virsh dump' dumped
kvm dumpfile, the bt command did not get kernel backtrace properly.
guest kernel: 2.6.32
crash: 5.0.6 patched with qemu_ram_version_4.patch(attached)
steps to get dumpfile:
  1. virsh start vm
  2. connect to vm, say by vnc
  3. On guest, build and run the code:
int main(void)
{
	while (1);
	return 0;
}
  4. On host, run `virsh dump vm /mnt/data/kernel-2.6.32.dump3-userspace-endless-loop'
Then run crash:
  crash /mnt/data/kernel/linux-2.6.32/System.map /mnt/data/kernel/linux-2.6.32/vmlinux /mnt/data/kernel-2.6.32.dump3-userspace-endless-loop
got the result:
crash 5.0.6
Copyright (C) 2002-2010  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.0                               
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
  SYSTEM MAP: /mnt/data/kernel/linux-2.6.32/System.map                 
DEBUG KERNEL: /mnt/data/kernel/linux-2.6.32/vmlinux (2.6.32)
    DUMPFILE: /mnt/data/kernel-2.6.32.dump3-userspace-endless-loop
        CPUS: 1
        DATE: Fri Aug 27 05:18:12 2010
      UPTIME: 00:00:51
LOAD AVERAGE: 0.44, 0.11, 0.03
       TASKS: 67
    NODENAME: localhost.localdomain
     RELEASE: 2.6.32
     VERSION: #2 SMP PREEMPT Wed Aug 25 15:26:48 CST 2010
     MACHINE: x86_64  (2925 Mhz)
      MEMORY: 511.6 MB
       PANIC: "Oops: 0003 [#1] PREEMPT SMP " (check log for details)
         PID: 0
     COMMAND: "swapper"
        TASK: ffffffff8158df70  [THREAD_INFO: ffffffff8154e000]
         CPU: 0
       STATE: TASK_RUNNING 
     WARNING: panic task not found
crash> bt
PID: 0      TASK: ffffffff8158df70  CPU: 0   COMMAND: "swapper"
 #0 [ffffffff8154fe28] schedule at ffffffff8138baa3
bt: invalid kernel virtual address: 41  type: "call byte"
bt: invalid kernel virtual address: 44e6835ad  type: "call byte"
bt: load_memfile_offset: read: Success
bt: read error: kernel virtual address: fffffffffffffffc  type: "call byte"
bt: invalid kernel virtual address: e7ab  type: "call byte"
bt: invalid kernel virtual address: e273  type: "call byte"
bt: invalid kernel virtual address: 13a7b  type: "call byte"
bt: invalid kernel virtual address: 935cb  type: "call byte"
bt: load_memfile_offset: read: Success
bt: read error: kernel virtual address: fffffffffffffffb  type: "call byte"
bt: invalid kernel virtual address: 935cb  type: "call byte"
 #1 [ffffffff8154fef0] cpu_idle at ffffffff8100ad1e
crash> 
Note the output of `bt' command. Without running that endless-loop code then
`bt' got:
crash> bt
PID: 0      TASK: ffffffff8158df70  CPU: 0   COMMAND: "swapper"
 #0 [ffffffff8154fe28] schedule at ffffffff8138baa3
 #1 [ffffffff8154fe48] apic_timer_interrupt at ffffffff8100c65e
 #2 [ffffffff8154fed0] need_resched at ffffffff810125a8
 #3 [ffffffff8154fee0] default_idle at ffffffff81012e03
 #4 [ffffffff8154fef0] cpu_idle at ffffffff8100acd6
crash> 
Any suggestions on how to solve the problem?
Regards,
Hu Tao
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] [PATCH v2 0/6] crash utility - add ARM support
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Mika Westerberg" <ext-mika.1.westerberg(a)nokia.com> wrote:
> Hi Dave,
> 
> This series brings ARM support for the crash utility. This is the result of
> collaboration work with Nokia and SonyEricsson. Basically we combined our
> versions of the code. Previous version of the patches can be found here:
> 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2010-June/019188.html
> 
> We tried to keep any ARM specific changes isolated with #ifdefs and similar so
> that it should not cause any problems with other archs.
> 
> In this series:
> 	o crash can be build as native ARM binary or "cross" version running on
> 	  x86 host (make target=ARM to build "cross" version)
> 	o supports kdump, diskdump and /dev/mem (live system)
> 	o stack unwinding with both framepointers and ARM unwind tables
> 	o most of the arch specific code is implemented
> 
> The patches apply on top of crash 5.0.6 sources.
Let me first thank you guys for making the integration of this patch-set
so simple, and for making the changes so non-intrusive.
I did make a few minor changes/additions:
In the spirit of avoiding "#ifdef <arch>" usage if at all possible,
I renamed kdump_phys_base() to arm_kdump_phys_base() and removed
the #ifdef ARM around it.  I also removed the #ifdef ARM around the
new entries in the offset_table and size_table, as there are already
several arch-specific entries in there already.  And I added those
new offset_table and size_table entries to the dump_offset_table()
output so that their values can be seen with "help -o".
For minimal documentation, I added arm references to the README file.
For building with the src.rpm, I added "arm" to the ExclusiveArch list
in the crash.spec file.
I fixed these warnings generated by "make warn":
  arm.c: In function ‘arm_dump_backtrace_entry’:
  arm.c:1160: warning: format ‘%d’ expects type ‘int’, but argument 6 has type ‘ulong’
  arm.c:1166: warning: format ‘%d’ expects type ‘int’, but argument 7 has type ‘ulong’
  arm.c: In function ‘arm_dump_irq’:
  arm.c:1424: warning: suggest parentheses around comparison in operand of ‘&’
  arm.c:1490: warning: too many arguments for format
  arm.c:1357: warning: unused variable ‘tmp2’
  arm.c: In function ‘arm_parse_cmdline_args’:
  arm.c:409: warning: ‘value’ may be used uninitialized in this function
I modified arm_init() to capture any attempt to run an x86 binary built
for ARM on a live x86 or x86_64 system to display a fatal error message
indicating: "crash: compiled for the ARM architecture".  As it was, it
would fail with a nebulous "cannot resolve _stext" error.
The only other suggestion I can make is to put something in either the
top-level Makefile or in configure.c to catch/prevent a subsequent "make" command
being entered after having first done the initial build with "make target=ARM".
I found myself doing that constantly.  Or vice-versa, for that matter.
And if you really want to make the "other type" of binary, then there
should be a message that kills the build attempt, and indicates that you'd
have to do a "make clean" as well as removing the gdb subdirectory tree
entirely.  But that all can wait until after this first patch-set is released.
So -- with the minor changes above -- consider it queued for the next release.
And thanks again for making it so easy...
Dave
> 
> Best regards,
> MW
> 
> Jan Karlsson (1):
>   crash: update IRQ flags
> 
> Mika Westerberg (5):
>   configure/Makefile: add support for ARM targets
>   crash: add support for ARM kernel image
>   crash/diskdump: add ARM support
>   crash/kdump: add ARM support
>   crash: add ARM crashdump support
> 
>  Makefile            |   30 +-
>  arm.c               | 1741
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>  configure.c         |   32 +-
>  defs.h              |  245 +++++++-
>  diskdump.c          |   42 ++-
>  kernel.c            |    3 +-
>  lkcd_vmdump_v2_v3.h |    4 +-
>  netdump.c           |  115 ++++
>  symbols.c           |   15 +-
>  unwind_arm.c        |  697 +++++++++++++++++++++
>  10 files changed, 2902 insertions(+), 22 deletions(-)
>  create mode 100644 arm.c
>  create mode 100644 unwind_arm.c
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] Missing PID 1 is crash problem with losing tasks
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> Well, I've been picking at this some more.  PID 1 is in the system, but
> crash misses it when it's building its table of tasks in
> refresh_hlist_task_table_v2().  In fact, on my particular dump, it loses
> track of at least 3 processes. 
> 
> The attached patch changes that behavior.  It has to do with collisions
> on the pid_hash table where an early item on the chain has a NULL task
> pointer which causes the code to ignore subsequent items on that
> collision chain.  I'm not sure what it means when the tasks[0].first
> pointer in the struct pid is NULL, but that's what triggers the problem
> and keeps crash from following the pid_chain pointer to the next struct
> pid.  I am not confident that this whole area is correct yet, just
> closer to correct than it was. 
> 
> These now appear in the ps output:
> 
> crash-5.0.6-fix2> ps 1 8144 998
>    PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
>       1      0   1  ffff81012bd3c780  IN   0.0    6124    688  init
>    8144   6257   0  ffff81011996e140  RU   0.7  108876  35016  mirrorclient
>     998     11   0  ffff81012a9cd780  IN   0.0       0      0  [fc_dl_1]
> 
> where before:
> 
> crash-5.0.6-fix> ps 1 8144 998
> ps: invalid task or pid value: 1
> 
> ps: invalid task or pid value: 8144
> 
> ps: invalid task or pid value: 998
> 
> This might have been some transition behavior of the pid hash design in
> the kernel, because I've got two dumps based on 2.6.18 kernels that show
> missing processes (this one had 3 out of 532, the other had 1 out of
> 146), but my new patched crash doesn't reveal any missing processes in
> 2.6.29 and newer dumps (I checked 4 dumps, with process counts ranging
> from 362 to 926).  Only my recent 2.6.18 dump was lucky enough to be
> missing PID 1, with me being lucky enough to try crash's mount command,
> or we'd still not know about it :-)
Yeah, I agree that it must be catching a kernel transition.
And it's probably not being seen in your 2.6.29-and-newer dumps because
2.6.24-and-later kernels use refresh_hlist_task_table_v3().
 
> The patch is simple, but has lots of lines because I moved the indent.
The patch looks reasonable and safe.  I'll run it against my stable of
sample dumpfiles to see if I can find one...
Anyway, nice catch Bob -- and thanks again for tracking down yet another
gnarly issue,
  Dave
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        [PATCH v2 0/6] crash utility - add ARM support
                                
                                
                                
                                    
                                        by Mika Westerberg
                                    
                                
                                
                                        Hi Dave,
This series brings ARM support for the crash utility. This is the result of
collaboration work with Nokia and SonyEricsson. Basically we combined our
versions of the code. Previous version of the patches can be found here:
	http://lists.infradead.org/pipermail/linux-arm-kernel/2010-June/019188.html
We tried to keep any ARM specific changes isolated with #ifdefs and similar so
that it should not cause any problems with other archs.
In this series:
	o crash can be build as native ARM binary or "cross" version running on
	  x86 host (make target=ARM to build "cross" version)
	o supports kdump, diskdump and /dev/mem (live system)
	o stack unwinding with both framepointers and ARM unwind tables
	o most of the arch specific code is implemented
The patches apply on top of crash 5.0.6 sources.
Best regards,
MW
Jan Karlsson (1):
  crash: update IRQ flags
Mika Westerberg (5):
  configure/Makefile: add support for ARM targets
  crash: add support for ARM kernel image
  crash/diskdump: add ARM support
  crash/kdump: add ARM support
  crash: add ARM crashdump support
 Makefile            |   30 +-
 arm.c               | 1741 +++++++++++++++++++++++++++++++++++++++++++++++++++
 configure.c         |   32 +-
 defs.h              |  245 +++++++-
 diskdump.c          |   42 ++-
 kernel.c            |    3 +-
 lkcd_vmdump_v2_v3.h |    4 +-
 netdump.c           |  115 ++++
 symbols.c           |   15 +-
 unwind_arm.c        |  697 +++++++++++++++++++++
 10 files changed, 2902 insertions(+), 22 deletions(-)
 create mode 100644 arm.c
 create mode 100644 unwind_arm.c
                                
                         
                        
                                
                                15 years, 2 months