----- Original Message -----
Hello Crash Utility Community,
I am hoping that someone in the Crash Analysis community can provide some
assistance with a problem that I am having to analyze vmcore files gathered
from our 32-bit machines. I am working to add kexec to our systems so that
we can run the crash utility (version 7.0.1) on our appliances and I am
having trouble with our 32-bit systems. Fortunately my 64-bit systems are
working fine so I know that can I make the technology work. I believe that
the crash analysis tool does not like the System.map file and I am trying to
get to the root cause of this problem.
If the vmlinux file that you're using matches the vmcore, then please don't
use any -S or System.map argument -- just enter: "crash vmlinux vmcore"
System.map files are only required if the symbol values in the vmlinux
file are different from those in the running kernel. It doesn't sound
like that's the case in your environment.
Secondly, if the session doesn't start that way, please provide the
debug output generated by entering:
$ crash -d8 vmlinux vmcore
My problem originally manifests itself when I try to decode the
vmcore file.
After intentionally creating an oops panic event I upload the vmcore file to
my build machine and run crash on that system. While the vmcore file is
generated on an appliance I run crash analysis program on the build system
that produced the Linux kernel since the appliances are meant to be deployed
into the field and will not be accessible for running crash analysis events.
build# crash -S System.map vmlinux vmcore
crash 7.0.1
...
crash: read error: kernel virtual address: c1363c5c type: "cpu_possible_mask"
So I then tried to find what this symbol is within the map:
build# crash --minimal -S System.map vmlinux vmcore
...
crash> sym cpu_possible_mask
c1363c5c (R) cpu_possible_mask
crash>
When you were in the "minimal" session, were you able to "rd" the
cpu_possible_mask
address? i.e.
crash> rd c1363c5c
or what did this show:
crash> rd linux_banner 10
From this I can only see that the addresses match up. So I then
decided to
run the crash utility on the appliance itself to see what happens. I copied
the crash utility to the appliance and the uncompressed kernel image to the
appliance as well. The appliance boots from a "bzImage" file and the crash
utility can't use the bzImage file for processing so I needed to manually
copy the uncompressed kernel image to the box.
Right, crash is only interested in the vmlinux ELF file from which the
bzImage file was generated.
I then run the following commands on our appliance for data
gathering
purposes:
root@appliance:/var/crash# crash -S /boot/System.map
vmlinux-2.6.32.24-sf.pentM-37
...
WARNING: cannot read linux_banner string
crash: /boot/System.map and /dev/mem do not match!
root@appliance:/var/crash# ls -l /boot/System.map
lrwxrwxrwx 1 root root 32 Aug 28 22:32 /boot/System.map ->
System.map-2.6.32.24-sf.pentM-37
root@appliance:/var/crash#
root@appliance:/var/crash# cat /proc/version
Linux version 2.6.32.24sf.pentM-37 (build@ajax) (gcc version 4.7.1 (GCC) ) #1
PREEMPT Mon Aug 26 22:26:34 UTC 2013
root@appliance:/var/crash#
So from everything I can see the Linux kernel and the System.map file are in
version agreement but the crash utility disagrees with me. The crash utility
is the judge so something is wrong. My goal is to find out how I can get the
information that is needed to determine the problem.
OK, while running on the appliance itself, again, try running without
the System.map argument. It will presumably still fail as shown above.
On that appliance, what is the output from these commands:
$ cat /proc/kallsyms | grep cpu_possible_mask
$ nm -Bn /usr/lib/debug/lib/modules/3.9.10-100.fc17.x86_64/vmlinux | grep
cpu_possible_mask
$ grep cpu_possible_mask /boot/System.map
If they are not the same, it is possible you may need to use the "--reloc
<size>"
command line argument. That is required for 32-bit x86 kernels that are configured
as described here:
http://people.redhat.com/anderson/crash.changelog.html#4_0_4_5
Dave