Hello Crash Utility Community,
I am hoping that someone in the Crash Analysis community can provide some
assistance with a problem that I am having to analyze vmcore files gathered
from our 32-bit machines. I am working to add kexec to our systems so that
we can run the crash utility (version 7.0.1) on our appliances and I am
having trouble with our 32-bit systems. Fortunately my 64-bit systems are
working fine so I know that can I make the technology work. I believe that
the crash analysis tool does not like the System.map file and I am trying
to get to the root cause of this problem.
My problem originally manifests itself when I try to decode the vmcore
file. After intentionally creating an oops panic event I upload the vmcore
file to my build machine and run crash on that system. While the vmcore
file is generated on an appliance I run crash analysis program on the build
system that produced the Linux kernel since the appliances are meant to be
deployed into the field and will not be accessible for running crash
analysis events.
build# crash -S System.map vmlinux vmcore
crash 7.0.1
...
crash: read error: kernel virtual address: c1363c5c type:
"cpu_possible_mask"
So I then tried to find what this symbol is within the map:
build# crash --minimal -S System.map vmlinux vmcore
...
crash> sym cpu_possible_mask
c1363c5c (R) cpu_possible_mask
crash>
From this I can only see that the addresses match up. So I then
decided to
run the crash utility on the appliance itself to see what happens. I
copied the crash utility to the appliance and the uncompressed kernel image
to the appliance as well. The appliance boots from a "bzImage" file and
the crash utility can't use the bzImage file for processing so I needed to
manually copy the uncompressed kernel image to the box.
I then run the following commands on our appliance for data gathering
purposes:
root@appliance:/var/crash# crash -S /boot/System.map
vmlinux-2.6.32.24-sf.pentM-37
...
WARNING: cannot read linux_banner string
crash: /boot/System.map and /dev/mem do not match!
root@appliance:/var/crash# ls -l /boot/System.map
lrwxrwxrwx 1 root root 32 Aug 28 22:32 /boot/System.map ->
System.map-2.6.32.24-sf.pentM-37
root@appliance:/var/crash#
root@appliance:/var/crash# cat /proc/version
Linux version 2.6.32.24sf.pentM-37 (build@ajax) (gcc version 4.7.1 (GCC)
) #1 PREEMPT Mon Aug 26 22:26:34 UTC 2013
root@appliance:/var/crash#
So from everything I can see the Linux kernel and the System.map file are
in version agreement but the crash utility disagrees with me. The crash
utility is the judge so something is wrong. My goal is to find out how I
can get the information that is needed to determine the problem.
We have build machines that produce Linux kernels for multiple appliances
each based upon various processors. Some are 32-bit and some are 64-bit.
I have made identical changes to the Linux kernel for all of our systems
and the 64-bit versions work well while the 32-bit versions do not. All of
our appliances are able to gather the vmcore file after an oops event and
the analysis is the only problem.
Is there a way I can determine if a System.map file is good on a system?
Maybe there is some other utility that I can run to analyze a System.map
file to see if it agrees with the running system? My system boots from a
compressed bzImage file but I cannot extract an uncompressed Linux kernel
file from that bzImage. I have tried but I have not been successful. For
my testing I need to copy the uncompressed vmlinux file from the build
machine since the build machine uses that file as an artifact to create the
final bzImage file.
Any ideas for gathering better information would be greatly appreciated.
Thank you,
Patrick