----- "Pavan Naregundi" <pavan(a)linux.vnet.ibm.com>
wrote:
> On Tue, 2010-04-20 at 09:14 -0400, Dave Anderson wrote:
> > ----- "Pavan Naregundi" <pavan(a)linux.vnet.ibm.com> wrote:
> >
> > The cause for seek errors depends upon the type
> > of dumpfile.
> >
> > You didn't mention which type of dumpfile the vmcore
> > is, so I'll presume that it's either an ELF-format
> > kdump or a compressed kdump created by makedumpfile.
> >
> > So presuming that it's a compressed kdump, the seek error
> > most likely comes from here in read_diskdump() in diskdump.c:
> >
> > if ((pfn >= dd->header->max_mapnr) || !page_is_ram(pfn))
> > return SEEK_ERROR;
> >
> > where the requested physical address pfn values are larger
> > than the max_mapnr value advertised in the header.
> >
> > When you do any "crash -d# ...", the dumpfile header will
> > be dumped first. What does that show?
> >
> > Dave
>
>
> Dave,
>
> Dumpfile is compressed kdump created by makedumpfile.
>
> header shows the following values:
> max_mapnr: 32768
> block_shift: 16
>
> Yes. Adding some debug printf's shows me that (pfn >=
> dd->header->max_mapnr) fails.
>
> For example: in the first seek error,
> crash: seek error: kernel virtual address: c0000000af715480 type:
> "kmem_cache buffer"
>
> paddr: af715480 => pfn=44913
>
> crash -d8 log:
http://pastebin.com/qrCvyPfR
>
> Thanks..Pavan
OK, so the compressed dumpfile has exactly 32768 pages of physical
memory, or exactly 2GB. That being the case, the crash utility
will fail all readmem attempts above that value, and obviously
there is critical data above the artificial 2GB threshold.
The question at hand is why kdump is creating a truncated dumpfile
with a max_mapnr of 32768:
(1) makedumpfile determines the "max_mapnr" value based upon the
highest physical address found in any of the PT_LOAD segments
of the /proc/vmcore file on the secondary kernel.
(2) the /proc/vmcore PT_LOAD segments were pre-calculated during
the primary kernel's kdump initialization phase, based upon
the values found in the set of "/proc/device-tree/memory@xxx/reg"
files existing in the primary kernel, where the "xxx" is the
starting physical address of the memory region, and the "reg"
file in that directory contains the size of the memory region.
For whatever reason, those files showed a maximum of 2GB of
physical memory. (If you do not use makedumpfile, and then do
a "readelf -a" of the resultant vmcore file, you will see
the PT_LOAD segment values.)
Does the SLES11 vmlinux-2.6.32.10-0.4.99.25.62005-ppc64 kernel
contain this patch?:
http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.gi...
I ask because we also have an outstanding bugzilla that exhibits similar
behavior, where an abnormally small ppc64 vmcore file gets created
because there was only a single /proc/device-tree/memory@0 directory
file that showed just a small subset of the total physical memory.
Typically there are many of those "memory@xxx" directories, but in
the failing scenario, there was only one /proc/device-tree/memory@0
directory.
Anyway, there's (unproven) speculation that the kernel patch above
is related to the problem.
In any case, unfortunately, there's nothing can be done from the crash
utility's perspective.
Dave
Thank you Dave.
Our SLES11 does not have the above patch you mentioned, but at the same
time system is not AMS enabled and CONFIG_CMM is also not set in the
config file..
This system also has /proc/device-tree/memory@0 dir only..
Regards..Pavan