Hi,
On Mon, Mar 18, 2013 at 8:29 AM, Dave Anderson <anderson(a)redhat.com> wrote:
By classification, do you mean which bit in the filtering option
of makedumpfile?
Exactly.
> Per your request:
>
> > crash> struct page 0xffffea001cdad420
> > struct struct page {
> > flags = 0x200000000000000,
[...]
OK, looks like a page struct (most likely)...
I was already
pretty sure. Confirmed.
> > crash> kmem -p | tail
>
> OK, here's mine, along with the closest page numbers:
>
> > PAGE PHYSICAL MAPPING INDEX CNT FLAGS
> > [...]
> > ffffea64e939b6f0 1cc4b7fff000 0 0 0
0
> <<fin>>
Wow, that system has physical memory installed at an unusually high
physical address location, i.e., where 1cc4b7fff000 is up around
28 terabytes?
That seems large to me too, by about a factor of 10.
It _is_ a largish system.
I'd be interested in seeing a dump of "kmem -n". In
your case the output
is probably huge, but the top part would reflect the physical memory layout,
NODE SIZE PGLIST_DATA BOOTMEM_DATA NODE_ZONES
0 8912880 ffff88087fffb000 ---- ffff88087fffb000
ffff88087fffb980
ffff88087fffc300
ffff88087fffcc80
MEM_MAP START_PADDR START_MAPNR
ffffea0000000380 10000 16
ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR
0 DMA 4080 ffffea0000000380 10000 16
1 DMA32 1044480 ffffea0000038000 1000000 4096
2 Normal 7864320 ffffea0003800000 100000000 1048576
3 Movable 0 0 0 0
-------------------------------------------------------------------
NODE SIZE PGLIST_DATA BOOTMEM_DATA NODE_ZONES
1 8388608 ffff88107fffa040 ---- ffff88107fffa040
ffff88107fffa9c0
ffff88107fffb340
ffff88107fffbcc0
MEM_MAP START_PADDR START_MAPNR
ffffffffffffffff 880000000 8912896
ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR
0 DMA 0 0 0 0
1 DMA32 0 0 0 0
2 Normal 8388608 0 880000000 8912896
3 Movable 0 0 0 0
NR SECTION CODED_MEM_MAP MEM_MAP PFN
0 ffff88087fffa000 ffffea0000000000 ffffea0000000000 0
1 ffff88087fffa020 ffffea0000000000 ffffea00001c0000 32768
2 ffff88087fffa040 ffffea0000000000 ffffea0000380000 65536
[...]
130 ffff88107fff9040 ffffea0000000000 ffffea000e380000 4259840
131 ffff88107fff9060 ffffea0000000000 ffffea000e540000 4292608
132096 ffff880838574558 ffff881038105798 ffff8848a8105798 4328521728
132098 ffff880838574598 ffff880837ed2c00 ffff8840a8252c00 4328587264
[...]
237504 ffff8810369d2f40 ffff8810369d2f40 ffff8875af9d2f40 7782531072
237505 ffff8810369d2f60 1a48b64 657ac08b64 7782563840
237506 ffff8810369d2f80 3686dc30 65afbedc30 7782596608
237507 ffff8810369d2fa0 ffff881033219740 ffff8875ac759740 7782629376
kmem: page excluded: kernel virtual address: ffff8810369d3000 type:
"memory section"
So your target page structure should "fit" into one of the
sections above, where the starting MEM_MAP address of each
section should have a contiguous array of page structs that
reference the array of physical pages starting at the "PFN"
value. Those MEM_MAP addresses are typically increasing in
value with each section, but I believe that I have seen cases
where they are not. And they shouldn't have to be, each section
has a base vmemmap address for some number of PFN/physical-pages.
OK. That's a bit confusing for me.
Anyway, it does looks like a page structure, and the page structure
pointer
itself is translatable. The problem at hand is that the physical address
that the page structure refers to is not being determined because the page
structure address itself is not being recognized by is_page_ptr() as being
part of the sparsemem infrastructure. The "if IS_SPARSEMEM()" section at
the top of is_page_ptr() is returning FALSE.
That being said, from your target page structure address and the "kmem -n"
output, you could presumably calculate the associated physical address.
If the kmem -n output didn't seem to skip over the address of interest....
> The memory in question is probably not in the dump, but I
don't know how
> to specify that it be added to the dump without knowing how the memory
> is characterized.
The actual physical page that is referenced by your target page structure
is in the dumpfile should not affect the is_page_ptr() function. That
should work regardless.
I think it is a good guess that the data I really want are not in the dump:
# strings cdump-0c0s6n3 |grep -E 'Process (entered|leaving)'
# strings cdump-0c2s6n3 |grep -E 'Process (entered|leaving)'
# strings ../mrp752/sp1-fulldbg/dk.data | \
# strings ../mrp752/sp1-fulldbg/dump.c0-0c1s0n0 | \
grep -E 'Process (entered|leaving)'|sort |uniq -c
311804 Process entered
1 Process enteredgot mutex:
2 Process enteredpage@
129991 Process leaving
[...]
The "cdump-0c0s6n3" and "cdump-0c2s6n3" files are from the release at
issue,
and the ../mrp752/sp1-fulldbg/dump.c0-0c1s0n0 dump is from the SLES-11 SP1
release. As you can see, there should be many thousands of matching strings
in the dump files. Since there is not, ...
So: what physical pages are missing and why are the missing?
With those two questions resolved, we can fix the dump specification
to include the missing pages.
Thank you again. - Bruce