----- Original Message -----
Hi,
On Mon, Mar 18, 2013 at 8:29 AM, Dave Anderson <anderson(a)redhat.com>
wrote:
> By classification, do you mean which bit in the filtering option
> of makedumpfile?
Exactly.
>> Per your request:
>>
>> > crash> struct page 0xffffea001cdad420
>> > struct struct page {
>> > flags = 0x200000000000000,
[...]
> OK, looks like a page struct (most likely)...
I was already pretty sure. Confirmed.
>> > crash> kmem -p | tail
>>
>> OK, here's mine, along with the closest page numbers:
>>
>> > PAGE PHYSICAL MAPPING INDEX CNT FLAGS
>> > [...]
>> > ffffea64e939b6f0 1cc4b7fff000 0 0 0 0
>> <<fin>>
>
> Wow, that system has physical memory installed at an unusually high
> physical address location, i.e., where 1cc4b7fff000 is up around
> 28 terabytes?
That seems large to me too, by about a factor of 10.
It _is_ a largish system.
What does the initial system banner (or the "sys" command) show?
> I'd be interested in seeing a dump of "kmem -n". In your case the
output
> is probably huge, but the top part would reflect the physical
> memory layout,
NODE SIZE PGLIST_DATA BOOTMEM_DATA NODE_ZONES
0 8912880 ffff88087fffb000 ---- ffff88087fffb000
ffff88087fffb980
ffff88087fffc300
ffff88087fffcc80
MEM_MAP START_PADDR START_MAPNR
ffffea0000000380 10000 16
ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR
0 DMA 4080 ffffea0000000380 10000 16
1 DMA32 1044480 ffffea0000038000 1000000 4096
2 Normal 7864320 ffffea0003800000 100000000 1048576
3 Movable 0 0 0 0
-------------------------------------------------------------------
NODE SIZE PGLIST_DATA BOOTMEM_DATA NODE_ZONES
1 8388608 ffff88107fffa040 ---- ffff88107fffa040
ffff88107fffa9c0
ffff88107fffb340
ffff88107fffbcc0
MEM_MAP START_PADDR START_MAPNR
ffffffffffffffff 880000000 8912896
ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR
0 DMA 0 0 0 0
1 DMA32 0 0 0 0
2 Normal 8388608 0 880000000 8912896
3 Movable 0 0 0 0
At first I didnn't understand how there could be a MEM_MAP of "0" for
the NODE 1 physical memory section starting at 34GB (880000000). It
indicates that there are 8388608 pages (32GB) starting at 880000000.
So the highest physical address would be 0x1080000000 (66GB), which
would be a max_pfn value of 0x1080000000 / 4k, or 17301504 decimal.
But after section 131, the PFN values start at 4328521728 -- which
is 16512GB (~16.5 TB). So clearly the section data is being misinterpreted,
and because of that phys_to_page() fails to find a MEM_MAP address for
a physical address of 880000000 (i.e. a pfn of 8912896) because section
data skips from a PFN of 429268 to the bizarre 4328521728:
NR SECTION CODED_MEM_MAP MEM_MAP PFN
0 ffff88087fffa000 ffffea0000000000 ffffea0000000000 0
1 ffff88087fffa020 ffffea0000000000 ffffea00001c0000 32768
2 ffff88087fffa040 ffffea0000000000 ffffea0000380000 65536
[...]
130 ffff88107fff9040 ffffea0000000000 ffffea000e380000 4259840
131 ffff88107fff9060 ffffea0000000000 ffffea000e540000 4292608
132096 ffff880838574558 ffff881038105798 ffff8848a8105798 4328521728
132098 ffff880838574598 ffff880837ed2c00 ffff8840a8252c00 4328587264
[...]
237504 ffff8810369d2f40 ffff8810369d2f40 ffff8875af9d2f40 7782531072
237505 ffff8810369d2f60 1a48b64 657ac08b64 7782563840
237506 ffff8810369d2f80 3686dc30 65afbedc30 7782596608
237507 ffff8810369d2fa0 ffff881033219740 ffff8875ac759740 7782629376
kmem: page excluded: kernel virtual address: ffff8810369d3000 type: "memory
section"
> So your target page structure should "fit" into one of the
> sections above, where the starting MEM_MAP address of each
> section should have a contiguous array of page structs that
> reference the array of physical pages starting at the "PFN"
> value. Those MEM_MAP addresses are typically increasing in
> value with each section, but I believe that I have seen cases
> where they are not. And they shouldn't have to be, each section
> has a base vmemmap address for some number of PFN/physical-pages.
OK. That's a bit confusing for me.
So again, the output with the full kmem -n display contains
bizarre values after section 131, causing it to go off into
the weeds:
...
127 ffff88087fffafe0 ffffea0000000000 ffffea000de40000 4161536 (ok)
128 ffff88107fff9000 ffffea0000000000 ffffea000e000000 4194304 (ok)
129 ffff88107fff9020 ffffea0000000000 ffffea000e1c0000 4227072 (ok)
130 ffff88107fff9040 ffffea0000000000 ffffea000e380000 4259840 (ok)
131 ffff88107fff9060 ffffea0000000000 ffffea000e540000 4292608 (ok)
132096 ffff880838574558 ffff881038105798 ffff8848a8105798 4328521728 (bogus from here
onward...)
132098 ffff880838574598 ffff880837ed2c00 ffff8840a8252c00 4328587264
132099 ffff8808385745b8 ffff880835850400 ffff8840a5d90400 4328620032
132100 ffff8808385745d8 ffff8810342e1c00 ffff8848a49e1c00 4328652800
132101 ffff8808385745f8 ffff8810342e2c00 ffff8848a4ba2c00 4328685568
132102 ffff880838574618 ffff880833a52000 ffff8840a44d2000 4328718336
132103 ffff880838574638 ffff8808354c0c00 ffff8840a6100c00 4328751104
132104 ffff880838574658 ffff8810342e3c00 ffff8848a50e3c00 4328783872
132105 ffff880838574678 ffff8810342e4c00 ffff8848a52a4c00 4328816640
132110 ffff880838574718 20 3871880020 4328980480
132112 ffff880838574758 ffff881037fa3718 ffff8848a9ba3718 4329046016
132114 ffff880838574798 ffff880833a13c00 ffff8840a5993c00 4329111552
132115 ffff8808385747b8 ffff8808386a0800 ffff8840aa7e0800 4329144320
...
So clearly crash is mishandling the memory setup being presented to it.
But I have *no* idea what the problem is.
> Anyway, it does looks like a page structure, and the page structure pointer
> itself is translatable. The problem at hand is that the physical address
> that the page structure refers to is not being determined because the page
> structure address itself is not being recognized by is_page_ptr() as being
> part of the sparsemem infrastructure. The "if IS_SPARSEMEM()" section at
> the top of is_page_ptr() is returning FALSE.
>
> That being said, from your target page structure address and the "kmem
-n"
> output, you could presumably calculate the associated physical address.
If the kmem -n output didn't seem to skip over the address of
interest....
Right, it would walk through all of the sections from obviously misinterpreted
section data above, and would not find your target page. After section 131, the
MEM_MAP addresses shown are not even in the vmemmap virtual range, which
starts at ffffea0000000000.
>> The memory in question is probably not in the dump, but I don't know how
>> to specify that it be added to the dump without knowing how the memory
>> is characterized.
>
> The actual physical page that is referenced by your target page structure
> is in the dumpfile should not affect the is_page_ptr() function. That
> should work regardless.
I think it is a good guess that the data I really want are not in the dump:
# strings cdump-0c0s6n3 |grep -E 'Process (entered|leaving)'
# strings cdump-0c2s6n3 |grep -E 'Process (entered|leaving)'
# strings ../mrp752/sp1-fulldbg/dk.data | \
# strings ../mrp752/sp1-fulldbg/dump.c0-0c1s0n0 | \
> grep -E 'Process (entered|leaving)'|sort |uniq -c
311804 Process entered
1 Process enteredgot mutex:
2 Process enteredpage@
129991 Process leaving
[...]
The "cdump-0c0s6n3" and "cdump-0c2s6n3" files are from the release at
issue,
and the ../mrp752/sp1-fulldbg/dump.c0-0c1s0n0 dump is from the SLES-11 SP1
release. As you can see, there should be many thousands of matching strings
in the dump files. Since there is not, ...
So: what physical pages are missing and why are the missing?
With those two questions resolved, we can fix the dump specification
to include the missing pages.
I don't know how SUSE sets up their dumping operation. I presume that they
use makedumpfile to post-process/filter /proc/vmcore into the dumpfile, and
therefore you would need to find out how it got invoked. On RHEL systems,
there is an /etc/kdump.conf file which specifies a "core_collector", and if
it specifies "makedumpfile", it also shows the exact command line used to
invoke it when running against /proc/vmcore in the second kernel.
For example, by default we use:
core_collector makedumpfile -c --message-level 1 -d 31
and where the makedumpfile(8) (or "makedumpfile --help") will indicate
which types of memory will be filtered based upon the "-d <dump_level>"
argument. A dump_level of 31 is the most aggressive:
dump | zero | cache|cache | user | free
level | page | page |private| data | page
-------+------+------+-------+------+------
0 | | | | |
1 | X | | | |
2 | | X | | |
3 | X | X | | |
4 | | X | X | |
5 | X | X | X | |
6 | | X | X | |
7 | X | X | X | |
8 | | | | X |
9 | X | | | X |
10 | | X | | X |
11 | X | X | | X |
12 | | X | X | X |
13 | X | X | X | X |
14 | | X | X | X |
15 | X | X | X | X |
16 | | | | | X
17 | X | | | | X
18 | | X | | | X
19 | X | X | | | X
20 | | X | X | | X
21 | X | X | X | | X
22 | | X | X | | X
23 | X | X | X | | X
24 | | | | X | X
25 | X | | | X | X
26 | | X | | X | X
27 | X | X | | X | X
28 | | X | X | X | X
29 | X | X | X | X | X
30 | | X | X | X | X
31 | X | X | X | X | X
You might want to just filter zero-filled-pages and free-pages,
which would be a dump-level of 17.
Dave