Hi Dave,
On 03/15/13 07:07, Dave Anderson wrote:
> extension working again. It used to work, but does no more.
> It first calls is_page_ptr(kvaddr, &kpaddr) to convert a virtual
> address into a physical address, and then calls:
>
>> readmem(kpaddr, PHYSADDR, buf, used,
>> "trace page data", RETURN_ON_ERROR)
>
> to fetch the bytes. Updating the release to SLES-11 SP2 causes
> this to now fail.
So are you saying that it works with an earlier kernel version?
Yep. My first guess on this is that there is some different classification
of the memory and that the new classification is not selected by the
crash dump.
> Help, please? Thank you!
It is translating the vmemmap'ed kernel address to a physical address
by walking the page tables, and finding it in a 2MB big-page.
If you skip the is_page_ptr() qualifier, does this work, and
if so, does it look like a legitimate page structure?:
It is both a qualifier and a translator-to-phys page.
I'll have to do some research on how to invoke readmem with the virtual
address instead of physical address. Eventually, they must all fold back
into crash's memory.c readmem() function.
Per your request:
crash> struct page 0xffffea001cdad420
struct struct page {
flags = 0x200000000000000,
_count = { counter = 0x1 },
{
_mapcount = { counter = 0xffffffff },
{ inuse = 0xffff, objects = 0xffff }
},
{
{ private = 0x0, mapping = 0x0 },
ptl = {
{ rlock = { raw_lock = { slock = 0x0 } } }
},
slab = 0x0,
first_page = 0x0
},
{
index = 0xffff88067b39a400,
freelist = 0xffff88067b39a400,
pfmemalloc = 0x0
},
lru = {
next = 0xdead000000100100,
prev = 0xdead000000200200
}
}
But the sparsemem stuff doesn't seem to be accepting it as a
vmemmap
page struct address. Does "kmem -p" include physical address 0x87afad420?
For example, on my system, the last physical page mapped in the
vmmemap is 21ffff000:
crash> kmem -p | tail
OK, here's mine, along with the closest page numbers:
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
[...]
ffffea000e6ffee8 41fffb000 0 c5600 0 200000000000000
ffffea000e6fff20 41fffc000 0 c5600 0 200000000000000
ffffea000e6fff58 41fffd000 0 c5600 0 200000000000000
ffffea000e6fff90 41fffe000 0 c5600 0 200000000000000
ffffea000e6fffc8 41ffff000 0 c5600 0 200000000000000
<<no
0xffffea001cdad420 entry, the next line is:>>
ffffea56189f2488 189120000000 0 0 0 0
ffffea56189f24c0 189120001000 0 0 0 0
ffffea56189f24f8 189120002000 0 0 0 0
ffffea56189f2530 189120003000 0 0 0 0
[...]
ffffea64e939b648 1cc4b7ffc000 0 0 0 0
ffffea64e939b680 1cc4b7ffd000 0 0 0 0
ffffea64e939b6b8 1cc4b7ffe000 0 0 0 0
ffffea64e939b6f0 1cc4b7fff000 0 0 0 0
<<fin>>
Anyway, the first thing that needs to be done is to verify that
the the SECTION_SIZE_BITS and MAX_PHYSMEM_BITS are being setup
correctly. The upstream kernel currently has:
# define SECTION_SIZE_BITS 27 /* matt - 128 is convenient right now */
# define MAX_PHYSADDR_BITS 44
# define MAX_PHYSMEM_BITS 46
That is what linux-3.0.13-0.27 has for x86-64, too.
crash> help -m | grep -e section -e physmem
section_size_bits: 27
max_physmem_bits: 46
sections_per_root: 128
crash>
Matches my output. Is there a way to coerce readelf to tell me anything about
the crash dump? If you are curious to look at the actual dump, I can tell
you how to get it via ftp (offline). The extension is on github:
git clone
git://github.com/brkorb/lustre-crash-tools.git
and cr-ext/lustre-ext.c is the the one.
The memory in question is probably not in the dump, but I don't know how
to specify that it be added to the dump without knowing how the memory
is characterized.
Thank you for your help! Regards, Bruce