----- Original Message -----
Hi Dave, et al.,
I have this little problem. I am trying to get a lustre file system
extension working again. It used to work, but does no more.
It first calls is_page_ptr(kvaddr, &kpaddr) to convert a virtual
address into a physical address, and then calls:
> readmem(kpaddr, PHYSADDR, buf, used,
> "trace page data", RETURN_ON_ERROR)
to fetch the bytes. Updating the release to SLES-11 SP2 causes
this to now fail.
So are you saying that it works with an earlier kernel version?
In my debugging of crash/gdb, this:
> is_page_ptr (addr=18446719884937843744, phys=0x7fffffffd370) at memory.c:11448
> 11448 if (IS_SPARSEMEM()) {
> (gdb) p/x addr
> $8 = 0xffffea001cdad420
is about to fail. However, this:
> crash> gdb x/4xg 0xffffea001cdad420
works just fine. I've stepped through x_command until it gets to
x86_64_kvtop() where I'm finding the logic a little twisty.
But it pretty clearly does not rely on section_mem_map_addr() stuff.
So, here's my point: this is confusing. What should I look for
to determine why "is_page_ptr()" is saying 0xffffea001cdad420
is invalid while "x86_64_kvtop()" is saying that it is and its
physical address is 0x87afad420?
> 878 return(readmem(addr, memtype, buf, len,
> (gdb) s
> readmem (addr=0xffffea001cdad420, memtype=0x1, buffer=0x5d85d10,
> size=0x8,
> type=0x945f0a "gdb_readmem_callback", error_handle=0x2) at
> memory.c:1991
>
> 0xffffea001cdad420: PML4 DIRECTORY: ffffffff81623000
> PAGE DIRECTORY: 87fff7067
> PUD: 87fff7000 => 87fff6067
> PMD: 87fff6730 => 800000087ae001e3
> PAGE: 87ae00000 (2MB)
> PTE PHYSICAL FLAGS
> 800000087ae001e3 87ae00000
> (PRESENT|RW|ACCESSED|DIRTY|PSE|GLOBAL|NX)
> (gdb) p physpage
> $34 = 0x87afad420
0xffffea001cdad420: 0x0200000000000000 0xffffffff00000001
0xffffea001cdad430: 0x0000000000000000 0x0000000000000000
Help, please? Thank you!
It is translating the vmemmap'ed kernel address to a physical address
by walking the page tables, and finding it in a 2MB big-page.
If you skip the is_page_ptr() qualifier, does this work, and
if so, does it look like a legitimate page structure?:
crash> struct page ffffea001cdad420
But the sparsemem stuff doesn't seem to be accepting it as a vmemmap
page struct address. Does "kmem -p" include physical address 0x87afad420?
For example, on my system, the last physical page mapped in the
vmmemap is 21ffff000:
crash> kmem -p | tail
ffffea00087ffd80 21fff6000 0 0 0 0
ffffea00087ffdc0 21fff7000 0 0 0 0
ffffea00087ffe00 21fff8000 0 0 0 0
ffffea00087ffe40 21fff9000 0 0 0 0
ffffea00087ffe80 21fffa000 0 0 0 0
ffffea00087ffec0 21fffb000 0 0 0 0
ffffea00087fff00 21fffc000 0 0 0 0
ffffea00087fff40 21fffd000 0 0 0 0
ffffea00087fff80 21fffe000 0 0 0 0
ffffea00087fffc0 21ffff000 0 0 0 0
crash>
Anyway, the first thing that needs to be done is to verify that
the the SECTION_SIZE_BITS and MAX_PHYSMEM_BITS are being setup
correctly. The upstream kernel currently has:
# define SECTION_SIZE_BITS 27 /* matt - 128 is convenient right now */
# define MAX_PHYSADDR_BITS 44
# define MAX_PHYSMEM_BITS 46
And crash has these, where SECTION_SIZE_BITS is stable, but the MAX_PHYSMEM_BITS
can be either of 3 possible values, depending upon kernel version:
#define _SECTION_SIZE_BITS 27
#define _MAX_PHYSMEM_BITS 40
#define _MAX_PHYSMEM_BITS_2_6_26 44
#define _MAX_PHYSMEM_BITS_2_6_31 46
And in x86_64_init() there is a segment that tries to pick the correct value.
So for example, on my 3.7.9 kernel, I see:
crash> help -m | grep -e section -e physmem
section_size_bits: 27
max_physmem_bits: 46
sections_per_root: 128
crash>
Take a look at your SLES-11 SP2 kernel sources and determine what
values are being used, and compare them to what crash set them up
to be.
Dave