Hi Dave,
On Tue, 01 May 2012 15:24:17 -0400 (EDT)
Dave Anderson <anderson(a)redhat.com> wrote:
With respect to the 3rd "vm -p" bug, I did some cursory
debugging, and
here's what I found.
In all cases, the readmem() failure occurs in
_kl_pg_table_deref_s390x() as a result of transitioning from one page
of PTEs to the next, because the pointer to the "next" page of PTES
contains 0x20, which looks to be _SEGMENT_ENTRY_INV or
_REGION_ENTRY_INV? (not sure of the s390x nomenclature...)
So you'll see something like this in the page table that points
to the pages of PTEs:
...
c6386e0: 0000000000000020
0000000000000020 ....... ....... c6386f0: 000000001608c800
0000000000000020 ............... c638700: 0000000000000020
0000000000000020 ....... ....... ...
The vaddr's in the page of PTEs pointed to by c6386f0 (at
000000001608c800) all resolve as expected, but when the virtual
address bumps it to c6386f8, it reads the 0x20, and passes it to
_kl_pg_table_deref_s390x(). The user vaddr(s) that resolve to that
next page of PTEs are legitimate, given that they are in the virtual
region defined by the vm_area_struct. But they certainly may not be
mapped.
Anyway, it seems that there should be something that catches the
invalid entry in s390x_vtop() -- prior to calling
_kl_pg_table_deref_s390x()-- and return FALSE at that point.
So if I make this kludge:
...
/* Check if this is a large page. */
if (entry & 0x400ULL) {
/* Add the 1MB page offset and return the final
value. */ *phys_addr = table + (vaddr & 0xfffffULL);
return TRUE;
}
======> if (entry == 0x20) return FALSE;
/* Get the page table entry */
entry = _kl_pg_table_deref_s390x(vaddr, entry & ~0x7ffULL);
if (!entry)
return FALSE;
/* Isolate the page origin from the page table entry. */
paddr = entry & ~0xfffULL;
/* Add the page offset and return the final value. */
*phys_addr = paddr + (vaddr & 0xfffULL);
return TRUE;
}
then everything seems to work OK.
So unless the calculation of the next page of PTEs is incorrect,
which seems unlikely, it seems that the 0x20 is legitimate, and
should be recognized? What do you think?
0x20 means that the invalid bit of a region table entry is set (bit 58).
The problem is a wrong check in _kl_rsg_table_deref_s390x() where we
check for 0x40 (instead of 0x20).
The following is the correct fix for the problem:
---
s390x.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/s390x.c
+++ b/s390x.c
@@ -568,7 +568,7 @@ static ulong _kl_rsg_table_deref_s390x(u
if ((entry & 0xcULL) != (level << 2))
return 0;
/* Check if the region table entry has the invalid bit set. */
- if (entry & 0x40ULL)
+ if (entry & 0x20ULL)
return 0;
/* Region table entry is valid and well formed. */
return entry;