Hi Michael,
With respect to the 3rd "vm -p" bug, I did some cursory debugging, and
here's what I found.
In all cases, the readmem() failure occurs in _kl_pg_table_deref_s390x()
as a result of transitioning from one page of PTEs to the next, because
the pointer to the "next" page of PTES contains 0x20, which looks to be
_SEGMENT_ENTRY_INV or _REGION_ENTRY_INV? (not sure of the s390x nomenclature...)
So you'll see something like this in the page table that points
to the pages of PTEs:
...
c6386e0: 0000000000000020 0000000000000020 ....... .......
c6386f0: 000000001608c800 0000000000000020 ...............
c638700: 0000000000000020 0000000000000020 ....... .......
...
The vaddr's in the page of PTEs pointed to by c6386f0 (at 000000001608c800)
all resolve as expected, but when the virtual address bumps it to c6386f8,
it reads the 0x20, and passes it to _kl_pg_table_deref_s390x(). The user
vaddr(s) that resolve to that next page of PTEs are legitimate, given that
they are in the virtual region defined by the vm_area_struct. But they
certainly may not be mapped.
Anyway, it seems that there should be something that catches the invalid entry
in s390x_vtop() -- prior to calling _kl_pg_table_deref_s390x()-- and return
FALSE at that point.
So if I make this kludge:
...
/* Check if this is a large page. */
if (entry & 0x400ULL) {
/* Add the 1MB page offset and return the final value. */
*phys_addr = table + (vaddr & 0xfffffULL);
return TRUE;
}
======> if (entry == 0x20) return FALSE;
/* Get the page table entry */
entry = _kl_pg_table_deref_s390x(vaddr, entry & ~0x7ffULL);
if (!entry)
return FALSE;
/* Isolate the page origin from the page table entry. */
paddr = entry & ~0xfffULL;
/* Add the page offset and return the final value. */
*phys_addr = paddr + (vaddr & 0xfffULL);
return TRUE;
}
then everything seems to work OK.
So unless the calculation of the next page of PTEs is incorrect, which
seems unlikely, it seems that the 0x20 is legitimate, and should be
recognized? What do you think?
Dave
----- Original Message -----
Mistakenly cc'd to "crash-utility-owner(a)redhat.com" instead of this
list...
----- Forwarded Message -----
From: "Dave Anderson" <anderson(a)redhat.com>
To: "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com>
Cc: crash-utility-owner(a)redhat.com
Sent: Monday, April 30, 2012 4:53:46 PM
Subject: s390x fixes
Hi Michael,
I've got a couple simple bug fixes for s390x that I want to
run by you, plus a third one that I don't have a fix for.
First the easy ones:
(1) "bt -t" and "bt -T" fail on the active task on a live system:
crash> bt -t
PID: 34875 TASK: 14342540 CPU: 1 COMMAND: "crash"
bt: invalid/stale stack pointer for this task: 0
crash> bt -T
PID: 34875 TASK: 14342540 CPU: 1 COMMAND: "crash"
bt: invalid/stale stack pointer for this task: 0
crash>
That can be fixed by adding a !LIVE() check to
s390x_get_stack_frame()
so that it will use (bt->task + OFFSET(task_struct_thread_ksp):
/* get the stack pointer */
if(esp){
- if(s390x_has_cpu(bt)){
+ if (!LIVE() && s390x_has_cpu(bt)) {
ksp = ULONG(lowcore +
MEMBER_OFFSET("_lowcore",
"gpregs_save_area") + (15 *
S390X_WORD_SIZE));
} else {
readmem(bt->task +
OFFSET(task_struct_thread_ksp),
KVADDR, &ksp, sizeof(void *),
"thread_struct ksp", FAULT_ON_ERROR);
}
*esp = ksp;
} else {
(2) "vm -p" can show bogus data when a page is not mapped, like this
example:
crash> vm -p 1
PID: 1 TASK: 17b91120 CPU: 1 COMMAND: "init"
MM PGD RSS TOTAL_VM
14f48400 14f4c000 344k 3116k
VMA START END FLAGS FILE
14b88c80 2aab283b000 2aab2862000 8001875
/sbin/init
VIRTUAL PHYSICAL
2aab283b000 SWAP: (unknown swap location) OFFSET: 0
2aab283c000 SWAP: (unknown swap location) OFFSET: 0
2aab283d000 SWAP: (unknown swap location) OFFSET: 0
2aab283e000 SWAP: (unknown swap location) OFFSET: 0
2aab283f000 SWAP: (unknown swap location) OFFSET: 0
2aab2840000 SWAP: (unknown swap location) OFFSET: 0
2aab2841000 SWAP: (unknown swap location) OFFSET: 0
...
And that's because when a "machdep->uvtop()" operation is done on a
user
page that is not resident, the machine-dependent function should
return
FALSE -- but it should return the PTE value in the paddr pointer
field
so that it can be translated by vm_area_page_dump(). The
s390x_uvtop()
does not return the PTE, so the failed output can vary, because it's
using
an uninitialized "paddr" stack variable. But this is another easy
fix,
in this case to s390x_vtop():
/* lookup virtual address in page tables */
int s390x_vtop(ulong table, ulong vaddr, physaddr_t *phys_addr, int
verbose)
{
ulong entry, paddr;
int level, len;
+ *phys_addr = 0;
(3) Even with the (2) applied, however, "vm -p" can fail to translate
user addresses in another situation. If you try this, you'll
see a number of failures like this:
crash> foreach user vm -p | grep PID
PID: 1 TASK: 17b91120 CPU: 1 COMMAND: "init"
PID: 599 TASK: 14fbc140 CPU: 1 COMMAND: "udevd"
PID: 955 TASK: 14343620 CPU: 0 COMMAND: "udevd"
PID: 961 TASK: 13f19220 CPU: 1 COMMAND: "udevd"
PID: 1246 TASK: 14cc0ab0 CPU: 0 COMMAND: "auditd"
PID: 1247 TASK: 14f88240 CPU: 0 COMMAND: "auditd"
PID: 1271 TASK: 140a3320 CPU: 0 COMMAND: "rsyslogd"
vm: read error: kernel virtual address: 0 type: "entry"
PID: 1272 TASK: 14b11520 CPU: 0 COMMAND: "rs:main
Q:Reg"
vm: read error: kernel virtual address: 0 type: "entry"
PID: 1273 TASK: 16a32440 CPU: 1 COMMAND: "rsyslogd"
vm: read error: kernel virtual address: 0 type: "entry"
PID: 1274 TASK: 14c3cbb0 CPU: 0 COMMAND: "rsyslogd"
vm: read error: kernel virtual address: 0 type: "entry"
...
And if I take a particular case:
crash> vm -p
PID: 5088 TASK: 14399420 CPU: 1 COMMAND: "mingetty"
MM PGD RSS TOTAL_VM
14e49c00 147f8000 116k 2180k
... [ cut ] ...
VMA START END FLAGS FILE
14c49bc0 8dee1000 8df02000 100073
VIRTUAL PHYSICAL
8dee1000 ef03000
8dee2000 (not mapped)
8dee3000 (not mapped)
8dee4000 (not mapped)
8dee5000 (not mapped)
8dee6000 (not mapped)
8dee7000 (not mapped)
8dee8000 (not mapped)
8dee9000 (not mapped)
8deea000 (not mapped)
8deeb000 (not mapped)
8deec000 (not mapped)
8deed000 (not mapped)
8deee000 (not mapped)
8deef000 (not mapped)
8def0000 (not mapped)
8def1000 (not mapped)
8def2000 (not mapped)
8def3000 (not mapped)
8def4000 (not mapped)
8def5000 (not mapped)
8def6000 (not mapped)
8def7000 (not mapped)
8def8000 (not mapped)
8def9000 (not mapped)
8defa000 (not mapped)
8defb000 (not mapped)
8defc000 (not mapped)
8defd000 (not mapped)
8defe000 (not mapped)
8deff000 (not mapped)
vm: read error: kernel virtual address: 0 type: "entry"
crash>
So in this example, the page that's failing is 8df00000, which is
located in the VMA's range from 8dee1000 to 8df02000. But the
machdep->uvtop() operation fails unexpectedly:
crash> vtop -u 8df00000 -u
VIRTUAL PHYSICAL
vtop: read error: kernel virtual address: 0 type: "entry"
crash>
And that "entry" readmem() is in s390x.c code that I don't wish
to screw around with...
Hoping you can help,
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility