This is a request for some help from the IBM'ers on the list...
Starting somewhere in the 3.10 timeframe (I believe), the
virtual-to-physical translation of kernel modules no longer
works for ppc64. User-space vtop also seems to be broken.
Looks like it's associated with at least these commits:
$ git log -p 419df06eea5bfa815e3a78e0aad6cfb320c1654f
commit 419df06eea5bfa815e3a78e0aad6cfb320c1654f
Author: Aneesh Kumar K.V <aneesh.kumar(a)linux.vnet.ibm.com>
Date: Sun Apr 28 09:37:31 2013 +0000
powerpc: Reduce the PTE_INDEX_SIZE
This make one PMD cover 16MB range. That helps in easier implementation of THP
on power. THP core code make use of one pmd entry to track the hugepage and
the range mapped by a single pmd entry should be equal to the hugepage size
supported by the hardware.
This also switch PGD to cover 16GB. That is needed so that we can simplify the
hugetlb page walking code so that we have same pte format for explicit hugepage
and THP hugepage.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus(a)samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
diff --git a/arch/powerpc/include/asm/pgtable-ppc64-64k.h
b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
index be4e287..45142d6 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64-64k.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
@@ -4,10 +4,10 @@
#include <asm-generic/pgtable-nopud.h>
-#define PTE_INDEX_SIZE 12
-#define PMD_INDEX_SIZE 12
+#define PTE_INDEX_SIZE 8
+#define PMD_INDEX_SIZE 10
#define PUD_INDEX_SIZE 0
-#define PGD_INDEX_SIZE 6
+#define PGD_INDEX_SIZE 12
#ifndef __ASSEMBLY__
#define PTE_TABLE_SIZE (sizeof(real_pte_t) << PTE_INDEX_SIZE)
$ git log -p 0e5f35d0e4a8179cdfac115023f418126419e659
commit 0e5f35d0e4a8179cdfac115023f418126419e659
Author: Aneesh Kumar K.V <aneesh.kumar(a)linux.vnet.ibm.com>
Date: Sun Apr 28 09:37:28 2013 +0000
powerpc: Don't truncate pgd_index wrongly
With PGD_INDEX_SIZE set to 12 the existing macro doesn't work. Fix it to
use PTRS_PER_PGD
The idea originally was to have one more bit in the result of
pgd_index() than PGD_INDEX_SIZE, so that if one had an address
corresponding to the last PGD entry, and then incremented that address
by PGD_SIZE, and took pgd_index() of that, you wouldn't end up with
zero. The commit that introduced that dates back to 2002, and the
code that was sensitive to that edge case has long since been
refactored (several times), so there is no need for it these days.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus(a)samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h
b/arch/powerpc/include/asm/pgtable-ppc64.h
index 0182c20..e3d55f6f 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -167,8 +167,7 @@
* Find an entry in a page-table-directory. We combine the address region
* (the high order N bits) and the pgd portion of the address.
*/
-/* to avoid overflow in free_pgtables we don't use PTRS_PER_PGD here */
-#define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & 0x1ff)
+#define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & (PTRS_PER_PGD - 1))
#define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address))
$ git describe --contains 419df06eea5bfa815e3a78e0aad6cfb320c1654f
v3.10-rc1~121^2~14
$ git describe --contains cc3665a60a4ff072f5b5b18312bdf9b6612c5814
v3.10-rc1~121^2~18
$
Dave
For example, here's an example using a 3.10.0-0.rc4.59.el7.ppc64
kernel, which shows the "WARNING: cannot access vmalloc'd module
memory" message during initialization, and I also show the results
of a "vtop" on the first and last module addresses in the kernel's
"modules" list:
# crash
crash 7.0.1
Copyright (C) 2002-2013 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for
details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu"...
WARNING: cannot access vmalloc'd module memory
KERNEL: /usr/lib/debug/lib/modules/3.10.0-0.rc4.59.el7.ppc64/vmlinux
DUMPFILE: /dev/crash
CPUS: 28
DATE: Mon Jun 24 13:35:45 2013
UPTIME: 01:54:23
LOAD AVERAGE: 0.00, 0.17, 0.23
TASKS: 318
NODENAME:
ibm-p730-04-lp4.rhts.eng.bos.redhat.com
RELEASE: 3.10.0-0.rc4.59.el7.ppc64
VERSION: #1 SMP Mon Jun 3 14:42:22 EDT 2013
MACHINE: ppc64 (3550 Mhz)
MEMORY: 8 GB
PID: 30000
COMMAND: "crash"
TASK: c000000192860000 [THREAD_INFO: c0000001c6c00000]
CPU: 5
STATE: TASK_RUNNING (ACTIVE)
crash> p modules
modules = $1 = {
next = 0xd00000000d019080,
prev = 0xd000000001016f10
}
crash> vtop 0xd00000000d019080
VIRTUAL PHYSICAL
d00000000d019080 (not mapped)
PAGE DIRECTORY: c0000000011d0000
L4: c0000000011d0000 => c0000001fba80000
PMD: c0000001fba80000 => c0000001fba70000
PMD: c0000001fba80000 => fba76808
PTE: fba76808 => 0
crash> vtop 0xd000000001016f10
VIRTUAL PHYSICAL
d000000001016f10 (not mapped)
PAGE DIRECTORY: c0000000011d0000
L4: c0000000011d0000 => c0000001fba80000
PMD: c0000001fba80000 => c0000001fba70000
PMD: c0000001fba80000 => fba70808
PTE: fba70808 => 0
crash>
I'm not at all familiar with ppc64 page table walk-throughs,
but what little debugging I've tried has yielded nothing other
than module-address-translation that end up with PTE's that
contain zero like the above.
Also, the vtop of user-space addresses also seems to be completely
disfunctional. Taking the live system above, if I take the user-space
address of the page-table page buffer used by the crash utility itself,
and try to do a vtop on it, it yields this obviously bogus result:
crash> help -m | grep ptbl:
ptbl: 100133d6f10
crash> vtop 100133d6f10
VIRTUAL PHYSICAL
100133d6f10 (not mapped)
PAGE DIRECTORY: c0000001e0940000
L4: c0000001e0940008 => 0
VMA START END FLAGS FILE
c00000001b7d0000 10012fe0000 100167b0000 100073
crash>
It seems to have been something recently introduced, as here's
a 3.9.0-0.rc8.54.el7.ppc64 kernel, which works just fine:
# crash
crash 7.0.1
Copyright (C) 2002-2013 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for
details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu"...
KERNEL: /usr/lib/debug/lib/modules/3.9.0-0.rc8.54.el7.ppc64/vmlinux
DUMPFILE: /dev/crash
CPUS: 28
DATE: Mon Jun 24 13:50:38 2013
UPTIME: 00:12:20
LOAD AVERAGE: 0.22, 0.17, 0.16
TASKS: 316
NODENAME:
ibm-p730-04-lp4.rhts.eng.bos.redhat.com
RELEASE: 3.9.0-0.rc8.54.el7.ppc64
VERSION: #1 SMP Mon Apr 22 18:30:40 EDT 2013
MACHINE: ppc64 (3550 Mhz)
MEMORY: 8 GB
PID: 7035
COMMAND: "crash"
TASK: c0000001ddf00000 [THREAD_INFO: c0000001bb780000]
CPU: 4
STATE: TASK_RUNNING (ACTIVE)
crash> p modules
modules = $1 = {
next = 0xd00000000c729100,
prev = 0xd000000000fe6ff8
}
crash> vtop 0xd00000000c729100
VIRTUAL PHYSICAL
d00000000c729100 1cfb29100
PAGE DIRECTORY: c000000001190000
L4: c000000001190000 => c0000001fba80000
PMD: c0000001fba80000 => c0000001fba60000
PMD: c0000001fba80000 => fba66390
PTE: fba66390 => 73ec88000395
PAGE: 1cfb20000
PTE PHYSICAL FLAGS
73ec88000395 1cfb20000 (PRESENT|RW|COHERENT|DIRTY|ACCESSED)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c0000000042c26f0 1cfb20000 0 0 1 73c0400000000
crash> vtop 0xd000000000fe6ff8
VIRTUAL PHYSICAL
d000000000fe6ff8 1eba16ff8
PAGE DIRECTORY: c000000001190000
L4: c000000001190000 => c0000001fba80000
PMD: c0000001fba80000 => c0000001fba60000
PMD: c0000001fba80000 => fba607f0
PTE: fba607f0 => 7ae848000395
PAGE: 1eba10000
PTE PHYSICAL FLAGS
7ae848000395 1eba10000 (PRESENT|RW|COHERENT|DIRTY|ACCESSED)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c000000004482338 1eba10000 0 0 1 7ac0400000000
crash>
And the user-space vtop example works as expected:
crash> help -m | grep ptbl:
ptbl: 1001d208c80
crash> vtop 1001d208c80
VIRTUAL PHYSICAL
1001d208c80 1a1b18c80
PAGE DIRECTORY: c0000001ec8a9c00
L4: c0000001ec8a9c08 => c0000001f0a50000
PMD: c0000001f0a50008 => c0000001c7f20000
PMD: c0000001f0a50008 => c7f26900
PTE: c7f26900 => 686c48000393
PAGE: 1a1b10000
PTE PHYSICAL FLAGS
686c48000393 1a1b10000 (PRESENT|USER|COHERENT|DIRTY|ACCESSED)
VMA START END FLAGS FILE
c0000001eb523af0 1001ce20000 100206f0000 100073
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c000000003fe26b8 1a1b10000 c0000001e93f4c41 1001d20 1 6840400080068
crash>
Any ideas?
Thanks,
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility