----- "Dave Anderson" <anderson(a)redhat.com> wrote:
Somewhere between the RHEL5 (2.6.18-based) and RHEL6 timeframe,
the ppc64 architecture has started using a virtual memmap scheme
for the arrays of page structures used to describe/handle
each physical page of memory.
... [ snip ] ...
So my speculation (guess?) is that the ppc64.c ppc64_vtop()
function needs updating to properly translate these addresses.
Since the ppc64 stuff in the crash utility was written by, and
has been maintained by IBM (and since I am ppc64-challenged),
can you guys take a look at what needs to be done?
[ sound of crickets... ]
Well that request apparently fell on deaf ears...
Here's my understanding of the situation.
In 2.6.26 the ppc64 architecture started using a new kernel virtual
memory region to map the kernel's page structure array(s), so that
now there are three kernel virtual memory regions:
KERNEL 0xc000000000000000
VMALLOC 0xd000000000000000
VMEMMAP 0xf000000000000000
The KERNEL region is the unity-mapped region, where the underlying
physical address can be determined by manipulating the virtual address
itself.
The VMALLOC region requires a page-table walk-through to find
the underlying physical address in a PTE.
The new VMEMMAP region is mapped in ppc64 firmware, where a
physical address of a given size is mapped to a VMEMMAP virtual
address. So for example, the page structure for physical page 0
is at VMEMMAP address 0xf000000000000000, the page for physical
page 1 is at f000000000000068, and so on. Once mapped in the
firmware TLB (?) the virtual-to-physical translation is done
automatically while running in kernel mode.
The problem is that the physical-to-vmemmap address/size mapping
information is not stored in the kernel proper, so there is
no way for the crash utility to make the translation. That
being the case, any crash command that needs to read the contents
of any page structure will fail.
The kernel mapping is performed here in 2.6.26 through 2.6.31:
int __meminit vmemmap_populate(struct page *start_page,
unsigned long nr_pages, int node)
{
unsigned long start = (unsigned long)start_page;
unsigned long end = (unsigned long)(start_page + nr_pages);
unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
/* Align to the page size of the linear mapping. */
start = _ALIGN_DOWN(start, page_size);
for (; start < end; start += page_size) {
int mapped;
void *p;
if (vmemmap_populated(start, page_size))
continue;
p = vmemmap_alloc_block(page_size, node);
if (!p)
return -ENOMEM;
pr_debug("vmemmap %08lx allocated at %p, physical %08lx.\n",
start, p, __pa(p));
mapped = htab_bolt_mapping(start, start + page_size, __pa(p),
pgprot_val(PAGE_KERNEL),
mmu_vmemmap_psize, mmu_kernel_ssize);
BUG_ON(mapped < 0);
}
return 0;
}
So if the pr_debug() statement is turned on, it shows on my test system:
vmemmap f000000000000000 allocated at c000000003000000, physical 03000000
This would make for an extremely simple virtual-to-physical translation
for the crash utility, but note that neither the unity-mapped virtual address
of 0xc000000003000000 nor its associated physical address of 0x3000000 are
stored anywhere, since "p" is a stack variable. The htab_bolt_mapping()
function does not store the mapping information in the kernel either, it
just uses temporary stack variables before calling the ppc_md.hpte_insert()
function which eventually leads to a machine-dependent (directly to firmware)
function.
So unless I'm missing something, nowhere along the vmemmap call-chain are the
VTOP address/size particulars stored anywhere -- say for example, in a
/proc/iomem-like "resource" data structure.
(FWIW, I note that in 2.6.32, CONFIG_PPC_BOOK3E arches still use the normal page
tables to map the memmap array(s). I don't know whether BOOK3E arch is the
most common or not...)
In any case, not being able to read the page structure contents has a
significant effect on the crash utility. This is about the only thing
that can be done for these kernels, where a warning gets printed during
initialization, and any command that attempts to read a page structure
will subsequently fail:
# crash vmlinux vmcore
crash 4.1.2p1
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu"...
WARNING: cannot translate vmemmap kernel virtual addresses:
commands requiring page structure contents will fail
KERNEL: vmlinux
DUMPFILE: vmcore
CPUS: 2
DATE: Thu Dec 10 05:40:35 2009
UPTIME: 21:44:59
LOAD AVERAGE: 0.11, 0.03, 0.01
TASKS: 196
NODENAME:
ibm-js20-04.lab.bos.redhat.com
RELEASE: 2.6.31-38.el6.ppc64
VERSION: #1 SMP Sun Nov 22 08:15:30 EST 2009
MACHINE: ppc64 (unknown Mhz)
MEMORY: 2 GB
PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for
details)
PID: 10656
COMMAND: "runtest.sh"
TASK: c000000072156420 [THREAD_INFO: c000000072058000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> kmem -i
kmem: cannot translate vmemmap address: f000000000000000
crash> kmem -p
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
kmem: cannot translate vmemmap address: f000000000000000
crash> kmem -s
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
kmem: cannot translate vmemmap address: f00000000030db44
crash>
Can any of the IBM engineers on this list (or any ppc64 user)
confirm my findings? Maybe I'm missing something, but I don't
see it.
And if you agree, perhaps you can work on an upstream solution to
store the vmemmap-to-physical data information?
Dave