Richard W.M. Jones wrote:
On Tue, Aug 19, 2008 at 04:32:35PM -0400, Dave Anderson wrote:
> Out of curiousity, any reason why a libvirt interface couldn't be
> created that accesses guest pseudo-physical addresses? And does
> the existing interface accept vmalloc addresses, or only unity-mapped
> kernel virtual addresses?
I'm afraid I didn't fully understand your previous comment about
vmalloc & unity-mapped addresses. I'm not sure what a "unity-mapped"
address is, and I thought vmalloc just used ordinary kernel addresses
above PAGE_OFFSET.
It depends upon the architecture, but all of them have at least one
kernel virtual address range where, the PAGE_OFFSET value can be stripped off,
resulting in the physical address. So "unity-mapping" just means a one-to-one
address translation, since the physical address is right "there" encoded in the
virtual address value.
For bare-metal kernels, the physical address is then read directly from the
dumpfile or /dev/mem -- or since Red Hat restricts /dev/mem, the /dev/crash
driver is loaded as replacement for /dev/mem.
For xen kernels specifically, the stripped address is a pseudo-physical
address, but the xendump dumpfile formats (the "old" and newer-ELF-style
formats) require those to be translated to machine-address values.
So then the p2m table needs to be accessed to turn the pseudo-physical
into a machine address, and then that address is what is found and accessed
from the dumpfile.
On the other hand, vmalloc() addresses are a range of kernel virtual addresses,
typically above PAGE_OFFSET (but not necessarily on all architectures), that
are mapped to some unknown physical address. Therefore, they *do* require a
full page-table walkthrough, because, like user virtual addresses, there's no
indication in the vmalloc address value as to what the underlying physical address
is. And, FWIW, the page-table walkthrough is far more involved in our xen
writeable-page-table kernels than on bare-metal kernels because the physical
address encoded into each page-table entry is a *machine* address -- that must
be translated back into pseudo-physical address in order to find the next page-table.
BTW, for xen unity-mapped address translations, a full page-table walk could
be done as well, but it's unnecessary. But I digress...
So in any case, the underlying libvirt guest kernel virtual memory access
function sounds like it translates them regardless whether they're
unity-mapped or vmalloc-mapped addresses, so I think you've answered my
question.
The issue at hand is that the crash utility is "founded" upon the
access of physical-addresses. So all readmem() requests are translated
to physical addresses before the resultant value is passed on to
/dev/mem or /dev/crash (live systems), or to the myriad of dumpfile
formats that crash supports (about a dozen of them).
Getting back to the libvirt adaptation into crash, yes, the requests for
unity-mapped -- and if I understand it correctly -- for vmalloc addresses,
could be passed to your interface directly without a translation to
physical. And for crash readmem(PHYSADDR, ...) direct requests, they
could presumably be turned into unity-mapped addresses by applying
PAGE_OFFSET, and then passing that value to libvirt -- with one significant
caveat on 32-bit systems. On 32-bit systems, only the first 896MB of
physical memory can be accessed via unity-mapped kernel virtual addresses.
The remaining 128MB is given to vmalloc and a handful of other hardwired-type
virtual addresses on the top end. So, for example, you wouldn't be able
to read user virtual addresses on larger 32-bit systems because
their page tables and resultant physical pages are biased to be
located in highmem, i.e., above the 896MB limit. That would
also be a problem for vmalloc addresses *if* the libvirt
interface only handled unity-mapped addresses, which was the
genesis of my question.
And that's also why I asked about the issue of creating a libvirt
interface that accepted pseudo-physical addresses. If that were
in place, it would simply mimic /dev/mem (/dev/crash), and it
should "just work". There are a couple of other minor gotchas that
would also have to be handled for live guest access, like for example,
the access of /proc/version.
Anyway, in libvirt we rely on what the underlying hypervisor can do.
In the case of QEMU/KVM, the QEMU monitor supports a simple "memsave"
command. This command takes three parameters: start, size and a
filename, and it saves the memory from start to start+size-1 into the
file. Along the way it translates these virtual addresses through CR3
/ the page tables (or the equivalent on non-x86 architectures).
We could offer a way to get at physical addresses, but it would
require getting a patch accepted into QEMU & KVM (separate but loosely
synchronized codebases), and then a corresponding change in libvirt.
Then there's a long wait while everyone updates to the newest versions
of everything and finally physical memory peeking would be possible
through libvirt.
Yep, understood...
For the Xen driver's virDomainMemoryPeek call -- which isn't
implemented in libvirt yet -- it's actually a lot easier to use
physical addresses, because you request from the hypervisor that pages
from another domain be mapped into your process using an ioctl which
takes physical addresses. In order to provide compatibility with the
existing software using virDomainMemoryPeek we were planning on
implementing the page table lookups ourselves within libvirt.
Right -- and what I'm suggesting is letting the crash utility do
all the dirty work for you by just giving crash access to pseudo-physical
addresses of the target guest. By doing that, for all practical
purposes, crash wouldn't even know that it was even dealing with a
"remote" system.
Dave