Hi:
I don't know if you're aware of this, but libvirt[1] recently added a
call which allows you to snoop on the live memory of guests,
virDomainMemoryPeek[2]:
int virDomainMemoryPeek (virDomainPtr dom,
unsigned long long start, /* start address */
size_t size, /* size (bytes) */
void * buffer, /* return buffer */
unsigned int flags);
This would allow, in theory, for crash to debug running guests. I had
a look at the crash code and it doesn't seem like it would be too hard
to add this.
We [the libvirt team] only support this for QEMU & KVM guests at the
moment, but we plan to support this call for Xen in the near future.
Also, the call only works on virtual memory addresses (in other words,
the address is translated through the guest's page tables), but in
practice that isn't too bad because the common configuration for Linux
is to map all of physical memory at some address, eg. 0xc0000000 on
i386. Also the peek operation is read-only.
So if you are interested, let me know, and I will attempt a patch.
- - -
Now, the bigger picture ...
For some months now we've been attempting to write system
administrator tools to mimic common sysadmin commands, except that
they work on guests. For example 'virt-ps <guest>' lists out the
process table in <guest>. It runs from the host and works by snooping
guest memory using virDomainMemoryPeek.
We have had some success, although it's been quite a lot harder than
we imagined it would be. At the moment we have 'virt-dmesg',
'virt-uname', 'virt-ifconfig' and 'virt-ps', plus a handful of
custom
commands, working to a greater or lesser extent.
However I wasn't aware before that crash could already do this
(particularly 'log', 'ps', 'mount' and 'net' commands),
and in fact
crash has a lot more complete support for these commands than we do.
So it makes sense to use crash to do this, instead of continuing with
our separate implementation, if we can make it work.
I think there are two things that we'd need to add to crash in order
to get this working:
(i) Scripting. I'm aware that there are two scripting projects for
crash out there already, but it looked fairly immature and/or
unsupported. However, not too hard to pull these projects up to
standard and/or add some scripting support, or use expect.
(ii) Getting the debug symbols.
Item (ii) is the big deal for us. Our current virt-* tools can work
with a wide range of kernels.
What we do is to download the kernel-debuginfo packages beforehand,
extract only the tiny amount of debug info we actually need from
vmlinux, and build a 'kernel database'. (We're using dwarves to get
the layout of the dozen or so structures that we care about). It
turns out that it's quite easy to heuristically determine the version
of a running kernel, and from that we can look up the structures in
the kernel database at runtime.
Upshot is that we support currently ~ 350 kernels with a database
which is a modest 1 MB in size, and probably could be made smaller
with very little effort.
The problem I haven't yet resolved with using crash is that we need a
matching, identical vmlinux image (ie. 50-100 MB) per guest kernel
version. In the case where we see a kernel version we've not seen
before, we may have to download this and store it somewhere.
The alternative seems to involve some really deep hacking inside gdb,
perhaps so it can be persuaded to use only partial debug info?
I don't know if you have any thoughts about (ii).
Rich.
[1]
http://libvirt.org/
[2]
http://libvirt.org/html/libvirt-libvirt.html#virDomainMemoryPeek
--
Richard Jones, Emerging Technologies, Red Hat
http://et.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/