On Mon, Oct 18, 2010 at 09:56:41AM -0400, Dave Anderson wrote:
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> ----- "Hu Tao" <hutao(a)cn.fujitsu.com> wrote:
>
> > Hi Dave,
> >
> > There is a bug on get_be_long() that causes high 32 bits truncated.
> > As a result, we get wrong registers values from dump file. Patch 1
> > fixes this.
>
> Good catch!
>
> > Once we can get right cpu registers values, it's better to use the
> > sp/ip for backtracing the active task. This can show a more accurate
> > backtrace, not including those invalid frames beyond sp. pathes 2 and
> > 3 do this on kvmdump case(virsh dump).
> >
> > To verify: run that km_probe.c test module on a x86_64 system, then
> > `echo q > /proc/sysrq-trigger' to trigger the kprobe which does
> > looping in post_handler. Then vrish dump then crash.
>
> However, I'm wondering whether you actually tested this with a
> *crashed* system's dumpfile, and not just with a *live* system dump with
> a contrived set of circumstances. And if so, what differences, if any,
> did you see with the backtraces of the task that either oops'd or called
> panic(), as well as those of the other active tasks that were brought down
> by IP interrupt?
>
> Anyway, I'll give this a thorough testing with a set of sample
> dumpfiles that I have on hand.
Actually, the patch fails miserably on SMP dumpfiles with segmentation
during initialization.
And now looking at your patch, I'm wondering whether you even tested this
with an SMP system?
My fault. I will do more tests and improve the patch.
* test with a SMP system
* test with a live dump
* test with system panic/oops
* test with x86/x86_64
The change to qemu_load() here:
@@ -904,6 +906,9 @@ qemu_load (const struct qemu_device_loader *devices, uint32_t
required_features,
if (feof (fp) || ferror (fp))
break;
+ if (STREQ(d->vtbl->name, "cpu"))
+ result->dx86 = d;
+
if (sec == QEMU_VM_SECTION_END || sec == QEMU_VM_SECTION_FULL)
result->features |= features;
}
That function cycles through the "cpu" devices for *each* cpu
in the system, so this patch will store the device of the last cpu
device it encounters. So in an SMP machine, it will store the
device for the highest cpu only, right?
And then there's this change to get_kvmdump_regs():
@@ -310,7 +311,11 @@ kvmdump_memory_dump(FILE *ofp)
void
get_kvmdump_regs(struct bt_info *bt, ulong *pc, ulong *sp)
{
- machdep->get_stack_frame(bt, pc, sp);
+ if (is_task_active(bt->task)) {
+ *sp = device_list->dx86->regs[R_ESP];
+ *pc = device_list->dx86->eip;
+ } else
+ machdep->get_stack_frame(bt, pc, sp);
}
is_task_active() returns TRUE for all active tasks in an SMP system,
not just the panic task. So it would seem you're going to pass
back the same registers for all active tasks?
And what's the point of this change to kernel.c?
diff --git a/kernel.c b/kernel.c
index e399099..2627020 100755
--- a/kernel.c
+++ b/kernel.c
@@ -16,6 +16,7 @@
*/
#include "defs.h"
+#include "qemu-load.h"
#include "xen_hyper_defs.h"
#include <elf.h>
Also, the change to main.c is unnecessary -- there are dozens of malloc'd
memory areas in the program -- so why go to the bother of free'ing
just this one prior to exiting?
Anyway, instead of saving the device list, I suggest you do something
like storing the per-cpu IP/SP values in a separate data structure that
can possibly be used as an alternative source for register values for
"live dumps" -- and possibly for crashing systems if usable starting
hooks cannot be determined in the traditional manner. I had thought
of doing something like that in the past, but when I looked at the
register values, I must have run into the get_be_long() issue?
Sounds reasonable. Thanks for suggestions.
--
Thanks,
Hu Tao