----- hutao(a)cn.fujitsu.com wrote:
On Thu, Sep 02, 2010 at 03:46:00PM +0800, hutao(a)cn.fujitsu.com
wrote:
> Hi,
>
> I got a problem where it seemed crash got a bad backtrace.
> The problem occurred under the following conditions:
> On a qemu guest system loading a module that stuck at
> the init function(say, call a function that did deadlooping),
> then dumped the guest by `virsh dump vm dumpfile', and run
> crash on the dumpfile.
>
> The module is:
>
> ---
> #include <linux/module.h>
>
> int endless_loop(void)
> {
> printk("endless loop\n");
> while (1);
>
> return 0;
> }
>
> int __init endless_init(void)
> {
> endless_loop();
>
> return 0;
> }
> module_init(endless_init);
>
> MODULE_LICENSE("GPL");
> ---
>
> crash bt command got:
>
> crash> bt -a
> PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> #0 [ffffffff81601e08] schedule at ffffffff813e8a49
> #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
> #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
> #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
>
> PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
> #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
> #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
> #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
> #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
>
> Does it lose some function calls between do_one_initcall and retint_kernel?
> (endless_loop <- endless_init)
>
In addition, if we don't stick in the init function (there is still a deadloop
somewhere in module but triggered by, say, reading a /proc file) then the backtrace
outputed by crash is correct.
When you say "correct", I presume that you see your module functions as frames.
But if you also see the backtrace starting with "schedule", then it's just
luck
that the backtrace bumped into your module functions. It just so happened that
when walking back from schedule(), it "mistakenly" stumbled upon your
module's
functions.
In the example above, I presume that when trying to backtrace from retint_kernel(),
it stepped over your module's "loop" functions that were called via
do_one_initcall().
That's why I suggest that you should probably see them on the kernel stack in
between ffff88001e751e90 and ffff88001e751f20 if you use "bt -t". That is what
"bt -t" is for -- the "bt" command is never guaranteed to be correct.
Dave