On Thu, Sep 02, 2010 at 08:55:38AM -0400, Dave Anderson wrote:
----- hutao(a)cn.fujitsu.com wrote:
> On Thu, Sep 02, 2010 at 03:46:00PM +0800, hutao(a)cn.fujitsu.com wrote:
> > Hi,
> >
> > I got a problem where it seemed crash got a bad backtrace.
> > The problem occurred under the following conditions:
> > On a qemu guest system loading a module that stuck at
> > the init function(say, call a function that did deadlooping),
> > then dumped the guest by `virsh dump vm dumpfile', and run
> > crash on the dumpfile.
> >
> > The module is:
> >
> > ---
> > #include <linux/module.h>
> >
> > int endless_loop(void)
> > {
> > printk("endless loop\n");
> > while (1);
> >
> > return 0;
> > }
> >
> > int __init endless_init(void)
> > {
> > endless_loop();
> >
> > return 0;
> > }
> > module_init(endless_init);
> >
> > MODULE_LICENSE("GPL");
> > ---
> >
> > crash bt command got:
> >
> > crash> bt -a
> > PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
> > #0 [ffffffff81601e08] schedule at ffffffff813e8a49
> > #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
> > #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
> > #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
> > #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
> >
> > PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
> > #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
> > #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
> > #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
> > #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
> > #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
> > #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
> > #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
> > RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
> > RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
> > RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
> > RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
> > R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
> > R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
> > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
> >
> > Does it lose some function calls between do_one_initcall and retint_kernel?
> > (endless_loop <- endless_init)
> >
>
> In addition, if we don't stick in the init function (there is still a deadloop
> somewhere in module but triggered by, say, reading a /proc file) then the backtrace
> outputed by crash is correct.
When you say "correct", I presume that you see your module functions as
frames.
But if you also see the backtrace starting with "schedule", then it's just
luck
that the backtrace bumped into your module functions. It just so happened that
when walking back from schedule(), it "mistakenly" stumbled upon your
module's
functions.
In the example above, I presume that when trying to backtrace from retint_kernel(),
it stepped over your module's "loop" functions that were called via
do_one_initcall().
That's why I suggest that you should probably see them on the kernel stack in
between ffff88001e751e90 and ffff88001e751f20 if you use "bt -t". That is
what
"bt -t" is for -- the "bt" command is never guaranteed to be correct.
Nothing new in between ffff88001e751e90 and ffff88001e751f20 with `bt -at':
crash> bt -at
PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper"
START: schedule at ffffffff813e8a49
[ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
[ffffffff81601e60] __atomic_notifier_call_chain at ffffffff813ed799
[ffffffff81601ea0] need_resched at ffffffff8100970c
[ffffffff81601eb0] default_idle at ffffffff81009f6b
[ffffffff81601ec0] cpu_idle at ffffffff81001bf5
[ffffffff81601f10] rest_init at ffffffff813d72ec
[ffffffff81601f30] start_kernel at ffffffff816e1d77
[ffffffff81601f40] command_line at ffffffff81718e90
[ffffffff81601f70] x86_64_start_reservations at ffffffff816e12ac
[ffffffff81601f90] x86_64_start_kernel at ffffffff816e13a8
PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod"
START: schedule at ffffffff813e8a49
[ffff88001e751dd0] schedule at ffffffff813e8aec
[ffff88001e751e40] get_parent_ip at ffffffff8102f193
[ffff88001e751e60] sub_preempt_count at ffffffff813ed62c
[ffff88001e751e70] need_resched at ffffffff8102a975
[ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
[ffff88001e751e90] retint_kernel at ffffffff813eab86
[ffff88001e751f20] do_one_initcall at ffffffff81000210
[ffff88001e751f50] sys_init_module at ffffffff8106b7ca
[ffff88001e751f80] system_call_fastpath at ffffffff81002a82
RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206
RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010
RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030
RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000
R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da
R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010
ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
crash>
This is the related part of output of `bt -ar' in case you're interested:
ffff88001e751e60: sub_preempt_count+146 ffff88001e751e78
ffff88001e751e70: need_resched+30 ffff88001e751e88
ffff88001e751e80: preempt_schedule_irq+82 ffff88001e751f18
ffff88001e751e90: retint_kernel+38 ffff88001e751dd8
ffff88001e751ea0: 0000000000000000 0000000000000004
ffff88001e751eb0: 0000000000000000 ffff88001e751fd8
ffff88001e751ec0: 0000000000000000 0000000000000001
ffff88001e751ed0: 0000000000000000 ffffffffa00f9000
ffff88001e751ee0: ffffffffffffff10 ffffffffa00f9004
ffff88001e751ef0: 0000000000000010 0000000000000246
ffff88001e751f00: ffff88001e751f18 0000000000000018
ffff88001e751f10: ffff8800ffffffff ffff88001e751f48
ffff88001e751f20: do_one_initcall+122 __this_module
ffff88001e751f30: 0000000000000000 0000000000020000
ffff88001e751f40: 0000000000b96030 ffff88001e751f78
ffff88001e751f50: sys_init_module+196 0000000000b96010
ffff88001e751f60: 00000000000163da 00007fff67a44985
ffff88001e751f70: 00000000000163da 0000000000b96010
ffff88001e751f80: system_call_fastpath+22 0000000000000202
ffff88001e751f90: 00007f761bb4b140 0000000000080000
ffff88001e751fa0: 0000000000010011 00000000000000af
ffff88001e751fb0: 0000000000020010 0000000000b96010
ffff88001e751fc0: 00000000000163da 0000000000b96030
ffff88001e751fd0: 00000000000000af 00007f761bb58b7a
ffff88001e751fe0: 0000000000000033 0000000000010206
ffff88001e751ff0: 00007fff67a43120 000000000000002b
and `sym -m endless':
crash> sym -m endless
ffffffffa00f6000 MODULE START: endless
ffffffffa00f6000 (?) endless_loop
ffffffffa00f6030 (r) __ksymtab_endless_loop
ffffffffa00f6040 (r) __kstrtab_endless_loop
ffffffffa00f6050 (d) __this_module
ffffffffa00f6345 MODULE END: endless
crash>
--
Thanks,
Hu Tao