On Thu, Sep 02, 2010 at 08:55:38AM -0400, Dave Anderson wrote:
 
 ----- hutao(a)cn.fujitsu.com wrote:
 
 > On Thu, Sep 02, 2010 at 03:46:00PM +0800, hutao(a)cn.fujitsu.com wrote:
 > > Hi,
 > > 
 > > I got a problem where it seemed crash got a bad backtrace.
 > > The problem occurred under the following conditions:
 > > On a qemu guest system loading a module that stuck at
 > > the init function(say, call a function that did deadlooping),
 > > then dumped the guest by `virsh dump vm dumpfile', and run
 > > crash on the dumpfile.
 > > 
 > > The module is:
 > > 
 > > ---
 > > #include <linux/module.h>
 > > 
 > > int endless_loop(void)
 > > {
 > > 	printk("endless loop\n");
 > > 	while (1);
 > > 
 > > 	return 0;
 > > }
 > > 
 > > int __init endless_init(void)
 > > {
 > > 	endless_loop();
 > > 
 > > 	return 0;
 > > }
 > > module_init(endless_init);
 > > 
 > > MODULE_LICENSE("GPL");
 > > ---
 > > 
 > > crash bt command got:
 > > 
 > > crash> bt -a
 > > PID: 0      TASK: ffffffff81648020  CPU: 0   COMMAND: "swapper"
 > >  #0 [ffffffff81601e08] schedule at ffffffff813e8a49
 > >  #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
 > >  #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
 > >  #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
 > >  #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
 > > 
 > > PID: 1088   TASK: ffff88001dda2d60  CPU: 1   COMMAND: "insmod"
 > >  #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
 > >  #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
 > >  #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
 > >  #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
 > >  #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
 > >  #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
 > >  #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
 > >     RIP: 00007f761bb58b7a  RSP: 00007fff67a43120  RFLAGS: 00010206
 > >     RAX: 00000000000000af  RBX: ffffffff81002a82  RCX: 0000000000020010
 > >     RDX: 0000000000b96010  RSI: 00000000000163da  RDI: 0000000000b96030
 > >     RBP: 0000000000b96010   R8: 0000000000010011   R9: 0000000000080000
 > >     R10: 00007f761bb4b140  R11: 0000000000000202  R12: 00000000000163da
 > >     R13: 00007fff67a44985  R14: 00000000000163da  R15: 0000000000b96010
 > >     ORIG_RAX: 00000000000000af  CS: 0033  SS: 002b
 > > 
 > > Does it lose some function calls between do_one_initcall and retint_kernel?
 > > (endless_loop <- endless_init)
 > > 
 > 
 > In addition, if we don't stick in the init function (there is still a deadloop
 > somewhere in module but triggered by, say, reading a /proc file) then the backtrace
 > outputed by crash is correct.
 
 When you say "correct", I presume that you see your module functions as
frames.
 But if you also see the backtrace starting with "schedule", then it's just
luck
 that the backtrace bumped into your module functions.  It just so happened that
 when walking back from schedule(), it "mistakenly" stumbled upon your
module's
 functions.  
 
 In the example above, I presume that when trying to backtrace from retint_kernel(),
 it stepped over your module's "loop" functions that were called via
do_one_initcall().
 That's why I suggest that you should probably see them on the kernel stack in 
 between ffff88001e751e90 and ffff88001e751f20 if you use "bt -t".  That is
what
 "bt -t" is for -- the "bt" command is never guaranteed to be correct.
Nothing new in between ffff88001e751e90 and ffff88001e751f20 with `bt -at':
crash> bt -at
PID: 0      TASK: ffffffff81648020  CPU: 0   COMMAND: "swapper"
              START: schedule at ffffffff813e8a49
  [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
  [ffffffff81601e60] __atomic_notifier_call_chain at ffffffff813ed799
  [ffffffff81601ea0] need_resched at ffffffff8100970c
  [ffffffff81601eb0] default_idle at ffffffff81009f6b
  [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
  [ffffffff81601f10] rest_init at ffffffff813d72ec
  [ffffffff81601f30] start_kernel at ffffffff816e1d77
  [ffffffff81601f40] command_line at ffffffff81718e90
  [ffffffff81601f70] x86_64_start_reservations at ffffffff816e12ac
  [ffffffff81601f90] x86_64_start_kernel at ffffffff816e13a8
PID: 1088   TASK: ffff88001dda2d60  CPU: 1   COMMAND: "insmod"
              START: schedule at ffffffff813e8a49
  [ffff88001e751dd0] schedule at ffffffff813e8aec
  [ffff88001e751e40] get_parent_ip at ffffffff8102f193
  [ffff88001e751e60] sub_preempt_count at ffffffff813ed62c
  [ffff88001e751e70] need_resched at ffffffff8102a975
  [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
  [ffff88001e751e90] retint_kernel at ffffffff813eab86
  [ffff88001e751f20] do_one_initcall at ffffffff81000210
  [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
  [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
    RIP: 00007f761bb58b7a  RSP: 00007fff67a43120  RFLAGS: 00010206
    RAX: 00000000000000af  RBX: ffffffff81002a82  RCX: 0000000000020010
    RDX: 0000000000b96010  RSI: 00000000000163da  RDI: 0000000000b96030
    RBP: 0000000000b96010   R8: 0000000000010011   R9: 0000000000080000
    R10: 00007f761bb4b140  R11: 0000000000000202  R12: 00000000000163da
    R13: 00007fff67a44985  R14: 00000000000163da  R15: 0000000000b96010
    ORIG_RAX: 00000000000000af  CS: 0033  SS: 002b
crash> 
This is the related part of output of `bt -ar' in case you're interested:
ffff88001e751e60:  sub_preempt_count+146 ffff88001e751e78 
ffff88001e751e70:  need_resched+30  ffff88001e751e88 
ffff88001e751e80:  preempt_schedule_irq+82 ffff88001e751f18 
ffff88001e751e90:  retint_kernel+38 ffff88001e751dd8 
ffff88001e751ea0:  0000000000000000 0000000000000004 
ffff88001e751eb0:  0000000000000000 ffff88001e751fd8 
ffff88001e751ec0:  0000000000000000 0000000000000001 
ffff88001e751ed0:  0000000000000000 ffffffffa00f9000 
ffff88001e751ee0:  ffffffffffffff10 ffffffffa00f9004 
ffff88001e751ef0:  0000000000000010 0000000000000246 
ffff88001e751f00:  ffff88001e751f18 0000000000000018 
ffff88001e751f10:  ffff8800ffffffff ffff88001e751f48 
ffff88001e751f20:  do_one_initcall+122 __this_module    
ffff88001e751f30:  0000000000000000 0000000000020000 
ffff88001e751f40:  0000000000b96030 ffff88001e751f78 
ffff88001e751f50:  sys_init_module+196 0000000000b96010 
ffff88001e751f60:  00000000000163da 00007fff67a44985 
ffff88001e751f70:  00000000000163da 0000000000b96010 
ffff88001e751f80:  system_call_fastpath+22 0000000000000202 
ffff88001e751f90:  00007f761bb4b140 0000000000080000 
ffff88001e751fa0:  0000000000010011 00000000000000af 
ffff88001e751fb0:  0000000000020010 0000000000b96010 
ffff88001e751fc0:  00000000000163da 0000000000b96030 
ffff88001e751fd0:  00000000000000af 00007f761bb58b7a 
ffff88001e751fe0:  0000000000000033 0000000000010206 
ffff88001e751ff0:  00007fff67a43120 000000000000002b 
and `sym -m endless':
crash> sym -m endless
ffffffffa00f6000 MODULE START: endless
ffffffffa00f6000 (?) endless_loop  
ffffffffa00f6030 (r) __ksymtab_endless_loop  
ffffffffa00f6040 (r) __kstrtab_endless_loop  
ffffffffa00f6050 (d) __this_module  
ffffffffa00f6345 MODULE END: endless
crash> 
-- 
Thanks,
Hu Tao