Re: [Crash-utility] crash does not get proper backtrace?

Tuesday, 7 September 2010

----- "KAMEZAWA Hiroyuki" <kamezawa.hiroyu(a)jp.fujitsu.com&gt; wrote:

...
 On Thu, 2 Sep 2010 08:44:12 -0400 (EDT)
 Dave Anderson <anderson(a)redhat.com&gt; wrote:

 > 
 > ----- hutao(a)cn.fujitsu.com wrote:
 > 
 > > Hi,
 > > 
 > > I got a problem where it seemed crash got a bad backtrace.
 > > The problem occurred under the following conditions:
 > > On a qemu guest system loading a module that stuck at
 > > the init function(say, call a function that did deadlooping),
 > > then dumped the guest by `virsh dump vm dumpfile', and run
 > > crash on the dumpfile.
 > > 
 > > The module is:
 > > 
 > > ---
 > > #include <linux/module.h>
 > > 
 > > int endless_loop(void)
 > > {
 > > 	printk("endless loop\n");
 > > 	while (1);
 > > 
 > > 	return 0;
 > > }
 > > 
 > > int __init endless_init(void)
 > > {
 > > 	endless_loop();
 > > 
 > > 	return 0;
 > > }
 > > module_init(endless_init);
 > > 
 > > MODULE_LICENSE("GPL");
 > > ---
 > > 
 > > crash bt command got:
 > > 
 > > crash> bt -a
 > > PID: 0      TASK: ffffffff81648020  CPU: 0   COMMAND: "swapper"
 > >  #0 [ffffffff81601e08] schedule at ffffffff813e8a49
 > >  #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e
 > >  #2 [ffffffff81601ea0] need_resched at ffffffff8100970c
 > >  #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b
 > >  #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5
 > > 
 > > PID: 1088   TASK: ffff88001dda2d60  CPU: 1   COMMAND: "insmod"
 > >  #0 [ffff88001e751dc8] schedule at ffffffff813e8a49
 > >  #1 [ffff88001e751dd0] schedule at ffffffff813e8aec
 > >  #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90
 > >  #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86
 > >  #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210
 > >  #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca
 > >  #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82
 > >     RIP: 00007f761bb58b7a  RSP: 00007fff67a43120  RFLAGS: 00010206
 > >     RAX: 00000000000000af  RBX: ffffffff81002a82  RCX: 0000000000020010
 > >     RDX: 0000000000b96010  RSI: 00000000000163da  RDI: 0000000000b96030
 > >     RBP: 0000000000b96010   R8: 0000000000010011   R9: 0000000000080000
 > >     R10: 00007f761bb4b140  R11: 0000000000000202  R12: 00000000000163da
 > >     R13: 00007fff67a44985  R14: 00000000000163da  R15: 0000000000b96010
 > >     ORIG_RAX: 00000000000000af  CS: 0033  SS: 002b
 > > 
 > > Does it lose some function calls between do_one_initcall and retint_kernel?
 > > (endless_loop <- endless_init)
 > 
 > Your best bet is to use "bt -t" in a case such as that.
 > 
 > If there are no "starting hooks" for the backtrace code to use, then
 > it simply defaults to the RSP value left in the task->thread_struct->rsp,
 > and the RIP of the instruction following "__switch_to". 

 Then, virsh dump doesn't save RIP/RSP of the vcpu in format which crash can
 understand. Right ? (Tao-san, did you check the memory image of stack contains
 something expected ?)

 If so, what we (fujitsu) has to do is...
   1. confirm "virsh dump" saves registers or not.
   2. If saved, investigate the format.
   3. add crash support. (or write a program of format converter.)

 I'm sorry if I don't understand correctly.  
"virsh dump" does save registers, but getting them reliably from the dumpfile,
and then if you do get them, the values do not necessarily work for
the crash utlility's purposes.

You can put some debug code in the cpu_load() function in the crash
utility's qemu-load.c file.  But unfortunately, that code area functionality
keeps changing, and I simply take the latest code from the QEMU developers
and plug it in.

Anyway, when I have tried doing so in the past, I've found register contents
that were either invalid, or containing eip/esp values that could not be used
for starting points for backtracing the particular task.

Dave

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] crash does not get proper backtrace?