Hi Daisuke,
On Tue, Oct 23, 2012 at 4:49 PM, HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com
wrote:
> From: Lei Wen <adrian.wenl(a)gmail.com>
> Subject: Re: GCORE: add directly show backtrace function in crash
> Date: Tue, 23 Oct 2012 15:06:30 +0800
>
> > Hi Daisuke,
> >
> > On Tue, Oct 23, 2012 at 2:17 PM, HATAYAMA Daisuke <
> d.hatayama(a)jp.fujitsu.com
> >
wrote:
> >
> >> From: Lei Wen <adrian.wenl(a)gmail.com>
> >> Subject: Re: GCORE: add directly show backtrace function in crash
> >> Date: Mon, 22 Oct 2012 15:36:47 +0800
> >>
> >> > Hi Daisuke,
> >> >
> >> > On Mon, Oct 22, 2012 at 3:29 PM, HATAYAMA Daisuke <
> >> d.hatayama(a)jp.fujitsu.com
> >> >
wrote:
> >> >
> >> >> From: Lei Wen <adrian.wenl(a)gmail.com>
> >> >> Subject: GCORE: add directly show backtrace function in crash
> >> >> Date: Mon, 22 Oct 2012 12:21:49 +0800
> >> >>
> >> >> > Hi Daisuke,
> >> >> >
> >> >> > I create a new option "-tT" for gcore, so that it
could display bt
> for
> >> >> user
> >> >> > space
> >> >> > directly inside crash itself, without needing to dump a
separated
> core
> >> >> file
> >> >> > image,
> >> >> > and analyze it in a different gdb env.
> >> >> >
> >> >> > The attached patch is directly based on below patch, and
verify
> over
> >> ARM
> >> >> > platform.
> >> >> >
http://osdir.com/ml/general/2012-10/msg32677.html
> >> >> >
> >> >> > Before use the corresponding gcore command, we need set env
in
> crash
> >> by:
> >> >> >
> >> >> > crash>> gdb set solib-search-path [system lib path]
> >> >> >
> >> >> > crash>> gdb file [user space program symbol file]
> >> >> >
> >> >> > crash>> gcore -t [user space thread id]
> >> >> >
> >> >> > Thanks,
> >> >> > Lei
> >> >>
> >> >> Hello Lei,
> >> >>
> >> >> That must be a useful feature, but I think it's very othogonal
to
> >> >> gcore command...
> >> >>
> >> >> Why not releasing your own extension module separately to gcore?
> >> >>
> >> >
> >> > I put this function in gcore, is for gcore already provide the
> function
> >> to
> >> > get
> >> > the general register for user space thread. If add another module,
> that
> >> > part of function seems a little duplicated...
> >> >
> >>
> >> I now understand why you want to add it in gcore.
> >>
> >> > Also provide the gcore the capability to either dump into a core file
> >> > or directly display, user may have more choice over this. :)
> >> >
> >>
> >> But, users then need to do more work such as loading application's
> >> symbol files. There seems not so big difference.
> >>
> >> gdb has one symbol space only. If you load applications's symbols, it
> >> can override kernel symbols. Then, gcore might behave abnormally. Can
> >> users reset the loading of applications' symbols in any feature of
> >> gdb?
> >>
> >
> > Good question!
> > However I am not a gdb expert... Hope someone here could give a
> solution...
>
> If there's no such feature, users are forced to restart crash
> utility. Symbol space are dirtry and the later crash and gcore's
> behaviour can no longer be trusted. Not all users can use machine
> powerfull enough. Restart should be considered hard for such users, in
> particular on large dump files.
>
> I can easily make such a situation where gcore doesn't work well. I
> made the proram bellow, where task_struct sturucture is defined.
>
> #include <stdio.h>
>
> struct task_struct
> {
> int x;
> };
>
> struct task_struct ts;
>
> int main(void)
> {
> printf("%p\n", &ts);
>
> return 0;
> }
>
> After loading this binary built with -g to crash, next load gcore
> module. Then I saw the following failure of gcore.
>
> crash> gcore 1904
>
> gcore: invalid structure member offset: mm_struct_map_count
> FILE: libgcore/gcore_coredump.c LINE: 75 FUNCTION:
> gcore_coredump()
>
> [/pub/repos/crash-6.1.0/crash] error trace: 7fe1cffe7149 => 7fe1cffe740a
> => 7fe1cffdd3a7 => 54bc43
>
> 54bc43: OFFSET_verify+164
>
> gcore: invalid structure member offset: mm_struct_map_count
> FILE: libgcore/gcore_coredump.c LINE: 75 FUNCTION:
> gcore_coredump()
>
> Failed.
>
You are right, with this test case, I saw the same issue...
> Also a silly question, since kernel runs well with user symbol, why gdb
> could not live with the chaos?
The premiss would be wrong. crash can work wrongly if user symbol is
loaded. crash, and gcore, memoize the symbols they frequently refer to
in memory for performance. It is done in early start-up phase before
reaching crash's prompt. Such symbols memoized are not affected. But
there are symbols not memoized in crash and gcore. They are of couruse
affected.
Now I fully understand your concern, the same symbol would destroy kernel's
original cached one... Is there any method to let crash only use those
symbol
from kernel, and gcore use those from user space when try to do the
backtrace?
Thanks,
Lei