Hello Dave,
Thanks for your observations.
From: Dave Anderson <anderson(a)redhat.com>
Subject: Re: [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Date: Mon, 24 Jan 2011 14:27:39 -0500 (EST)
----- Original Message -----
> gcore extension module provides a means to create ELF core dump for
> user-mode process that is contained within crash kernel dump. I design
> this to behave as kernel's ELF core dumper.
>
> For previous discussion, see:
>
https://www.redhat.com/archives/crash-utility/2010-August/msg00001.html
A few observations...
I'll fix unwind_x86_64.h to prevent this build warning:
# make extensions
...
gcc -Wall -I.. -I./libgcore -fPIC -DX86_64 -c -o libgcore/gcore_x86.o
libgcore/gcore_x86.c
In file included from libgcore/gcore_x86.c:19:
../unwind_x86_64.h:61:1: warning: "offsetof" redefined
In file included from libgcore/gcore_x86.c:17:
../defs.h:60:1: warning: this is the location of the previous definition
...
The warning is caused by IO_BITMAP_OFFSET that is defined but unused
in gcore_x86.c. So, it seems to me that part to be fixed is
gcore_x86.c, not unwind_x86_64.h.
But the gcore.mk file should gracefully fail to build on
non-supported
architectures. It ends up spewing ~200 lines of error messages when
attempted, for example, on a ppc64 machine:
Yes, I know it behaves like this if we make it run on unsupported
architectures. I'd understood it was implicitly permitted by looking
at similar build error of sial. But if it's wrong in fact, I'll make
it buildable on unsupported architectures.
gcore includes part that can be shared commonly among different
architectures. This is mostly equal to anything but part of collecting
kinds of note information that are inherently architecture speciffic.
I'll fix here so that gcore on unsupported architectures are providing
ELF core only with PT_LOAD sections.
Your documentation implies that the command would only work on
certain kernel versions:
> Compared with the previous version, this release:
> - supports more kernel versions, and
> - collects register values more accurately (but still not perfect).
>
> Support Range
> =============
>
> |----------------+----------------------------------------------|
> | ARCH | X86, X86_64 |
> |----------------+----------------------------------------------|
> | Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 |
> |----------------+----------------------------------------------|
But, for example, on a 2.6.34-2.fc14 kernel (presumably unsupported),
it seems to work OK on some tasks, but on others it doesn't work so well.
Here, the "less" command can be dumped OK kernel:
crash> sys | grep RELEASE
RELEASE: 2.6.34-2.fc14.x86_64
crash> ps
... [ cut ] ...
> 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
crash> gcore -v0 2090
Saved core.2090.less
crash>
But with the same (full) 2.6.34-2.fc14 dumpfile, it can't seem to handle
dumping the crash utility itself, and just hangs:
crash> swap
FILENAME TYPE SIZE USED PCT PRIORITY
/dev/dm-1 PARTITION 18579452k 0k 0% -1
crash> ps
... [ cut ] ...
> 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
crash> gcore -v1 2080
gcore: Restoring the thread group ...
gcore: done.
gcore: Retrieving note information ...
< hangs forever >
...
I would have thought that it would either work-for-all or work-for-none
with respect to a particular kernel version?
Sorry, I have no idea on what you mean by ``work-for-all or
work-for-none''.
``supported kernel versions'' stands for ``I tested gcore
extension module on these kernels''. There's possibility for gcore to
work well even on differnet kernel versions if there's no
incompatibility among the kernel versions.
In any case, if it's going to fail, perhaps there should be some mechanism
in place that would prevent it from hanging, and instead print a message
that the kernel version is not supported? Or if a particular data structure
is different than the "supported" versions, it should fail immediately?
Just a thought...
I agree to the former idea. I believe gcore has an enough chanse to
work well on unsupported kernels.
The hanging part is likely to be restore_frame_pointer() that runs
only when the analized kernel is built with CONFIG_FRAME_POINTER=y and
user-space frame pointer is available by looking at the base pointer
in order.
If kernel stack frame is in mess condition, unwinding behaviour by the
function can be performed in any unexpected way.
I'll fix here by adding some degree that limits the number of tracing
to some finite number. Kernel stack size would be enough here.
Also I note that "gcore -v7" fails -- shouldn't it be accepted as an
argument?
crash> gcore -v7 2080
gcore: invalid vlevel: 7.
crash>
Oh, sorry. This is just a bug that should be removed my unit testing. Thanks.
I'll post again fixed version soon. Please wait for a while.
Thanks.
HATAYAMA Daisuke