----- Original Message -----
Hello Dave,
Thanks for your observations.
> I'll fix unwind_x86_64.h to prevent this build warning:
>
> # make extensions
> ...
> gcc -Wall -I.. -I./libgcore -fPIC -DX86_64 -c -o
> libgcore/gcore_x86.o libgcore/gcore_x86.c
> In file included from libgcore/gcore_x86.c:19:
> ../unwind_x86_64.h:61:1: warning: "offsetof" redefined
> In file included from libgcore/gcore_x86.c:17:
> ../defs.h:60:1: warning: this is the location of the previous
> definition
> ...
>
The warning is caused by IO_BITMAP_OFFSET that is defined but unused
in gcore_x86.c. So, it seems to me that part to be fixed is
gcore_x86.c, not unwind_x86_64.h.
Maybe, but it should also be fixed in unwind_x86_64.h like this:
--- unwind_x86_64.h 30 Nov 2010 19:40:30 -0000 1.4
+++ unwind_x86_64.h 24 Jan 2011 20:54:25 -0000 1.5
@@ -58,7 +58,9 @@
extern void init_unwind_table(void);
extern void free_unwind_table(void);
+#ifndef offsetof
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
+#endif
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
Your module is the first C source file that #include's defs.h and then
unwind_x86_64.h. The change above to unwind_x86_64.h just does the same
thing as defs.h.
> But the gcore.mk file should gracefully fail to build on non-supported
> architectures. It ends up spewing ~200 lines of error messages when
> attempted, for example, on a ppc64 machine:
Yes, I know it behaves like this if we make it run on unsupported
architectures. I'd understood it was implicitly permitted by looking
at similar build error of sial. But if it's wrong in fact, I'll make
it buildable on unsupported architectures.
Or you could just catch it in the gcore.mk by doing something like this:
ARCH=UNSUPPORTED
ifeq ($(shell arch), x86_64)
ARCH=SUPPORTED
endif
ifeq ($(shell arch), i686)
ARCH=SUPPORTED
endif
all: gcore.so
gcore.so: gcore.c
@if [ ${ARCH} = "UNSUPPORTED" ]; then \
echo "gcore: architecture not supported"; else \
echo "do build here..."; fi;
gcore includes part that can be shared commonly among different
architectures. This is mostly equal to anything but part of collecting
kinds of note information that are inherently architecture speciffic.
I'll fix here so that gcore on unsupported architectures are providing
ELF core only with PT_LOAD sections.
>
> Your documentation implies that the command would only work on
> certain kernel versions:
>
>> Compared with the previous version, this release:
>> - supports more kernel versions, and
>> - collects register values more accurately (but still not perfect).
>>
>> Support Range
>> =============
>>
>> |----------------+----------------------------------------------|
>> | ARCH | X86, X86_64 |
>> |----------------+----------------------------------------------|
>> | Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 |
>> |----------------+----------------------------------------------|
>
>
> But, for example, on a 2.6.34-2.fc14 kernel (presumably unsupported),
> it seems to work OK on some tasks, but on others it doesn't work so well.
> Here, the "less" command can be dumped OK kernel:
>
>
> crash> sys | grep RELEASE
> RELEASE: 2.6.34-2.fc14.x86_64
> crash> ps
> ... [ cut ] ...
> > 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
> 2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
> 2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
> crash> gcore -v0 2090
> Saved core.2090.less
> crash>
>
> But with the same (full) 2.6.34-2.fc14 dumpfile, it can't seem to handle
> dumping the crash utility itself, and just hangs:
>
> crash> swap
> FILENAME TYPE SIZE USED PCT PRIORITY
> /dev/dm-1 PARTITION 18579452k 0k 0% -1
> crash> ps
> ... [ cut ] ...
> > 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
> 2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
> 2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
> crash> gcore -v1 2080
> gcore: Restoring the thread group ...
> gcore: done.
> gcore: Retrieving note information ...
>
> < hangs forever >
>
> ...
>
> I would have thought that it would either work-for-all or work-for-none
> with respect to a particular kernel version?
Sorry, I have no idea on what you mean by ``work-for-all or work-for-none''.
``supported kernel versions'' stands for ``I tested gcore
extension module on these kernels''. There's possibility for gcore to
work well even on differnet kernel versions if there's no
incompatibility among the kernel versions.
But the "less" and "crash" command examples were from the same
dumpfile,
so I didn't understand whey gcore would work for one command, but not for
another command -- from the same kernel version?
>
> In any case, if it's going to fail, perhaps there should be some mechanism
> in place that would prevent it from hanging, and instead print a message
> that the kernel version is not supported? Or if a particular data structure
> is different than the "supported" versions, it should fail immediately?
> Just a thought...
I agree to the former idea. I believe gcore has an enough chanse to
work well on unsupported kernels.
The hanging part is likely to be restore_frame_pointer() that runs
only when the analized kernel is built with CONFIG_FRAME_POINTER=y and
user-space frame pointer is available by looking at the base pointer
in order.
If kernel stack frame is in mess condition, unwinding behaviour by the
function can be performed in any unexpected way.
I'll fix here by adding some degree that limits the number of tracing
to some finite number. Kernel stack size would be enough here.
>
> Also I note that "gcore -v7" fails -- shouldn't it be accepted as an
> argument?
>
> crash> gcore -v7 2080
> gcore: invalid vlevel: 7.
> crash>
Oh, sorry. This is just a bug that should be removed my unit testing.
Thanks.
I'll post again fixed version soon. Please wait for a while.
OK thanks,
Dave