Hello Iguchi-san,
Thanks for your comments.
From: "S.Iguchi" <iguchi.sg(a)ncos.nec.co.jp>
Subject: Re: [Crash-utility] [RFC] gcore subcommand: a process coredump feature
Date: Tue, 03 Aug 2010 13:10:09 +0900 (JST)
Hi, Hatayama-san
I have a mostly same purpose extension with your patch.
But your patch is great! , because supporting latest kernel and
also dump filter masking.
my current extention file is attached.
Yes, my code is quite buggy, ugly and not enough against latest kernel
than yours.
(sigh ... I didnot know fill_vma_cache(), so do "vm -p" everytime before
dump.)
BTW, I have some comments.
I'd like to add some features below to yours.
or if you will do, it is happy for me. :)
- support i386
- support elf32 binary on x86-64
- support old kernel (before 2.6.17)
as Dave said, if your patch committed as extension,
I could submit some patches to that.
How about this?
As I've written in the first entry, I have a plan to support RHEL4,
RHEL5 and RHEL6 on i386, x86_64 and IA64, and the latest upstream
kernel, too. Next table shows correspondence of community's kernel
versions.
RHEL4 RHEL5 RHEL6 upstream
---------------------------------
2.6.9 2.6.18 2.6.32 2.6.35
So, it could probably be enough for your first and third requests.
On the other hand, I've not planned to support ia32 emulation over
both x86_64 and ia64.
Best regards,
Seigo Iguchi
From: HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com>
Subject: [Crash-utility] [RFC] gcore subcommand: a process coredump feature
Date: Mon, 02 Aug 2010 18:00:02 +0900 (東京 (標準時))
> Hello,
>
> For some weeks I've developed gcore subcommand for crash utility which
> provides process coredump feature for crash kernel dump, strongly
> demanded by users who want to investigate user-space applications
> contained in kernel crash dump.
>
> I've now finished making a prototype version of gcore and found out
> what are the issues to be addressed intensely. Could you give me any
> comments and suggestions on this work?
>
>
> Motivation
> ==========
>
> It's a relatively familiar technique that in a cluster system a
> currently running node triggers crash kernel dump mechanism when
> detecting a kind of a critical error in order for the running, error
> detecting server to cease as soon as possible. Concequently, the
> residual crash kernel dump contains a process image for the erroneous
> user application. At the case, developpers are interested in user
> space, rather than kernel space.
>
> There's also a merit of gcore that it allows us to use several
> userland debugging tools, such as GDB and binutils, in order to
> analyze user space memory.
>
>
> Current Status
> ==============
>
> I confirm the prototype version runs on the following configuration:
>
> Linux Kernel Version: 2.6.34
> Supporting Architecture: x86_64
> Crash Version: 5.0.5
> Dump Format: ELF
>
> I'm planning to widen a range of support as follows:
>
> Linux Kernel Version: Any
> Supporting Architecture: i386, x86_64 and IA64
> Dump Format: Any
>
>
> Issues
> ======
>
> Currently, I have issues below.
>
> 1) Retrieval of appropriate register values
>
> The prototype version retrieves register values from a _wrong_
> location: a top of the kernel stack, into which register values are
> saved at any preemption context switch. On the other hand, the
> register values that should be included here are the ones saved at
> user-to-kernel context switch on any interrupt event.
>
> I've yet to implement this. Specifically, I need to do the following
> task from now.
>
> (1) list all entries from user-space to kernel-space execution path.
>
> (2) divide the entries according to where and how the register
> values from user-space context are saved.
>
> (3) compose a program that retrieves the saved register values from
> appropriate locations that is traced by means of (1) and (2).
>
> Ideally, I think it's best if crash library provides any means of
> retrieving this kind of register values, that is, ones saved on
> various stack frames. Is there such a plan to do?
>
>
> 2) Getting a signal number for a task which was during core dump
> process at kernel crash
>
> If a target task is halfway of core dump process, it's better to know
> a signal number in order to know why the task was about to be core
> dumped.
>
> Unfortunately, I have no choice but backtrace the kernel stack to
> retrieve a signal number saved there as an argument of, for example,
> do_coredump().
>
>
> 3) Kernel version compatibility
>
> crash's policy is to support all kernel versions by the latest crash
> package. On the other hand, the prototype is based on kernel 2.6.34.
> This means more kernel versions need to be supported.
>
> Well, the question is: to what versions do I need to really test in
> addition to the latest upstream kernel? I think it's practically
> enough to support RHEL4, RHEL5 and RHEL6.
>
>
> Build Instruction
> =================
>
> $ tar xf crash-5.0.5.tar.gz
> $ cd crash-5.0.5/
> $ patch -p 1 < gcore.patch
> $ make
>
>
> Usage
> =====
>
> Use help subcommand of crash utility as ``help gcore''.
>
>
> Attached File
> =============
>
> * gcore.patch
>
> A patch implementing gcore subcommand for crash-5.0.5.
>
> The diffstat output is as follows.
>
> $ diffstat gcore.patch
> Makefile | 10 +-
> defs.h | 15 +
> gcore.c | 1858 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> gcore.h | 639 ++++++++++++++++++++
> global_data.c | 3 +
> help.c | 28 +
> netdump.c | 27 +
> tools.c | 37 ++
> 8 files changed, 2615 insertions(+), 2 deletions(-)
>
> --
> HATAYAMA Daisuke
> d.hatayama(a)jp.fujitsu.com