Hi,
From: HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com>
Subject: Re: [Crash-utility] [RFC] gcore subcommand: a process coredump feature
Date: Tue, 03 Aug 2010 15:17:00 +0900 (東京 (標準時))
 Hello Iguchi-san,
 
 Thanks for your comments.
 
 From: "S.Iguchi" <iguchi.sg(a)ncos.nec.co.jp>
 Subject: Re: [Crash-utility] [RFC] gcore subcommand: a process coredump feature
 Date: Tue, 03 Aug 2010 13:10:09 +0900 (JST)
 
 > Hi, Hatayama-san
 > 
 > I have a mostly same purpose extension with your patch.
 > But your patch is great! , because supporting latest kernel and 
 > also dump filter masking.
 > 
 > my current extention file is attached.
 > Yes, my code is quite buggy, ugly and not enough against latest kernel
 > than yours.
 > (sigh ... I didnot know fill_vma_cache(), so do "vm -p" everytime before
dump.)
 > 
 > BTW, I have some comments.
 > I'd like to add some features below to yours. 
 > or if you will do, it is happy for me. :) 
 > 
 > - support i386 
 > - support elf32 binary on x86-64 
 > - support old kernel (before 2.6.17)
 > 
 > as Dave said, if your patch committed as extension,
 > I could submit some patches to that.
 > 
 > How about this?
 
 As I've written in the first entry, I have a plan to support RHEL4,
 RHEL5 and RHEL6 on i386, x86_64 and IA64, and the latest upstream
 kernel, too. Next table shows correspondence of community's kernel
 versions.
 
    RHEL4  RHEL5   RHEL6   upstream
   ---------------------------------
    2.6.9  2.6.18  2.6.32  2.6.35
 
 So, it could probably be enough for your first and third requests.
  
Ugh, i didnt check RHEL4 ... sorry.
thank you for your explanation.
 On the other hand, I've not planned to support ia32 emulation
over
 both x86_64 and ia64.
  
OK.
it is enough for me to support ia32 emulation on x86-64 ...
if your extension applied, I'll think about it.
Thanks.
Regards,
Seigo Iguchi
 > 
 > Best regards,
 > Seigo Iguchi
 > 
 > 
 > From: HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com>
 > Subject: [Crash-utility] [RFC] gcore subcommand: a process coredump feature
 > Date: Mon, 02 Aug 2010 18:00:02 +0900	(東京 (標準時))
 > 
 >> Hello,
 >> 
 >> For some weeks I've developed gcore subcommand for crash utility which
 >> provides process coredump feature for crash kernel dump, strongly
 >> demanded by users who want to investigate user-space applications
 >> contained in kernel crash dump.
 >> 
 >> I've now finished making a prototype version of gcore and found out
 >> what are the issues to be addressed intensely. Could you give me any
 >> comments and suggestions on this work?
 >> 
 >> 
 >> Motivation
 >> ==========
 >> 
 >> It's a relatively familiar technique that in a cluster system a
 >> currently running node triggers crash kernel dump mechanism when
 >> detecting a kind of a critical error in order for the running, error
 >> detecting server to cease as soon as possible. Concequently, the
 >> residual crash kernel dump contains a process image for the erroneous
 >> user application. At the case, developpers are interested in user
 >> space, rather than kernel space.
 >> 
 >> There's also a merit of gcore that it allows us to use several
 >> userland debugging tools, such as GDB and binutils, in order to
 >> analyze user space memory.
 >> 
 >> 
 >> Current Status
 >> ==============
 >> 
 >> I confirm the prototype version runs on the following configuration:
 >> 
 >>   Linux Kernel Version: 2.6.34
 >>   Supporting Architecture: x86_64
 >>   Crash Version: 5.0.5
 >>   Dump Format: ELF
 >> 
 >> I'm planning to widen a range of support as follows:
 >> 
 >>   Linux Kernel Version: Any
 >>   Supporting Architecture: i386, x86_64 and IA64
 >>   Dump Format: Any
 >> 
 >> 
 >> Issues
 >> ======
 >> 
 >> Currently, I have issues below.
 >> 
 >> 1) Retrieval of appropriate register values
 >> 
 >> The prototype version retrieves register values from a _wrong_
 >> location: a top of the kernel stack, into which register values are
 >> saved at any preemption context switch. On the other hand, the
 >> register values that should be included here are the ones saved at
 >> user-to-kernel context switch on any interrupt event.
 >> 
 >> I've yet to implement this. Specifically, I need to do the following
 >> task from now.
 >> 
 >>   (1) list all entries from user-space to kernel-space execution path.
 >> 
 >>   (2) divide the entries according to where and how the register
 >>   values from user-space context are saved.
 >> 
 >>   (3) compose a program that retrieves the saved register values from
 >>   appropriate locations that is traced by means of (1) and (2).
 >> 
 >> Ideally, I think it's best if crash library provides any means of
 >> retrieving this kind of register values, that is, ones saved on
 >> various stack frames. Is there such a plan to do?
 >> 
 >> 
 >> 2) Getting a signal number for a task which was during core dump
 >> process at kernel crash
 >> 
 >> If a target task is halfway of core dump process, it's better to know
 >> a signal number in order to know why the task was about to be core
 >> dumped.
 >> 
 >> Unfortunately, I have no choice but backtrace the kernel stack to
 >> retrieve a signal number saved there as an argument of, for example,
 >> do_coredump().
 >> 
 >> 
 >> 3) Kernel version compatibility
 >> 
 >> crash's policy is to support all kernel versions by the latest crash
 >> package. On the other hand, the prototype is based on kernel 2.6.34.
 >> This means more kernel versions need to be supported.
 >> 
 >> Well, the question is: to what versions do I need to really test in
 >> addition to the latest upstream kernel? I think it's practically
 >> enough to support RHEL4, RHEL5 and RHEL6.
 >> 
 >> 
 >> Build Instruction
 >> =================
 >> 
 >>   $ tar xf crash-5.0.5.tar.gz
 >>   $ cd crash-5.0.5/
 >>   $ patch -p 1 < gcore.patch
 >>   $ make
 >> 
 >> 
 >> Usage
 >> =====
 >> 
 >> Use help subcommand of crash utility as ``help gcore''.
 >> 
 >> 
 >> Attached File
 >> =============
 >> 
 >>   * gcore.patch
 >> 
 >>     A patch implementing gcore subcommand for crash-5.0.5.
 >> 
 >>     The diffstat output is as follows.
 >> 
 >> $ diffstat gcore.patch
 >>  Makefile      |   10 +-
 >>  defs.h        |   15 +
 >>  gcore.c       | 1858 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 >>  gcore.h       |  639 ++++++++++++++++++++
 >>  global_data.c |    3 +
 >>  help.c        |   28 +
 >>  netdump.c     |   27 +
 >>  tools.c       |   37 ++
 >>  8 files changed, 2615 insertions(+), 2 deletions(-)
 >> 
 >> --
 >> HATAYAMA Daisuke
 >> d.hatayama(a)jp.fujitsu.com