Dave,

Thanks for your explanation.

Well the reason behind my questions is, we have an application running on customer site and the application consumes around 60GB of system memory.
When this process receives the segmentation fault or signal abort, the kernel will start to take the process core dump. Here is the problem. Kernel takes at least  1hr (60-minutes) to come out from core dump. During this time the system is unresponsive (hung), and I feel it is because the system is entering into thrashing due to huge memory usage by the process. This long down time is not acceptable by the customer.

So I started to find the better way or tackling the problem.

1>First thing we thought is changing the system page size from 4KB to 8KB. Since this change could not be done on our x86_64 architecture, since x86_64 architecture doesn’t support multi-page size option.

2>We wrote a program using libbfd API’s and used with in our application. Whenever the SIGSEGV or SIGABRT is received by the process it will log the stack trace of all the threads within that process. This feature is not so effective or flexible as compared to process core dump.

3>Last we thought of using kcore/vmcore to analyze the cause for SIGSEGV or SIGABRT.

4>I have one more thought, making the “elf_core_dump()” function SMP. This function is responsible for dumping the core, and the function is present in “/usr/src/linux/fs/binfmt_elf.c”


Any comments/ideas are welcome.

--Regards,
rajesh
 

>
>Rajesh,
>
>Castor's patch/suggestion is the best/only option you have
>for this kind of thing.  I've not tried it, but since the
>crash utility's "vm -p" option delineates where each
>instantiated page of a given task is located, it's potentially
>possible to recreate an ELF core file of the specified
>task.  (Any swapped-out pages won't be in the vmcore...)
>
>The embedded gdb module inside of crash is invoked internally
>as "gdb vmlinux", and has no clue about any other user-space
>program.
>
>That being said, you can execute the gdb "add-symbol-file"
>command to load the debuginfo data from a user space
>program, and then examine user-space data from the context
>of that program.
>
>For example, when you run the crash utility on a live system,
>the default context is that of the "crash" utility itself:
>
>  $ ./crash
>
>  crash 4.0-4.6
>  Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007  Red Hat, Inc.
>  Copyright (C) 2004, 2005, 2006  IBM Corporation
>  Copyright (C) 1999-2006  Hewlett-Packard Co
>  Copyright (C) 2005, 2006  Fujitsu Limited
>  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
>  Copyright (C) 2005  NEC Corporation
>  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
>  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>  This program is free software, covered by the GNU General Public License,
>  and you are welcome to change it and/or distribute copies of it under
>  certain conditions.  Enter "help copying" to see the conditions.
>  This program has absolutely no warranty.  Enter "help warranty" for details.
>
>  GNU gdb 6.1
>  Copyright 2004 Free Software Foundation, Inc.
>  GDB is free software, covered by the GNU General Public License, and you are
>  welcome to change it and/or distribute copies of it under certain conditions.
>  Type "show copying" to see the conditions.
>  There is absolutely no warranty for GDB.  Type "show warranty" for details.
>  This GDB was configured as "i686-pc-linux-gnu"...
>
>        KERNEL: /boot/vmlinux-2.4.21-37.ELsmp
>      DEBUGINFO: /usr/lib/debug/boot/vmlinux-2.4.21-37.ELsmp.debug
>      DUMPFILE: /dev/mem
>          CPUS: 2
>          DATE: Tue Sep  4 16:36:53 2007
>        UPTIME: 15 days, 08:15:06
>  LOAD AVERAGE: 0.14, 0.06, 0.01
>          TASKS: 87
>      NODENAME: crash.boston.redhat.com
>        RELEASE: 2.4.21-37.ELsmp
>        VERSION: #1 SMP Wed Sep 7 13:28:55 EDT 2005
>        MACHINE: i686  (1993 Mhz)
>        MEMORY: 511.5 MB
>            PID: 9381
>        COMMAND: "crash"
>          TASK: dd63c000
>            CPU: 1
>          STATE: TASK_RUNNING (ACTIVE)
>  crash>
>
>Verify the current context:
>
>  crash> set
>      PID: 9381
>  COMMAND: "crash"
>      TASK: dd63c000
>      CPU: 0
>    STATE: TASK_RUNNING (ACTIVE)
>  crash>
>
>So, for example, the crash utility has a program_context
>data structure that starts like this:
>
>  struct program_context {
>          char *program_name;            /* this program's name */
>          char *program_path;            /* unadulterated argv[0] */
>          char *program_version;          /* this program's version */
>          char *gdb_version;              /* embedded gdb version */
>          char *prompt;                  /* this program's prompt */
>          unsigned long long flags;      /* flags from above */
>          char *namelist;                /* linux namelist */
>          ...
>
>And it declares a data variable with the same name:
>
>  struct program_context program_context = { 0 };
>
>If I wanted to see a gdb-style dump of its contents, I can
>do this:
>
>  crash> add-symbol-file ./crash
>  add symbol table from file "./crash" at
>  Reading symbols from ./crash...done.
>  crash>
>
>Now the embedded gdb has the debuginfo data from the crash
>object file (which was compiled with -g), and it knows where
>the program_context structure is located in user space:
>
>  crash> p &program_context
>  $1 = (struct program_context *) 0x8391ea0
>  crash>
>
>Since 0x8391ea0 is not a kernel address, the "p" command cannot
>be used to display the data structure.  However, the crash
>utility's "struct" command has a little-used "-u" option, which
>indicates that the address that follows is a user-space address
> from the current context:
>
>  crash> struct program_context -u 0x8391ea0
>  struct program_context {
>    program_name = 0xbffff9b0 "crash",
>    program_path = 0xbffff9ae "./crash",
>    program_version = 0x82e9c12 "4.0-4.6",
>    gdb_version = 0x834ecdf "6.1",
>    prompt = 0x8400438 "crash> ",
>    flags = 844424965983303,
>    namelist = 0x83f5940 "/boot/vmlinux-2.4.21-37.ELsmp",
>    ...
>
>That all being said, this capability cannot be used to generate
>any kind of user-space backtrace.  You can do raw reads of the
>user-space stack, say from the point at which it entered kernel
>space, but whether that's of any help depends upon what you're
>looking for.
>
>Dave
>



Map