New subject: User Stack back trace of the process

Wednesday, 5 September 2007

Dave,

Thanks for your explanation.

Well the reason behind my questions is, we have an application running on customer site
and the application consumes around 60GB of system memory.
When this process receives the segmentation fault or signal abort, the kernel will start
to take the process core dump. Here is the problem. Kernel takes at least  1hr
(60-minutes) to come out from core dump. During this time the system is unresponsive
(hung), and I feel it is because the system is entering into thrashing due to huge memory
usage by the process. This long down time is not acceptable by the customer.

So I started to find the better way or tackling the problem.

1>First thing we thought is changing the system page size from 4KB to 8KB. Since this
change could not be done on our x86_64 architecture, since x86_64 architecture doesnt
support multi-page size option.

2>We wrote a program using libbfd APIs and used with in our application. Whenever the
SIGSEGV or SIGABRT is received by the process it will log the stack trace of all the
threads within that process. This feature is not so effective or flexible as compared to
process core dump. 

3>Last we thought of using kcore/vmcore to analyze the cause for SIGSEGV or SIGABRT.

4>I have one more thought, making the elf_core_dump() function SMP. This function is
responsible for dumping the core, and the function is present in
/usr/src/linux/fs/binfmt_elf.c

Any comments/ideas are welcome.

--Regards,
rajesh

...

Rajesh,

Castor's patch/suggestion is the best/only option you have
for this kind of thing.  I've not tried it, but since the
crash utility's "vm -p" option delineates where each
instantiated page of a given task is located, it's potentially
possible to recreate an ELF core file of the specified
task.  (Any swapped-out pages won't be in the vmcore...)

The embedded gdb module inside of crash is invoked internally
as "gdb vmlinux", and has no clue about any other user-space
program.

That being said, you can execute the gdb "add-symbol-file"
command to load the debuginfo data from a user space
program, and then examine user-space data from the context
of that program.

For example, when you run the crash utility on a live system,
the default context is that of the "crash" utility itself:

   $ ./crash

   crash 4.0-4.6
   Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007  Red Hat, Inc.
   Copyright (C) 2004, 2005, 2006  IBM Corporation
   Copyright (C) 1999-2006  Hewlett-Packard Co
   Copyright (C) 2005, 2006  Fujitsu Limited
   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
   Copyright (C) 2005  NEC Corporation
   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
   This program is free software, covered by the GNU General Public License,
   and you are welcome to change it and/or distribute copies of it under
   certain conditions.  Enter "help copying" to see the conditions.
   This program has absolutely no warranty.  Enter "help warranty" for details.

   GNU gdb 6.1
   Copyright 2004 Free Software Foundation, Inc.
   GDB is free software, covered by the GNU General Public License, and you are
   welcome to change it and/or distribute copies of it under certain conditions.
   Type "show copying" to see the conditions.
   There is absolutely no warranty for GDB.  Type "show warranty" for details.
   This GDB was configured as "i686-pc-linux-gnu"...

         KERNEL: /boot/vmlinux-2.4.21-37.ELsmp
      DEBUGINFO: /usr/lib/debug/boot/vmlinux-2.4.21-37.ELsmp.debug
       DUMPFILE: /dev/mem
           CPUS: 2
           DATE: Tue Sep  4 16:36:53 2007
         UPTIME: 15 days, 08:15:06
   LOAD AVERAGE: 0.14, 0.06, 0.01
          TASKS: 87
       NODENAME: crash.boston.redhat.com
        RELEASE: 2.4.21-37.ELsmp
        VERSION: #1 SMP Wed Sep 7 13:28:55 EDT 2005
        MACHINE: i686  (1993 Mhz)
         MEMORY: 511.5 MB
            PID: 9381
        COMMAND: "crash"
           TASK: dd63c000
            CPU: 1
          STATE: TASK_RUNNING (ACTIVE)
   crash>

Verify the current context:

   crash> set
       PID: 9381
   COMMAND: "crash"
      TASK: dd63c000
       CPU: 0
     STATE: TASK_RUNNING (ACTIVE)
   crash>

So, for example, the crash utility has a program_context
data structure that starts like this:

   struct program_context {
           char *program_name;             /* this program's name */
           char *program_path;             /* unadulterated argv[0] */
           char *program_version;          /* this program's version */
           char *gdb_version;              /* embedded gdb version */
           char *prompt;                   /* this program's prompt */
           unsigned long long flags;       /* flags from above */
           char *namelist;                 /* linux namelist */
           ...

And it declares a data variable with the same name:

   struct program_context program_context = { 0 };

If I wanted to see a gdb-style dump of its contents, I can
do this:

   crash> add-symbol-file ./crash
   add symbol table from file "./crash" at
   Reading symbols from ./crash...done.
   crash>

Now the embedded gdb has the debuginfo data from the crash
object file (which was compiled with -g), and it knows where
the program_context structure is located in user space:

   crash> p &program_context
   $1 = (struct program_context *) 0x8391ea0
   crash>

Since 0x8391ea0 is not a kernel address, the "p" command cannot
be used to display the data structure.  However, the crash
utility's "struct" command has a little-used "-u" option, which
indicates that the address that follows is a user-space address
 from the current context:

   crash> struct program_context -u 0x8391ea0
   struct program_context {
     program_name = 0xbffff9b0 "crash",
     program_path = 0xbffff9ae "./crash",
     program_version = 0x82e9c12 "4.0-4.6",
     gdb_version = 0x834ecdf "6.1",
     prompt = 0x8400438 "crash> ",
     flags = 844424965983303,
     namelist = 0x83f5940 "/boot/vmlinux-2.4.21-37.ELsmp",
     ...

That all being said, this capability cannot be used to generate
any kind of user-space backtrace.  You can do raw reads of the
user-space stack, say from the point at which it entered kernel
space, but whether that's of any help depends upon what you're
looking for.

Dave

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: Re: [Crash-utility] User Stack back trace of the process