fix_lkcd_address problem
by Alan Tyson
Hi,
I believe that there is an incorrect comparison in fix_lkcd_address:
059 ulonglong
060 fix_lkcd_address(ulonglong addr)
061 {
062 int i;
063 ulong offset;
064
065 for (i = 0; i < lkcd->fix_addr_num; i++) {
066 if ( (addr >=lkcd->fix_addr[i].task) &&
067 (addr <= lkcd->fix_addr[i].task + STACKSIZE())){
^^^^^- here
On Itanium fix_addr[i] + STACKSIZE() may be the address of an adjacent
task structure. As it stands both parts of the comparison pass if addr is
the address in the fix_addr[i].task field or if it is the task structure
which follows that one. The result is this it is not possible to read the
task structure of the task that follows a task which is in this fixup list
and zeroes are returned instead.
Regards,
Alan Tyson, HP.
--- lkcd_common.c.orig 2007-08-27 16:51:11.000000000 +0100
+++ lkcd_common.c 2007-09-19 16:46:07.000000000 +0100
@@ -64,7 +64,7 @@ fix_lkcd_address(ulonglong addr)
for (i = 0; i < lkcd->fix_addr_num; i++) {
if ( (addr >=lkcd->fix_addr[i].task) &&
- (addr <= lkcd->fix_addr[i].task + STACKSIZE())){
+ (addr < lkcd->fix_addr[i].task + STACKSIZE())){
offset = addr - lkcd->fix_addr[i].task;
addr = lkcd->fix_addr[i].saddr + offset;
17 years, 2 months
crash 4.0-3.14 and SLES 10
by reagen_jeff@emc.com
I am trying to look at a live SLES 10 system using crash. Crash fails to
start successfully. It returns:
crash: /boot/vmlinux: no debugging data available
The vmlinux file in question is not stripped.
Has anyone else been able to get this to work? If so what did you do?
Jeff
17 years, 2 months
Re: [PATCH 0/2] vmcoreinfo support for dump filtering #2
by Vivek Goyal
On Mon, Sep 10, 2007 at 11:35:21AM -0700, Randy Dunlap wrote:
> On Fri, 7 Sep 2007 17:57:46 +0900 Ken'ichi Ohmichi wrote:
>
> > Hi,
>
> > I released a new makedumpfile (version 1.2.0) with vmcoreinfo support.
> > I updated the patches for linux and kexec-tools.
> >
> > PATCH SET:
> > [1/2] [linux-2.6.22] Add vmcoreinfo
> > The patch is for linux-2.6.22.
> > The patch adds the vmcoreinfo data. Its address and size are output
> > to /sys/kernel/vmcoreinfo.
> >
> > [2/2] [kexec-tools] Pass vmcoreinfo's address and size
> > The patch is for kexec-tools-testing-20070330.
> > (http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/)
> > kexec command gets the address and size of the vmcoreinfo data from
> > /sys/kernel/vmcoreinfo, and passes them to the second kernel through
> > ELF header of /proc/vmcore. When the second kernel is booting, the
> > kernel gets them from the ELF header and creates vmcoreinfo's PT_NOTE
> > segment into /proc/vmcore.
>
> Hi,
> When using the vmcoreinfo patches, what tool(s) are available for
> analyzing the vmcore (dump) file? E.g., lkcd or crash or just gdb?
>
> gdb works for me, but I tried to use crash (4.0-4.6 from
> http://people.redhat.com/anderson/) and crash complained:
>
> crash: invalid kernel virtual address: 0 type: "cpu_pda entry"
>
> Should crash work, or does it need to be modified?
>
Hi Randy,
Crash should just work. It might broken on latest kernel. Copying it
to crash-utility mailing list. Dave will be able to tell us better.
> This is on a 2.6.23-rc3 kernel with vmcoreinfo patches and a dump file
> with -l 31 (dump level 31, omitting all possible pages).
>
Thanks
Vivek
17 years, 2 months
Re: Re: Re: crash and sles 9 dumps (Dave Anderson)
by Daniel Li
Hey Dave,
When you said this was something you never saw before, did you mean you
never tried to use crash on a dump of SLES9 guest with the nonstandard
ELF format, or that this scenario was working for you, thus you didn't
see this type of error message?
If the answer happen to be the first one, do you have any plan to
support SLES guest dumps? (with the new ELF format you incorporated in
the first half of this year to get crash working with Redhat guest dumps)
Later,
Daniel
Virtual Iron Software, Inc
www.virtualiron.com
crash-utility-request(a)redhat.com wrote:
> Send Crash-utility mailing list submissions to
> crash-utility(a)redhat.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://www.redhat.com/mailman/listinfo/crash-utility
> or, via email, send a message with subject or body 'help' to
> crash-utility-request(a)redhat.com
>
> You can reach the person managing the list at
> crash-utility-owner(a)redhat.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Crash-utility digest..."
>
>
> Today's Topics:
>
> 1. Re: Re: crash and sles 9 dumps (Dave Anderson)
> 2. Re: crash and sles 9 GUEST dumps (Dave Anderson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 20 Aug 2007 15:49:47 -0400
> From: Dave Anderson <anderson(a)redhat.com>
> Subject: Re: [Crash-utility] Re: crash and sles 9 dumps
> To: holzheu(a)linux.vnet.ibm.com, "Discussion list for crash utility
> usage, maintenance and development" <crash-utility(a)redhat.com>
> Message-ID: <46C9F05B.2090800(a)redhat.com>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
> Michael Holzheu wrote:
>
>> Hi Cliff
>>
>> On Mon, 2007-08-13 at 11:33 -0500, Cliff Wickman wrote:
>>
>>
>>> On Fri, Aug 10, 2007 at 05:19:10PM +0200, Bernhard Walle wrote:
>>>
>>
>>> The kerntypes file that crash can use is built by the LKCD dwarfexract
>>> command. Types are extracted from a -g kernel and modules. And dwarfextract
>>> writes a magic ELF e_version that crash uses to distinguish a kerntypes from
>>> a vmlinux. So only such a kerntypes file will work.
>>>
>> Also the standard -g compiled lkcd Kerntypes file seems to work, if you
>> set the KERNTYPES flag. This can be useful, if you don't want to build a
>> full -g compiled vmlinux.
>>
>> I used the following simple patch which adds the "-k" option to force
>> crash using the kerntypes code path.
>>
>> diff -Naurp crash-4.0-4.5/main.c crash-4.0-4.5-kerntypes/main.c
>> --- crash-4.0-4.5/main.c 2007-08-13 15:07:20.000000000 +0200
>> +++ crash-4.0-4.5-kerntypes/main.c 2007-08-13 15:06:51.000000000 +0200
>> @@ -70,7 +70,7 @@ main(int argc, char **argv)
>> */
>> opterr = 0;
>> optind = 0;
>> - while((c = getopt_long(argc, argv, "Lgh::e:i:sSvc:d:tfp:m:",
>> + while((c = getopt_long(argc, argv, "Lkgh::e:i:sSvc:d:tfp:m:",
>> long_options, &option_index)) != -1) {
>> switch (c)
>> {
>> @@ -222,6 +222,9 @@ main(int argc, char **argv)
>> else
>> program_usage(LONG_FORM);
>> clean_exit(0);
>> + case 'k':
>> + pc->flags |= KERNTYPES;
>> + break;
>>
>> case 'e':
>> if (STREQ(optarg, "vi"))
>>
>>
>>
>
> This simple "-k" fix looks fine to me, presuming that there's
> nothing else obvious in the lkcd kerntypes file that distinguishes
> it -- i.e., like the unique ELF e_version that dwarfextract uses.
> (EV_DWARFEXTRACT 101010101)
>
> So unless anybody objects, or has a better idea, I'll put this -k
> option in the next release.
>
> Thanks,
> Dave
>
>
>
>> I attached the kerntypes file, which works for s390:
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> /*
>> * kerntypes.c
>> *
>> * Dummy module that includes headers for all kernel types of interest.
>> * The kernel type information is used by the lcrash utility when
>> * analyzing system crash dumps or the live system. Using the type
>> * information for the running system, rather than kernel header files,
>> * makes for a more flexible and robust analysis tool.
>> *
>> * This source code is released under the GNU GPL.
>> */
>>
>> /* generate version for this file */
>> typedef char *COMPILE_VERSION;
>>
>> /* General linux types */
>>
>> #include <linux/autoconf.h>
>> #include <linux/compile.h>
>> #include <linux/utsname.h>
>> #include <linux/module.h>
>> #include <linux/sched.h>
>> #include <linux/mm.h>
>> #include <linux/slab_def.h>
>> #include <linux/slab.h>
>> #include <linux/bio.h>
>> #include <linux/bitmap.h>
>> #include <linux/bitops.h>
>> #include <linux/bitrev.h>
>> #include <linux/blkdev.h>
>> #include <linux/blkpg.h>
>> #include <linux/bootmem.h>
>> #include <linux/buffer_head.h>
>> #include <linux/cache.h>
>> #include <linux/cdev.h>
>> #include <linux/cpu.h>
>> #include <linux/cpumask.h>
>> #include <linux/cpuset.h>
>> #include <linux/dcache.h>
>> #include <linux/debugfs.h>
>> #include <linux/elevator.h>
>> #include <linux/fd.h>
>> #include <linux/file.h>
>> #include <linux/fs.h>
>> #include <linux/futex.h>
>> #include <linux/genhd.h>
>> #include <linux/highmem.h>
>> #include <linux/if.h>
>> #include <linux/if_addr.h>
>> #include <linux/if_arp.h>
>> #include <linux/if_bonding.h>
>> #include <linux/if_ether.h>
>> #include <linux/if_tr.h>
>> #include <linux/if_tun.h>
>> #include <linux/if_vlan.h>
>> #include <linux/in.h>
>> #include <linux/in6.h>
>> #include <linux/in_route.h>
>> #include <linux/inet.h>
>> #include <linux/inet_diag.h>
>> #include <linux/inetdevice.h>
>> #include <linux/init.h>
>> #include <linux/initrd.h>
>> #include <linux/inotify.h>
>> #include <linux/interrupt.h>
>> #include <linux/ioctl.h>
>> #include <linux/ip.h>
>> #include <linux/ipsec.h>
>> #include <linux/ipv6.h>
>> #include <linux/ipv6_route.h>
>> #include <linux/irq.h>
>> #include <linux/irqflags.h>
>> #include <linux/irqreturn.h>
>> #include <linux/jbd.h>
>> #include <linux/jbd2.h>
>> #include <linux/jffs2.h>
>> #include <linux/jhash.h>
>> #include <linux/jiffies.h>
>> #include <linux/kallsyms.h>
>> #include <linux/kernel.h>
>> #include <linux/kernel_stat.h>
>> #include <linux/kexec.h>
>> #include <linux/kobject.h>
>> #include <linux/kthread.h>
>> #include <linux/ktime.h>
>> #include <linux/list.h>
>> #include <linux/memory.h>
>> #include <linux/miscdevice.h>
>> #include <linux/mm.h>
>> #include <linux/mm_inline.h>
>> #include <linux/mm_types.h>
>> #include <linux/mman.h>
>> #include <linux/mmtimer.h>
>> #include <linux/mmzone.h>
>> #include <linux/mnt_namespace.h>
>> #include <linux/module.h>
>> #include <linux/moduleloader.h>
>> #include <linux/moduleparam.h>
>> #include <linux/mount.h>
>> #include <linux/mpage.h>
>> #include <linux/mqueue.h>
>> #include <linux/mtio.h>
>> #include <linux/mutex.h>
>> #include <linux/namei.h>
>> #include <linux/neighbour.h>
>> #include <linux/net.h>
>> #include <linux/netdevice.h>
>> #include <linux/netfilter.h>
>> #include <linux/netfilter_arp.h>
>> #include <linux/netfilter_bridge.h>
>> #include <linux/netfilter_decnet.h>
>> #include <linux/netfilter_ipv4.h>
>> #include <linux/netfilter_ipv6.h>
>> #include <linux/netlink.h>
>> #include <linux/netpoll.h>
>> #include <linux/pagemap.h>
>> #include <linux/param.h>
>> #include <linux/percpu.h>
>> #include <linux/percpu_counter.h>
>> #include <linux/pfn.h>
>> #include <linux/pid.h>
>> #include <linux/pid_namespace.h>
>> #include <linux/poll.h>
>> #include <linux/posix-timers.h>
>> #include <linux/posix_acl.h>
>> #include <linux/posix_acl_xattr.h>
>> #include <linux/posix_types.h>
>> #include <linux/preempt.h>
>> #include <linux/prio_tree.h>
>> #include <linux/proc_fs.h>
>> #include <linux/profile.h>
>> #include <linux/ptrace.h>
>> #include <linux/radix-tree.h>
>> #include <linux/ramfs.h>
>> #include <linux/raw.h>
>> #include <linux/rbtree.h>
>> #include <linux/rcupdate.h>
>> #include <linux/reboot.h>
>> #include <linux/relay.h>
>> #include <linux/resource.h>
>> #include <linux/romfs_fs.h>
>> #include <linux/root_dev.h>
>> #include <linux/route.h>
>> #include <linux/rwsem.h>
>> #include <linux/sched.h>
>> #include <linux/sem.h>
>> #include <linux/seq_file.h>
>> #include <linux/seqlock.h>
>> #include <linux/shm.h>
>> #include <linux/shmem_fs.h>
>> #include <linux/signal.h>
>> #include <linux/signalfd.h>
>> #include <linux/skbuff.h>
>> #include <linux/smp.h>
>> #include <linux/smp_lock.h>
>> #include <linux/socket.h>
>> #include <linux/sockios.h>
>> #include <linux/spinlock.h>
>> #include <linux/stat.h>
>> #include <linux/statfs.h>
>> #include <linux/stddef.h>
>> #include <linux/swap.h>
>> #include <linux/swapops.h>
>> #include <linux/sys.h>
>> #include <linux/syscalls.h>
>> #include <linux/sysctl.h>
>> #include <linux/sysdev.h>
>> #include <linux/sysfs.h>
>> #include <linux/sysrq.h>
>> #include <linux/tc.h>
>> #include <linux/tcp.h>
>> #include <linux/thread_info.h>
>> #include <linux/threads.h>
>> #include <linux/tick.h>
>> #include <linux/time.h>
>> #include <linux/timer.h>
>> #include <linux/timerfd.h>
>> #include <linux/times.h>
>> #include <linux/timex.h>
>> #include <linux/topology.h>
>> #include <linux/transport_class.h>
>> #include <linux/tty.h>
>> #include <linux/tty_driver.h>
>> #include <linux/tty_flip.h>
>> #include <linux/tty_ldisc.h>
>> #include <linux/types.h>
>> #include <linux/uaccess.h>
>> #include <linux/unistd.h>
>> #include <linux/utime.h>
>> #include <linux/uts.h>
>> #include <linux/utsname.h>
>> #include <linux/utsrelease.h>
>> #include <linux/version.h>
>> #include <linux/vfs.h>
>> #include <linux/vmalloc.h>
>> #include <linux/vmstat.h>
>> #include <linux/wait.h>
>> #include <linux/watchdog.h>
>> #include <linux/workqueue.h>
>> #include <linux/zconf.h>
>> #include <linux/zlib.h>
>>
>> /*
>> * s390 specific includes
>> */
>>
>> #include <asm/lowcore.h>
>> #include <asm/debug.h>
>> #include <asm/ccwdev.h>
>> #include <asm/ccwgroup.h>
>> #include <asm/qdio.h>
>> #include <asm/zcrypt.h>
>> #include <asm/etr.h>
>> #include <asm/ipl.h>
>> #include <asm/setup.h>
>>
>> /* channel subsystem driver */
>> #include "drivers/s390/cio/cio.h"
>> #include "drivers/s390/cio/chsc.h"
>> #include "drivers/s390/cio/css.h"
>> #include "drivers/s390/cio/device.h"
>> #include "drivers/s390/cio/qdio.h"
>>
>> /* dasd device driver */
>> #include "drivers/s390/block/dasd_int.h"
>> #include "drivers/s390/block/dasd_diag.h"
>> #include "drivers/s390/block/dasd_eckd.h"
>> #include "drivers/s390/block/dasd_fba.h"
>>
>> /* networking drivers */
>> #include "drivers/s390/net/fsm.h"
>> #include "include/net/iucv/iucv.h"
>> #include "drivers/s390/net/lcs.h"
>> #include "drivers/s390/net/qeth.h"
>>
>> /* zfcp device driver */
>> #include "drivers/s390/scsi/zfcp_def.h"
>> #include "drivers/s390/scsi/zfcp_fsf.h"
>>
>> /* crypto device driver */
>> #include "drivers/s390/crypto/ap_bus.h"
>> #include "drivers/s390/crypto/zcrypt_api.h"
>> #include "drivers/s390/crypto/zcrypt_cca_key.h"
>> #include "drivers/s390/crypto/zcrypt_pcica.h"
>> #include "drivers/s390/crypto/zcrypt_pcicc.h"
>> #include "drivers/s390/crypto/zcrypt_pcixcc.h"
>> #include "drivers/s390/crypto/zcrypt_cex2a.h"
>>
>> /* sclp device driver */
>> #include "drivers/s390/char/sclp.h"
>> #include "drivers/s390/char/sclp_rw.h"
>> #include "drivers/s390/char/sclp_tty.h"
>>
>> /* vmur device driver */
>> #include "drivers/s390/char/vmur.h"
>>
>> /*
>> * include sched.c for types:
>> * - struct prio_array
>> * - struct runqueue
>> */
>> #include "kernel/sched.c"
>> /*
>> * include slab.c for struct kmem_cache
>> */
>> #include "mm/slab.c"
>>
>>
>> ------------------------------------------------------------------------
>>
>> --
>> Crash-utility mailing list
>> Crash-utility(a)redhat.com
>> https://www.redhat.com/mailman/listinfo/crash-utility
>>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 20 Aug 2007 16:25:01 -0400
> From: Dave Anderson <anderson(a)redhat.com>
> Subject: Re: [Crash-utility] crash and sles 9 GUEST dumps
> To: "Discussion list for crash utility usage, maintenance and
> development" <crash-utility(a)redhat.com>
> Message-ID: <46C9F89D.1000301(a)redhat.com>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
> Daniel Li wrote:
>
>> After finding out how to get crash working with native sles 9 LKCD
>> format dumps -- namely, build and use a debug vmlinux with appropriate
>> flags to feed to crash -- I started looking into using crash on kernel
>> dumps created for sles 9 guest domains.
>>
>> As compared to the LKCD format of native sles 9 dumps, those dumps are
>> created using the new non-standard ELF format with section headers
>> instead of program headers, which is the case with the xenctrl library
>> now. Such formats are working for RHAS4U4 64bit guests, while I had to
>> make minor modification to make it work for RHAS4U4 32bit guests as
>> well. However, when it comes to sles 9 guests, crash seems to be having
>> problems locating the stacks for each thread, with the exception of the
>> CURRENT thread. (see below)
>>
>> It may well be that the stack pointers were not saved properly for sles
>> 9 guests by the Xen library in the dump. I'll take a look into the dump
>> and the xen library code to see if that is the case... Or is this the
>> case of crash not looking in the right places for those stack pointers?
>>
>>
>
> Looking at the data below, this is hard to decipher what's going on.
>
> The "ps" list -- except for the current task at ffffffff803d2800, shows
> seemingly legitimate tasks because the COMM ("task_struct.comm[16]")
> strings look OK. But the state (ST) fields and the PPID values are
> bogus?
>
> > crash> ps
> > PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > > 0 0 0 ffffffff803d2800 RU 0.0 4399648058624
> > 4389578663200 [<80>^L]
> > 0 0 0 ffffffff803d2808 ?? 0.0 0 0 [swapper]
> > 1 0 0 10017e1f2c8 ?? 0.1 640 304 init
> > 2 -1 0 10017e1e9a8 ?? 0.0 0 0 [migration/0]
> > 3 -1 0 10017e1e088 ?? 0.0 0 0 [ksoftirqd/0]
> ...
>
> But the state (ST) field and the PPID values above are bogus.
>
> And that's all confirmed when you ran the "task 10015180208" command,
> which simply has gdb print the task_struct at that address:
>
> > crash> bt 10015180208
> > PID: 3696 TASK: 10015180208 CPU: 0 COMMAND: "klogd"
> > *bt: invalid kernel virtual address: 12 type: "stack contents"*
> > bt: read of stack at 12 failed
> > crash> task 10015180208
> > PID: 3696 TASK: 10015180208 CPU: 0 COMMAND: "klogd"
> > struct task_struct {
> > *state = 1099873050624,*
> > * thread_info = 0x12,*
> > usage = {
> > counter = 320
> > },
> > flags = 0,
> ...
> > comm = "klogd\000roc\000\000\000\000\000\000",
> ...
>
> The "state" and "thread_info" (i.e., the stack page pointer) fields
> make no sense, while the "comm" field, and many of the others (upon
> a quick examination) do seem correct.
>
> It's interesting that all of the task_struct addresses end in "8",
> though. If you were to enter "task_struct 10015180200", do those
> two fields look right, and perhaps due to structure padding (?),
> you'd still see the "klog" string in the correct place?
>
> I'm sure this is something I've never seen before, so I'm afraid I
> can't offer any answers or suggestions...
>
> Dave
>
>
>
>
>> Thanks,
>> Daniel
>>
>> /dumps/new/sles/64bit$ /home/dli/bin/crash vmlinux-2.6.5-7.244-smp
>> vmlinux.dbg DUMP10.1.230.112
>>
>> crash 4.0-4.5
>> Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 Red Hat, Inc.
>> Copyright (C) 2004, 2005, 2006 IBM Corporation
>> Copyright (C) 1999-2006 Hewlett-Packard Co
>> Copyright (C) 2005, 2006 Fujitsu Limited
>> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
>> Copyright (C) 2005 NEC Corporation
>> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
>> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
>> This program is free software, covered by the GNU General Public License,
>> and you are welcome to change it and/or distribute copies of it under
>> certain conditions. Enter "help copying" to see the conditions.
>> This program has absolutely no warranty. Enter "help warranty" for
>> details.
>>
>> GNU gdb 6.1
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you
>> are
>> welcome to change it and/or distribute copies of it under certain
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB. Type "show warranty" for details.
>> This GDB was configured as "x86_64-unknown-linux-gnu"...
>>
>> WARNING: could not find MAGIC_START!
>> please wait... (gathering task table data)
>> crash: invalid kernel virtual address: 13 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: f type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 6 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: c type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 13 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: c type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 18 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: f type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 13 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 13 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 12 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 11 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 15 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: f type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 12 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 6e type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 22 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 13 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: f type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: c type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: c type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 13 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: f type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: f type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 11 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 10 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: c type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 14 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 13 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: 18 type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: f type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: e type: "fill_thread_info"
>>
>> crash: invalid kernel virtual address: f type: "fill_thread_info"
>> please wait... (determining panic task)
>> bt: invalid kernel virtual address: 13 type: "stack contents"
>>
>> bt: read of stack at 13 failed
>>
>>
>> bt: invalid kernel virtual address: f type: "stack contents"
>>
>> bt: read of stack at f failed
>>
>>
>> bt: invalid kernel virtual address: 6 type: "stack contents"
>>
>> bt: read of stack at 6 failed
>>
>>
>> bt: invalid kernel virtual address: c type: "stack contents"
>>
>> bt: read of stack at c failed
>>
>>
>> bt: invalid kernel virtual address: 13 type: "stack contents"
>>
>> bt: read of stack at 13 failed
>>
>>
>> bt: invalid kernel virtual address: c type: "stack contents"
>>
>> bt: read of stack at c failed
>>
>>
>> bt: invalid kernel virtual address: 18 type: "stack contents"
>>
>> bt: read of stack at 18 failed
>>
>>
>> bt: invalid kernel virtual address: f type: "stack contents"
>>
>> bt: read of stack at f failed
>>
>>
>> bt: invalid kernel virtual address: 13 type: "stack contents"
>>
>> bt: read of stack at 13 failed
>>
>>
>> bt: invalid kernel virtual address: 13 type: "stack contents"
>>
>> bt: read of stack at 13 failed
>>
>>
>> bt: invalid kernel virtual address: 12 type: "stack contents"
>>
>> bt: read of stack at 12 failed
>>
>>
>> bt: invalid kernel virtual address: 11 type: "stack contents"
>>
>> bt: read of stack at 11 failed
>>
>>
>> bt: invalid kernel virtual address: 15 type: "stack contents"
>>
>> bt: read of stack at 15 failed
>>
>>
>> bt: invalid kernel virtual address: f type: "stack contents"
>>
>> bt: read of stack at f failed
>>
>>
>> bt: invalid kernel virtual address: 12 type: "stack contents"
>>
>> bt: read of stack at 12 failed
>>
>>
>> bt: invalid kernel virtual address: 6e type: "stack contents"
>>
>> bt: read of stack at 6e failed
>>
>>
>> bt: invalid kernel virtual address: 22 type: "stack contents"
>>
>> bt: read of stack at 22 failed
>>
>>
>> bt: invalid kernel virtual address: 13 type: "stack contents"
>>
>> bt: read of stack at 13 failed
>>
>>
>> bt: invalid kernel virtual address: f type: "stack contents"
>>
>> bt: read of stack at f failed
>>
>>
>> bt: invalid kernel virtual address: c type: "stack contents"
>>
>> bt: read of stack at c failed
>>
>>
>> bt: invalid kernel virtual address: c type: "stack contents"
>>
>> bt: read of stack at c failed
>>
>>
>> bt: invalid kernel virtual address: 13 type: "stack contents"
>>
>> bt: read of stack at 13 failed
>>
>>
>> bt: invalid kernel virtual address: f type: "stack contents"
>>
>> bt: read of stack at f failed
>>
>>
>> bt: invalid kernel virtual address: f type: "stack contents"
>>
>> bt: read of stack at f failed
>>
>>
>> bt: invalid kernel virtual address: 11 type: "stack contents"
>>
>> bt: read of stack at 11 failed
>>
>>
>> bt: invalid kernel virtual address: 10 type: "stack contents"
>>
>> bt: read of stack at 10 failed
>>
>>
>> bt: invalid kernel virtual address: c type: "stack contents"
>>
>> bt: read of stack at c failed
>>
>>
>> bt: invalid kernel virtual address: 14 type: "stack contents"
>>
>> bt: read of stack at 14 failed
>>
>>
>> bt: invalid kernel virtual address: 13 type: "stack contents"
>>
>> bt: read of stack at 13 failed
>>
>>
>> bt: invalid kernel virtual address: 18 type: "stack contents"
>>
>> bt: read of stack at 18 failed
>>
>>
>> bt: invalid kernel virtual address: f type: "stack contents"
>>
>> bt: read of stack at f failed
>>
>>
>> bt: invalid kernel virtual address: e type: "stack contents"
>>
>> bt: read of stack at e failed
>>
>>
>> bt: invalid kernel virtual address: f type: "stack contents"
>>
>> bt: read of stack at f failed
>>
>> KERNEL: vmlinux-2.6.5-7.244-smp
>> DEBUG KERNEL: vmlinux.dbg (2.6.5-7.244-default)
>> DUMPFILE: DUMP10.1.230.112
>> CPUS: 1
>> DATE: Thu Jul 26 14:34:46 2007
>> UPTIME: 213503982284 days, 21:34:00
>> LOAD AVERAGE: 0.01, 0.12, 0.07
>> TASKS: 34
>> NODENAME: linux
>> RELEASE: 2.6.5-7.244-smp
>> VERSION: #1 SMP Mon Dec 12 18:32:25 UTC 2005
>> MACHINE: x86_64 (2793 Mhz)
>> MEMORY: 1015808 GB
>> PANIC: ""
>> PID: 0
>> COMMAND: "
>> "
>> TASK: ffffffff803d2800 (1 of 2) [THREAD_INFO: ffffffff80590000]
>> CPU: 0
>> STATE: TASK_RUNNING (ACTIVE)
>> WARNING: panic task not found
>>
>> crash> bt
>> PID: 0 TASK: ffffffff803d2800 CPU: 0 COMMAND: "<80>^L"
>> #0 [ffffffff80591ef0] schedule at ffffffff801394e4
>> #1 [ffffffff80591f98] default_idle at ffffffff8010f1c0
>> #2 [ffffffff80591fc8] cpu_idle at ffffffff8010f65a
>> crash> ps
>> PID PPID CPU TASK ST %MEM VSZ RSS COMM
>> > 0 0 0 ffffffff803d2800 RU 0.0 4399648058624
>> 4389578663200 [<80>^L]
>> 0 0 0 ffffffff803d2808 ?? 0.0 0 0 [swapper]
>> 1 0 0 10017e1f2c8 ?? 0.1 640 304 init
>> 2 -1 0 10017e1e9a8 ?? 0.0 0 0 [migration/0]
>> 3 -1 0 10017e1e088 ?? 0.0 0 0 [ksoftirqd/0]
>> 4 -1 0 10001b712d8 ?? 0.0 0 0 [events/0]
>> 5 -1 0 10001b709b8 ?? 0.0 0 0 [khelper]
>> 6 -1 0 10001b70098 ?? 0.0 0 0 [kacpid]
>> 25 -1 0 10017dd72e8 ?? 0.0 0 0 [kblockd/0]
>> 47 -1 0 10017dd69c8 ?? 0.0 0 0 [pdflush]
>> 48 -1 0 10017dd60a8 ?? 0.0 0 0 [pdflush]
>> 49 -1 0 100178272f8 ?? 0.0 0 0 [kswapd0]
>> 50 -1 0 100178269d8 ?? 0.0 0 0 [aio/0]
>> 1295 -1 0 100178260b8 ?? 0.0 0 0 [kseriod]
>> 2077 -1 0 10017897308 ?? 0.0 0 0 [reiserfs/0]
>> 2744 -1 0 10014de9488 ?? 0.0 0 0 [khubd]
>> 3077 -1 0 10015aa13c8 ?? 0.2 2560 608 hwscand
>> 3693 -1 0 100164e1348 ?? 0.2 3568 816 syslogd
>> 3696 -1 0 10015180208 ?? 0.3 2744 1112 klogd
>> 3721 -1 0 10015b0eab8 ?? 0.2 3536 628 resmgrd
>> 3722 -1 0 10015e6e1c8 ?? 0.2 4564 640 portmap
>> 3803 -1 0 10015d49368 ?? 0.6 20036 2340 master
>> 3814 -1 0 10015daea58 ?? 0.6 20100 2312 pickup
>> 3815 -1 0 10016c5a0d8 ?? 0.6 20144 2364 qmgr
>> 3861 -1 0 10016ca2a08 ?? 0.7 26800 2932 sshd
>> 4022 -1 0 10014c42b48 ?? 0.2 6804 924 cron
>> 4057 -1 0 100178960c8 ?? 0.2 2484 612 agetty
>> 4058 -1 0 10016c5b318 ?? 0.5 21864 1772 login
>> 4059 -1 0 10016ca3328 ?? 0.2 7012 936 mingetty
>> 4060 -1 0 10015fb5398 ?? 0.2 7012 936 mingetty
>> 4061 -1 0 10014cc6238 ?? 0.2 7012 936 mingetty
>> 4062 -1 0 10015b0f3d8 ?? 0.2 7012 936 mingetty
>> 4063 -1 0 100151e7458 ?? 0.2 7012 936 mingetty
>> 4152 -1 0 10016a180f8 ?? 0.8 12716 2992 bash
>> crash> bt 10015180208
>> PID: 3696 TASK: 10015180208 CPU: 0 COMMAND: "klogd"
>> *bt: invalid kernel virtual address: 12 type: "stack contents"*
>> bt: read of stack at 12 failed
>> crash> task 10015180208
>> PID: 3696 TASK: 10015180208 CPU: 0 COMMAND: "klogd"
>> struct task_struct {
>> *state = 1099873050624,*
>> * thread_info = 0x12,*
>> usage = {
>> counter = 320
>> },
>> flags = 0,
>> ptrace = 502511173631,
>> lock_depth = 120,
>> prio = 0,
>> static_prio = 1048832,
>> run_list = {
>> next = 0x200200,
>> prev = 0x0
>> },
>> array = 0x50fe72e6,
>> sleep_avg = 1,
>> interactive_credit = 67616128664,
>> timestamp = 67616128664,
>> last_ran = 0,
>> activated = 0,
>> policy = 18446744073709551615,
>> cpus_allowed = 18446744073709551615,
>> time_slice = 150,
>> first_time_slice = 0,
>> tasks = {
>> next = 0x10015b0eb48,
>> prev = 0x100164e13d8
>> },
>> ptrace_children = {
>> next = 0x100151802a8,
>> prev = 0x100151802a8
>> },
>> ptrace_list = {
>> next = 0x100151802b8,
>> prev = 0x100151802b8
>> },
>> mm = 0x1001546c500,
>> active_mm = 0x1001546c500,
>> binfmt = 0xffffffff803e70c0,
>> exit_state = 0,
>> exit_code = 0,
>> exit_signal = 17,
>> pdeath_signal = 0,
>> personality = 0,
>> did_exec = 0,
>> pid = 3696,
>> tgid = 3696,
>> real_parent = 0x10017e1f2c0,
>> parent = 0x10017e1f2c0,
>> children = {
>> next = 0x10015180320,
>> prev = 0x10015180320
>> },
>> sibling = {
>> next = 0x10015b0ebe0,
>> prev = 0x100164e1470
>> },
>> group_leader = 0x10015180200,
>> pids = {{
>> pid_chain = {
>> next = 0x10015180370,
>> prev = 0x10015180370
>> },
>> pidptr = 0x10015180360,
>> pid = {
>> nr = 3696,
>> count = {
>> counter = 1
>> },
>> task = 0x10015180200,
>> task_list = {
>> next = 0x10015180348,
>> prev = 0x10015180348
>> },
>> hash_chain = {
>> next = 0x10017827470,
>> prev = 0x10016ca2b80
>> }
>> }
>> }, {
>> pid_chain = {
>> next = 0x100151803b8,
>> prev = 0x100151803b8
>> },
>> pidptr = 0x100151803a8,
>> pid = {
>> nr = 3696,
>> count = {
>> counter = 1
>> },
>> task = 0x10015180200,
>> task_list = {
>> next = 0x10015180390,
>> prev = 0x10015180390
>> },
>> hash_chain = {
>> next = 0x100178274b8,
>> prev = 0x10016ca2bc8
>> }
>> }
>> }, {
>> pid_chain = {
>> next = 0x10015180400,
>> prev = 0x10015180400
>> },
>> pidptr = 0x100151803f0,
>> pid = {
>> nr = 3696,
>> count = {
>> counter = 1
>> },
>> task = 0x10015180200,
>> task_list = {
>> next = 0x100151803d8,
>> prev = 0x100151803d8
>> },
>> hash_chain = {
>> next = 0x10001949240,
>> prev = 0x10016ca2c10
>> }
>> }
>> }, {
>> pid_chain = {
>> next = 0x10015180448,
>> prev = 0x10015180448
>> },
>> pidptr = 0x10015180438,
>> pid = {
>> nr = 3696,
>> count = {
>> counter = 1
>> },
>> task = 0x10015180200,
>> task_list = {
>> next = 0x10015180420,
>> prev = 0x10015180420
>> },
>> hash_chain = {
>> next = 0x10001949340,
>> prev = 0x10016ca2c58
>> }
>> }
>> }},
>> wait_chldexit = {
>> lock = {
>> lock = 1
>> },
>> task_list = {
>> next = 0x10015180470,
>> prev = 0x10015180470
>> }
>> },
>> vfork_done = 0x0,
>> set_child_tid = 0x2a95894b90,
>> clear_child_tid = 0x2a95894b90,
>> rt_priority = 0,
>> it_real_value = 0,
>> it_prof_value = 0,
>> it_virt_value = 0,
>> it_real_incr = 0,
>> it_prof_incr = 0,
>> it_virt_incr = 0,
>> real_timer = {
>> entry = {
>> next = 0x100100,
>> prev = 0x200200
>> },
>> expires = 29143,
>> lock = {
>> lock = 1
>> },
>> magic = 1267182958,
>> function = 0xffffffff80141b50 <it_real_fn>,
>> data = 1099865522688,
>> base = 0x0
>> },
>> utime = 0,
>> stime = 4,
>> cutime = 0,
>> cstime = 0,
>> nvcsw = 13,
>> nivcsw = 2,
>> cnvcsw = 0,
>> cnivcsw = 0,
>> start_time = 53888910424,
>> min_flt = 105,
>> maj_flt = 0,
>> cmin_flt = 0,
>> cmaj_flt = 0,
>> uid = 0,
>> euid = 0,
>> suid = 0,
>> fsuid = 0,
>> gid = 0,
>> egid = 0,
>> sgid = 0,
>> fsgid = 0,
>> group_info = 0xffffffff803e2a00,
>> cap_effective = 4294967039,
>> cap_inheritable = 0,
>> cap_permitted = 4294967039,
>> keep_capabilities = 0,
>> user = 0xffffffff803e29a0,
>> rlim = {{
>> rlim_cur = 18446744073709551615,
>> rlim_max = 18446744073709551615
>> }, {
>> rlim_cur = 18446744073709551615,
>> rlim_max = 18446744073709551615
>> }, {
>> rlim_cur = 18446744073709551615,
>> rlim_max = 18446744073709551615
>> }, {
>> rlim_cur = 8388608,
>> rlim_max = 18446744073709551615
>> }, {
>> rlim_cur = 0,
>> rlim_max = 18446744073709551615
>> }, {
>> rlim_cur = 18446744073709551615,
>> rlim_max = 18446744073709551615
>> }, {
>> rlim_cur = 3071,
>> rlim_max = 3071
>> }, {
>> rlim_cur = 1024,
>> rlim_max = 1024
>> }, {
>> rlim_cur = 18446744073709551615,
>> rlim_max = 18446744073709551615
>> }, {
>> rlim_cur = 18446744073709551615,
>> rlim_max = 18446744073709551615
>> }, {
>> rlim_cur = 18446744073709551615,
>> rlim_max = 18446744073709551615
>> }, {
>> rlim_cur = 1024,
>> rlim_max = 1024
>> }, {
>> rlim_cur = 819200,
>> rlim_max = 819200
>> }},
>> used_math = 0,
>> rcvd_sigterm = 0,
>> oomkilladj = 0,
>> comm = "klogd\000roc\000\000\000\000\000\000",
>> link_count = 0,
>> total_link_count = 0,
>> sysvsem = {
>> undo_list = 0x0
>> },
>> thread = {
>> rsp0 = 1099873058120,
>> rsp = 548682070920,
>> userrsp = 182897429248,
>> fs = 0,
>> gs = 0,
>> es = 0,
>> ds = 0,
>> fsindex = 0,
>> gsindex = 0,
>> debugreg0 = 0,
>> debugreg1 = 0,
>> debugreg2 = 0,
>> debugreg3 = 0,
>> debugreg6 = 0,
>> debugreg7 = 0,
>> cr2 = 0,
>> trap_no = 0,
>> error_code = 0,
>> i387 = {
>> fxsave = {
>> cwd = 0,
>> swd = 0,
>> twd = 0,
>> fop = 0,
>> rip = 0,
>> rdp = 281470681751424,
>> mxcsr = 0,
>> mxcsr_mask = 0,
>> st_space = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0, 0, 0, 0, 0},
>> xmm_space = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
>> , 0, 0, 0},
>> padding = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0, 0, 0, 0, 0, 0}
>> }
>> },
>> ioperm = 0,
>> io_bitmap_ptr = 0x0,
>> tls_array = {0, 0, 0}
>> },
>> fs = 0x10014c7a180,
>> files = 0x10001a114c0,
>> namespace = 0x100154fb900,
>> signal = 0x10015184600,
>> sighand = 0x0,
>> blocked = {
>> sig = {0}
>> },
>> real_blocked = {
>> sig = {1099865524632}
>> },
>> pending = {
>> list = {
>> next = 0x10015180998,
>> prev = 0x0
>> },
>> signal = {
>> sig = {0}
>> }
>> },
>> sas_ss_sp = 0,
>> sas_ss_size = 0,
>> notifier = 0,
>> notifier_data = 0x0,
>> notifier_mask = 0x0,
>> security = 0x600000005,
>> parent_exec_id = 1,
>> self_exec_id = 1,
>> alloc_lock = {
>> lock = 1
>> },
>> proc_lock = {
>> lock = 0
>> },
>> switch_lock = {
>> lock = 0
>> },
>> journal_info = 0x0,
>> reclaim_state = 0x10015469180,
>> proc_dentry = 0x0,
>> backing_dev_info = 0x10015b40940,
>> io_context = 0x0,
>> ptrace_message = 0,
>> last_siginfo = 0x0,
>> io_wait = 0xac9,
>> rchar = 2292,
>> wchar = 3,
>> syscr = 32,
>> syscw = 475,
>> acct_rss_mem1 = 2743,
>> acct_vm_mem1 = 4,
>> acct_stimexpd = 4294967297,
>> ckrm_tsklock = {
>> lock = 0
>> },
>> ckrm_celock = {
>> lock = 0
>> },
>> ce_data = 0xffffffff804f3f20,
>> taskclass = 0x100164e1bc8,
>> taskclass_link = {
>> next = 0x10015b0f338,
>> prev = 0xffffffff80537940
>> },
>> cpu_class = 0x0,
>> demand_stat = {
>> run = 0,
>> total = 61218488692,
>> last_sleep = 32000000,
>> recalc_interval = 0,
>> cpu_demand = 105133020
>> },
>> delays = {
>> waitcpu_total = 3647587,
>> runcpu_total = 23603870,
>> iowait_total = 0,
>> mem_iowait_total = 4294967311,
>> runs = 0,
>> num_iowaits = 0,
>> num_memwaits = 0,
>> splpar_total = 1431654400
>> },
>> map_base = 0,
>> mempolicy = 0x0,
>> il_next = 0,
>> audit = 0x10
>> }
>>
>> crash>
>>
>>
>> --
>> Crash-utility mailing list
>> Crash-utility(a)redhat.com
>> https://www.redhat.com/mailman/listinfo/crash-utility
>>
>
>
>
>
> ------------------------------
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
>
>
> End of Crash-utility Digest, Vol 23, Issue 8
> ********************************************
>
>
17 years, 2 months
Re: Re: [Crash-utility] User Stack back trace of the process
by Rajesh
May be I'm posting to wrong mailing list.kindly guide me...
I have modified the elf_core dump functionality, to take only text, data and stack segments. I'm not intrested in dynamic allocated memroy of the proces.
Below is the modification i have done in "binfmt_elf.c" file.
In maydump() function I'm checking for the VMA mapped to dynamic memory of the proces or not.
-------------------------------------------------------
if ((vma->vm_file == NULL) &&
(!((current->mm->start_stack) < vma->vm_end)))
return 0;
-------------------------------------------------------
It is working fine for single threaded processes, but when i take the core dump of the multi-threaded process, I only get the core dump of the process i kill. And in gdb I'm not able to switch between the threads.
Please let me know whether those modifications are correct or not.
--Regards,
rajesh
On Wed, 05 Sep 2007 Dave Anderson wrote :
>Rajesh wrote:
>>Sorry in my previous e-mail I mistyped.
>>
>>I want to dump only code and stack segments of a process.
>>
>>--Regards,
>>rajesh
>
>stack segments would have: (vma->vm_flags & VM_GROWSDOWN)
>
>
>>
>>
>>On Wed, 05 Sep 2007 Rajesh wrote :
>> >Hi,
>> >
>> >Is there any way to find using kernel data structure, the VMA of a process belongs to stack or heap. It is easy to distinguish the VMA belongs to code segment or not from vm_area_struct structure, using "vm_flags" variable.
>> >
>> >In "elf_core_dump()" function I'm planning to dump only code and data segments.
>> >
>> >Can any body please guide me...
>> >
>> >--Regards,
>> >rajesh
>> >
>> >On Wed, 05 Sep 2007 Dave Anderson wrote :
>> > >Rajesh wrote:
>> > >>Dave,
>> > >>
>> > >>Thanks for your explanation.
>> > >>
>> > >>Well the reason behind my questions is, we have an application running on customer site and the application consumes around 60GB of system memory.
>> > >>When this process receives the segmentation fault or signal abort, the kernel will start to take the process core dump. Here is the problem. Kernel takes at least 1hr (60-minutes) to come out from core dump. During this time the system is unresponsive (hung), and I feel it is because the system is entering into thrashing due to huge memory usage by the process. This long down time is not acceptable by the customer.
>> > >>
>> > >>So I started to find the better way or tackling the problem.
>> > >>
>> > >>1>First thing we thought is changing the system page size from 4KB to 8KB. Since this change could not be done on our x86_64 architecture, since x86_64 architecture doesnt support multi-page size option.
>> > >>
>> > >>2>We wrote a program using libbfd APIs and used with in our application. Whenever the SIGSEGV or SIGABRT is received by the process it will log the stack trace of all the threads within that process. This feature is not so effective or flexible as compared to process core dump.
>> > >>
>> > >>3>Last we thought of using kcore/vmcore to analyze the cause for SIGSEGV or SIGABRT.
>> > >>
>> > >>4>I have one more thought, making the elf_core_dump() function SMP. This function is responsible for dumping the core, and the function is present in /usr/src/linux/fs/binfmt_elf.c
>> > >>
>> > >>
>> > >>Any comments/ideas are welcome.
>> > >>
>> > >>--Regards,
>> > >>rajesh
>> > >
>> > >Maybe tinker with maydump()?
>> > >
>> > >If you know that the core dump contains the VMA's that are
>> > >not necessary to dump, such as large shared memory segments,
>> > >and you can identify them from the VMA, you can prevent
>> > >them from being copied to the core dump. There's this
>> > >patch floating around, which may have been updated:
>> > >
>> > > http://lkml.org/lkml/2007/2/16/149
>> > >
>> > >Dave
>> > >
>> > >
>> > >
>> > >
>> >--
>> >Crash-utility mailing list
>> >Crash-utility(a)redhat.com
>> >https://www.redhat.com/mailman/listinfo/crash-utility
>>
>>
>>
>><http://adworks.rediff.com/cgi-bin/AdWorks/click.cgi/www.rediff.com/signat... target=new >
>>
>
>
17 years, 2 months
crash doesnt give complete backtrace
by Adhiraj Joshi
Hi All,
I get a kernel panic on my system and I see a long backtrace on the console
before the system freezes. I have kdump setup and analyse the generated
vmcore using crash. But the backtrace from crash is too short and it doesnt
give any relevant information. I wanted backtrace that appeared on the
console before freezing.
Any ideas on this?
Regards,
Adhiraj.
17 years, 2 months
[PATCH] fix loop index
by Daisuke Nishimura
Hi.
In x86_xen_kdump_p2m_create(), same valuable(i) is
used as for-loop index, where one for-loop is inside
the another for-loop.
As a result, if debug level is equal to or larger than
7, outer for-loop is repeated only once.
This patch fixes this bug.
Thanks.
Daisuke Nishimura.
diff -uprN crash-4.0-4.6.org/x86.c crash-4.0-4.6/x86.c
--- crash-4.0-4.6.org/x86.c 2007-08-28 00:51:11.000000000 +0900
+++ crash-4.0-4.6/x86.c 2007-09-06 10:13:37.000000000 +0900
@@ -4141,9 +4141,9 @@ x86_xen_kdump_p2m_create(struct xen_kdum
if (CRASHDEBUG(7)) {
up = (ulong *)xkd->page;
- for (i = 0; i < 256; i++) {
+ for (j = 0; j < 256; j++) {
fprintf(fp, "%08lx: %08lx %08lx %08lx %08lx\n",
- (ulong)((i * 4) * sizeof(ulong)),
+ (ulong)((j * 4) * sizeof(ulong)),
*up, *(up+1), *(up+2), *(up+3));
up += 4;
}
17 years, 2 months
Re: Re: Re: [Crash-utility] User Stack back trace of the process
by Rajesh
Sorry in my previous e-mail I mistyped.
I want to dump only code and stack segments of a process.
--Regards,
rajesh
On Wed, 05 Sep 2007 Rajesh wrote :
>Hi,
>
>Is there any way to find using kernel data structure, the VMA of a process belongs to stack or heap. It is easy to distinguish the VMA belongs to code segment or not from vm_area_struct structure, using "vm_flags" variable.
>
>In "elf_core_dump()" function I'm planning to dump only code and data segments.
>
>Can any body please guide me...
>
>--Regards,
>rajesh
>
>On Wed, 05 Sep 2007 Dave Anderson wrote :
> >Rajesh wrote:
> >>Dave,
> >>
> >>Thanks for your explanation.
> >>
> >>Well the reason behind my questions is, we have an application running on customer site and the application consumes around 60GB of system memory.
> >>When this process receives the segmentation fault or signal abort, the kernel will start to take the process core dump. Here is the problem. Kernel takes at least 1hr (60-minutes) to come out from core dump. During this time the system is unresponsive (hung), and I feel it is because the system is entering into thrashing due to huge memory usage by the process. This long down time is not acceptable by the customer.
> >>
> >>So I started to find the better way or tackling the problem.
> >>
> >>1>First thing we thought is changing the system page size from 4KB to 8KB. Since this change could not be done on our x86_64 architecture, since x86_64 architecture doesnt support multi-page size option.
> >>
> >>2>We wrote a program using libbfd APIs and used with in our application. Whenever the SIGSEGV or SIGABRT is received by the process it will log the stack trace of all the threads within that process. This feature is not so effective or flexible as compared to process core dump.
> >>
> >>3>Last we thought of using kcore/vmcore to analyze the cause for SIGSEGV or SIGABRT.
> >>
> >>4>I have one more thought, making the elf_core_dump() function SMP. This function is responsible for dumping the core, and the function is present in /usr/src/linux/fs/binfmt_elf.c
> >>
> >>
> >>Any comments/ideas are welcome.
> >>
> >>--Regards,
> >>rajesh
> >
> >Maybe tinker with maydump()?
> >
> >If you know that the core dump contains the VMA's that are
> >not necessary to dump, such as large shared memory segments,
> >and you can identify them from the VMA, you can prevent
> >them from being copied to the core dump. There's this
> >patch floating around, which may have been updated:
> >
> > http://lkml.org/lkml/2007/2/16/149
> >
> >Dave
> >
> >
> >
> >
>--
>Crash-utility mailing list
>Crash-utility(a)redhat.com
>https://www.redhat.com/mailman/listinfo/crash-utility
17 years, 2 months
Re: Re: [Crash-utility] User Stack back trace of the process
by Rajesh
Hi,
Is there any way to find using kernel data structure, the VMA of a process belongs to stack or heap. It is easy to distinguish the VMA belongs to code segment or not from vm_area_struct structure, using "vm_flags" variable.
In "elf_core_dump()" function I'm planning to dump only code and data segments.
Can any body please guide me...
--Regards,
rajesh
On Wed, 05 Sep 2007 Dave Anderson wrote :
>Rajesh wrote:
>>Dave,
>>
>>Thanks for your explanation.
>>
>>Well the reason behind my questions is, we have an application running on customer site and the application consumes around 60GB of system memory.
>>When this process receives the segmentation fault or signal abort, the kernel will start to take the process core dump. Here is the problem. Kernel takes at least 1hr (60-minutes) to come out from core dump. During this time the system is unresponsive (hung), and I feel it is because the system is entering into thrashing due to huge memory usage by the process. This long down time is not acceptable by the customer.
>>
>>So I started to find the better way or tackling the problem.
>>
>>1>First thing we thought is changing the system page size from 4KB to 8KB. Since this change could not be done on our x86_64 architecture, since x86_64 architecture doesnt support multi-page size option.
>>
>>2>We wrote a program using libbfd APIs and used with in our application. Whenever the SIGSEGV or SIGABRT is received by the process it will log the stack trace of all the threads within that process. This feature is not so effective or flexible as compared to process core dump.
>>
>>3>Last we thought of using kcore/vmcore to analyze the cause for SIGSEGV or SIGABRT.
>>
>>4>I have one more thought, making the elf_core_dump() function SMP. This function is responsible for dumping the core, and the function is present in /usr/src/linux/fs/binfmt_elf.c
>>
>>
>>Any comments/ideas are welcome.
>>
>>--Regards,
>>rajesh
>
>Maybe tinker with maydump()?
>
>If you know that the core dump contains the VMA's that are
>not necessary to dump, such as large shared memory segments,
>and you can identify them from the VMA, you can prevent
>them from being copied to the core dump. There's this
>patch floating around, which may have been updated:
>
> http://lkml.org/lkml/2007/2/16/149
>
>Dave
>
>
>
>
17 years, 2 months
Re: Re: [Crash-utility] User Stack back trace of the process
by Rajesh
Dave,
Thanks for your explanation.
Well the reason behind my questions is, we have an application running on customer site and the application consumes around 60GB of system memory.
When this process receives the segmentation fault or signal abort, the kernel will start to take the process core dump. Here is the problem. Kernel takes at least 1hr (60-minutes) to come out from core dump. During this time the system is unresponsive (hung), and I feel it is because the system is entering into thrashing due to huge memory usage by the process. This long down time is not acceptable by the customer.
So I started to find the better way or tackling the problem.
1>First thing we thought is changing the system page size from 4KB to 8KB. Since this change could not be done on our x86_64 architecture, since x86_64 architecture doesnt support multi-page size option.
2>We wrote a program using libbfd APIs and used with in our application. Whenever the SIGSEGV or SIGABRT is received by the process it will log the stack trace of all the threads within that process. This feature is not so effective or flexible as compared to process core dump.
3>Last we thought of using kcore/vmcore to analyze the cause for SIGSEGV or SIGABRT.
4>I have one more thought, making the elf_core_dump() function SMP. This function is responsible for dumping the core, and the function is present in /usr/src/linux/fs/binfmt_elf.c
Any comments/ideas are welcome.
--Regards,
rajesh
>
>Rajesh,
>
>Castor's patch/suggestion is the best/only option you have
>for this kind of thing. I've not tried it, but since the
>crash utility's "vm -p" option delineates where each
>instantiated page of a given task is located, it's potentially
>possible to recreate an ELF core file of the specified
>task. (Any swapped-out pages won't be in the vmcore...)
>
>The embedded gdb module inside of crash is invoked internally
>as "gdb vmlinux", and has no clue about any other user-space
>program.
>
>That being said, you can execute the gdb "add-symbol-file"
>command to load the debuginfo data from a user space
>program, and then examine user-space data from the context
>of that program.
>
>For example, when you run the crash utility on a live system,
>the default context is that of the "crash" utility itself:
>
> $ ./crash
>
> crash 4.0-4.6
> Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for details.
>
> GNU gdb 6.1
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "i686-pc-linux-gnu"...
>
> KERNEL: /boot/vmlinux-2.4.21-37.ELsmp
> DEBUGINFO: /usr/lib/debug/boot/vmlinux-2.4.21-37.ELsmp.debug
> DUMPFILE: /dev/mem
> CPUS: 2
> DATE: Tue Sep 4 16:36:53 2007
> UPTIME: 15 days, 08:15:06
> LOAD AVERAGE: 0.14, 0.06, 0.01
> TASKS: 87
> NODENAME: crash.boston.redhat.com
> RELEASE: 2.4.21-37.ELsmp
> VERSION: #1 SMP Wed Sep 7 13:28:55 EDT 2005
> MACHINE: i686 (1993 Mhz)
> MEMORY: 511.5 MB
> PID: 9381
> COMMAND: "crash"
> TASK: dd63c000
> CPU: 1
> STATE: TASK_RUNNING (ACTIVE)
> crash>
>
>Verify the current context:
>
> crash> set
> PID: 9381
> COMMAND: "crash"
> TASK: dd63c000
> CPU: 0
> STATE: TASK_RUNNING (ACTIVE)
> crash>
>
>So, for example, the crash utility has a program_context
>data structure that starts like this:
>
> struct program_context {
> char *program_name; /* this program's name */
> char *program_path; /* unadulterated argv[0] */
> char *program_version; /* this program's version */
> char *gdb_version; /* embedded gdb version */
> char *prompt; /* this program's prompt */
> unsigned long long flags; /* flags from above */
> char *namelist; /* linux namelist */
> ...
>
>And it declares a data variable with the same name:
>
> struct program_context program_context = { 0 };
>
>If I wanted to see a gdb-style dump of its contents, I can
>do this:
>
> crash> add-symbol-file ./crash
> add symbol table from file "./crash" at
> Reading symbols from ./crash...done.
> crash>
>
>Now the embedded gdb has the debuginfo data from the crash
>object file (which was compiled with -g), and it knows where
>the program_context structure is located in user space:
>
> crash> p &program_context
> $1 = (struct program_context *) 0x8391ea0
> crash>
>
>Since 0x8391ea0 is not a kernel address, the "p" command cannot
>be used to display the data structure. However, the crash
>utility's "struct" command has a little-used "-u" option, which
>indicates that the address that follows is a user-space address
> from the current context:
>
> crash> struct program_context -u 0x8391ea0
> struct program_context {
> program_name = 0xbffff9b0 "crash",
> program_path = 0xbffff9ae "./crash",
> program_version = 0x82e9c12 "4.0-4.6",
> gdb_version = 0x834ecdf "6.1",
> prompt = 0x8400438 "crash> ",
> flags = 844424965983303,
> namelist = 0x83f5940 "/boot/vmlinux-2.4.21-37.ELsmp",
> ...
>
>That all being said, this capability cannot be used to generate
>any kind of user-space backtrace. You can do raw reads of the
>user-space stack, say from the point at which it entered kernel
>space, but whether that's of any help depends upon what you're
>looking for.
>
>Dave
>
17 years, 2 months