February 2012 - Crash-utility - Crash Utility List Archives

by Bruce Korb

In exec_input_file(): 1375 if (!(pc->flags & SILENT)) { 1376 fprintf(fp, "%s%s", pc->prompt, buf); This "fp" variable needs to be "stdout". The prompting and echoing of input commands needs to go there, not whereever "fp" is currently pointing (crash command output). $ diff -u *~ cmdline.c --- cmdline.c~ 2012-02-03 11:22:33.000000000 -0800 +++ cmdline.c 2012-02-15 16:51:07.209524248 -0800 @@ -1372,10 +1372,8 @@ if (!(argcnt = parse_line(pc->command_line, args))) continue; - if (!(pc->flags & SILENT)) { - fprintf(fp, "%s%s", pc->prompt, buf); - fflush(fp); - } + if (!(pc->flags & SILENT)) + printf("%s%s", pc->prompt, buf); exec_command(); }

13 years, 4 months

3
9
0 / 0

[PATCH V3] Add -C option for search

by zhangyanfei

Hello Dave, The new patch is attached. I simplified the display_with_pre_and_post() function by calling the currently-existing display_memory() function, and made the output readable according to your advice. Thanks Zhang Yanfei

13 years, 4 months

2
2
0 / 0

[PATCH] add arm support for libgcore

by Lei Wen

Hi, Current the crash utility has the support for extracting core dump image from original kdump file: http://people.redhat.com/anderson/extensions/gcore_help_gcore.html But it only supports x86 and x86_64 now. I add one supporting patch to port it to ARM and another fix a minor bug in original implementation. Best regards, Lei

13 years, 4 months

4
10
0 / 0

Re: [Crash-utility] bt: cannot determine starting stack pointer

by Dave Anderson

----- Original Message ----- > On 02/15/12 06:36, Dave Anderson wrote: > > I'm not too surprised. In the world of back-end clustered storage systems, > updating systems is a massive security/stability concern. Consequently, > new fangled stuff from less than a decade ago get incorporated slowly. :) > > Analysis tools, however, can be (and are!!) updated. > > > That being said, it's news to me that backtraces cannot be generated > > for the active tasks from LKCD dumpfiles, unless it's some kind of > > "live dump" or something? Was there a panic or oops? What's the > > last thing shown by the "log" command? > > Yes, it is a live dump, if that's what you mean by a crash dump. OK, yes that's what I meant. And that's unfortunate... > Figuring out why ptlrpc_invalidate_import() is struggling is what I signed up for > learning how to do. Coercing crash into giving me stack traces for live/onproc > processes is what I was hoping you would please be kind enough to help me figure out. > My solution is the script (attached) that requires me to type four commands: > > > crash> ! bash live-bt.sh > > crash> < c-cmd > > crash> < c-cmd > > crash> < c-cmd That's about the best you can do. The task->stack pointer holds a reference to the last time the task blocked in schedule(), but the active tasks are either in user-space, or have re-entered the kernel for another purpose. If you can find something useful in their stacks, then go for it -- and good luck! Dave

13 years, 4 months

1
0
0 / 0

bt: cannot determine starting stack pointer

by Bruce Korb

Hi, I need the stack traces of the tasks that are on-proc as well as the tasks that are not. "bt" fails for the on-proc tasks, even though there is a backup mechanism for finding the stack: the "stack" field of the task structure. Even if it is a bit out-of-date, it is better than an "I dunno" message. Perhaps augment the stack trace with a "this might be slightly out-of-date because the task was running when the kernel crashed" message. Example: crash> foreach bt [...] PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc" bt: cannot determine starting stack pointer [...] crash> ps | egrep '^>' > 0 0 4 ffff880205f6b0c0 RU 0.0 0 0 [swapper] > 0 0 5 ffff880205f77870 RU 0.0 0 0 [swapper] > 0 0 7 ffff880205d557f0 RU 0.0 0 0 [swapper] > 0 0 10 ffff880205d5c080 RU 0.0 0 0 [swapper] > 2982 2 11 ffff8801fd3b07f0 RU 0.0 0 0 [ldlm_cb_00] > 2983 2 8 ffff880205548080 RU 0.0 0 0 [ldlm_cb_01] > 20250 20245 1 ffff880202deb0c0 RU 0.0 82388 2372 fcntl17 > 20251 20245 2 ffff88020537b7b0 RU 0.0 82388 2396 fcntl17 > 20252 20245 3 ffff8801fd3b4770 RU 0.0 82388 2376 fcntl17 > 20264 20249 0 ffff8801fd444830 RU 0.0 0 0 fcntl17 > 20290 1 6 ffff8803fe86f7b0 RU 0.0 14044 516 xtnhc > 20311 20305 9 ffff8803ff654140 RU 0.0 14044 516 xtnhc crash> set ffff8803ff654140 PID: 20311 COMMAND: "xtnhc" TASK: ffff8803ff654140 [THREAD_INFO: ffff8803fd85a000] CPU: 9 STATE: TASK_RUNNING (ACTIVE) crash> p task->stack p: gdb request failed: p task->stack crash> task PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc" struct task_struct { state = 0, stack = 0xffff8803fd85a000, [...] crash> bt -S 0xffff8803fd85a000 PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc" #0 [ffff8803fd85a000] schedule at ffffffff81297bc5 #1 [ffff8803fd85b830] ldlm_resource_get at ffffffffa0269380 [ptlrpc] #2 [ffff8803fd85b900] ldlm_lock_match at ffffffffa0267359 [ptlrpc] #3 [ffff8803fd85ba10] mdc_revalidate_lock at ffffffffa0423a8e [mdc] #4 [ffff8803fd85bac0] mdc_intent_lock at ffffffffa042723f [mdc] #5 [ffff8803fd85bbc0] __ll_inode_revalidate_it at ffffffffa04a79c2 [lustre] #6 [ffff8803fd85bcf0] ll_inode_permission at ffffffffa04a8266 [lustre] #7 [ffff8803fd85bd90] inode_permission at ffffffff810f0a09 #8 [ffff8803fd85bda0] may_open at ffffffff810f14d7 #9 [ffff8803fd85bdd0] do_filp_open at ffffffff810f5294 #10 [ffff8803fd85bf20] do_sys_open at ffffffff810e5850 #11 [ffff8803fd85bf70] sys_open at ffffffff810e596b #12 [ffff8803fd85bf80] system_call_fastpath at ffffffff81002eab RIP: 00007ffff78f2f80 RSP: 00007fffffffd818 RFLAGS: 00010202 RAX: 0000000000000002 RBX: ffffffff81002eab RCX: 00000000006130f0 RDX: 00000000000001b6 RSI: 0000000000000000 RDI: 000000000060f960 RBP: 0000000000000008 R8: 0000000000000008 R9: 0000000000000001 R10: 000000000040a261 R11: 0000000000000246 R12: ffffffff810e596b R13: ffff8803fd85bf78 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b crash>

13 years, 4 months

2
8
0 / 0

[PATCH] ARM: fix unwinding on recent kernels

by Rabin Vincent

Unwinding doesn't work on recent ARM kernels since after the following commit the kernel doesn't perform the prel31_to_addr() conversion of the offsets in the index table. The leads to crash not finding the correct unwind instructions. http://git.kernel.org/linus/de66a979012dbc66b1ec0125795a3f79ee667b8a The patch below makes crash do the conversion itself if necessary. Rabin diff --git a/unwind_arm.c b/unwind_arm.c index d86ec63..e804cfb 100644 --- a/unwind_arm.c +++ b/unwind_arm.c @@ -71,6 +71,8 @@ struct unwind_table { static struct unwind_table *kernel_unwind_table; static struct unwind_table *module_unwind_tables; +static int index_in_prel31; + struct unwind_ctrl_block { ulong vrs[16]; ulong insn; @@ -104,6 +106,7 @@ static int is_core_kernel_text(ulong); static struct unwind_table *search_table(ulong); static struct unwind_idx *search_index(const struct unwind_table *, ulong); static ulong prel31_to_addr(ulong, ulong); +static void index_prel31_to_addr(struct unwind_table *); static int unwind_frame(struct stackframe *, ulong); /* @@ -187,6 +190,8 @@ init_kernel_unwind_table(void) goto fail; } + index_in_prel31 = !is_kernel_text(kernel_unwind_table->idx[0].addr); + kernel_unwind_table->start = kernel_unwind_table->idx; kernel_unwind_table->end = (struct unwind_idx *) ((char *)kernel_unwind_table->idx + idx_size); @@ -194,6 +199,9 @@ init_kernel_unwind_table(void) kernel_unwind_table->end_addr = (kernel_unwind_table->end - 1)->addr; kernel_unwind_table->kv_base = idx_start; + if (index_in_prel31) + index_prel31_to_addr(kernel_unwind_table); + if (CRASHDEBUG(1)) { fprintf(fp, "UNWIND: master kernel table start\n"); fprintf(fp, "UNWIND: size : %ld\n", idx_size); @@ -260,6 +268,9 @@ read_module_unwind_table(struct unwind_table *tbl, ulong addr) tbl->end_addr = TABLE_VALUE(buf, unwind_table_end_addr); tbl->kv_base = idx_start; + if (index_in_prel31) + index_prel31_to_addr(tbl); + if (CRASHDEBUG(1)) { fprintf(fp, "UNWIND: module table start\n"); fprintf(fp, "UNWIND: start : %p\n", tbl->start); @@ -571,6 +582,16 @@ prel31_to_addr(ulong addr, ulong insn) return addr + offset; } +static void +index_prel31_to_addr(struct unwind_table *tbl) +{ + struct unwind_idx *idx = tbl->start; + ulong kvaddr = tbl->kv_base; + + for (; idx < tbl->end; idx++, kvaddr += sizeof(struct unwind_idx)) + idx->addr = prel31_to_addr(kvaddr, idx->addr); +} + static int unwind_frame(struct stackframe *frame, ulong stacktop) {

13 years, 5 months

3
4
0 / 0

[PATCH] s390dbf: Print only ASCII characters in hex_ascii view

by Michael Holzheu

Hi Dave, Currently the hex_ascii view displays also non ASCII characters. Example: $ s390dbf test hex_ascii 00 01328703733:110640 1 - 01 0000000000114288 fb 63 ff fb fc | �c�� To make the output better readable we should only print ASCII characters. Signed-off-by: Michael Holzheu <holzheu(a)linux.vnet.ibm.com> --- s390dbf.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/s390dbf.c +++ b/s390dbf.c @@ -419,10 +419,10 @@ hex_ascii_format_fn(debug_info_t * id, d rc += sprintf(out_buf + rc, "| "); for (i = 0; i < id->buf_size; i++) { unsigned char c = in_buf[i]; - if (!isprint(c)) - rc += sprintf(out_buf + rc, "."); - else + if (isascii(c) && isprint(c)) rc += sprintf(out_buf + rc, "%c", c); + else + rc += sprintf(out_buf + rc, "."); } rc += sprintf(out_buf + rc, "\n"); out:

13 years, 5 months

2
1
0 / 0

crash tool not working

by Adil Mujeeb

Hi List, I am new to crash / kdump tool and failing some problem as mentioned below. I am referring the Linux Kernel Crash Book (http://www.dedoimedo.com/) and URL http://www.dedoimedo.com/computers/crash.html. I am building modified kernel source (2.6.32 based) and added my modules (for study purpose). Building, installing and booting kernel is successful. I have enabled the options for kdump as mentioned in the book: Enable Kexec system call: CONFIG_KEXEC=y Enable kernel crash dumps: CONFIG_CRASH_DUMP=y Optional: Disable Symmetric Multi-Processing (SMP) support CONFIG_SMP=y Enable sysfs file system support: CONFIG_SYSFS=y Enable /proc/vmcore support: CONFIG_PROC_VMCORE=y Configure the kernel with debug info: CONFIG_DEBUG_INFO=y Configure the start section for reserved RAM for the crash kernel: CONFIG_PHYSICAL_START=0x200000 (2MB) Configure kdump kernel so it can be identified: CONFIG_LOCALVERSION="-crash" Kdump configuration /etc/sysconfig/kdump: KDUMP_KERNELVER="" KDUMP_COMMANDLINE="" KDUMP_COMMANDLINE_APPEND="maxcpus=1 " KEXEC_OPTIONS="" KDUMP_IMMEDIATE_REBOOT="yes" KDUMP_TRANSFER="" KDUMP_SAVEDIR="file:///var/crash" KDUMP_KEEP_OLD_DUMPS="5" KDUMP_FREE_DISK_SIZE="64" KDUMP_VERBOSE="3" KDUMP_DUMPLEVEL="0" KDUMP_DUMPFORMAT="compressed" There is no option KDUMP_DUMPDEV option There is no option KDUMP_RUNLEVEL I booted successfully with this kernel and tried to crash it by module. After rebooting, I found that vmcore is generated under /var/crash/ But I am not able to analyze it with crash command. linux:/home/adil # cat /proc/cmdline root=/dev/disk/by-id/ata-WDC_WD800BD-22LRA1_WD-WMAM9ZS19445-part1 resume=/dev/disk/by-id/ata-WDC_WD800BD-22LRA1_WD-WMAM9ZS19445-part2 splash=silent crashkernel=256M-:128M vga=0x31a linux:/home/adil # linux:/home/adil # crash /boot/System.map-2.6.32.12-crash-crash /boot/vmlinuz-2.6.32.12-crash-crash crash 5.0.1 Copyright (C) 2002-2010 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. crash: /boot/vmlinuz-2.6.32.12-crash-crash: not a supported file format Usage: crash [-h [opt]][-v][-s][-i file][-d num] [-S] [mapfile] [namelist] [dumpfile] Enter "crash -h" for details. linux:/home/adil # The URL http://www.dedoimedo.com/computers/crash.html mentioned that "The newer versions of Kdump can work with compressed kernel images. Furthermore, they copy the System map file and the kernel image into the crash directory, making the use of crash utility somewhat simpler." linux:/home/adil # ls -al /var/crash/2012-01-30-18\:08/ total 1335536 drwxr-xr-x 2 root root 4096 2012-01-30 18:13 . drwxr-xr-x 8 root root 4096 2012-01-31 12:17 .. -rw-r--r-- 1 root root 187 2012-01-30 18:13 README.txt -rw-r--r-- 1 root root 1716605 2012-01-30 18:13 System.map-2.6.32.12-0.7-default -rw------- 1 root root 1360732590 2012-01-30 18:13 vmcore -rw-r--r-- 1 root root 3774506 2012-01-30 18:13 vmlinux-2.6.32.12-0.7-default.gz linux:/home/adil # ls -al /var/crash/2012-01-31-12\:17/ total 1343860 drwxr-xr-x 2 root root 4096 2012-01-31 12:24 . drwxr-xr-x 8 root root 4096 2012-01-31 12:17 .. -rw-r--r-- 1 root root 187 2012-01-31 12:24 README.txt -rw------- 1 root root 1374748735 2012-01-31 12:24 vmcore linux:/home/adil # linux:/home/adil # ls /boot/ backup_mbr boot boot.readme config-2.6.32.12-0.7-default config-2.6.32.12-0.7-xen grub initrd initrd-2.6.32.12-0.7-default initrd-2.6.32.12-0.7-default-kdump initrd-2.6.32.12-0.7-xen initrd-2.6.32.12-crash-crash initrd-2.6.32.12-crash-crash-kdump initrd-xen message symsets-2.6.32.12-0.7-default.tar.gz symtypes-2.6.32.12-0.7-default.gz symvers-2.6.32.12-0.7-default.gz symvers-2.6.32.12-0.7-xen.gz System.map-2.6.32.12-0.7-default System.map-2.6.32.12-0.7-xen System.map-2.6.32.12-crash-crash vmlinux-2.6.32.12-0.7-xen.gz vmlinuz vmlinuz-2.6.32.12-0.7-default vmlinuz-2.6.32.12-0.7-xen vmlinuz-2.6.32.12-crash-crash vmlinuz-xen vmlinux-2.6.32.12-0.7-default.gz linux:/home/adil # Another observation is boot.kdump seems to on but manually start giving me error: linux:/home/adil # chkconfig boot.kdump boot.kdump on linux:/home/adil # linux:/home/adil # /etc/init.d/boot.kdump start Loading kdump Regenerating kdump initrd ... Can't find kernel text map area from kcore Cannot load /boot/vmlinuz-2.6.32.12-crash-crash failed linux:/home/adil # Other Query: Following is not clear mentioned in the book under "section 11.2 Crash (capture) kernel": -------------- This means that while your production kernels will most likely be named vmlinuz, the Kdump crash kernels need to be uncompressed, hence named vmlinux, or rather vmlinux-kdump. --------------- Please help how to correctly setup and use crash on my machine. Thank you, Adil

13 years, 5 months

3
12
0 / 0

RFE: feedback loops

by Bruce Korb

A relatively easy implementation would be to fiddle shell commands: if (LASTCHAR(p) == '|') error(FATAL_RESTART, "pipe to nowhere?\n"); to interpret pipes to nowhere to, instead, be redirecting to a mkstemp file that gets read back in via an internal: sprintf(pc->command_line, "< %s", tmp_file); after the command completes, and unlink(tmp_file); after *those* commands complete. What you already have works. It's that this would be slick :). It would make shell scripting into a crash extension language.

13 years, 5 months

2
2
0 / 0

More fixes for kmem on slabs

by Bob Montgomery

More testing revealed a machine in our stable that either failed to initialize kmem: please wait... (gathering kmem slab cache data) crash-6.0.3: page excluded: kernel virtual address: ffff8801263d6000 type: "kmem_cache buffer" crash-6.0.3: unable to initialize kmem slab cache subsystem Or succeeded on initialize and then failed on a kmem -s command: crash-6.0.3> kmem -s CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE Segmentation fault The problem is that the array struct at the end of kmem_cache remains declared as 32 elements, but for all dynamically allocated copies, is actually trimmed down to nr_cpu_ids in length. crash-6.0.3.best> struct kmem_cache struct kmem_cache { unsigned int batchcount; ... struct list_head next; struct kmem_list3 **nodelists; struct array_cache *array[32]; } SIZE: 368 On my normal play machine, nr_cpu_ids = 32 and actual cpus = 16. On the failing machine, nr_cpus_ids and actual cpus are both 2. Two problems occur: 1) max_cpudata_limit traverses the array until it finds a 0x0 or reaches the real size. On the 2-cpu system, the "third" element in the array belonged elsewhere, was non-zero, and pointed to data that caused the apparent limit to be 0xffffffffffff8801, which didn't work well as a length in a memcopy. 2) kmem_cache structs can be allocated near enough to the edge of a page that the old incorrect length crosses the page boundary, even though the real smaller structure fits in the page. That caused a readmem of the structure to cross into a coincidentally missing page in the dump. This patch fixes both of those (after wrestling ARRAY_LENGTH to the ground), but *does not* fix the similar page crossing problem when I try to use a "struct kmem_cache" command on the particular structure at the end of the page. Reference this unfortunate comment in include/linux/slab_def.h: /* 6) per-cpu/per-node data, touched during every alloc/free */ /* * We put array[] at the end of kmem_cache, because we want to size * this array to nr_cpu_ids slots instead of NR_CPUS * (see kmem_cache_init()) * We still use [NR_CPUS] and not [1] or [0] because cache_cache * is statically defined, so we reserve the max number of cpus. */ struct kmem_list3 **nodelists; struct array_cache *array[NR_CPUS]; /* * Do not add fields after array[] */ }; Bob Montgomery

13 years, 5 months

2
4
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility February 2012