bug in cmdline.c
by Bruce Korb
In exec_input_file():
1375 if (!(pc->flags & SILENT)) {
1376 fprintf(fp, "%s%s", pc->prompt, buf);
This "fp" variable needs to be "stdout".
The prompting and echoing of input commands needs to go there,
not whereever "fp" is currently pointing (crash command output).
$ diff -u *~ cmdline.c
--- cmdline.c~ 2012-02-03 11:22:33.000000000 -0800
+++ cmdline.c 2012-02-15 16:51:07.209524248 -0800
@@ -1372,10 +1372,8 @@
if (!(argcnt = parse_line(pc->command_line, args)))
continue;
- if (!(pc->flags & SILENT)) {
- fprintf(fp, "%s%s", pc->prompt, buf);
- fflush(fp);
- }
+ if (!(pc->flags & SILENT))
+ printf("%s%s", pc->prompt, buf);
exec_command();
}
12 years, 9 months
[PATCH V3] Add -C option for search
by zhangyanfei
Hello Dave,
The new patch is attached.
I simplified the display_with_pre_and_post() function by calling the
currently-existing display_memory() function, and made the output
readable according to your advice.
Thanks
Zhang Yanfei
12 years, 9 months
Re: [Crash-utility] bt: cannot determine starting stack pointer
by Dave Anderson
----- Original Message -----
> On 02/15/12 06:36, Dave Anderson wrote:
>
> I'm not too surprised. In the world of back-end clustered storage systems,
> updating systems is a massive security/stability concern. Consequently,
> new fangled stuff from less than a decade ago get incorporated slowly. :)
>
> Analysis tools, however, can be (and are!!) updated.
>
> > That being said, it's news to me that backtraces cannot be generated
> > for the active tasks from LKCD dumpfiles, unless it's some kind of
> > "live dump" or something? Was there a panic or oops? What's the
> > last thing shown by the "log" command?
>
> Yes, it is a live dump, if that's what you mean by a crash dump.
OK, yes that's what I meant. And that's unfortunate...
> Figuring out why ptlrpc_invalidate_import() is struggling is what I signed up for
> learning how to do. Coercing crash into giving me stack traces for live/onproc
> processes is what I was hoping you would please be kind enough to help me figure out.
> My solution is the script (attached) that requires me to type four commands:
>
> > crash> ! bash live-bt.sh
> > crash> < c-cmd
> > crash> < c-cmd
> > crash> < c-cmd
That's about the best you can do. The task->stack pointer holds a
reference to the last time the task blocked in schedule(), but
the active tasks are either in user-space, or have re-entered the
kernel for another purpose. If you can find something useful in
their stacks, then go for it -- and good luck!
Dave
12 years, 9 months
bt: cannot determine starting stack pointer
by Bruce Korb
Hi,
I need the stack traces of the tasks that are on-proc as well as the
tasks that are not. "bt" fails for the on-proc tasks, even though there
is a backup mechanism for finding the stack: the "stack" field of the
task structure. Even if it is a bit out-of-date, it is better than an
"I dunno" message. Perhaps augment the stack trace with a "this
might be slightly out-of-date because the task was running when
the kernel crashed" message.
Example:
crash> foreach bt
[...]
PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc"
bt: cannot determine starting stack pointer
[...]
crash> ps | egrep '^>'
> 0 0 4 ffff880205f6b0c0 RU 0.0 0 0 [swapper]
> 0 0 5 ffff880205f77870 RU 0.0 0 0 [swapper]
> 0 0 7 ffff880205d557f0 RU 0.0 0 0 [swapper]
> 0 0 10 ffff880205d5c080 RU 0.0 0 0 [swapper]
> 2982 2 11 ffff8801fd3b07f0 RU 0.0 0 0 [ldlm_cb_00]
> 2983 2 8 ffff880205548080 RU 0.0 0 0 [ldlm_cb_01]
> 20250 20245 1 ffff880202deb0c0 RU 0.0 82388 2372 fcntl17
> 20251 20245 2 ffff88020537b7b0 RU 0.0 82388 2396 fcntl17
> 20252 20245 3 ffff8801fd3b4770 RU 0.0 82388 2376 fcntl17
> 20264 20249 0 ffff8801fd444830 RU 0.0 0 0 fcntl17
> 20290 1 6 ffff8803fe86f7b0 RU 0.0 14044 516 xtnhc
> 20311 20305 9 ffff8803ff654140 RU 0.0 14044 516 xtnhc
crash> set ffff8803ff654140
PID: 20311
COMMAND: "xtnhc"
TASK: ffff8803ff654140 [THREAD_INFO: ffff8803fd85a000]
CPU: 9
STATE: TASK_RUNNING (ACTIVE)
crash> p task->stack
p: gdb request failed: p task->stack
crash> task
PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc"
struct task_struct {
state = 0,
stack = 0xffff8803fd85a000,
[...]
crash> bt -S 0xffff8803fd85a000
PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc"
#0 [ffff8803fd85a000] schedule at ffffffff81297bc5
#1 [ffff8803fd85b830] ldlm_resource_get at ffffffffa0269380 [ptlrpc]
#2 [ffff8803fd85b900] ldlm_lock_match at ffffffffa0267359 [ptlrpc]
#3 [ffff8803fd85ba10] mdc_revalidate_lock at ffffffffa0423a8e [mdc]
#4 [ffff8803fd85bac0] mdc_intent_lock at ffffffffa042723f [mdc]
#5 [ffff8803fd85bbc0] __ll_inode_revalidate_it at ffffffffa04a79c2 [lustre]
#6 [ffff8803fd85bcf0] ll_inode_permission at ffffffffa04a8266 [lustre]
#7 [ffff8803fd85bd90] inode_permission at ffffffff810f0a09
#8 [ffff8803fd85bda0] may_open at ffffffff810f14d7
#9 [ffff8803fd85bdd0] do_filp_open at ffffffff810f5294
#10 [ffff8803fd85bf20] do_sys_open at ffffffff810e5850
#11 [ffff8803fd85bf70] sys_open at ffffffff810e596b
#12 [ffff8803fd85bf80] system_call_fastpath at ffffffff81002eab
RIP: 00007ffff78f2f80 RSP: 00007fffffffd818 RFLAGS: 00010202
RAX: 0000000000000002 RBX: ffffffff81002eab RCX: 00000000006130f0
RDX: 00000000000001b6 RSI: 0000000000000000 RDI: 000000000060f960
RBP: 0000000000000008 R8: 0000000000000008 R9: 0000000000000001
R10: 000000000040a261 R11: 0000000000000246 R12: ffffffff810e596b
R13: ffff8803fd85bf78 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
crash>
12 years, 9 months
[PATCH] ARM: fix unwinding on recent kernels
by Rabin Vincent
Unwinding doesn't work on recent ARM kernels since after the following
commit the kernel doesn't perform the prel31_to_addr() conversion of the
offsets in the index table. The leads to crash not finding the correct
unwind instructions.
http://git.kernel.org/linus/de66a979012dbc66b1ec0125795a3f79ee667b8a
The patch below makes crash do the conversion itself if necessary.
Rabin
diff --git a/unwind_arm.c b/unwind_arm.c
index d86ec63..e804cfb 100644
--- a/unwind_arm.c
+++ b/unwind_arm.c
@@ -71,6 +71,8 @@ struct unwind_table {
static struct unwind_table *kernel_unwind_table;
static struct unwind_table *module_unwind_tables;
+static int index_in_prel31;
+
struct unwind_ctrl_block {
ulong vrs[16];
ulong insn;
@@ -104,6 +106,7 @@ static int is_core_kernel_text(ulong);
static struct unwind_table *search_table(ulong);
static struct unwind_idx *search_index(const struct unwind_table *, ulong);
static ulong prel31_to_addr(ulong, ulong);
+static void index_prel31_to_addr(struct unwind_table *);
static int unwind_frame(struct stackframe *, ulong);
/*
@@ -187,6 +190,8 @@ init_kernel_unwind_table(void)
goto fail;
}
+ index_in_prel31 = !is_kernel_text(kernel_unwind_table->idx[0].addr);
+
kernel_unwind_table->start = kernel_unwind_table->idx;
kernel_unwind_table->end = (struct unwind_idx *)
((char *)kernel_unwind_table->idx + idx_size);
@@ -194,6 +199,9 @@ init_kernel_unwind_table(void)
kernel_unwind_table->end_addr = (kernel_unwind_table->end - 1)->addr;
kernel_unwind_table->kv_base = idx_start;
+ if (index_in_prel31)
+ index_prel31_to_addr(kernel_unwind_table);
+
if (CRASHDEBUG(1)) {
fprintf(fp, "UNWIND: master kernel table start\n");
fprintf(fp, "UNWIND: size : %ld\n", idx_size);
@@ -260,6 +268,9 @@ read_module_unwind_table(struct unwind_table *tbl, ulong addr)
tbl->end_addr = TABLE_VALUE(buf, unwind_table_end_addr);
tbl->kv_base = idx_start;
+ if (index_in_prel31)
+ index_prel31_to_addr(tbl);
+
if (CRASHDEBUG(1)) {
fprintf(fp, "UNWIND: module table start\n");
fprintf(fp, "UNWIND: start : %p\n", tbl->start);
@@ -571,6 +582,16 @@ prel31_to_addr(ulong addr, ulong insn)
return addr + offset;
}
+static void
+index_prel31_to_addr(struct unwind_table *tbl)
+{
+ struct unwind_idx *idx = tbl->start;
+ ulong kvaddr = tbl->kv_base;
+
+ for (; idx < tbl->end; idx++, kvaddr += sizeof(struct unwind_idx))
+ idx->addr = prel31_to_addr(kvaddr, idx->addr);
+}
+
static int
unwind_frame(struct stackframe *frame, ulong stacktop)
{
12 years, 9 months
[PATCH] s390dbf: Print only ASCII characters in hex_ascii view
by Michael Holzheu
Hi Dave,
Currently the hex_ascii view displays also non ASCII characters. Example:
$ s390dbf test hex_ascii
00 01328703733:110640 1 - 01 0000000000114288 fb 63 ff fb fc | �c���
To make the output better readable we should only print ASCII characters.
Signed-off-by: Michael Holzheu <holzheu(a)linux.vnet.ibm.com>
---
s390dbf.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/s390dbf.c
+++ b/s390dbf.c
@@ -419,10 +419,10 @@ hex_ascii_format_fn(debug_info_t * id, d
rc += sprintf(out_buf + rc, "| ");
for (i = 0; i < id->buf_size; i++) {
unsigned char c = in_buf[i];
- if (!isprint(c))
- rc += sprintf(out_buf + rc, ".");
- else
+ if (isascii(c) && isprint(c))
rc += sprintf(out_buf + rc, "%c", c);
+ else
+ rc += sprintf(out_buf + rc, ".");
}
rc += sprintf(out_buf + rc, "\n");
out:
12 years, 9 months
crash tool not working
by Adil Mujeeb
Hi List,
I am new to crash / kdump tool and failing some problem as mentioned below.
I am referring the Linux Kernel Crash Book (http://www.dedoimedo.com/)
and URL http://www.dedoimedo.com/computers/crash.html.
I am building modified kernel source (2.6.32 based) and added my
modules (for study purpose). Building, installing and booting kernel
is successful. I have enabled the options for kdump as mentioned in
the book:
Enable Kexec system call:
CONFIG_KEXEC=y
Enable kernel crash dumps:
CONFIG_CRASH_DUMP=y
Optional: Disable Symmetric Multi-Processing (SMP) support
CONFIG_SMP=y
Enable sysfs file system support:
CONFIG_SYSFS=y
Enable /proc/vmcore support:
CONFIG_PROC_VMCORE=y
Configure the kernel with debug info:
CONFIG_DEBUG_INFO=y
Configure the start section for reserved RAM for the crash kernel:
CONFIG_PHYSICAL_START=0x200000 (2MB)
Configure kdump kernel so it can be identified:
CONFIG_LOCALVERSION="-crash"
Kdump configuration /etc/sysconfig/kdump:
KDUMP_KERNELVER=""
KDUMP_COMMANDLINE=""
KDUMP_COMMANDLINE_APPEND="maxcpus=1 "
KEXEC_OPTIONS=""
KDUMP_IMMEDIATE_REBOOT="yes"
KDUMP_TRANSFER=""
KDUMP_SAVEDIR="file:///var/crash"
KDUMP_KEEP_OLD_DUMPS="5"
KDUMP_FREE_DISK_SIZE="64"
KDUMP_VERBOSE="3"
KDUMP_DUMPLEVEL="0"
KDUMP_DUMPFORMAT="compressed"
There is no option KDUMP_DUMPDEV option
There is no option KDUMP_RUNLEVEL
I booted successfully with this kernel and tried to crash it by
module. After rebooting, I found that vmcore is generated under
/var/crash/ But I am not able to analyze it with crash command.
linux:/home/adil # cat /proc/cmdline
root=/dev/disk/by-id/ata-WDC_WD800BD-22LRA1_WD-WMAM9ZS19445-part1
resume=/dev/disk/by-id/ata-WDC_WD800BD-22LRA1_WD-WMAM9ZS19445-part2
splash=silent crashkernel=256M-:128M vga=0x31a
linux:/home/adil #
linux:/home/adil # crash /boot/System.map-2.6.32.12-crash-crash
/boot/vmlinuz-2.6.32.12-crash-crash
crash 5.0.1
Copyright (C) 2002-2010 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
crash: /boot/vmlinuz-2.6.32.12-crash-crash: not a supported file format
Usage:
crash [-h [opt]][-v][-s][-i file][-d num] [-S] [mapfile] [namelist] [dumpfile]
Enter "crash -h" for details.
linux:/home/adil #
The URL http://www.dedoimedo.com/computers/crash.html mentioned that
"The newer versions of Kdump can work with compressed kernel images.
Furthermore, they copy the System map file and the kernel image into
the crash directory, making the use of crash utility somewhat
simpler."
linux:/home/adil # ls -al /var/crash/2012-01-30-18\:08/
total 1335536
drwxr-xr-x 2 root root 4096 2012-01-30 18:13 .
drwxr-xr-x 8 root root 4096 2012-01-31 12:17 ..
-rw-r--r-- 1 root root 187 2012-01-30 18:13 README.txt
-rw-r--r-- 1 root root 1716605 2012-01-30 18:13
System.map-2.6.32.12-0.7-default
-rw------- 1 root root 1360732590 2012-01-30 18:13 vmcore
-rw-r--r-- 1 root root 3774506 2012-01-30 18:13
vmlinux-2.6.32.12-0.7-default.gz
linux:/home/adil # ls -al /var/crash/2012-01-31-12\:17/
total 1343860
drwxr-xr-x 2 root root 4096 2012-01-31 12:24 .
drwxr-xr-x 8 root root 4096 2012-01-31 12:17 ..
-rw-r--r-- 1 root root 187 2012-01-31 12:24 README.txt
-rw------- 1 root root 1374748735 2012-01-31 12:24 vmcore
linux:/home/adil #
linux:/home/adil # ls /boot/
backup_mbr
boot
boot.readme
config-2.6.32.12-0.7-default
config-2.6.32.12-0.7-xen
grub
initrd
initrd-2.6.32.12-0.7-default
initrd-2.6.32.12-0.7-default-kdump
initrd-2.6.32.12-0.7-xen
initrd-2.6.32.12-crash-crash
initrd-2.6.32.12-crash-crash-kdump
initrd-xen
message
symsets-2.6.32.12-0.7-default.tar.gz
symtypes-2.6.32.12-0.7-default.gz
symvers-2.6.32.12-0.7-default.gz
symvers-2.6.32.12-0.7-xen.gz
System.map-2.6.32.12-0.7-default
System.map-2.6.32.12-0.7-xen
System.map-2.6.32.12-crash-crash
vmlinux-2.6.32.12-0.7-xen.gz
vmlinuz
vmlinuz-2.6.32.12-0.7-default
vmlinuz-2.6.32.12-0.7-xen
vmlinuz-2.6.32.12-crash-crash
vmlinuz-xen
vmlinux-2.6.32.12-0.7-default.gz
linux:/home/adil #
Another observation is boot.kdump seems to on but manually start
giving me error:
linux:/home/adil # chkconfig boot.kdump
boot.kdump on
linux:/home/adil #
linux:/home/adil # /etc/init.d/boot.kdump start
Loading kdump
Regenerating kdump initrd ...
Can't find kernel text map area from kcore
Cannot load /boot/vmlinuz-2.6.32.12-crash-crash
failed
linux:/home/adil #
Other Query: Following is not clear mentioned in the book under
"section 11.2 Crash (capture) kernel":
--------------
This means that while your production kernels will most likely be
named vmlinuz, the Kdump crash kernels need to be uncompressed,
hence named vmlinux, or rather vmlinux-kdump.
---------------
Please help how to correctly setup and use crash on my machine.
Thank you,
Adil
12 years, 9 months
RFE: feedback loops
by Bruce Korb
A relatively easy implementation would be to fiddle shell commands:
if (LASTCHAR(p) == '|')
error(FATAL_RESTART, "pipe to nowhere?\n");
to interpret pipes to nowhere to, instead, be redirecting to a mkstemp file
that gets read back in via an internal:
sprintf(pc->command_line, "< %s", tmp_file);
after the command completes, and
unlink(tmp_file);
after *those* commands complete.
What you already have works. It's that this would be slick :).
It would make shell scripting into a crash extension language.
12 years, 9 months
More fixes for kmem on slabs
by Bob Montgomery
More testing revealed a machine in our stable that either failed to
initialize kmem:
please wait... (gathering kmem slab cache data)
crash-6.0.3: page excluded: kernel virtual address: ffff8801263d6000 type: "kmem_cache buffer"
crash-6.0.3: unable to initialize kmem slab cache subsystem
Or succeeded on initialize and then failed on a kmem -s command:
crash-6.0.3> kmem -s
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
Segmentation fault
The problem is that the array struct at the end of kmem_cache remains declared as
32 elements, but for all dynamically allocated copies, is actually trimmed down
to nr_cpu_ids in length.
crash-6.0.3.best> struct kmem_cache
struct kmem_cache {
unsigned int batchcount;
...
struct list_head next;
struct kmem_list3 **nodelists;
struct array_cache *array[32];
}
SIZE: 368
On my normal play machine, nr_cpu_ids = 32 and actual cpus = 16.
On the failing machine, nr_cpus_ids and actual cpus are both 2.
Two problems occur:
1) max_cpudata_limit traverses the array until it finds a 0x0 or
reaches the real size. On the 2-cpu system, the "third" element in the
array belonged elsewhere, was non-zero, and pointed to data that caused
the apparent limit to be 0xffffffffffff8801, which didn't work well as
a length in a memcopy.
2) kmem_cache structs can be allocated near enough to the edge of a page
that the old incorrect length crosses the page boundary, even though the
real smaller structure fits in the page. That caused a readmem of the
structure to cross into a coincidentally missing page in the dump.
This patch fixes both of those (after wrestling ARRAY_LENGTH to the
ground), but *does not* fix the similar page crossing problem when I try
to use a "struct kmem_cache" command on the particular structure at the
end of the page.
Reference this unfortunate comment in include/linux/slab_def.h:
/* 6) per-cpu/per-node data, touched during every alloc/free */
/*
* We put array[] at the end of kmem_cache, because we want to size
* this array to nr_cpu_ids slots instead of NR_CPUS
* (see kmem_cache_init())
* We still use [NR_CPUS] and not [1] or [0] because cache_cache
* is statically defined, so we reserve the max number of cpus.
*/
struct kmem_list3 **nodelists;
struct array_cache *array[NR_CPUS];
/*
* Do not add fields after array[]
*/
};
Bob Montgomery
12 years, 9 months