[ANNOUNCE] crash gcore command, version 1.3.2 is released
by HATAYAMA Daisuke
This is the release of crash gcore command, version 1.3.2.
This release includes a fix for the issue reported by Eric Ewanco and
some bugfixes found on 4.8 kernel.
ChangeLog:
- Fix a Segmentation fault issue caused by NULL pointer dereference
due to a renaming of symbol old_rsp to rsp_scratch at the commit
ac9af4983e77765a642b5a21086bc1fdc55418c4, triggered by the commit
263042e4630a85e856b4a8cd72f28dab33ef4741 that changes a saving
location of user stack pointer in syscall path from
thread_struct::usersp to pt_regs at the bottom of kernel stack.
(Eric.Ewanco(a)genband.com, d.hatayama(a)jp.fujitsu.com)
- Fix a runtime error with an error message "invalid structure member
offset: thread_struct_fs" due to a renaming of fs/gs members of
thread_struct on x86 to fsbase/gsbase. Without this fix, gcore
exits abnormally without producing any core file on this issue.
(d.hatayama(a)jp.fujitsu.com)
- Fix a Segmentation fault issue caused by NULL pointer dereference
due to buffer overrun during a copy of floating pointer register
values onto a buffer allocated on the stack where detected size of
the copied floating register values are too large, larger than
prepared buffer size. This fix makes the copying floating pointer
register values more fail safe to make sure at least that such
detection of wrong data structure size doesn't make gcore process
abnormally terminate.
(d.hatayama(a)jp.fujitsu.com)
MD5 CheckSum:
$ md5sum ./crash-gcore-command-1.3.2.tar.gz
41c33802ed5bf7efe1058982ed973e16 ./crash-gcore-command-1.3.2.tar.gz
--
Thanks.
HATAYAMA, Daisuke
8 years
[PATCH 1/2] mips: fix missing note check
by Rabin Vincent
From: Rabin Vincent <rabinv(a)axis.com>
Add a missing continue after we check if note is NULL. Otherwise we
proceed and dereference the NULL pointer and segfault after printing the
"cannot find NT_PRSTATUS note for cpu" warning.
---
mips.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mips.c b/mips.c
index 5f74b7c..30e6255 100644
--- a/mips.c
+++ b/mips.c
@@ -939,9 +939,11 @@ static int mips_get_elf_notes(void)
else if (KDUMP_DUMPFILE())
note = netdump_get_prstatus_percpu(i);
- if (!note)
+ if (!note) {
error(WARNING,
"cannot find NT_PRSTATUS note for cpu: %d\n", i);
+ continue;
+ }
len = sizeof(Elf32_Nhdr);
len = roundup(len + note->n_namesz, 4);
--
2.1.4
8 years
[PATCH] Fix module init for initcall crash
by Rabin Vincent
From: Rabin Vincent <rabinv(a)axis.com>
If the kernel crashed while running a module's initcall, then the
mod->init_size is not zero and in this case crash fails while gathering
module symbol data with:
crash: store_module_symbols_v2: total: 7 mcnt: 8
This seems to be because store_module_symbols_v2 will add pseudo-symbols
for MODULE_INIT_START and MODULE_INIT_END, while the "total" calculation
in module_init() doesn't account for this.
For reference, a log with -d8:
please wait... (gathering module symbol data)module: c00fc5c0
<readmem: c00fc5c0, KVADDR, "module struct", 384, (ROE|Q), 8701800>
<readmem: 80540000, KVADDR, "pgd page", 16384, (FOE), a0c3ec8>
<read_ramdump: addr: 80540000 paddr: 540000 cnt: 16384>
read_ramdump: addr: 80540000 paddr: 540000 cnt: 16384 offset: 540000
<readmem: 72b0000, PHYSADDR, "page table", 16384, (FOE), a0c7ed0>
<read_ramdump: addr: 72b0000 paddr: 72b0000 cnt: 16384>
read_ramdump: addr: 0 paddr: 72b0000 cnt: 16384 offset: 72b0000
<read_ramdump: addr: c00fc5c0 paddr: 722c5c0 cnt: 384>
read_ramdump: addr: c00fc5c0 paddr: 722c5c0 cnt: 384 offset: 722c5c0
FREEBUF(0)
GETBUF(384 -> 0)
<readmem: c00fc5c0, KVADDR, "module buffer", 384, (FOE), 8701800>
<read_ramdump: addr: c00fc5c0 paddr: 722c5c0 cnt: 384>
read_ramdump: addr: c00fc5c0 paddr: 722c5c0 cnt: 384 offset: 722c5c0
c00fc5c0 (c00fc000): null_blk syms: 0 gplsyms: 0 ksyms: 5
GETBUF(2031 -> 1)
<readmem: c00fc000, KVADDR, "module (kallsyms)", 2031, (ROE|Q), 8704000>
<read_ramdump: addr: c00fc000 paddr: 722c000 cnt: 2031>
read_ramdump: addr: c00fc000 paddr: 722c000 cnt: 2031 offset: 722c000
GETBUF(4140 -> 2)
<readmem: c0104000, KVADDR, "module init (kallsyms)", 4140, (ROE|Q), 870e000>
<read_ramdump: addr: c0104000 paddr: 7154000 cnt: 4140>
read_ramdump: addr: c0104000 paddr: 7154000 cnt: 4140 offset: 7154000
null_set_queue_mode: st_name: 1 st_value: c00fc000 st_shndx: 2 st_info: t
null_set_irqmode: st_name: 21 st_value: c00fc048 st_shndx: 2 st_info: t
null_exit: st_name: 38 st_value: c00fc090 st_shndx: 6 st_info: t
cleanup_module: st_name: 48 st_value: c00fc090 st_shndx: 6 st_info: t
FREEBUF(2)
FREEBUF(1)
FREEBUF(0)
crash: store_module_symbols_v2: total: 7 mcnt: 8
---
kernel.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel.c b/kernel.c
index 9019cf5..bdd0d05 100644
--- a/kernel.c
+++ b/kernel.c
@@ -3475,6 +3475,7 @@ module_init(void)
total += nsyms;
total += 2; /* store the module's start/ending addresses */
+ total += 2; /* and the init start/ending addresses */
/*
* If the module has kallsyms, set up to grab them as well.
--
2.1.4
8 years
gcore: Segmentation fault due to renaming of old_rsp symbol in kernel
by Eric Ewanco
I am trying to use gcore to generate a user application core from a kernel dump file. I compiled the latest crash-7.1.6 and crash-gcore-command-1.3.1 from https://people.redhat.com/anderson/. I installed a debug kernel (vmlinux-4.1.34-33-debug.gz from openSUSE Leap 42.1) and did a controlled (sysrq-trigger) crash. When I attempt to use gcore on the process in question, after reading <https://people.redhat.com/anderson/extensions/gcore_help_gcore.html>, I get a segmentation fault:
eje-code:~ # crash /boot/vmlinux-4.1.34-33-debug.gz /var/crash/2016-10-31-17\:01//vmcore
crash 7.1.6
Copyright (C) 2002-2016 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
KERNEL: /boot/vmlinux-4.1.34-33-debug.gz
DUMPFILE: /var/crash/2016-10-31-17:01//vmcore
CPUS: 4
DATE: Mon Oct 31 13:01:36 2016
UPTIME: 02:12:08
LOAD AVERAGE: 0.00, 0.00, 0.00
TASKS: 204
NODENAME: eje-code
RELEASE: 4.1.34-33-debug
VERSION: #1 SMP Thu Oct 20 08:03:29 UTC 2016 (fe18aba)
MACHINE: x86_64 (2094 Mhz)
MEMORY: 4 GB
PANIC: "sysrq: SysRq : Trigger a crash"
PID: 3260
COMMAND: "crashtest"
TASK: ffff88011a020550 [THREAD_INFO: ffff8800bcd98000]
CPU: 3
STATE: TASK_RUNNING (SYSRQ)
crash> extend /usr/lib64/crash/extensions/gcore.so
/usr/lib64/crash/extensions/gcore.so: shared object loaded
crash> gcore -f 0 -v 7 3260
gcore: Opening file core.3260.crashtest ...
gcore: done.
gcore: Writing ELF header ...
gcore: done.
gcore: Retrieving and writing note information ...
Segmentation fault
Sixty-four bytes of core get written before the segmentation fault (I'm guessing that's the ELF header). I can gcore some other processes (although I get many "gcore: WARNING: page fault at 7ffca6a5d000" errors). I tried this both with an echo from bash from the command line and a custom test program that just does a controlled crash in a function nested four deep. The segmentation fault sometimes causes a hang (which I can end with Ctrl-C).
It does the same thing if I specify the task address (in this case, "gcore ffff88011a020550"). I've tried it without any options, too, and with different combinations.
I obtained a core dump of gcore and this is my debugging session:
eje-code:~ # gdb /usr/lib64/crash/extensions/gcore.so /var/core/core.eje-code-crash-3074
GNU gdb (GDB; openSUSE Leap 42.1) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://bugs.opensuse.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib64/crash/extensions/gcore.so...done.
warning: core file may not match specified executable file. [Not sure why ...]
[New LWP 3074]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `crash /boot/vmlinux-4.1.34-33-debug.gz /var/crash/2016-10-31-17:01//vmcore'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000000000 in ?? ()
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.19-17.4.x86_64 liblzma5-debuginfo-5.0.5-3.5.x86_64 libncurses5-debuginfo-5.9-53.4.x86_64 libz1-debuginfo-1.2.8-6.4.x86_64
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007f1235eed4e4 in restore_regs_syscall_context (target=0x6939df8, regs=0xf6f280, active_regs=0x7ffefa968880)
at libgcore/gcore_x86.c:1656
#2 0x00007f1235eedcb6 in genregs_get (target=0x6939df8, regset=0x7f12360f6460 <x86_64_regsets>, size=216,
buf=0xf6f280) at libgcore/gcore_x86.c:1795
#3 0x00007f1235ee6438 in fill_write_thread_core_info (fp=0x59efb10, tc=0x6939df8, dump_tc=0x6939df8, info=0xf6ee80,
view=0x7f12360f5d80 <x86_64_regset_view>, offset=0x7ffefa968ab0, total=0xf6ee98) at libgcore/gcore_coredump.c:469
#4 0x00007f1235ee682c in fill_write_note_info (fp=0x59efb10, info=0xf6ee80, phnum=20, offset=0x7ffefa968ab0)
at libgcore/gcore_coredump.c:566
#5 0x00007f1235ee4dd1 in gcore_coredump () at libgcore/gcore_coredump.c:112
#6 0x00007f1235eeeb8b in do_gcore (arg=0x0) at gcore.c:317
#7 0x00007f1235eee926 in cmd_gcore () at gcore.c:253
#8 0x0000000000472b8c in ?? ()
#9 0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007f1235eed4e4 in restore_regs_syscall_context (target=0x6939df8, regs=0xf6f280, active_regs=0x7ffefa968880)
at libgcore/gcore_x86.c:1656
#2 0x00007f1235eedcb6 in genregs_get (target=0x6939df8, regset=0x7f12360f6460 <x86_64_regsets>, size=216,
buf=0xf6f280) at libgcore/gcore_x86.c:1795
#3 0x00007f1235ee6438 in fill_write_thread_core_info (fp=0x59efb10, tc=0x6939df8, dump_tc=0x6939df8, info=0xf6ee80,
view=0x7f12360f5d80 <x86_64_regset_view>, offset=0x7ffefa968ab0, total=0xf6ee98) at libgcore/gcore_coredump.c:469
#4 0x00007f1235ee682c in fill_write_note_info (fp=0x59efb10, info=0xf6ee80, phnum=20, offset=0x7ffefa968ab0)
at libgcore/gcore_coredump.c:566
#5 0x00007f1235ee4dd1 in gcore_coredump () at libgcore/gcore_coredump.c:112
#6 0x00007f1235eeeb8b in do_gcore (arg=0x0) at gcore.c:317
#7 0x00007f1235eee926 in cmd_gcore () at gcore.c:253
#8 0x0000000000472b8c in ?? ()
#9 0x0000000000000000 in ?? ()
(gdb) up
#1 0x00007f1235eed4e4 in restore_regs_syscall_context (target=0x6939df8, regs=0xf6f280, active_regs=0x7ffefa968880)
at libgcore/gcore_x86.c:1656
1656 regs->sp = gxt->get_old_rsp(target->processor);
(gdb) print gxt
$1 = (struct gcore_x86_table *) 0x215ea0 <gcore_x86_table>
(gdb) print *target
$2 = {task = 18446612137045525840, thread_info = 18446612135482589184, pid = 3260, comm = "crashtest\000@XI\215u H",
processor = 3, ptask = 18446612137046565648, mm_struct = 18446612137048351232, tc_next = 0x0}
(gdb) print *regs
$3 = {r15 = 0, r14 = 2, r13 = 2, r12 = 34324496, bp = 2, bx = 4196186, r11 = 582, r10 = 140728806957456,
r9 = 140048302249728, r8 = 34324720, ax = 18446744073709551578, cx = 140048297135408, dx = 2, si = 140048302292992,
di = 3, orig_ax = 1, ip = 140048297135408, cs = 51, flags = 582, sp = 140728806957864, ss = 43, fs_base = 0,
gs_base = 0, ds = 0, es = 0, fs = 0, gs = 0}
(gdb) print *gxt
$4 = {get_old_rsp = 0x0, get_thread_struct_fpu = 0x0, get_thread_struct_fpu_size = 0x0, is_special_syscall = 0x0,
is_special_ia32_syscall = 0x0, tsk_used_math = 0x0}
=============================
So not only is get_old_rsp zero, all the fields in gxt are zero.
Looks like a kernel support issue. This field is filled in by gcore_x86_table_register_get_old_rsp() which looks up four symbols in various forms, none of which exist in my kernel:
eje-code:~ # fgrep old_rsp /proc/kallsyms
eje-code:~ # fgrep cpu_pda /proc/kallsyms
eje-code:~ #
old_rsp did exist in openSUSE 12.1 and 13.1 (3.11.10-29 for the latter).
According to http://lists.openwall.net/linux-kernel/2015/03/17/766 old_rsp was renamed rsp_scratch. I don't know if the semantics changed -- it doesn't appear so -- but I added code to accept this symbol as an alternative and the core dump generates and works (I can see a correct backtrace). I do not warrant the work though. :-) Someone may want to review my work, and check the other functions and see if they are supposed to be zero. Since they haven't been invoked I don't know if they are supposed to be non-zero or not.
Here is the diff:
--- gcore_x86.c~ 2014-11-06 04:58:47.000000000 -0500
+++ gcore_x86.c 2016-10-31 16:01:00.989025841 -0400
@@ -1351,6 +1351,26 @@ static ulong gcore_x86_64_get_old_rsp(in
}
/**
+ * gcore_x86_64_get_rsp_scratch() - get rsp at per-cpu area
+ *
+ * @cpu target CPU's CPU id
+ *
+ * Given a CPU id, returns a RSP value saved at per-cpu area for the
+ * CPU whose id is the given CPU id.
+ */
+static ulong gcore_x86_64_get_rsp_scratch(int cpu)
+{
+ ulong old_rsp;
+
+ readmem(symbol_value("rsp_scratch") + kt->__per_cpu_offset[cpu],
+ KVADDR, &old_rsp, sizeof(old_rsp),
+ "gcore_x86_64_get_rsp_scratch: rsp_scratch",
+ gcore_verbose_error_handle());
+
+ return old_rsp;
+}
+
+/**
* gcore_x86_64_get_per_cpu__old_rsp() - get rsp at per-cpu area
*
* @cpu target CPU's CPU id
@@ -1834,6 +1854,11 @@ static void gcore_x86_table_register_get
else if (symbol_exists("_cpu_pda"))
gxt->get_old_rsp = gcore_x86_64_get_cpu__pda_oldrsp;
+
+ else if (symbol_exists("rsp_scratch"))
+ gxt->get_old_rsp = gcore_x86_64_get_rsp_scratch;
+
+ if (!gxt->get_old_rsp) printf ("Warning: NO gxt->get_old_rsp\n");
}
#endif
8 years
kernel module parsing failure - mips
by Sagar Borikar
Hi Dave,
With 7.1.7, crash is working for MIPS when all drivers are embedded
inside kernel.
When I make the driver loadable and panic the kernel, crash doesn't
locate some symbols correctly.
please wait... (gathering module symbol data)
crash: invalid size request: 0 type: "pgd page"
debugged further and find that PGD_ORDER provides incorrect number
due to which the PGD_SIZE macro results in 0.
Just for fun, I replaced PGD_ORDER with 0(I know its incorrect) and it
went ahead but couldn't run "mod" command successfully as it threw
following error
crash> mod
mod: cannot access vmalloc'd module memory
Any idea?
Thanks
Sagar
8 years
[PATCH] Add support for SLAB OBJFREELIST_SLAB.
by Thomas Garnier
In this mode, the freelist can be an object and if the slab is full,
there is no freelist. On the next free, an object is recycled to be used
as the freelist but not clean-up. This change will go through only
known freed objects to prevent errors of wrong/corrupt freelist entries.
Related to the linux kernel commit: b03a017bebc403d40aa53a092e79b3020786537d.
---
memory.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/memory.c b/memory.c
index 4eac413..774d090 100644
--- a/memory.c
+++ b/memory.c
@@ -9880,6 +9880,7 @@ ignore_cache(struct meminfo *si, char *name)
#define SLAB_MAGIC_DESTROYED 0xB2F23C5AUL /* slab has been destroyed */
#define SLAB_CFLGS_BUFCTL 0x020000UL /* bufctls in own cache */
+#define SLAB_CFLGS_OBJFREELIST 0x40000000UL /* Freelist as an object */
#define KMEM_SLAB_ADDR (1)
#define KMEM_BUFCTL_ADDR (2)
@@ -12439,11 +12440,13 @@ gather_slab_free_list_percpu(struct meminfo *si)
static void
gather_slab_free_list_slab_overload_page(struct meminfo *si)
{
- int i, active;
+ int i, active, start_offset;
ulong obj, objnr, cnt, freelist;
unsigned char *ucharptr;
unsigned short *ushortptr;
unsigned int *uintptr;
+ unsigned int cache_flags, overload_active;
+ ulong slab_overload_page;
if (CRASHDEBUG(1))
fprintf(fp, "slab page: %lx active: %ld si->c_num: %ld\n",
@@ -12452,12 +12455,19 @@ gather_slab_free_list_slab_overload_page(struct meminfo *si)
if (si->s_inuse == si->c_num )
return;
- readmem(si->slab - OFFSET(page_lru) + OFFSET(page_freelist),
+ slab_overload_page = si->slab - OFFSET(page_lru);
+ readmem(slab_overload_page + OFFSET(page_freelist),
KVADDR, &freelist, sizeof(void *), "page freelist",
FAULT_ON_ERROR);
readmem(freelist, KVADDR, si->freelist,
si->freelist_index_size * si->c_num,
"freelist array", FAULT_ON_ERROR);
+ readmem(si->cache+OFFSET(kmem_cache_s_flags),
+ KVADDR, &cache_flags, sizeof(uint),
+ "kmem_cache_s flags", FAULT_ON_ERROR);
+ readmem(slab_overload_page + OFFSET(page_active),
+ KVADDR, &overload_active, sizeof(uint),
+ "active", FAULT_ON_ERROR);
BNEG(si->addrlist, sizeof(ulong) * (si->c_num+1));
cnt = objnr = 0;
@@ -12466,14 +12476,22 @@ gather_slab_free_list_slab_overload_page(struct meminfo *si)
uintptr = NULL;
active = si->s_inuse;
+ /*
+ * On an OBJFREELIST slab, the object might have been recycled
+ * and everything before the active count can be random data.
+ */
+ start_offset = 0;
+ if (cache_flags & SLAB_CFLGS_OBJFREELIST)
+ start_offset = overload_active;
+
switch (si->freelist_index_size)
{
- case 1: ucharptr = (unsigned char *)si->freelist; break;
- case 2: ushortptr = (unsigned short *)si->freelist; break;
- case 4: uintptr = (unsigned int *)si->freelist; break;
+ case 1: ucharptr = (unsigned char *)si->freelist + start_offset; break;
+ case 2: ushortptr = (unsigned short *)si->freelist + start_offset; break;
+ case 4: uintptr = (unsigned int *)si->freelist + start_offset; break;
}
- for (i = 0; i < si->c_num; i++) {
+ for (i = start_offset; i < si->c_num; i++) {
switch (si->freelist_index_size)
{
case 1: objnr = (ulong)*ucharptr++; break;
--
2.8.0.rc3.226.g39d4020
8 years