crash version 4.0-3.21 is available
by Dave Anderson
- Introduced support for upstream xensource ELF format dumpfiles,
which will replace the current xendump format in xen 3.0.5. The
new xen format uses ELF in a non-standard manner such that memory
contents are defined in section headers instead of the traditional
manner of using program headers. Testing has been on paravirtualized
x86, x86 PAE, x86_64 and ia64 dumpfiles. Fully-virtualized dumpfiles
have not been tested. (anderson(a)redhat.com)
- A number of "xencrash" (where the session is run against a xen-syms
binary) fixes have been applied:
1) "bt" did not switch from the ia64 MCA stack to the vcpu stack.
2) "bt" caused an infinite loop if ar_bspstore contained an illegal
value.
3) "bt" shows unnecessary unwind warning message. (ia64)
4) "man log" caused crash to fail with a segmentation violation.
5) "man log" did not have an example.
(oda(a)valinux.co.jp)
- Fix for "vtop" on x86 PAE kernels, which could abort upon reaching
the PTE translation section, showing the error message: "vtop:
cannot determine the swap location". (anderson(a)redhat.com)
- Fix for "vm -p" or "vtop" on 2.6 x86 PAE kernels, which could show
incorrect swap offsets, because the swap type/offset encoding was
moved to the high word of the 64-bit PTE. (anderson(a)redhat.com)
- Fix for "vm -p" on x86_64 kernels when a PTE referenced a swap
location, it would show "(not mapped)" instead of the swap location.
(anderson(a)redhat.com)
- In current 2.6 kernels, it is now possible to recognize ppc BOOKE
processors, which is the current default in crash. If the processor
is confirmed to not be BOOKE, then page table translation is done
differently. (antipov(a)ru.mvista.com)
- Fix for live system analysis of Ubuntu kernels due to a mismatch
between /proc/version and the linux_banner string. This was due
to an appendage to the linux_banner string in Ubuntu kernels.
(asid(a)hp.com)
- Fix for 2.6.21 kernels that fail during initialization with the
message: "crash: invalid (optional) structure member offsets:
zone_struct_free_pages or zone_free_pages". This was due to the
removal of the zone struct's "free_pages" member; instead the
zone's "vm_stat[NR_FREE_PAGES]" value is used. (anderson(a)redhat.com)
Download from: http://people.redhat.com/anderson
17 years, 7 months
Re: [Crash-utility] handling missing kdump pages in diskdump format
by Ken'ichi Ohmichi
Hi Vivek,
2007/03/06 12:02:46 +0530, Vivek Goyal <vgoyal(a)in.ibm.com> wrote:
>On Thu, Mar 01, 2007 at 09:15:05PM +0900, Ken'ichi Ohmichi wrote:
>>
>> (added Vivek Goyal)
>> (added redhat-kexec-kdump-ml)
>>
>> Hi everyone,
>>
>> If Bob's proposal (the end of this mail) is merged into makedumpfile,
>> the analysis behavior of ELF dumpfile is different from kdump-compressed
>> dumpfile's as follows.
>> When reading ELF dumpfile, the crash utility treats the excluded pages
>> as zero-filled pages. But, as to kdump-compressed dumpfile, the crash
>> utility will display the warning message to mean "These pages are
>> excluded by partial dump" when it accesses the excluded pages.
>>
>
>I quickly went through this thread. I don't know much about the diskdump
>format but from a layman's perspective, instead of keeping the zero pages
>and compressing them, why not extend diskdump format and maintain another
>bitmap which signifies the valid zero pages but they are not physically
>part of the core file? I think overall it might reduce dump size.
Thank you for the comment.
I think that it is unnecessary to extend diskdump format.
Instead of having all the compressed zero-pages, it is enough that
page descriptors of all zero-pages point same zero-page.
In this implement, the size of dumpfile increases by only one page
(4K, 16K, etc.). The attached patch is for it.
>Secondly, for ELF format core files, probably we can disable the behavior of
>excluding zero filled pages from dumpfile and then "crash" behavior can
>be consistent. (Return zero filled page only for valid data otherwise crib).
I want to use the feature of excluding zero-pages, because our systems
(x86_64) have many zero-pages immediately after system booting.
Bob is researching for the behavior of crash on ELF format dumpfiles.
I would like to wait for his report.
Thanks
Ken'ichi Ohmichi
17 years, 7 months
how to analyze a 32bit dump with a 64bit crash
by Ming Zhang
Hi All
My laptop has 64bit fc6 and crash utility. One of the development
machine is 32bit RHEL4. whenever i try to open the dump generated by
that box, I got a unknown format error. i can analyze the dump with the
crash from that RHEL4, and also my laptop can analyze the dump from
another 64bit RHEL4.
so my question is how to allow the crash in a 64bit box to open a dump
from a 32bit box?
Thanks!
Ming
17 years, 8 months
[PATCH] PPC BookE/non-BookE support
by Dmitry Antipov
Hello,
this patch introduces a kind of detection of PPC32 CPU type in attempt to
determine the valid kvtop()/uvtop() addresses translation method. It also
assumes that you have a quite recent 2.6 kernel, btw.
Dmitry
diff -ur .orig-crash-4.0-3.20/defs.h crash-4.0-3.20/defs.h
--- .orig-crash-4.0-3.20/defs.h 2007-02-21 23:52:01.000000000 +0300
+++ crash-4.0-3.20/defs.h 2007-02-22 16:16:41.000000000 +0300
@@ -3747,6 +3747,8 @@
#define display_idt_table() \
error(FATAL, "-d option is not applicable to PowerPC architecture\n")
#define KSYMS_START (0x1)
+/* This should match PPC_FEATURE_BOOKE from include/asm-powerpc/cputable.h */
+#define CPU_BOOKE (0x00008000)
#endif
/*
diff -ur .orig-crash-4.0-3.20/ppc.c crash-4.0-3.20/ppc.c
--- .orig-crash-4.0-3.20/ppc.c 2007-02-21 23:52:01.000000000 +0300
+++ crash-4.0-3.20/ppc.c 2007-02-22 16:21:32.000000000 +0300
@@ -51,6 +51,9 @@
void
ppc_init(int when)
{
+ target_uint cpu_features;
+ target_ptr cur_cpu_spec;
+
switch (when)
{
case PRE_SYMTAB:
@@ -140,6 +143,13 @@
if (THIS_KERNEL_VERSION >= LINUX(2,6,0))
machdep->hz = 1000;
}
+ if (symbol_exists("cur_cpu_spec")) {
+ get_symbol_ptr("cur_cpu_spec", &cur_cpu_spec);
+ readmem_uint(cur_cpu_spec + MEMBER_OFFSET("cpu_spec", "cpu_user_features"),
+ KVADDR, &cpu_features, "cpu user features", FAULT_ON_ERROR);
+ if (cpu_features & CPU_BOOKE)
+ machdep->flags |= CPU_BOOKE;
+ }
machdep->section_size_bits = _SECTION_SIZE_BITS;
machdep->max_physmem_bits = _MAX_PHYSMEM_BITS;
break;
@@ -285,7 +295,11 @@
page_middle = (ulong *)pgd_pte;
- page_table = page_middle + (BTOP(vaddr) & (PTRS_PER_PTE - 1));
+ if (machdep->flags & CPU_BOOKE)
+ page_table = page_middle + (BTOP(vaddr) & (PTRS_PER_PTE - 1));
+ else
+ page_table = ((page_middle & machdep->pagemask) + machdep->kvbase) +
+ (BTOP(vaddr) & (PTRS_PER_PTE-1));
if (verbose)
fprintf(fp, " PMD: %lx => %lx\n",(ulong)page_middle,
@@ -369,7 +383,11 @@
page_middle = (ulong *)pgd_pte;
- page_table = page_middle + (BTOP(kvaddr) & (PTRS_PER_PTE-1));
+ if (machdep->flags & CPU_BOOKE)
+ page_table = page_middle + (BTOP(kvaddr) & (PTRS_PER_PTE - 1));
+ else
+ page_table = ((page_middle & machdep->pagemask) + machdep->kvbase) +
+ (BTOP(kvaddr) & (PTRS_PER_PTE-1));
if (verbose)
fprintf(fp, " PMD: %lx => %lx\n", (ulong)page_middle,
17 years, 8 months
xencrash some bug fix
by Itsuro ODA
Hi,
The attached patch fixes the following xencrash bugs:
* "bt" not switch mca stack to vcpu stack (ia64.c)
* "bt" causes infinite loop if ar_bspstore is illegal value
(unwind.c line 1747)
(illegal ar_bspstore value may be something worng, error stop
is OK. the fix is not limited for XEN_HYPER_MODE, I think
the fix is more robust.)
* "bt" shows unnecessary warning message (unwind.c line 1674)
* "man log" causes SIGSEGV (xen_hyper_global_data.c)
* "man log" is no example (xen_hyper_global_data.c)
The patch is for crash-4.0-3.20.
Thanks.
--
Itsuro ODA <oda(a)valinux.co.jp>
--
Index: ia64.c
===================================================================
RCS file: /cvsroot/xen_ia64/people/xencrash/src/crash/ia64.c,v
retrieving revision 1.3
retrieving revision 1.3.2.1
diff -u -r1.3 -r1.3.2.1
--- ia64.c 21 Feb 2007 22:58:33 -0000 1.3
+++ ia64.c 9 Mar 2007 06:40:22 -0000 1.3.2.1
@@ -4009,7 +4009,7 @@
if (symbol_exists("unw_init_frame_info")) {
machdep->flags |= NEW_UNWIND;
if (MEMBER_EXISTS("unw_frame_info", "pt")) {
- if (MEMBER_EXISTS("pt_regs", "ar_csd")) {
+ if (MEMBER_EXISTS("cpu_user_regs", "ar_csd")) {
machdep->flags |= NEW_UNW_V3;
ms->unwind_init = unwind_init_v3;
ms->unwind = unwind_v3;
Index: unwind.c
===================================================================
RCS file: /cvsroot/xen_ia64/people/xencrash/src/crash/unwind.c,v
retrieving revision 1.2
retrieving revision 1.2.2.2
diff -u -r1.2 -r1.2.2.2
--- unwind.c 21 Feb 2007 22:58:33 -0000 1.2
+++ unwind.c 14 Mar 2007 07:33:21 -0000 1.2.2.2
@@ -1674,8 +1674,13 @@
unw_get_sp(info, &sp);
unw_get_bsp(info, &bsp);
- if (ip < GATE_ADDR + PAGE_SIZE)
- break;
+ if (XEN_HYPER_MODE()) {
+ if (!IS_KVADDR(ip))
+ break;
+ } else {
+ if (ip < GATE_ADDR + PAGE_SIZE)
+ break;
+ }
if ((sm = value_search(ip, NULL)))
name = sm->name;
@@ -1747,7 +1752,8 @@
if (unw_switch_from_osinit_v3(info, bt, "INIT") == FALSE)
break;
} else {
- unw_switch_from_osinit_v2(info, bt);
+ if (unw_switch_from_osinit_v2(info, bt) == FALSE)
+ break;
frame++;
goto restart;
}
Index: xen_hyper_global_data.c
===================================================================
RCS file: /cvsroot/xen_ia64/people/xencrash/src/crash/xen_hyper_global_data.c,v
retrieving revision 1.2
retrieving revision 1.2.2.3
diff -u -r1.2 -r1.2.2.3
--- xen_hyper_global_data.c 21 Feb 2007 22:58:33 -0000 1.2
+++ xen_hyper_global_data.c 14 Mar 2007 07:28:21 -0000 1.2.2.3
@@ -169,7 +169,41 @@
char *xen_hyper_help_log[] = {
"log",
"dump system message buffer",
+" ",
" This command dumps the xen conring contents in chronological order." ,
+" ",
+"EXAMPLES",
+" Dump the Xen message buffer:\n",
+" %s> log",
+" __ __ _____ ___ _ _ _",
+" \\ \\/ /___ _ __ |___ / / _ \\ _ _ _ __ ___| |_ __ _| |__ | | ___",
+" \\ // _ \\ '_ \\ |_ \\| | | |__| | | | '_ \\/ __| __/ _` | '_ \\| |/ _ \\",
+" / \\ __/ | | | ___) | |_| |__| |_| | | | \\__ \\ || (_| | |_) | | __/",
+" /_/\\_\\___|_| |_| |____(_)___/ \\__,_|_| |_|___/\\__\\__,_|_.__/|_|\\___|",
+" ",
+" http://www.cl.cam.ac.uk/netos/xen",
+" University of Cambridge Computer Laboratory",
+" ",
+" Xen version 3.0-unstable (damm@) (gcc version 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0,",
+" pie-8.7.9)) Wed Dec 6 17:34:32 JST 2006",
+" Latest ChangeSet: unavailable",
+" ",
+" (XEN) Console output is synchronous.",
+" (XEN) Command line: 12733-i386-pae/xen.gz console=com1 sync_console conswitch=bb com1",
+" =115200,8n1,0x3f8 dom0_mem=480000 crashkernel=64M@32M",
+" (XEN) Physical RAM map:",
+" (XEN) 0000000000000000 - 0000000000098000 (usable)",
+" (XEN) 0000000000098000 - 00000000000a0000 (reserved)",
+" (XEN) 00000000000f0000 - 0000000000100000 (reserved)",
+" (XEN) 0000000000100000 - 000000003f7f0000 (usable)",
+" (XEN) 000000003f7f0000 - 000000003f7f3000 (ACPI NVS)",
+" (XEN) 000000003f7f3000 - 000000003f800000 (ACPI data)",
+" (XEN) 00000000e0000000 - 00000000f0000000 (reserved)",
+" (XEN) 00000000fec00000 - 0000000100000000 (reserved)",
+" (XEN) Kdump: 64MB (65536kB) at 0x2000000",
+" (XEN) System RAM: 1015MB (1039904kB)",
+" (XEN) ACPI: RSDP (v000 XPC ) @ 0x000f9250",
+" ...",
NULL
};
--
17 years, 8 months
linux_banner problem in 2.6.20/Ubuntu
by Alex Sidorenko
Hi Dave,
trying the latest crash-4.0-3.20 on the latest Ubuntu/feisty kernel
2.6.20-9-generic I have found that live access does not work because of the
following mismatch:
/proc/version:
Linux version 2.6.20-9-generic (root@rothera) (gcc version 4.1.2 (Ubuntu
4.1.2-0ubuntu3)) #2 SMP Mon Feb 26 03:01:44 UTC 2007
linux_banner:
Linux version 2.6.20-9-generic (root@rothera) (gcc version 4.1.2 (Ubuntu
4.1.2-0ubuntu3)) #2 SMP Mon Feb 26 03:01:44 UTC 2007 (Ubuntu
2.6.20-9.16-generic)
Looking at 2.6.20 sources (as provided in Ubuntu package) we see:
----------------------------------------------------------------------------------------
const char linux_banner[] =
"Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@"
LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION
#ifdef CONFIG_VERSION_SIGNATURE
" (" CONFIG_VERSION_SIGNATURE ")"
#endif
"\n";
const char linux_proc_banner[] =
"%s version %s"
" (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ")"
" (" LINUX_COMPILER ") %s\n";
----------------------------------------------------------------------------------------
As a result, these two strings will be different if CONFIG_VERSION_SIGNATURE
is defined as non-empty string.
The comparison crash uses is (kernel.c):
if (strlen(kt->proc_version) && !STREQ(buf, kt->proc_version)) {
if (CRASHDEBUG(1)) {
fprintf(fp, "/proc/version:\n%s",
kt->proc_version);
fprintf(fp, "linux_banner:\n%s\n", buf);
}
goto bad_match;
I tried to fix the problem by replacing STREQ with STRNEQ. Unfortunately, this
does not work as both strings are LF-terminated. So I had to use
--- kernel.c.orig 2007-02-21 15:52:01.000000000 -0500
+++ kernel.c 2007-03-11 08:20:38.468024104 -0400
@@ -498,7 +498,8 @@
error(WARNING, "cannot read linux_banner string\n");
if (ACTIVE()) {
- if (strlen(kt->proc_version) && !STREQ(buf, kt->proc_version))
{
+ int cmplen = strlen(kt->proc_version)-1;
+ if (cmplen>0 && strncmp(buf, kt->proc_version, cmplen) != 0) {
if (CRASHDEBUG(1)) {
fprintf(fp, "/proc/version:\n%s",
kt->proc_version);
Please note that this seems to be Ubuntu-specific problem - the generic
2.6.20.2 has linux_banner definition without CONFIG_VERSION_SIGNATURE.
Searching Google for "linux CONFIG_VERSION_SIGNATURE" I can see only results
related to Ubuntu.
Regards,
Alex
17 years, 8 months
64 GiB hugemem vmcore can't be analyzed
by Kurtis D. Rader
The following vmcore was manually induced ([sysrq-d]) from a freshly
booted hugemem kernel on a system with 64 GiB of memory:
-rw-r--r-- 1 krader krader 68719480832 2007-03-09 12:18:44 /vmware/vmcore
Running crash on it
crash -d255 System.map-2.4.21-47.0.1.ELhugemem /vmware/vmcore \
./usr/lib/debug/boot/vmlinux-2.4.21-47.ELhugemem.debug
Fails with
crash: ./usr/lib/debug/boot/vmlinux-2.4.21-47.ELhugemem.debug:
no text and data contents
Crash -d255 output is below. Is there any hope for getting this to
work? We've got a customer whose system is exhibiting both hangs and
panics. Until now the system hasn't been configured for crash dumps. This
dump was obtained under controlled conditions to validate that we can get
a complete dump. The dump process appeared to complete without error.
The size of the vmcore looks reasonable. From the vmcore dmesg buffer:
<6>disk_dump: Maximum block size: 16384
<6>disk_dump: total blocks required: 16515376 (header 3 + bitmap 512 + memory 16514861)
<6>SysRq : Crashing the kernel by request
crash 4.0-3.17
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
crash: diskdump: dump does not have panic dump header
vmcore_data:
flags: 5 (NETDUMP_LOCAL|NETDUMP_ELF32)
ndfd: 4
ofp: 48a3a4c0
header_size: 4096
num_pt_load_segments: 1
pt_load_segment[0]:
file_offset: 1000
phys_start: 0
phys_end: 0
zero_fill: 0
elf_header: 8405700
elf32: 8405700
notes32: 8405734
load32: 8405754
elf64: 0
notes64: 0
load64: 0
nt_prstatus: 8405774
nt_prpsinfo: 8405814
nt_taskstruct: 84058a0
task_struct: 75ff6000
page_size: 4096
switch_stack: 0
xen_kdump_data: (unused)
num_prstatus_notes: 1
nt_prstatus_percpu: 08405774
Elf32_Ehdr:
e_ident: \177ELF
e_ident[EI_CLASS]: 1 (ELFCLASS32)
e_ident[EI_DATA]: 1 (ELFDATA2LSB)
e_ident[EI_VERSION]: 1 (EV_CURRENT)
e_ident[EI_OSABI]: 0 (ELFOSABI_SYSV)
e_ident[EI_ABIVERSION]: 0
e_type: 4 (ET_CORE)
e_machine: 3 (EM_386)
e_version: 1 (EV_CURRENT)
e_entry: 0
e_phoff: 34
e_shoff: 0
e_flags: 0
e_ehsize: 34
e_phentsize: 20
e_phnum: 2
e_shentsize: 0
e_shnum: 0
e_shstrndx: 0
Elf32_Phdr:
p_type: 4 (PT_NOTE)
p_offset: 116 (74)
p_vaddr: 0
p_paddr: 0
p_filesz: 344 (158)
p_memsz: 0 (0)
p_flags: 0 ()
p_align: 0
Elf32_Phdr:
p_type: 1 (PT_LOAD)
p_offset: 4096 (1000)
p_vaddr: c0000000
p_paddr: 0
p_filesz: 0 (0)
p_memsz: 0 (0)
p_flags: 7 (PF_X|PF_W|PF_R)
p_align: 4096
Elf32_Nhdr:
n_namesz: 4 ("CORE")
n_descsz: 144
n_type: 1 (NT_PRSTATUS)
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 023b3ee0 00000001
00000001 00000063 00000006 00000063
75ff7f7c 00000068 00000068 00000000
00000000 ffffffff 021d2010 00000060
00010002 75ff7eac 00000068 00000000
Elf32_Nhdr:
n_namesz: 4 ("CORE")
n_descsz: 124
n_type: 3 (NT_PRPSINFO)
00005200 00000000 00000000 00000000
00000000 00000000 00000000 696c6d76
0078756e 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000
Elf32_Nhdr:
n_namesz: 4 ("CORE")
n_descsz: 8
n_type: 4 (NT_TASKSTRUCT)
75ff6000 00000000
Elf32_Nhdr:
n_namesz: 4 ("CORE")
n_descsz: 4
n_type: 70000001 (NT_DISKDUMP)
00000000
crash: ./usr/lib/debug/boot/vmlinux-2.4.21-47.ELhugemem.debug: no text and data contents
crash: the use of a System.map file requires that the accompanying namelist
argument is a kernel file built with the -g CFLAG. The namelist argument
supplied in this case is a debuginfo file, which must be accompanied by the
kernel file from which it was derived.
--
Kurtis D. Rader, Linux level 3 support email: krader(a)us.ibm.com
IBM Integrated Technology Services DID: +1 503-578-3714
15350 SW Koll Pkwy, MS DES1-01 service: 800-IBM-SERV
Beaverton, OR 97006-6063 http://www.ibm.com
17 years, 8 months
handling missing kdump pages in diskdump format
by Bob Montgomery
I've been experimenting with the makedumpfile utility for kdump on ia64.
One of my experiments was to verify that a page that should have been
missing indeed was missing. I used crash 4.0-3.8 to look for a user
page that should have been omitted from the dump.
crash> x/xg 0xe0000040fc00c000
0xe0000040fc00c000: 0x0000000000000000
On a full dump from makedumpfile as well as on a straight copy of
vmcore, crash reports this:
crash> x/xg 0xe0000040fc00c000
0xe0000040fc00c000: 0x00010102464c457f
The dumpfiles created by makedumpfile appear to crash as diskdump files,
and crash appears to excuse missing pages and report 0x0 contents here:
diskdump.c:read_diskdump, line 454:
if (!page_is_dumpable(pfn)) {
memset(bufptr, 0, cnt);
return cnt;
Shouldn't there be some indication that a requested page is missing as
opposed to being legitimately full of zeros?
Bob Montgomery
17 years, 9 months