March 2007 - Crash-utility - Crash Utility List Archives

by Dave Anderson

- Introduced support for upstream xensource ELF format dumpfiles, which will replace the current xendump format in xen 3.0.5. The new xen format uses ELF in a non-standard manner such that memory contents are defined in section headers instead of the traditional manner of using program headers. Testing has been on paravirtualized x86, x86 PAE, x86_64 and ia64 dumpfiles. Fully-virtualized dumpfiles have not been tested. (anderson(a)redhat.com) - A number of "xencrash" (where the session is run against a xen-syms binary) fixes have been applied: 1) "bt" did not switch from the ia64 MCA stack to the vcpu stack. 2) "bt" caused an infinite loop if ar_bspstore contained an illegal value. 3) "bt" shows unnecessary unwind warning message. (ia64) 4) "man log" caused crash to fail with a segmentation violation. 5) "man log" did not have an example. (oda(a)valinux.co.jp) - Fix for "vtop" on x86 PAE kernels, which could abort upon reaching the PTE translation section, showing the error message: "vtop: cannot determine the swap location". (anderson(a)redhat.com) - Fix for "vm -p" or "vtop" on 2.6 x86 PAE kernels, which could show incorrect swap offsets, because the swap type/offset encoding was moved to the high word of the 64-bit PTE. (anderson(a)redhat.com) - Fix for "vm -p" on x86_64 kernels when a PTE referenced a swap location, it would show "(not mapped)" instead of the swap location. (anderson(a)redhat.com) - In current 2.6 kernels, it is now possible to recognize ppc BOOKE processors, which is the current default in crash. If the processor is confirmed to not be BOOKE, then page table translation is done differently. (antipov(a)ru.mvista.com) - Fix for live system analysis of Ubuntu kernels due to a mismatch between /proc/version and the linux_banner string. This was due to an appendage to the linux_banner string in Ubuntu kernels. (asid(a)hp.com) - Fix for 2.6.21 kernels that fail during initialization with the message: "crash: invalid (optional) structure member offsets: zone_struct_free_pages or zone_free_pages". This was due to the removal of the zone struct's "free_pages" member; instead the zone's "vm_stat[NR_FREE_PAGES]" value is used. (anderson(a)redhat.com) Download from: http://people.redhat.com/anderson

18 years, 11 months

4
9
0 / 0

Re: [Crash-utility] handling missing kdump pages in diskdump format

by Ken'ichi Ohmichi

Hi Vivek, 2007/03/06 12:02:46 +0530, Vivek Goyal <vgoyal(a)in.ibm.com> wrote: >On Thu, Mar 01, 2007 at 09:15:05PM +0900, Ken'ichi Ohmichi wrote: >> >> (added Vivek Goyal) >> (added redhat-kexec-kdump-ml) >> >> Hi everyone, >> >> If Bob's proposal (the end of this mail) is merged into makedumpfile, >> the analysis behavior of ELF dumpfile is different from kdump-compressed >> dumpfile's as follows. >> When reading ELF dumpfile, the crash utility treats the excluded pages >> as zero-filled pages. But, as to kdump-compressed dumpfile, the crash >> utility will display the warning message to mean "These pages are >> excluded by partial dump" when it accesses the excluded pages. >> > >I quickly went through this thread. I don't know much about the diskdump >format but from a layman's perspective, instead of keeping the zero pages >and compressing them, why not extend diskdump format and maintain another >bitmap which signifies the valid zero pages but they are not physically >part of the core file? I think overall it might reduce dump size. Thank you for the comment. I think that it is unnecessary to extend diskdump format. Instead of having all the compressed zero-pages, it is enough that page descriptors of all zero-pages point same zero-page. In this implement, the size of dumpfile increases by only one page (4K, 16K, etc.). The attached patch is for it. >Secondly, for ELF format core files, probably we can disable the behavior of >excluding zero filled pages from dumpfile and then "crash" behavior can >be consistent. (Return zero filled page only for valid data otherwise crib). I want to use the feature of excluding zero-pages, because our systems (x86_64) have many zero-pages immediately after system booting. Bob is researching for the behavior of crash on ELF format dumpfiles. I would like to wait for his report. Thanks Ken'ichi Ohmichi

18 years, 11 months

4
23
0 / 0

how to analyze a 32bit dump with a 64bit crash

by Ming Zhang

Hi All My laptop has 64bit fc6 and crash utility. One of the development machine is 32bit RHEL4. whenever i try to open the dump generated by that box, I got a unknown format error. i can analyze the dump with the crash from that RHEL4, and also my laptop can analyze the dump from another 64bit RHEL4. so my question is how to allow the crash in a 64bit box to open a dump from a 32bit box? Thanks! Ming

18 years, 11 months

5
9
0 / 0

[PATCH] PPC BookE/non-BookE support

by Dmitry Antipov

Hello, this patch introduces a kind of detection of PPC32 CPU type in attempt to determine the valid kvtop()/uvtop() addresses translation method. It also assumes that you have a quite recent 2.6 kernel, btw. Dmitry diff -ur .orig-crash-4.0-3.20/defs.h crash-4.0-3.20/defs.h --- .orig-crash-4.0-3.20/defs.h 2007-02-21 23:52:01.000000000 +0300 +++ crash-4.0-3.20/defs.h 2007-02-22 16:16:41.000000000 +0300 @@ -3747,6 +3747,8 @@ #define display_idt_table() \ error(FATAL, "-d option is not applicable to PowerPC architecture\n") #define KSYMS_START (0x1) +/* This should match PPC_FEATURE_BOOKE from include/asm-powerpc/cputable.h */ +#define CPU_BOOKE (0x00008000) #endif /* diff -ur .orig-crash-4.0-3.20/ppc.c crash-4.0-3.20/ppc.c --- .orig-crash-4.0-3.20/ppc.c 2007-02-21 23:52:01.000000000 +0300 +++ crash-4.0-3.20/ppc.c 2007-02-22 16:21:32.000000000 +0300 @@ -51,6 +51,9 @@ void ppc_init(int when) { + target_uint cpu_features; + target_ptr cur_cpu_spec; + switch (when) { case PRE_SYMTAB: @@ -140,6 +143,13 @@ if (THIS_KERNEL_VERSION >= LINUX(2,6,0)) machdep->hz = 1000; } + if (symbol_exists("cur_cpu_spec")) { + get_symbol_ptr("cur_cpu_spec", &cur_cpu_spec); + readmem_uint(cur_cpu_spec + MEMBER_OFFSET("cpu_spec", "cpu_user_features"), + KVADDR, &cpu_features, "cpu user features", FAULT_ON_ERROR); + if (cpu_features & CPU_BOOKE) + machdep->flags |= CPU_BOOKE; + } machdep->section_size_bits = _SECTION_SIZE_BITS; machdep->max_physmem_bits = _MAX_PHYSMEM_BITS; break; @@ -285,7 +295,11 @@ page_middle = (ulong *)pgd_pte; - page_table = page_middle + (BTOP(vaddr) & (PTRS_PER_PTE - 1)); + if (machdep->flags & CPU_BOOKE) + page_table = page_middle + (BTOP(vaddr) & (PTRS_PER_PTE - 1)); + else + page_table = ((page_middle & machdep->pagemask) + machdep->kvbase) + + (BTOP(vaddr) & (PTRS_PER_PTE-1)); if (verbose) fprintf(fp, " PMD: %lx => %lx\n",(ulong)page_middle, @@ -369,7 +383,11 @@ page_middle = (ulong *)pgd_pte; - page_table = page_middle + (BTOP(kvaddr) & (PTRS_PER_PTE-1)); + if (machdep->flags & CPU_BOOKE) + page_table = page_middle + (BTOP(kvaddr) & (PTRS_PER_PTE - 1)); + else + page_table = ((page_middle & machdep->pagemask) + machdep->kvbase) + + (BTOP(kvaddr) & (PTRS_PER_PTE-1)); if (verbose) fprintf(fp, " PMD: %lx => %lx\n", (ulong)page_middle,

19 years

4
7
0 / 0

xencrash some bug fix

by Itsuro ODA

Hi, The attached patch fixes the following xencrash bugs: * "bt" not switch mca stack to vcpu stack (ia64.c) * "bt" causes infinite loop if ar_bspstore is illegal value (unwind.c line 1747) (illegal ar_bspstore value may be something worng, error stop is OK. the fix is not limited for XEN_HYPER_MODE, I think the fix is more robust.) * "bt" shows unnecessary warning message (unwind.c line 1674) * "man log" causes SIGSEGV (xen_hyper_global_data.c) * "man log" is no example (xen_hyper_global_data.c) The patch is for crash-4.0-3.20. Thanks. -- Itsuro ODA <oda(a)valinux.co.jp> -- Index: ia64.c =================================================================== RCS file: /cvsroot/xen_ia64/people/xencrash/src/crash/ia64.c,v retrieving revision 1.3 retrieving revision 1.3.2.1 diff -u -r1.3 -r1.3.2.1 --- ia64.c 21 Feb 2007 22:58:33 -0000 1.3 +++ ia64.c 9 Mar 2007 06:40:22 -0000 1.3.2.1 @@ -4009,7 +4009,7 @@ if (symbol_exists("unw_init_frame_info")) { machdep->flags |= NEW_UNWIND; if (MEMBER_EXISTS("unw_frame_info", "pt")) { - if (MEMBER_EXISTS("pt_regs", "ar_csd")) { + if (MEMBER_EXISTS("cpu_user_regs", "ar_csd")) { machdep->flags |= NEW_UNW_V3; ms->unwind_init = unwind_init_v3; ms->unwind = unwind_v3; Index: unwind.c =================================================================== RCS file: /cvsroot/xen_ia64/people/xencrash/src/crash/unwind.c,v retrieving revision 1.2 retrieving revision 1.2.2.2 diff -u -r1.2 -r1.2.2.2 --- unwind.c 21 Feb 2007 22:58:33 -0000 1.2 +++ unwind.c 14 Mar 2007 07:33:21 -0000 1.2.2.2 @@ -1674,8 +1674,13 @@ unw_get_sp(info, &sp); unw_get_bsp(info, &bsp); - if (ip < GATE_ADDR + PAGE_SIZE) - break; + if (XEN_HYPER_MODE()) { + if (!IS_KVADDR(ip)) + break; + } else { + if (ip < GATE_ADDR + PAGE_SIZE) + break; + } if ((sm = value_search(ip, NULL))) name = sm->name; @@ -1747,7 +1752,8 @@ if (unw_switch_from_osinit_v3(info, bt, "INIT") == FALSE) break; } else { - unw_switch_from_osinit_v2(info, bt); + if (unw_switch_from_osinit_v2(info, bt) == FALSE) + break; frame++; goto restart; } Index: xen_hyper_global_data.c =================================================================== RCS file: /cvsroot/xen_ia64/people/xencrash/src/crash/xen_hyper_global_data.c,v retrieving revision 1.2 retrieving revision 1.2.2.3 diff -u -r1.2 -r1.2.2.3 --- xen_hyper_global_data.c 21 Feb 2007 22:58:33 -0000 1.2 +++ xen_hyper_global_data.c 14 Mar 2007 07:28:21 -0000 1.2.2.3 @@ -169,7 +169,41 @@ char *xen_hyper_help_log[] = { "log", "dump system message buffer", +" ", " This command dumps the xen conring contents in chronological order." , +" ", +"EXAMPLES", +" Dump the Xen message buffer:\n", +" %s> log", +" __ __ _____ ___ _ _ _", +" \\ \\/ /___ _ __ |___ / / _ \\ _ _ _ __ ___| |_ __ _| |__ | | ___", +" \\ // _ \\ '_ \\ |_ \\| | | |__| | | | '_ \\/ __| __/ _` | '_ \\| |/ _ \\", +" / \\ __/ | | | ___) | |_| |__| |_| | | | \\__ \\ || (_| | |_) | | __/", +" /_/\\_\\___|_| |_| |____(_)___/ \\__,_|_| |_|___/\\__\\__,_|_.__/|_|\\___|", +" ", +" http://www.cl.cam.ac.uk/netos/xen", +" University of Cambridge Computer Laboratory", +" ", +" Xen version 3.0-unstable (damm@) (gcc version 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0,", +" pie-8.7.9)) Wed Dec 6 17:34:32 JST 2006", +" Latest ChangeSet: unavailable", +" ", +" (XEN) Console output is synchronous.", +" (XEN) Command line: 12733-i386-pae/xen.gz console=com1 sync_console conswitch=bb com1", +" =115200,8n1,0x3f8 dom0_mem=480000 crashkernel=64M@32M", +" (XEN) Physical RAM map:", +" (XEN) 0000000000000000 - 0000000000098000 (usable)", +" (XEN) 0000000000098000 - 00000000000a0000 (reserved)", +" (XEN) 00000000000f0000 - 0000000000100000 (reserved)", +" (XEN) 0000000000100000 - 000000003f7f0000 (usable)", +" (XEN) 000000003f7f0000 - 000000003f7f3000 (ACPI NVS)", +" (XEN) 000000003f7f3000 - 000000003f800000 (ACPI data)", +" (XEN) 00000000e0000000 - 00000000f0000000 (reserved)", +" (XEN) 00000000fec00000 - 0000000100000000 (reserved)", +" (XEN) Kdump: 64MB (65536kB) at 0x2000000", +" (XEN) System RAM: 1015MB (1039904kB)", +" (XEN) ACPI: RSDP (v000 XPC ) @ 0x000f9250", +" ...", NULL }; --

19 years

2
1
0 / 0

linux_banner problem in 2.6.20/Ubuntu

by Alex Sidorenko

Hi Dave, trying the latest crash-4.0-3.20 on the latest Ubuntu/feisty kernel 2.6.20-9-generic I have found that live access does not work because of the following mismatch: /proc/version: Linux version 2.6.20-9-generic (root@rothera) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu3)) #2 SMP Mon Feb 26 03:01:44 UTC 2007 linux_banner: Linux version 2.6.20-9-generic (root@rothera) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu3)) #2 SMP Mon Feb 26 03:01:44 UTC 2007 (Ubuntu 2.6.20-9.16-generic) Looking at 2.6.20 sources (as provided in Ubuntu package) we see: ---------------------------------------------------------------------------------------- const char linux_banner[] = "Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION #ifdef CONFIG_VERSION_SIGNATURE " (" CONFIG_VERSION_SIGNATURE ")" #endif "\n"; const char linux_proc_banner[] = "%s version %s" " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ")" " (" LINUX_COMPILER ") %s\n"; ---------------------------------------------------------------------------------------- As a result, these two strings will be different if CONFIG_VERSION_SIGNATURE is defined as non-empty string. The comparison crash uses is (kernel.c): if (strlen(kt->proc_version) && !STREQ(buf, kt->proc_version)) { if (CRASHDEBUG(1)) { fprintf(fp, "/proc/version:\n%s", kt->proc_version); fprintf(fp, "linux_banner:\n%s\n", buf); } goto bad_match; I tried to fix the problem by replacing STREQ with STRNEQ. Unfortunately, this does not work as both strings are LF-terminated. So I had to use --- kernel.c.orig 2007-02-21 15:52:01.000000000 -0500 +++ kernel.c 2007-03-11 08:20:38.468024104 -0400 @@ -498,7 +498,8 @@ error(WARNING, "cannot read linux_banner string\n"); if (ACTIVE()) { - if (strlen(kt->proc_version) && !STREQ(buf, kt->proc_version)) { + int cmplen = strlen(kt->proc_version)-1; + if (cmplen>0 && strncmp(buf, kt->proc_version, cmplen) != 0) { if (CRASHDEBUG(1)) { fprintf(fp, "/proc/version:\n%s", kt->proc_version); Please note that this seems to be Ubuntu-specific problem - the generic 2.6.20.2 has linux_banner definition without CONFIG_VERSION_SIGNATURE. Searching Google for "linux CONFIG_VERSION_SIGNATURE" I can see only results related to Ubuntu. Regards, Alex

19 years

2
1
0 / 0

64 GiB hugemem vmcore can't be analyzed

by Kurtis D. Rader

The following vmcore was manually induced ([sysrq-d]) from a freshly booted hugemem kernel on a system with 64 GiB of memory: -rw-r--r-- 1 krader krader 68719480832 2007-03-09 12:18:44 /vmware/vmcore Running crash on it crash -d255 System.map-2.4.21-47.0.1.ELhugemem /vmware/vmcore \ ./usr/lib/debug/boot/vmlinux-2.4.21-47.ELhugemem.debug Fails with crash: ./usr/lib/debug/boot/vmlinux-2.4.21-47.ELhugemem.debug: no text and data contents Crash -d255 output is below. Is there any hope for getting this to work? We've got a customer whose system is exhibiting both hangs and panics. Until now the system hasn't been configured for crash dumps. This dump was obtained under controlled conditions to validate that we can get a complete dump. The dump process appeared to complete without error. The size of the vmcore looks reasonable. From the vmcore dmesg buffer: <6>disk_dump: Maximum block size: 16384 <6>disk_dump: total blocks required: 16515376 (header 3 + bitmap 512 + memory 16514861) <6>SysRq : Crashing the kernel by request crash 4.0-3.17 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. crash: diskdump: dump does not have panic dump header vmcore_data: flags: 5 (NETDUMP_LOCAL|NETDUMP_ELF32) ndfd: 4 ofp: 48a3a4c0 header_size: 4096 num_pt_load_segments: 1 pt_load_segment[0]: file_offset: 1000 phys_start: 0 phys_end: 0 zero_fill: 0 elf_header: 8405700 elf32: 8405700 notes32: 8405734 load32: 8405754 elf64: 0 notes64: 0 load64: 0 nt_prstatus: 8405774 nt_prpsinfo: 8405814 nt_taskstruct: 84058a0 task_struct: 75ff6000 page_size: 4096 switch_stack: 0 xen_kdump_data: (unused) num_prstatus_notes: 1 nt_prstatus_percpu: 08405774 Elf32_Ehdr: e_ident: \177ELF e_ident[EI_CLASS]: 1 (ELFCLASS32) e_ident[EI_DATA]: 1 (ELFDATA2LSB) e_ident[EI_VERSION]: 1 (EV_CURRENT) e_ident[EI_OSABI]: 0 (ELFOSABI_SYSV) e_ident[EI_ABIVERSION]: 0 e_type: 4 (ET_CORE) e_machine: 3 (EM_386) e_version: 1 (EV_CURRENT) e_entry: 0 e_phoff: 34 e_shoff: 0 e_flags: 0 e_ehsize: 34 e_phentsize: 20 e_phnum: 2 e_shentsize: 0 e_shnum: 0 e_shstrndx: 0 Elf32_Phdr: p_type: 4 (PT_NOTE) p_offset: 116 (74) p_vaddr: 0 p_paddr: 0 p_filesz: 344 (158) p_memsz: 0 (0) p_flags: 0 () p_align: 0 Elf32_Phdr: p_type: 1 (PT_LOAD) p_offset: 4096 (1000) p_vaddr: c0000000 p_paddr: 0 p_filesz: 0 (0) p_memsz: 0 (0) p_flags: 7 (PF_X|PF_W|PF_R) p_align: 4096 Elf32_Nhdr: n_namesz: 4 ("CORE") n_descsz: 144 n_type: 1 (NT_PRSTATUS) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 023b3ee0 00000001 00000001 00000063 00000006 00000063 75ff7f7c 00000068 00000068 00000000 00000000 ffffffff 021d2010 00000060 00010002 75ff7eac 00000068 00000000 Elf32_Nhdr: n_namesz: 4 ("CORE") n_descsz: 124 n_type: 3 (NT_PRPSINFO) 00005200 00000000 00000000 00000000 00000000 00000000 00000000 696c6d76 0078756e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Elf32_Nhdr: n_namesz: 4 ("CORE") n_descsz: 8 n_type: 4 (NT_TASKSTRUCT) 75ff6000 00000000 Elf32_Nhdr: n_namesz: 4 ("CORE") n_descsz: 4 n_type: 70000001 (NT_DISKDUMP) 00000000 crash: ./usr/lib/debug/boot/vmlinux-2.4.21-47.ELhugemem.debug: no text and data contents crash: the use of a System.map file requires that the accompanying namelist argument is a kernel file built with the -g CFLAG. The namelist argument supplied in this case is a debuginfo file, which must be accompanied by the kernel file from which it was derived. -- Kurtis D. Rader, Linux level 3 support email: krader(a)us.ibm.com IBM Integrated Technology Services DID: +1 503-578-3714 15350 SW Koll Pkwy, MS DES1-01 service: 800-IBM-SERV Beaverton, OR 97006-6063 http://www.ibm.com

19 years

2
2
0 / 0

handling missing kdump pages in diskdump format

by Bob Montgomery

I've been experimenting with the makedumpfile utility for kdump on ia64. One of my experiments was to verify that a page that should have been missing indeed was missing. I used crash 4.0-3.8 to look for a user page that should have been omitted from the dump. crash> x/xg 0xe0000040fc00c000 0xe0000040fc00c000: 0x0000000000000000 On a full dump from makedumpfile as well as on a straight copy of vmcore, crash reports this: crash> x/xg 0xe0000040fc00c000 0xe0000040fc00c000: 0x00010102464c457f The dumpfiles created by makedumpfile appear to crash as diskdump files, and crash appears to excuse missing pages and report 0x0 contents here: diskdump.c:read_diskdump, line 454: if (!page_is_dumpable(pfn)) { memset(bufptr, 0, cnt); return cnt; Shouldn't there be some indication that a requested page is missing as opposed to being legitimately full of zeros? Bob Montgomery

19 years

5
9
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility March 2007