January 2013 - Crash-utility - Crash Utility List Archives

by Bouchard Louis

Hi Dave, While finishing my work on the makedumpfile --dump-dmesg fix for kernels higher than 3.5, I may have found a minor bug in your 'log -m' command. I was trying to compare my output from makedumpfile --dump-dmesg with your log -m output. Or else I misunderstood your 'dump_log_entry' command and how the message log level should look. When running in -d1 on an F18 kernel, I get the following : > crash> log -m | tail > log 45368dc -> msg: 45368ec ts_nsec: 83443019994 level: 21026 text_len: 54 dict_len: 0 > [ 83.443019] <21026>RIP [<ffffffff81393986>] sysrq_handle_crash+0x16/0x20 > > log 4536924 -> msg: 4536934 ts_nsec: 83443019994 level: 8322 text_len: 23 dict_len: 0 > [ 83.443019] <8322> RSP <ffff8800363dde38> > > log 453694c -> msg: 453695c ts_nsec: 83443019994 level: 17286 text_len: 21 dict_len: 0 > [ 83.443019] <17286>CR2: 0000000000000000 The message log level in brackets that is displayed seems dubious to me. Since I didn't have access to the log.level offset in the vmcoreinfo, I had to hack and hard code the structure offset in makedumpfile. While doing that, I used a 3 bit mask to get the level:3 element but it looks like what you report is the full log.level value. So I ported back what I had done in makedumpfile in your dump_log_entry (patch attached) and tested on crash 6.1.2. I got : > crash> log -m | tail > log 458df8c -> msg: 458df9c ts_nsec: 83443019994 flags/level: 22 text_len: 54 dict_len: 0 > [ 83.443019] <2>RIP [<ffffffff81393986>] sysrq_handle_crash+0x16/0x20 > > log 458dfd4 -> msg: 458dfe4 ts_nsec: 83443019994 flags/level: 82 text_len: 23 dict_len: 0 > [ 83.443019] <2> RSP <ffff8800363dde38> > > log 458dffc -> msg: 458e00c ts_nsec: 83443019994 flags/level: 86 text_len: 21 dict_len: 0 > [ 83.443019] <6>CR2: 0000000000000000 Does the message log level looks correct to you with that modification ? Kind regards, ...Louis -- Louis Bouchard Backline Support Analyst Canonical Ltd Ubuntu support: http://landscape.canonical.com

12 years, 5 months

4
6
0 / 0

Does Crash support kernel 3.4.20 on PPC64?

by 李佳豪

crash can not exit from readmem() recursive until it is killed because of OOM or Segmentation fault on PPC64 cpu. readmem-->kvtop-->ppc64_vtop_level4-->readme-->kvtop.... root@localhost:~# uname -a Linux localhost 3.4.20 #2 SMP PREEMPT Thu Jan 24 14:43:59 CST 2013 ppc64 GNU/Linux root@localhost:~# root@localhost:~# crash crash 6.1.0 Copyright (C) 2002-2012 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.3.1 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64-wrs-linux"... Segmentation fault (core dumped) root@localhost:~# gdb /usr/bin/crash core GNU gdb (Linux Sourcery CodeBench 4.6a-98) 7.4.50.20120716-cvs Copyright (C) 2012 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64-wrs-linux-gnu". For bug reporting instructions, please see: <support(a)codesource.com>... Reading symbols from /usr/bin/crash...Reading symbols from /usr/bin/.debug/crash...done. done. [New LWP 1166] warning: Could not load shared library symbols for linux-vdso64.so.1. Do you need "set solib-search-path" or "set sysroot"? Core was generated by `crash '. Program terminated with signal 11, Segmentation fault. #0 readmem (addr=13835058055299842048, memtype=1, buffer=0x10cbc460, size=4096, type=0x106eafe8 "level4 page", error_handle=1) at memory.c:1959 1959 { (gdb) bt #0 readmem (addr=13835058055299842048, memtype=1, buffer=0x10cbc460, size=4096, type=0x106eafe8 "level4 page", error_handle=1) at memory.c:1959 #1 0x000000001011a2e4 in ppc64_vtop_level4 (vaddr=13835058055299842048, level4=0xc0000000010dc000, paddr=0xfffe73ed1d8, verbose=<optimized out>) at ppc64.c:561 #2 0x0000000010097264 in kvtop (tc=<optimized out>, kvaddr=<error reading variable: value has been optimized out>, paddr=<optimized out>, verbose=<error reading variable: value has been optimized out>) at memory.c:2765 #3 0x000000001009895c in readmem (addr=13835058055299842048, memtype=<optimized out>, buffer=<optimized out>, size=<optimized out>, type=0x106eafe8 "level4 page", error_handle=1) at memory.c:2032 #4 0x000000001011a2e4 in ppc64_vtop_level4 (vaddr=13835058055299842048, level4=0xc0000000010dc000, paddr=0xfffe73ed418, verbose=<optimized out>) at ppc64.c:561 #5 0x0000000010097264 in kvtop (tc=<optimized out>, kvaddr=<error reading variable: value has been optimized out>, paddr=<optimized out>, verbose=<error reading variable: value has been optimized out>) at memory.c:2765 #6 0x000000001009895c in readmem (addr=13835058055299842048, memtype=<optimized out>, buffer=<optimized out>, size=<optimized out>, ---Type <return> to continue, or q <return> to quit---quit type=Quit (gdb) quit

12 years, 5 months

2
5
0 / 0

[PATCH]: symbol filtering

by Per Fransson

Hi, For x86, crash avoids storing any '__crc_*' symbols. It should do the same for ARM, right? Credit goes to Rabin Vincent for this patch, unless you don't like it, in which case you can blame me. Another thing. The ARM kernel potentially includes a symbol 'PRRR' with a value of 0xff0a81a8, defined in arch/arm/mm/proc-v7-2level.S. The problem with this is that it's the symbol which ends up st->symtable[st->symcnt-1] instead of '_end' which means a lot of values will pass this check in in_ksymbol_range(): if ((value >= st->symtable[0].value) && (value <= st->symtable[st->symcnt-1].value)) { if ((st->flags & PERCPU_SYMS) && (value < st->first_ksymbol)) return FALSE; else return TRUE; } How would you prefer dealing with this? How about excluding any symbols with values > '_end'? A KSYMS_END flag could be added to the machdep->flags. Or just unsetting KSYM_START when '_end' is encountered in verify_symbol(). Regards, Per

12 years, 5 months

3
7
0 / 0

Interpreting bt

by Ahmed Al-Mehdi

Hello, I am using crash version: 6.0.4-2.el6 on CentOS 6.3 (kernel 2.6.32-279.el6.x86_64). I apologize for my newbie questions, but googling did not help much. When analyzing a kernel dump, I am getting the following bt. crash> bt PID: 12663 TASK: ffff88036304f500 CPU: 0 COMMAND: "bash" #0 [ffff88035b949570] machine_kexec at ffffffff8103281b #1 [ffff88035b9495d0] crash_kexec at ffffffff810ba662 #2 [ffff88035b9496a0] oops_end at ffffffff81501290 #3 [ffff88035b9496d0] no_context at ffffffff81043bab #4 [ffff88035b949720] __bad_area_nosemaphore at ffffffff81043e35 #5 [ffff88035b949770] bad_area at ffffffff81043f5e #6 [ffff88035b9497a0] __do_page_fault at ffffffff81044710 #7 [ffff88035b9498c0] do_page_fault at ffffffff8150326e #8 [ffff88035b9498f0] page_fault at ffffffff81500625 [exception RIP: ahaann+47] RIP: ffffffffa06ce48f RSP: ffff88035b9499a8 RFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88035daef4e0 RBP: ffff88035b9499b8 R8: 0000000004a47daf R9: ffffffffa06dae99 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000007 R13: 00007fc82f4b8000 R14: 000000000000000a R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88035b9499c0] ahaecho at ffffffffa06d2899 [ahadrv] #10 [ffff88035b949a00] writectl at ffffffffa06c366e [ahadrv] #11 [ffff88035b949e40] writeaha at ffffffffa06d3e7b [ahadrv] #12 [ffff88035b949e60] proc_file_write at ffffffff811e6e44 #13 [ffff88035b949ea0] proc_reg_write at ffffffff811e0abe #14 [ffff88035b949ef0] vfs_write at ffffffff8117b068 #15 [ffff88035b949f30] sys_write at ffffffff8117ba81 #16 [ffff88035b949f80] system_call_fastpath at ffffffff8100b0f2 RIP: 0000003a29ada3c0 RSP: 00007ffffaec6830 RFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffffff8100b0f2 RCX: 0000000000000065 RDX: 000000000000000a RSI: 00007fc82f4b8000 RDI: 0000000000000001 RBP: 00007fc82f4b8000 R8: 000000000000000a R9: 00007fc82f4aa700 R10: 00000000fffffff7 R11: 0000000000000246 R12: 000000000000000a R13: 0000003a29d8c780 R14: 000000000000000a R15: 0000000001e18460 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b crash> 1. Are the hex addr in [] right before the function name the stack frame ptr for that function? 2. I am assuming the panic occurred in function ahaann() (and not in ahaecho() ). Is that right? 3. What is puzzling me is why there is no frame associated with call to ahaann(). Or is frame #8 associated to ahaann(). From the display it seems frame #8 is associated to page_fault() since 0xffffffff81500625 is an address in page_fault(). Or am totally misinterpreting the call stack. crash> dis ffffffff81500625 0xffffffff81500625 <page_fault+37>: jmpq 0xffffffff81500830 4. I can understand the value of register dump for frame #8, due to the panic. What is the significance of the register dump for frame #16. Appreciate any help. Thank you, Ahmed.

12 years, 5 months

2
4
0 / 0

[PATCH]: minimal mode extensions

by Per Fransson

Hi all, How do you feel about allowing minimal mode in extensions? See attached patch. Regards, Per

12 years, 5 months

2
3
0 / 0

questions about crash utility

by 卜弋天

Hello: i am using crash utility 6.0.8 to parse the dump file of kernel 3.4. my platform will generate ebi.bin after crash, this binary file dumps ddr from address 0x0 to 0x20000000, total 512MB ram. after i get this binary file, i prefix a elf header to it, the function to generate elf header is as below: static size_t mkelfheader(void *buf) { struct elf_phdr *nhdr, *phdr; struct elfhdr *elf; size_t offset = 0; void *bufp = buf; elf = (Elf32_Ehdr *) bufp; bufp += sizeof(Elf32_Ehdr); offset += sizeof(struct elfhdr); memcpy(elf->e_ident, ELFMAG, SELFMAG); elf->e_ident[EI_CLASS] = ELFCLASS32; elf->e_ident[EI_DATA] = ELFDATA2LSB; elf->e_ident[EI_VERSION]= EV_CURRENT; elf->e_ident[EI_OSABI] = ELFOSABI_NONE; memset(elf->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD); elf->e_type = ET_CORE; elf->e_machine = EM_ARM; elf->e_version = EV_CURRENT; elf->e_entry = 0; elf->e_phoff = sizeof(struct elfhdr); elf->e_shoff = 0; elf->e_flags = 0; elf->e_ehsize = sizeof(struct elfhdr); elf->e_phentsize= sizeof(struct elf_phdr); elf->e_phnum = 2; elf->e_shentsize= 0; elf->e_shnum = 0; elf->e_shstrndx = 0; nhdr = (struct elf_phdr *) bufp; bufp += sizeof(struct elf_phdr); offset += sizeof(struct elf_phdr); nhdr->p_type = PT_NOTE; nhdr->p_offset = 0; nhdr->p_vaddr = 0; nhdr->p_paddr = 0; nhdr->p_filesz = 0; nhdr->p_memsz = 0; nhdr->p_flags = 0; nhdr->p_align = 0; phdr = (struct elf_phdr *) bufp; bufp += sizeof(struct elf_phdr); offset += sizeof(struct elf_phdr); phdr->p_type = PT_LOAD; phdr->p_flags = PF_R|PF_W|PF_X; phdr->p_offset = offset; phdr->p_vaddr = 0xc0000000; phdr->p_paddr = 0x00200000; phdr->p_filesz = phdr->p_memsz = MEMSIZE; phdr->p_align = 0; return offset; } after all, there will be a cdump.elf which contains the generated elf header, tailed by ebi.bin. then i use crash utility to load this cdump.elf together with the vmlinux. it has below error: WARNING: could not find MAGIC_START! WARNING: cpu_present_mask indicates more than 4 (NR_CPUS) cpus crash: cannot determine base kernel version crash: vmlinux and cdump.elf do not match! our platform set CONFIG_PHYS_OFFSET=0x00200000 in kernel .config file, which means that the virtual address 0xc0000000 will map to physical address 0x00200000. for this reason, i set phdr->p_paddr = 0x00200000 when generate the elf header. please help me to find out what is wrong, thanks very much. Best Regards

12 years, 5 months

2
8
0 / 0

dmesg content from crash without debug symbols

by Bouchard Louis

Hello, Is it possible to extract the content of the kernel buffer (result of the 'log' command) from a kernel dump without access to the kernel debug symbols ? The intent is to be able to retrieve a minimal set of information from the dump on a system that doesn't have to possibility to install the namelist with debug symbols. TIA, Kind regards, ...Louis -- Louis Bouchard Backline Support Analyst Canonical Ltd Ubuntu support: http://landscape.canonical.com

12 years, 5 months

2
2
0 / 0

[ANNOUNCE] crash version 6.1.2 is available

by Dave Anderson

Download from: http://people.redhat.com/anderson Changelog: - Enhancement of the "task" command to display both the task_struct and the thread_info structures of a task. The -R option accepts members of either/both structure types. (anderson(a)redhat.com) - Fix for the X86_64 "search" and "rd" commands due to this commit: http://git.kernel.org/linus/027ef6c87853b0a9df53175063028edb4950d476 Upon any attempt to read a page within the RAM region reserved for AMD GART on a live system, the Linux 3.7rc1 commit above causes causes /dev/mem, /proc/kcore and the /dev/crash drivers to spin forever, leading to a kernel soft lockup. The RAM pages reserved for GART consist of 2MB large pages whose _PAGE_PRESENT bits are turned off. Prior to the above commit, a read() attempt on GART RAM would cause an unresolvable page fault, and would harmlessly return an EFAULT. The commit above has changed pmd_large() function such that it now returns TRUE if only _PAGE_PSE bit is set in the PTE, whereas before it required both _PAGE_PSE and _PAGE_PRESENT. So instead of just failing the read() system call with an EFAULT, the page fault handling code now considers it a spurious TLB fault, and the instruction is retried indefinitely. The crash utility patch stores the GART physical memory range, and disallows any attempts to read from it. (anderson(a)redhat.com) - If an EPPIC_GIT_URL environment variable is defined, then the URL that it points to is used as an alternative to the code.google.com git source repository for the eppic.so extension module. However, the alternative site is only accessed if code.google.com can first be pinged; this patch removes that restriction. (per.fransson.ml(a)gmail.com) - Fix for the "files" command PATH display on kernels configured with CONFIG_DEVTMPFS, when the vfsmount pointer in an file structure's "f_path" member does not point to the root vfsmount required for reconstructing the full file pathname. Without the patch, open files in /dev directory may be truncated and not show the "/dev" filename component. (anderson(a)redhat.com) - Enhancement to the "kmem -v" option on 2.6.28 and later kernels that utilize the "vmap_area_list" list of mapped kernel virtual memory regions, replacing the usage of the to-be-obsoleted "vmlist" list. In those kernels, the output of the command will also show each vmap_area structure address, in addition to its vm_struct address, memory range, and size. (anderson(a)redhat.com) - Update to the exported do_rbtree() and do_rdtree() functions such that they will return the number of items found in the targeted tree, similar in nature to the do_list() function. The two functions have also been fixed such that the VERBOSE flag is actually recognized, so that external callers are able to gather the entries in a tree without having them displayed. The calls to either function may be enclosed with hq_open() and hq_close() so the that tree entries may be subsequently gathered by retrieve_list() into a supplied buffer, as well as to recognize a corrupted list with duplicate entries. (anderson(a)redhat.com) - Fix for the "extend -u" option to prevent the usage of a member of a free()'d extension_table structure. No command failure occurs, but rather an inadvertent coding error. (Jan.Karlsson(a)sonymobile.com) - Fix to allow error() to be called during an open_tmpfile() sequence prior to close_tmpfile() being called. There are no crash functions that call error() during an open_tmpfile() sequence, but there's no reason why it cannot be done. Without the patch, the error message gets displayed on stdout (as expected), but the error message will also overwrite/corrupt the tmpfile() data while it is being parsed. (anderson(a)redhat.com) - Fix to properly determine whether X86_64 kernels were configured with CONFIG_FRAME_POINTER, due to this ftrace-related commit: http://git.kernel.org/linus/d57c5d51a30152f3175d2344cb6395f08bf8ee0c Without the patch, the crash utility fails to determine whether the kernel was built with CONFIG_FRAME_POINTER, and therefore the "bt" command cannot take advantage of it for more reliable backtraces. (anderson(a)redhat.com) - Fix to properly determine whether 2.6.31 and earlier X86_64 kernels were configured with CONFIG_FRAME_POINTER. Without the patch, the crash utility may fail to determine whether the kernel was built with CONFIG_FRAME_POINTER. In those kernel versions -- which may be dependent upon the compiler version used -- one of the sample functions tested may have their "push %rbp, mov %rsp,%rbp" function preamble separated by other instruction(s), resulting in a false negative that precludes the "bt" command from taking advantage of framepointers. (anderson(a)redhat.com) - Fix for the file and line-number string that is displayed by the "sym <kernel-text>" option. Without the patch, the "/usr/src/" part of the string is stripped, and the filename string itself could have two corrupted characters in the pathname, for example, showing "k3.nel-3.6.fc17" instead of "kernel-3.6.fc17". This is dependent upon the compiler version, or perhaps the string library that is linked into the crash binary, because it only has been seen on crash binaries built with gcc-4.7. The fix now displays the full pathname, no longer dropping the "/usr/src" from beginning. (anderson(a)redhat.com) - Restricted the X86_64 "line_number_hook" to kernels earlier than 2.6.24, i.e., kernels prior to the x86/x86_64 merge. Without the patch, the manufactured filename information for assembly-language files was incorrect for 2.6.24 and later kernels. Also, the kernel debuginfo data now has file/line-number data for assembly-language files as well, obviating the need for the hook. (anderson(a)redhat.com) - Fix for the extensions/trace.c extension module to prevent a double free exception that would occur if a calloc() call fails during module initialization. (per.fransson.ml@gmail com) - Fix for the "p -u" option if a 32-bit kernel symbol is incorrectly passed as an argument. Without the patch, the command fails, but the next command requiring the services of the embedded gdb module will generate an error message of the sort "*** glibc detected *** crash: free(): invalid pointer: <address> ***", or "*** glibc detected *** crash: munmap_chunk(): invalid pointer: <address> ***", followed by a backtrace, and an abort of the crash session. (anderson(a)redhat.com) - Fix for the embedded gdb module to correctly handle kernel modules whose ELF header contains "__ksymtab" and "__ksymtab_gpl" sections with non-zero (nonsensical) "Address" values, such as those shown in this example snippet: $ readelf -a edac_core.so ... Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 8] __ksymtab PROGBITS 0000000000000060 0000ad90 0000000000000010 0000000000000000 A 0 0 16 ... [10] __ksymtab_gpl PROGBITS 0000000000000070 0000add0 00000000000001a0 0000000000000000 A 0 0 16 ... Without the patch, if one of the odd sections above is encountered, the "Offset" values of the remaining sections are not processed; and if the module's .data section is ignored, gdb incorrectly calculates the address of all symbols in the module's .data section, leading to incorrect output if, for example, data is printed with the gdb "p" command. This invalid ELF section format was introduced in Linux 3.0 by the kernel's "scripts/module-common.lds" file. (jan.kratochvil(a)redhat.com) - Fix for the "runq -g" option if the kernel contains more than 200 task groups. Without the patch, the command generates a segmentation violation. (anderson(a)redhat.com)

12 years, 5 months

1
0
0 / 0

[PATCH] Add a new option -r to cgget

by Zhang Xiaohe

Hello Dave, This patch allows cgget to display one or some parameters of a controller of cgroup and fix some bugs. One will find it more efficient when he just focus on some parameters instead of the whole controller. Here are examples: 1. crash> cgget -r cpuset.mems / /: cpuset.mems: 0 2. crash> cgget -r memory.usage / /: memory.memsw.max_usage_in_bytes: 0 memory.memsw.usage_in_bytes: 1368694784 memory.max_usage_in_bytes: 0 memory.usage_in_bytes: 1368694784 To apply this patch, enter to crash-<version> directory and run the commands as follows: $ cp cgget.pacth ./ $ patch -p0 -i cgget.patch For more information, please refer to the attachment. Thanks. -- Zhang Xiaohe Regards -------------------------------------------------- Development Dept.I Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road, Nanjing, 210012, China TEL: +86+25-86630566-8552 FAX: +86+25-83317685 MAIL: zhangxh(a)cn.fujitsu.com --------------------------------------------------

12 years, 6 months

2
1
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility January 2013