log -m reports wrong log level value
by Bouchard Louis
Hi Dave,
While finishing my work on the makedumpfile --dump-dmesg fix for kernels
higher than 3.5, I may have found a minor bug in your 'log -m' command.
I was trying to compare my output from makedumpfile --dump-dmesg with
your log -m output. Or else I misunderstood your 'dump_log_entry'
command and how the message log level should look.
When running in -d1 on an F18 kernel, I get the following :
> crash> log -m | tail
> log 45368dc -> msg: 45368ec ts_nsec: 83443019994 level: 21026 text_len: 54 dict_len: 0
> [ 83.443019] <21026>RIP [<ffffffff81393986>] sysrq_handle_crash+0x16/0x20
>
> log 4536924 -> msg: 4536934 ts_nsec: 83443019994 level: 8322 text_len: 23 dict_len: 0
> [ 83.443019] <8322> RSP <ffff8800363dde38>
>
> log 453694c -> msg: 453695c ts_nsec: 83443019994 level: 17286 text_len: 21 dict_len: 0
> [ 83.443019] <17286>CR2: 0000000000000000
The message log level in brackets that is displayed seems dubious to me.
Since I didn't have access to the log.level offset in the vmcoreinfo, I
had to hack and hard code the structure offset in makedumpfile. While
doing that, I used a 3 bit mask to get the level:3 element but it looks
like what you report is the full log.level value.
So I ported back what I had done in makedumpfile in your dump_log_entry
(patch attached) and tested on crash 6.1.2. I got :
> crash> log -m | tail
> log 458df8c -> msg: 458df9c ts_nsec: 83443019994 flags/level: 22 text_len: 54 dict_len: 0
> [ 83.443019] <2>RIP [<ffffffff81393986>] sysrq_handle_crash+0x16/0x20
>
> log 458dfd4 -> msg: 458dfe4 ts_nsec: 83443019994 flags/level: 82 text_len: 23 dict_len: 0
> [ 83.443019] <2> RSP <ffff8800363dde38>
>
> log 458dffc -> msg: 458e00c ts_nsec: 83443019994 flags/level: 86 text_len: 21 dict_len: 0
> [ 83.443019] <6>CR2: 0000000000000000
Does the message log level looks correct to you with that modification ?
Kind regards,
...Louis
--
Louis Bouchard
Backline Support Analyst
Canonical Ltd
Ubuntu support: http://landscape.canonical.com
11 years, 10 months
Does Crash support kernel 3.4.20 on PPC64?
by 李佳豪
crash can not exit from readmem() recursive until it is killed because of
OOM or Segmentation fault on PPC64 cpu.
readmem-->kvtop-->ppc64_vtop_level4-->readme-->kvtop....
root@localhost:~# uname -a
Linux localhost 3.4.20 #2 SMP PREEMPT Thu Jan 24 14:43:59 CST 2013 ppc64 GNU/Linux
root@localhost:~#
root@localhost:~# crash
crash 6.1.0
Copyright (C) 2002-2012 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64-wrs-linux"...
Segmentation fault (core dumped)
root@localhost:~# gdb /usr/bin/crash core
GNU gdb (Linux Sourcery CodeBench 4.6a-98) 7.4.50.20120716-cvs
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64-wrs-linux-gnu".
For bug reporting instructions, please see:
<support(a)codesource.com>...
Reading symbols from /usr/bin/crash...Reading symbols from /usr/bin/.debug/crash...done.
done.
[New LWP 1166]
warning: Could not load shared library symbols for linux-vdso64.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `crash '.
Program terminated with signal 11, Segmentation fault.
#0 readmem (addr=13835058055299842048, memtype=1, buffer=0x10cbc460,
size=4096, type=0x106eafe8 "level4 page", error_handle=1) at memory.c:1959
1959 {
(gdb) bt
#0 readmem (addr=13835058055299842048, memtype=1, buffer=0x10cbc460,
size=4096, type=0x106eafe8 "level4 page", error_handle=1) at memory.c:1959
#1 0x000000001011a2e4 in ppc64_vtop_level4 (vaddr=13835058055299842048,
level4=0xc0000000010dc000, paddr=0xfffe73ed1d8, verbose=<optimized out>)
at ppc64.c:561
#2 0x0000000010097264 in kvtop (tc=<optimized out>,
kvaddr=<error reading variable: value has been optimized out>,
paddr=<optimized out>,
verbose=<error reading variable: value has been optimized out>)
at memory.c:2765
#3 0x000000001009895c in readmem (addr=13835058055299842048,
memtype=<optimized out>, buffer=<optimized out>, size=<optimized out>,
type=0x106eafe8 "level4 page", error_handle=1) at memory.c:2032
#4 0x000000001011a2e4 in ppc64_vtop_level4 (vaddr=13835058055299842048,
level4=0xc0000000010dc000, paddr=0xfffe73ed418, verbose=<optimized out>)
at ppc64.c:561
#5 0x0000000010097264 in kvtop (tc=<optimized out>,
kvaddr=<error reading variable: value has been optimized out>,
paddr=<optimized out>,
verbose=<error reading variable: value has been optimized out>)
at memory.c:2765
#6 0x000000001009895c in readmem (addr=13835058055299842048,
memtype=<optimized out>, buffer=<optimized out>, size=<optimized out>,
---Type <return> to continue, or q <return> to quit---quit
type=Quit
(gdb) quit
11 years, 10 months
[PATCH]: symbol filtering
by Per Fransson
Hi,
For x86, crash avoids storing any '__crc_*' symbols. It should do the
same for ARM, right? Credit goes to Rabin Vincent for this patch,
unless you don't like it, in which case you can blame me.
Another thing. The ARM kernel potentially includes a symbol 'PRRR'
with a value of 0xff0a81a8, defined in arch/arm/mm/proc-v7-2level.S.
The problem with this is that it's the symbol which ends up
st->symtable[st->symcnt-1] instead of '_end' which means a lot of
values will pass this check in in_ksymbol_range():
if ((value >= st->symtable[0].value) &&
(value <= st->symtable[st->symcnt-1].value)) {
if ((st->flags & PERCPU_SYMS) && (value < st->first_ksymbol))
return FALSE;
else
return TRUE;
}
How would you prefer dealing with this? How about excluding any
symbols with values > '_end'? A KSYMS_END flag could be added to the
machdep->flags. Or just unsetting KSYM_START when '_end' is
encountered in verify_symbol().
Regards,
Per
11 years, 10 months
Interpreting bt
by Ahmed Al-Mehdi
Hello,
I am using crash version: 6.0.4-2.el6 on CentOS 6.3 (kernel
2.6.32-279.el6.x86_64). I apologize for my newbie questions, but googling
did not help much.
When analyzing a kernel dump, I am getting the following bt.
crash> bt
PID: 12663 TASK: ffff88036304f500 CPU: 0 COMMAND: "bash"
#0 [ffff88035b949570] machine_kexec at ffffffff8103281b
#1 [ffff88035b9495d0] crash_kexec at ffffffff810ba662
#2 [ffff88035b9496a0] oops_end at ffffffff81501290
#3 [ffff88035b9496d0] no_context at ffffffff81043bab
#4 [ffff88035b949720] __bad_area_nosemaphore at ffffffff81043e35
#5 [ffff88035b949770] bad_area at ffffffff81043f5e
#6 [ffff88035b9497a0] __do_page_fault at ffffffff81044710
#7 [ffff88035b9498c0] do_page_fault at ffffffff8150326e
#8 [ffff88035b9498f0] page_fault at ffffffff81500625
[exception RIP: ahaann+47]
RIP: ffffffffa06ce48f RSP: ffff88035b9499a8 RFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88035daef4e0
RBP: ffff88035b9499b8 R8: 0000000004a47daf R9: ffffffffa06dae99
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000007
R13: 00007fc82f4b8000 R14: 000000000000000a R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff88035b9499c0] ahaecho at ffffffffa06d2899 [ahadrv]
#10 [ffff88035b949a00] writectl at ffffffffa06c366e [ahadrv]
#11 [ffff88035b949e40] writeaha at ffffffffa06d3e7b [ahadrv]
#12 [ffff88035b949e60] proc_file_write at ffffffff811e6e44
#13 [ffff88035b949ea0] proc_reg_write at ffffffff811e0abe
#14 [ffff88035b949ef0] vfs_write at ffffffff8117b068
#15 [ffff88035b949f30] sys_write at ffffffff8117ba81
#16 [ffff88035b949f80] system_call_fastpath at ffffffff8100b0f2
RIP: 0000003a29ada3c0 RSP: 00007ffffaec6830 RFLAGS: 00010202
RAX: 0000000000000001 RBX: ffffffff8100b0f2 RCX: 0000000000000065
RDX: 000000000000000a RSI: 00007fc82f4b8000 RDI: 0000000000000001
RBP: 00007fc82f4b8000 R8: 000000000000000a R9: 00007fc82f4aa700
R10: 00000000fffffff7 R11: 0000000000000246 R12: 000000000000000a
R13: 0000003a29d8c780 R14: 000000000000000a R15: 0000000001e18460
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash>
1. Are the hex addr in [] right before the function name the stack frame
ptr for that function?
2. I am assuming the panic occurred in function ahaann() (and not in
ahaecho() ). Is that right?
3. What is puzzling me is why there is no frame associated with call to
ahaann(). Or is frame #8 associated to ahaann(). From the display it
seems frame #8 is associated to page_fault() since 0xffffffff81500625 is an
address in page_fault(). Or am totally misinterpreting the call stack.
crash> dis ffffffff81500625
0xffffffff81500625 <page_fault+37>: jmpq 0xffffffff81500830
4. I can understand the value of register dump for frame #8, due to the
panic. What is the significance of the register dump for frame #16.
Appreciate any help.
Thank you,
Ahmed.
11 years, 11 months
questions about crash utility
by 卜弋天
Hello: i am using crash utility 6.0.8 to parse the dump file of kernel 3.4. my platform will generate ebi.bin after crash, this binary file dumps ddr from address 0x0 to 0x20000000, total 512MB ram. after i get this binary file, i prefix a elf header to it, the function to generate elf header is as below: static size_t mkelfheader(void *buf)
{
struct elf_phdr *nhdr, *phdr;
struct elfhdr *elf;
size_t offset = 0;
void *bufp = buf; elf = (Elf32_Ehdr *) bufp;
bufp += sizeof(Elf32_Ehdr);
offset += sizeof(struct elfhdr); memcpy(elf->e_ident, ELFMAG, SELFMAG);
elf->e_ident[EI_CLASS] = ELFCLASS32;
elf->e_ident[EI_DATA] = ELFDATA2LSB;
elf->e_ident[EI_VERSION]= EV_CURRENT;
elf->e_ident[EI_OSABI] = ELFOSABI_NONE;
memset(elf->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
elf->e_type = ET_CORE;
elf->e_machine = EM_ARM;
elf->e_version = EV_CURRENT;
elf->e_entry = 0;
elf->e_phoff = sizeof(struct elfhdr);
elf->e_shoff = 0;
elf->e_flags = 0;
elf->e_ehsize = sizeof(struct elfhdr);
elf->e_phentsize= sizeof(struct elf_phdr);
elf->e_phnum = 2;
elf->e_shentsize= 0;
elf->e_shnum = 0;
elf->e_shstrndx = 0; nhdr = (struct elf_phdr *) bufp;
bufp += sizeof(struct elf_phdr);
offset += sizeof(struct elf_phdr);
nhdr->p_type = PT_NOTE;
nhdr->p_offset = 0;
nhdr->p_vaddr = 0;
nhdr->p_paddr = 0;
nhdr->p_filesz = 0;
nhdr->p_memsz = 0;
nhdr->p_flags = 0;
nhdr->p_align = 0; phdr = (struct elf_phdr *) bufp;
bufp += sizeof(struct elf_phdr);
offset += sizeof(struct elf_phdr); phdr->p_type = PT_LOAD;
phdr->p_flags = PF_R|PF_W|PF_X;
phdr->p_offset = offset;
phdr->p_vaddr = 0xc0000000;
phdr->p_paddr = 0x00200000;
phdr->p_filesz = phdr->p_memsz = MEMSIZE;
phdr->p_align = 0; return offset;
}
after all, there will be a cdump.elf which contains the generated elf header, tailed by ebi.bin. then i use crash utility to load this cdump.elf together with the vmlinux. it has below error: WARNING: could not find MAGIC_START!
WARNING: cpu_present_mask indicates more than 4 (NR_CPUS) cpus
crash: cannot determine base kernel version
crash: vmlinux and cdump.elf do not match!
our platform set CONFIG_PHYS_OFFSET=0x00200000 in kernel .config file, which means that the virtual address 0xc0000000 will map to physical address 0x00200000. for this reason, i set phdr->p_paddr = 0x00200000 when generate the elf header. please help me to find out what is wrong, thanks very much. Best Regards
11 years, 11 months
dmesg content from crash without debug symbols
by Bouchard Louis
Hello,
Is it possible to extract the content of the kernel buffer (result of
the 'log' command) from a kernel dump without access to the kernel debug
symbols ?
The intent is to be able to retrieve a minimal set of information from
the dump on a system that doesn't have to possibility to install the
namelist with debug symbols.
TIA,
Kind regards,
...Louis
--
Louis Bouchard
Backline Support Analyst
Canonical Ltd
Ubuntu support: http://landscape.canonical.com
11 years, 11 months
[ANNOUNCE] crash version 6.1.2 is available
by Dave Anderson
Download from: http://people.redhat.com/anderson
Changelog:
- Enhancement of the "task" command to display both the task_struct
and the thread_info structures of a task. The -R option accepts
members of either/both structure types.
(anderson(a)redhat.com)
- Fix for the X86_64 "search" and "rd" commands due to this commit:
http://git.kernel.org/linus/027ef6c87853b0a9df53175063028edb4950d476
Upon any attempt to read a page within the RAM region reserved for
AMD GART on a live system, the Linux 3.7rc1 commit above causes
causes /dev/mem, /proc/kcore and the /dev/crash drivers to spin
forever, leading to a kernel soft lockup. The RAM pages reserved for
GART consist of 2MB large pages whose _PAGE_PRESENT bits are turned
off. Prior to the above commit, a read() attempt on GART RAM would
cause an unresolvable page fault, and would harmlessly return an
EFAULT. The commit above has changed pmd_large() function such that
it now returns TRUE if only _PAGE_PSE bit is set in the PTE, whereas
before it required both _PAGE_PSE and _PAGE_PRESENT. So instead of
just failing the read() system call with an EFAULT, the page fault
handling code now considers it a spurious TLB fault, and the
instruction is retried indefinitely. The crash utility patch stores
the GART physical memory range, and disallows any attempts to read
from it.
(anderson(a)redhat.com)
- If an EPPIC_GIT_URL environment variable is defined, then the URL
that it points to is used as an alternative to the code.google.com
git source repository for the eppic.so extension module. However,
the alternative site is only accessed if code.google.com can first
be pinged; this patch removes that restriction.
(per.fransson.ml(a)gmail.com)
- Fix for the "files" command PATH display on kernels configured with
CONFIG_DEVTMPFS, when the vfsmount pointer in an file structure's
"f_path" member does not point to the root vfsmount required for
reconstructing the full file pathname. Without the patch, open files
in /dev directory may be truncated and not show the "/dev" filename
component.
(anderson(a)redhat.com)
- Enhancement to the "kmem -v" option on 2.6.28 and later kernels that
utilize the "vmap_area_list" list of mapped kernel virtual memory
regions, replacing the usage of the to-be-obsoleted "vmlist" list.
In those kernels, the output of the command will also show each
vmap_area structure address, in addition to its vm_struct address,
memory range, and size.
(anderson(a)redhat.com)
- Update to the exported do_rbtree() and do_rdtree() functions such
that they will return the number of items found in the targeted tree,
similar in nature to the do_list() function. The two functions have
also been fixed such that the VERBOSE flag is actually recognized,
so that external callers are able to gather the entries in a tree
without having them displayed. The calls to either function may be
enclosed with hq_open() and hq_close() so the that tree entries may
be subsequently gathered by retrieve_list() into a supplied buffer,
as well as to recognize a corrupted list with duplicate entries.
(anderson(a)redhat.com)
- Fix for the "extend -u" option to prevent the usage of a member of
a free()'d extension_table structure. No command failure occurs,
but rather an inadvertent coding error.
(Jan.Karlsson(a)sonymobile.com)
- Fix to allow error() to be called during an open_tmpfile() sequence
prior to close_tmpfile() being called. There are no crash functions
that call error() during an open_tmpfile() sequence, but there's no
reason why it cannot be done. Without the patch, the error message
gets displayed on stdout (as expected), but the error message will
also overwrite/corrupt the tmpfile() data while it is being parsed.
(anderson(a)redhat.com)
- Fix to properly determine whether X86_64 kernels were configured
with CONFIG_FRAME_POINTER, due to this ftrace-related commit:
http://git.kernel.org/linus/d57c5d51a30152f3175d2344cb6395f08bf8ee0c
Without the patch, the crash utility fails to determine whether the
kernel was built with CONFIG_FRAME_POINTER, and therefore the "bt"
command cannot take advantage of it for more reliable backtraces.
(anderson(a)redhat.com)
- Fix to properly determine whether 2.6.31 and earlier X86_64 kernels
were configured with CONFIG_FRAME_POINTER. Without the patch, the
crash utility may fail to determine whether the kernel was built with
CONFIG_FRAME_POINTER. In those kernel versions -- which may be
dependent upon the compiler version used -- one of the sample
functions tested may have their "push %rbp, mov %rsp,%rbp" function
preamble separated by other instruction(s), resulting in a false
negative that precludes the "bt" command from taking advantage of
framepointers.
(anderson(a)redhat.com)
- Fix for the file and line-number string that is displayed by the
"sym <kernel-text>" option. Without the patch, the "/usr/src/"
part of the string is stripped, and the filename string itself
could have two corrupted characters in the pathname, for example,
showing "k3.nel-3.6.fc17" instead of "kernel-3.6.fc17". This is
dependent upon the compiler version, or perhaps the string library
that is linked into the crash binary, because it only has been seen
on crash binaries built with gcc-4.7. The fix now displays the full
pathname, no longer dropping the "/usr/src" from beginning.
(anderson(a)redhat.com)
- Restricted the X86_64 "line_number_hook" to kernels earlier than
2.6.24, i.e., kernels prior to the x86/x86_64 merge. Without the
patch, the manufactured filename information for assembly-language
files was incorrect for 2.6.24 and later kernels. Also, the kernel
debuginfo data now has file/line-number data for assembly-language
files as well, obviating the need for the hook.
(anderson(a)redhat.com)
- Fix for the extensions/trace.c extension module to prevent a double
free exception that would occur if a calloc() call fails during
module initialization.
(per.fransson.ml@gmail com)
- Fix for the "p -u" option if a 32-bit kernel symbol is incorrectly
passed as an argument. Without the patch, the command fails, but
the next command requiring the services of the embedded gdb module
will generate an error message of the sort "*** glibc detected ***
crash: free(): invalid pointer: <address> ***", or "*** glibc
detected *** crash: munmap_chunk(): invalid pointer: <address> ***",
followed by a backtrace, and an abort of the crash session.
(anderson(a)redhat.com)
- Fix for the embedded gdb module to correctly handle kernel modules
whose ELF header contains "__ksymtab" and "__ksymtab_gpl" sections
with non-zero (nonsensical) "Address" values, such as those shown
in this example snippet:
$ readelf -a edac_core.so
...
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[ 8] __ksymtab PROGBITS 0000000000000060 0000ad90
0000000000000010 0000000000000000 A 0 0 16
...
[10] __ksymtab_gpl PROGBITS 0000000000000070 0000add0
00000000000001a0 0000000000000000 A 0 0 16
...
Without the patch, if one of the odd sections above is encountered,
the "Offset" values of the remaining sections are not processed; and
if the module's .data section is ignored, gdb incorrectly calculates
the address of all symbols in the module's .data section, leading to
incorrect output if, for example, data is printed with the gdb "p"
command. This invalid ELF section format was introduced in Linux 3.0
by the kernel's "scripts/module-common.lds" file.
(jan.kratochvil(a)redhat.com)
- Fix for the "runq -g" option if the kernel contains more than 200
task groups. Without the patch, the command generates a segmentation
violation.
(anderson(a)redhat.com)
11 years, 11 months
[PATCH] Add a new option -r to cgget
by Zhang Xiaohe
Hello Dave,
This patch allows cgget to display one or some parameters of a
controller of cgroup and fix some bugs.
One will find it more efficient when he just focus on some
parameters instead of the whole controller.
Here are examples:
1.
crash> cgget -r cpuset.mems /
/:
cpuset.mems: 0
2.
crash> cgget -r memory.usage /
/:
memory.memsw.max_usage_in_bytes: 0
memory.memsw.usage_in_bytes: 1368694784
memory.max_usage_in_bytes: 0
memory.usage_in_bytes: 1368694784
To apply this patch, enter to crash-<version> directory and run
the commands as follows:
$ cp cgget.pacth ./
$ patch -p0 -i cgget.patch
For more information, please refer to the attachment.
Thanks.
--
Zhang Xiaohe
Regards
--------------------------------------------------
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, 210012, China
TEL: +86+25-86630566-8552
FAX: +86+25-83317685
MAIL: zhangxh(a)cn.fujitsu.com
--------------------------------------------------
11 years, 11 months