February 2008 - Crash-utility - Crash Utility List Archives

Anyone seen this problem or what did i miss?

by Jay Lan

I tested in a rhel5.1 root with: 2.6.24 kernel kexec-tools-testing-20080227 crash-4.0-5.1 Crash failed to initialize: crash: read error: kernel virtual address: a0000001007f0868 type: "kernel_config_data" WARNING: cannot read kernel_config_data crash: read error: kernel virtual address: a000000100f370b0 type: "xtime" Has anyone else seen this problem? Thanks, - jay

17 years, 5 months

3
4
0 / 0

crash version 4.0-6.1 is available

by Dave Anderson

- Support for 2.6.25 x86_64 kernels with the x86/x86_64 merger patch. Without the patch, attempting a crash session would fail during initialization with the error message: "crash: invalid structure member offset: tss_struct_ist". (anderson(a)redhat.com) - Support for 2.6.25 x86 kernels with the x86/x86_64 merger patch. Without the patch, attempting a crash session on a dumpfile would fail during initialization with the error message: "crash: invalid structure size: user_regs_struct". (anderson(a)redhat.com) - Fix for "bt" command when running on a live 2.6.25 x86 kernel with the x86/x86_64 merger patch. Without the patch, "bt" would fail with the error message: "bt: invalid structure member offset: task_struct_thread_eip". (anderson(a)redhat.com) - Fix for the "timer" command in 2.6.25 kernels. Without the patch the command would fail with the error message: "timer: zero-size memory allocation! (called from <user address>)". (anderson(a)redhat.com) - Cosmetic change to the x86 "bt" command to recognize the entry point name change from "sysenter_entry" to "ia32_sysenter_target". Without the patch, the entry point would indicate the "sysenter_past_esp" assembly code label. (anderson(a)redhat.com) Download from: http://people.redhat.com/anderson

17 years, 5 months

1
0
0 / 0

Heads up re: the 2.6.25 x86/x86_64 merge

by Dave Anderson

This is likely going to be a painful process getting crash to work right with these kernels again... The tss_struct hack I suggested to Solofo earlier will actually get an x86_64 session started: --- crash-4.0-4.13/x86_64.c.orig +++ crash-4.0-4.13/x86_64.c @@ -255,7 +255,7 @@ x86_64_init(int when) MEMBER_OFFSET_INIT(thread_struct_rsp, "thread_struct", "rsp"); MEMBER_OFFSET_INIT(thread_struct_rsp0, "thread_struct", "rsp0"); STRUCT_SIZE_INIT(tss_struct, "tss_struct"); - MEMBER_OFFSET_INIT(tss_struct_ist, "tss_struct", "ist"); + MEMBER_OFFSET_INIT(tss_struct_ist, "x86_hw_tss", "ist"); MEMBER_OFFSET_INIT(user_regs_struct_rip, "user_regs_struct", "rip"); MEMBER_OFFSET_INIT(user_regs_struct_rsp, but there will be numerous obstacles to overcome due to other structure related changes. Fundamental data structures that are crucial to crash have changed. Many are just name changes, but will cause "invalid structure member offset" failures like so: crash> bt 1 bt: invalid structure member offset: thread_struct_rsp FILE: x86_64.c LINE: 3645 FUNCTION: x86_64_get_sp() [./crash] error trace: 514b52 => 4d2ce8 => 4d3c3e => 4fff86 PID: 1 TASK: ffff81012fa9e7e0 CPU: 1 COMMAND: "init" 4fff86: OFFSET_verify+159 4d3c3e: x86_64_get_sp+110 4d2ce8: x86_64_get_stack_frame+107 514b52: get_netdump_regs_x86_64+798 WARNING: Because this kernel was compiled with gcc version 4.2.3, certain commands or command options may fail unless crash is invoked with the "--readnow" command line option. bt: invalid structure member offset: thread_struct_rsp FILE: x86_64.c LINE: 3645 FUNCTION: x86_64_get_sp() crash> which happens because the x86_64 thread_struct.rsp member has been renamed thread_struct.sp. Another example is the venerable pt_regs structure, which has changed many of its member names, from: struct pt_regs { long unsigned int r15; long unsigned int r14; long unsigned int r13; long unsigned int r12; long unsigned int rbp; long unsigned int rbx; long unsigned int r11; long unsigned int r10; long unsigned int r9; long unsigned int r8; long unsigned int rax; long unsigned int rcx; long unsigned int rdx; long unsigned int rsi; long unsigned int rdi; long unsigned int orig_rax; long unsigned int rip; long unsigned int cs; long unsigned int eflags; long unsigned int rsp; long unsigned int ss; } to: struct pt_regs { long unsigned int r15; long unsigned int r14; long unsigned int r13; long unsigned int r12; long unsigned int bp; long unsigned int bx; long unsigned int r11; long unsigned int r10; long unsigned int r9; long unsigned int r8; long unsigned int ax; long unsigned int cx; long unsigned int dx; long unsigned int si; long unsigned int di; long unsigned int orig_ax; long unsigned int ip; long unsigned int cs; long unsigned int flags; long unsigned int sp; long unsigned int ss; } Anyway, the same types of issues will plague x86 as well. What a pain in the ass... BTW, if anybody happens to have an x86 vmlinux/vmcore pair from a 2.6.25 kernel that they could make available to me, I'd appreciate it. Many thanks to Solofo for making the x86_64 dumpfile available to work with. Back to the drawing board... Dave

17 years, 5 months

2
2
0 / 0

2.6.25-rc2-git1 test

by Solofo Ramangalahy

Hello, Testing crash with 2.6.25-rc2-git1 leads to: .............................................................................. crash 4.0-4.13 [ ... snipped ... ] please wait... (patching 33611 gdb minimal_symbol values) crash: invalid structure member offset: tss_struct_ist FILE: x86_64.c LINE: 682 FUNCTION: x86_64_ist_init() [.../crash-4.0.4-13/bin/crash] error trace: 4522bb => 4cc1cb => 4ce2fb => 5029bd 5029bd: OFFSET_verify+159 4ce2fb: x86_64_ist_init+501 4cc1cb: x86_64_init+2903 4522bb: main_loop+115 WARNING: Because this kernel was compiled with gcc version 4.2.3, certain commands or command options may fail unless crash is invoked with the "--readnow" command line option. .............................................................................. . This is probably also the case with less recent kernel versions (cannot confirm as of now). . The vmcore was the one produced by "echo c > /proc/sysrq-trigger" . Advice of using --readnow does not suppress the error. . Using --no_data_debug option allows to go further. Regards, -- solofo

17 years, 5 months

2
8
0 / 0

crash version 4.0-5.1 is available

by Dave Anderson

Note that this is essentially two releases; the 4.0-5.0 release was an interim snapshot used for Red Hat errata releases. It should also be noted another release should be forthcoming to address the data structure changes introduced by the x86/x86_64 merger work done in the 2.6.25 kernel. 4.0-5.1: - Update "ps -l" to use task_struct.sched_info.last_arrival value on 2.6.23 and later kernels that don't have a task_struct.last_ran member. Without the patch, the option would fail with the error message: "ps: neither task_struct.last_run nor task_struct.timestamp exist in this kernel". (anderson(a)redhat.com) - Fix for potential initialization-time failure when running against 2.4-era x86 netdump dumpfiles if the ebp and esp contents in the ELF header's NT_PRSTATUS register dump do not contain a vestige of the panic task's kernel stack address. Without the patch, there may be one or more warning messages complaining about tasks not being in the PID hash, followed by a fatal error message: "crash: invalid kernel virtual address: <bad-address> type: 32-bit KVADDR", where the <bad-address> can be any bogus kernel virtual address. (anderson(a)redhat.com) - Fix to make the unused do_radix_tree() function work as advertised. atyson(a)hp.com) - Added zlib-devel to the crash-devel package-dependency Requires line in the crash.spec file. (anderson(a)redhat.com) 4.0-5.0: - Tentatively scheduled as the baseline version for RHEL4.7 and RHEL5.2 crash utility errata releases; also built in Fedora Rawhide: 4.0-5.0.0 - RHEL4.7 errata version 4.0-5.0.2 - RHEL5.2 errata version 4.0-5.0.3 - Fedora Rawhide (devel branch) - Fix for a potential segmentation violation during crash session initialization if a task's kernel stack has been completely overrun, corrupting its thread_info structure at the bottom of the stack. This could occur running against kernels from 2.6.8 through 2.6.18. With the patch, the suspect task will be reported during the task initialization sequence. (anderson(a)redhat.com) - Fix for the "bt" command when run on xen x86 dom0 dumpfiles, which may potentially show empty backtraces for one or more active tasks. (oomichi(a)mxs.nes.nec.co.jp) - Initial support for OpenVZ kernels. (kshileev(a)sw.ru) Download from: http://people.redhat.com/anderson

17 years, 5 months

1
0
0 / 0

determining a "valid" vmcore

by Andrew Hecox

hello, I'm looking at a customer issue where diskdumpmsg is unable to read a vmcore file. It is not clear if this a problem with the vmcore file or diskdumpmsg. I can load the vmcore with crash and in my naive usage of it, can see no problems. However, I'm new to the tool so that doesn't give me a lot of confidence. Does anyone have any suggestions on how or if I can use crash to help determine if there's corruption in the vmcore file? Or any other way of approaching the problem? Thanks much, Andrew

17 years, 6 months

3
14
0 / 0

[PATCH] SIAL ps.c: Fix wrong access to .counter on non-SMP kernels

by Bernhard Walle

This patch fixes following SIAL error when loading ps.c SIAL sample script on a non-SMP system (kernel): File /usr/share/sial/crash/ps.c, line 130, Error: Expression \ for member 'counter' is not a struct/union The problem behind is that mm_counter_t is defined as 'unsigned long' on systems which have NR_CPUS < CONFIG_SPLIT_PTLOCK_CPUS (in practise for distribution kernels, that are only UMP kernels -- I don't know how to test for that condition in SIAL). Signed-off-by: Bernhard Walle <bwalle(a)suse.de> --- ps.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) --- a/ps.c +++ b/ps.c @@ -127,7 +127,16 @@ int getasattr(task_t *t, int f) return t->mm->rss*4; } else { struct mm_struct *mm=t->mm?t->mm:t->active_mm; - return (mm->_file_rss.counter+mm->_anon_rss.counter)*4; + + /* + * on a SMP kernel (with a reasonable amount of NR_CPUS), + * the _anon_rss and _file_rss is a atomic_t, on a UMP kernel + * it's a normal integer + */ + if (exists("smp_num_cpus") || exists("__per_cpu_offset")) + return (mm->_file_rss.counter+mm->_anon_rss.counter)*4; + else + return (mm->_file_rss+mm->_anon_rss)*4; } case 2: return t->mm->total_vm*4;

17 years, 6 months

2
3
0 / 0

[PATCH] SIAL {files, ps}.c: typedef task_t only for new kernels

by Bernhard Walle

The SIAL interpreter is confused by typedef'ing task_t in kernel versions that already have that typedef in the kernel. The typedef was removed with kernel 2.6.18, so adding the typedef only when LINUX_RELEASE is greater than 2.6.17 fixes the problem. Signed-off-by: Bernhard Walle <bwalle(a)suse.de> --- files.c | 2 ++ ps.c | 3 +++ 2 files changed, 5 insertions(+) --- a/files.c +++ b/files.c @@ -129,7 +129,9 @@ sfiles_help() " DENTRY INODE SUPERBLK TYPE PATH\n"+ " f745fd60 f7284640 f73a3e00 REG /var/spool/lpd/lpd.lock\n"; } +#if LINUX_RELEASE > 0x020611 typedef struct task_struct task_t; +#endif void print_task_header(unsigned long tval, int newline) { --- a/ps.c +++ b/ps.c @@ -62,7 +62,10 @@ main() return 1; } +#if LINUX_RELEASE > 0x020611 typedef struct task_struct task_t; +#endif + struct mm_struct *x; void walk_tasks(string callback)

17 years, 6 months

2
1
0 / 0

enhance print command

by Ming Zhang

Hi All I wonder if this is doable or already there? Now for example, if use "print" to check a function pointer array, it only shows the content. then for each value, we have run sym <value> to see which symbol it is. is it possible for the "print" command do this automatically? Thanks. -- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 --------------------------------------------

17 years, 6 months

2
2
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility February 2008