Anyone seen this problem or what did i miss?
by Jay Lan
I tested in a rhel5.1 root with:
2.6.24 kernel
kexec-tools-testing-20080227
crash-4.0-5.1
Crash failed to initialize:
crash: read error: kernel virtual address: a0000001007f0868 type:
"kernel_config_data"
WARNING: cannot read kernel_config_data
crash: read error: kernel virtual address: a000000100f370b0 type: "xtime"
Has anyone else seen this problem?
Thanks,
- jay
16 years, 9 months
crash version 4.0-6.1 is available
by Dave Anderson
- Support for 2.6.25 x86_64 kernels with the x86/x86_64 merger patch.
Without the patch, attempting a crash session would fail during
initialization with the error message: "crash: invalid structure
member offset: tss_struct_ist". (anderson(a)redhat.com)
- Support for 2.6.25 x86 kernels with the x86/x86_64 merger patch.
Without the patch, attempting a crash session on a dumpfile would
fail during initialization with the error message: "crash: invalid
structure size: user_regs_struct". (anderson(a)redhat.com)
- Fix for "bt" command when running on a live 2.6.25 x86 kernel with
the x86/x86_64 merger patch. Without the patch, "bt" would fail
with the error message: "bt: invalid structure member offset:
task_struct_thread_eip". (anderson(a)redhat.com)
- Fix for the "timer" command in 2.6.25 kernels. Without the patch
the command would fail with the error message: "timer: zero-size
memory allocation! (called from <user address>)".
(anderson(a)redhat.com)
- Cosmetic change to the x86 "bt" command to recognize the entry point
name change from "sysenter_entry" to "ia32_sysenter_target". Without
the patch, the entry point would indicate the "sysenter_past_esp"
assembly code label. (anderson(a)redhat.com)
Download from: http://people.redhat.com/anderson
16 years, 9 months
Heads up re: the 2.6.25 x86/x86_64 merge
by Dave Anderson
This is likely going to be a painful process getting crash to
work right with these kernels again...
The tss_struct hack I suggested to Solofo earlier will actually get
an x86_64 session started:
--- crash-4.0-4.13/x86_64.c.orig
+++ crash-4.0-4.13/x86_64.c
@@ -255,7 +255,7 @@ x86_64_init(int when)
MEMBER_OFFSET_INIT(thread_struct_rsp, "thread_struct", "rsp");
MEMBER_OFFSET_INIT(thread_struct_rsp0, "thread_struct", "rsp0");
STRUCT_SIZE_INIT(tss_struct, "tss_struct");
- MEMBER_OFFSET_INIT(tss_struct_ist, "tss_struct", "ist");
+ MEMBER_OFFSET_INIT(tss_struct_ist, "x86_hw_tss", "ist");
MEMBER_OFFSET_INIT(user_regs_struct_rip,
"user_regs_struct", "rip");
MEMBER_OFFSET_INIT(user_regs_struct_rsp,
but there will be numerous obstacles to overcome due to other structure
related changes. Fundamental data structures that are crucial to crash
have changed. Many are just name changes, but will cause "invalid structure
member offset" failures like so:
crash> bt 1
bt: invalid structure member offset: thread_struct_rsp
FILE: x86_64.c LINE: 3645 FUNCTION: x86_64_get_sp()
[./crash] error trace: 514b52 => 4d2ce8 => 4d3c3e => 4fff86
PID: 1 TASK: ffff81012fa9e7e0 CPU: 1 COMMAND: "init"
4fff86: OFFSET_verify+159
4d3c3e: x86_64_get_sp+110
4d2ce8: x86_64_get_stack_frame+107
514b52: get_netdump_regs_x86_64+798
WARNING: Because this kernel was compiled with gcc version 4.2.3, certain
commands or command options may fail unless crash is invoked with
the "--readnow" command line option.
bt: invalid structure member offset: thread_struct_rsp
FILE: x86_64.c LINE: 3645 FUNCTION: x86_64_get_sp()
crash>
which happens because the x86_64 thread_struct.rsp member has been
renamed thread_struct.sp. Another example is the venerable pt_regs
structure, which has changed many of its member names, from:
struct pt_regs {
long unsigned int r15;
long unsigned int r14;
long unsigned int r13;
long unsigned int r12;
long unsigned int rbp;
long unsigned int rbx;
long unsigned int r11;
long unsigned int r10;
long unsigned int r9;
long unsigned int r8;
long unsigned int rax;
long unsigned int rcx;
long unsigned int rdx;
long unsigned int rsi;
long unsigned int rdi;
long unsigned int orig_rax;
long unsigned int rip;
long unsigned int cs;
long unsigned int eflags;
long unsigned int rsp;
long unsigned int ss;
}
to:
struct pt_regs {
long unsigned int r15;
long unsigned int r14;
long unsigned int r13;
long unsigned int r12;
long unsigned int bp;
long unsigned int bx;
long unsigned int r11;
long unsigned int r10;
long unsigned int r9;
long unsigned int r8;
long unsigned int ax;
long unsigned int cx;
long unsigned int dx;
long unsigned int si;
long unsigned int di;
long unsigned int orig_ax;
long unsigned int ip;
long unsigned int cs;
long unsigned int flags;
long unsigned int sp;
long unsigned int ss;
}
Anyway, the same types of issues will plague x86 as well.
What a pain in the ass...
BTW, if anybody happens to have an x86 vmlinux/vmcore pair
from a 2.6.25 kernel that they could make available to me,
I'd appreciate it.
Many thanks to Solofo for making the x86_64 dumpfile available
to work with.
Back to the drawing board...
Dave
16 years, 9 months
2.6.25-rc2-git1 test
by Solofo Ramangalahy
Hello,
Testing crash with 2.6.25-rc2-git1 leads to:
..............................................................................
crash 4.0-4.13
[ ... snipped ... ]
please wait... (patching 33611 gdb minimal_symbol values)
crash: invalid structure member offset: tss_struct_ist
FILE: x86_64.c LINE: 682 FUNCTION: x86_64_ist_init()
[.../crash-4.0.4-13/bin/crash] error trace: 4522bb => 4cc1cb => 4ce2fb => 5029bd
5029bd: OFFSET_verify+159
4ce2fb: x86_64_ist_init+501
4cc1cb: x86_64_init+2903
4522bb: main_loop+115
WARNING: Because this kernel was compiled with gcc version 4.2.3, certain
commands or command options may fail unless crash is invoked with
the "--readnow" command line option.
..............................................................................
. This is probably also the case with less recent kernel versions (cannot
confirm as of now).
. The vmcore was the one produced by "echo c > /proc/sysrq-trigger"
. Advice of using --readnow does not suppress the error.
. Using --no_data_debug option allows to go further.
Regards,
--
solofo
16 years, 9 months
crash version 4.0-5.1 is available
by Dave Anderson
Note that this is essentially two releases; the 4.0-5.0 release
was an interim snapshot used for Red Hat errata releases.
It should also be noted another release should be forthcoming to
address the data structure changes introduced by the x86/x86_64
merger work done in the 2.6.25 kernel.
4.0-5.1:
- Update "ps -l" to use task_struct.sched_info.last_arrival value
on 2.6.23 and later kernels that don't have a task_struct.last_ran
member. Without the patch, the option would fail with the error
message: "ps: neither task_struct.last_run nor task_struct.timestamp
exist in this kernel". (anderson(a)redhat.com)
- Fix for potential initialization-time failure when running against
2.4-era x86 netdump dumpfiles if the ebp and esp contents in the
ELF header's NT_PRSTATUS register dump do not contain a vestige of
the panic task's kernel stack address. Without the patch, there may
be one or more warning messages complaining about tasks not being in
the PID hash, followed by a fatal error message: "crash: invalid
kernel virtual address: <bad-address> type: 32-bit KVADDR", where
the <bad-address> can be any bogus kernel virtual address.
(anderson(a)redhat.com)
- Fix to make the unused do_radix_tree() function work as advertised.
atyson(a)hp.com)
- Added zlib-devel to the crash-devel package-dependency Requires line
in the crash.spec file. (anderson(a)redhat.com)
4.0-5.0:
- Tentatively scheduled as the baseline version for RHEL4.7 and RHEL5.2
crash utility errata releases; also built in Fedora Rawhide:
4.0-5.0.0 - RHEL4.7 errata version
4.0-5.0.2 - RHEL5.2 errata version
4.0-5.0.3 - Fedora Rawhide (devel branch)
- Fix for a potential segmentation violation during crash session
initialization if a task's kernel stack has been completely overrun,
corrupting its thread_info structure at the bottom of the stack.
This could occur running against kernels from 2.6.8 through 2.6.18.
With the patch, the suspect task will be reported during the task
initialization sequence. (anderson(a)redhat.com)
- Fix for the "bt" command when run on xen x86 dom0 dumpfiles, which
may potentially show empty backtraces for one or more active tasks.
(oomichi(a)mxs.nes.nec.co.jp)
- Initial support for OpenVZ kernels. (kshileev(a)sw.ru)
Download from: http://people.redhat.com/anderson
16 years, 9 months
determining a "valid" vmcore
by Andrew Hecox
hello,
I'm looking at a customer issue where diskdumpmsg is unable to read a
vmcore file. It is not clear if this a problem with the vmcore file or
diskdumpmsg. I can load the vmcore with crash and in my naive usage of
it, can see no problems. However, I'm new to the tool so that doesn't
give me a lot of confidence.
Does anyone have any suggestions on how or if I can use crash to help
determine if there's corruption in the vmcore file? Or any other way of
approaching the problem?
Thanks much,
Andrew
16 years, 9 months
[PATCH] SIAL ps.c: Fix wrong access to .counter on non-SMP kernels
by Bernhard Walle
This patch fixes following SIAL error when loading ps.c SIAL sample script on a
non-SMP system (kernel):
File /usr/share/sial/crash/ps.c, line 130, Error: Expression \
for member 'counter' is not a struct/union
The problem behind is that mm_counter_t is defined as 'unsigned long' on
systems which have NR_CPUS < CONFIG_SPLIT_PTLOCK_CPUS (in practise for
distribution kernels, that are only UMP kernels -- I don't know how to test for
that condition in SIAL).
Signed-off-by: Bernhard Walle <bwalle(a)suse.de>
---
ps.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/ps.c
+++ b/ps.c
@@ -127,7 +127,16 @@ int getasattr(task_t *t, int f)
return t->mm->rss*4;
} else {
struct mm_struct *mm=t->mm?t->mm:t->active_mm;
- return (mm->_file_rss.counter+mm->_anon_rss.counter)*4;
+
+ /*
+ * on a SMP kernel (with a reasonable amount of NR_CPUS),
+ * the _anon_rss and _file_rss is a atomic_t, on a UMP kernel
+ * it's a normal integer
+ */
+ if (exists("smp_num_cpus") || exists("__per_cpu_offset"))
+ return (mm->_file_rss.counter+mm->_anon_rss.counter)*4;
+ else
+ return (mm->_file_rss+mm->_anon_rss)*4;
}
case 2:
return t->mm->total_vm*4;
16 years, 9 months
[PATCH] SIAL {files, ps}.c: typedef task_t only for new kernels
by Bernhard Walle
The SIAL interpreter is confused by typedef'ing task_t in kernel versions that
already have that typedef in the kernel. The typedef was removed with kernel
2.6.18, so adding the typedef only when LINUX_RELEASE is greater than 2.6.17
fixes the problem.
Signed-off-by: Bernhard Walle <bwalle(a)suse.de>
---
files.c | 2 ++
ps.c | 3 +++
2 files changed, 5 insertions(+)
--- a/files.c
+++ b/files.c
@@ -129,7 +129,9 @@ sfiles_help()
" DENTRY INODE SUPERBLK TYPE PATH\n"+
" f745fd60 f7284640 f73a3e00 REG /var/spool/lpd/lpd.lock\n";
}
+#if LINUX_RELEASE > 0x020611
typedef struct task_struct task_t;
+#endif
void print_task_header(unsigned long tval, int newline)
{
--- a/ps.c
+++ b/ps.c
@@ -62,7 +62,10 @@ main()
return 1;
}
+#if LINUX_RELEASE > 0x020611
typedef struct task_struct task_t;
+#endif
+
struct mm_struct *x;
void
walk_tasks(string callback)
16 years, 9 months