CA SEOS module causes heartburn for crash(1)
by Kurtis D. Rader
I was asked to look at a s390 SUSE dump which had the Computer Associates
SEOS product modules loaded (seos and ksymadd). They exported symbols
with addresses well below the address space for modules. For example,
"dynamic_Seos_syscall_num" with an address of 0x4f90dc0 which is well
below the base of the first module at 0x7880d000. This results in crash
trying to mmap an anonymous 1.9 GB region. Which naturally fails on a
s390 system where the address space is only 2 GiB in size.
Anyone else run across this? Should crash be able to deal with this or
should I simply tell the customer to stop using CA's SEOS product if they
want us to look at their crash dumps?
--
Kurtis D. Rader, Level 3 Linux Support
ABC Service Center, Linux Change Team
T/L 775-3714, DID +1 503-578-3714
19 years
crash version 4.0-2.13 is available
by Dave Anderson
- Adapted Takao Indoh of Fujitsu's patch for determining proper size
of the ia64_init_stack; fixes empty ia64 "bt -a" output for cpu 8 and
above for diskdumps generated via OS_INIT.
- Applied a patch from Vladimir Kuznetsov to address a "net -s" error
"net: invalid structure member offset: inet_opt_daddr" error due to
the inet_opt structure being dropped between 2.6.10 and 2.6.11.
- Made the initialization-time rule such that if "bt -O" is contained
in any or all of the 3 possible initialization-time input files
($HOME/.crashrc, ./.crashrc, or "-i inputfile" files), the setting
will remain idempotent. Fixed the redundant running of $HOME/.crashrc
and ./.crashrc files if they are the same file.
- Added a gdb work-around/hack for ia64 initialization-time warning
"WARNING: cannot determine unw.tables offset" on rebuilt RHEL3 ia64
kernels that would prevent "bt" from working.
- Backed out 4.0-2.11 x86_64 pseudo-backtrace patch to show in-kernel
exception frame RIP and RSP values as a unique frame following the
register dump; instead, the exception RIP address is translated
and displayed prior to the register dump.
19 years, 1 month
bt command does not show stack traces of some CPUs.
by Takao Indoh
Hi, I found a problem on crash-4.0-2.12.
Summary:
bt command does not show stack traces of some CPUs.
Condition:
This problem happens only on ia64 machine.
There are two conditions to reproduce this problem.
1) Diskdump is executed via OS_INIT.
2) The machine has more than 8 CPUs.
Details:
When I executed bt command for vmcore which was created
on the 32 CPU machine, bt didn't show stack traces of some CPU.
Please see attached file(bt_failed.txt). Stack traces from CPU0 to
CPU7 are showed normally, but stack traces from CPU8 to CPU31 are not.
(Please don't worry about a message "unwind: bsp (xxxxxxxxx) out of
range". This is a problem of our platform.)
Cause:
I found a bug in ia64.c.
2679 ms->ia64_init_stack_size = get_array_length("ia64_init_stack",
2680 NULL, 0);
get_array_length() gets the length of stack of OS_INIT, and the
length is stored at ms->ia64_init_stack_size. However, the value
which get_array_length gives is different from actual stack length
because "ia64_init_stack" is declared like this:
u64 ia64_init_stack[NR_CPUS*KERNEL_STACK_SIZE/8];
Therefore, correct length of a stack is this:
get_array_length("ia64_init_stack", NULL, 0) * sizeof(u64)
I don't know how to fix, but it seems that attached patch
(ia64.c.patch) corrects this problem.
Another attached patch(test.patch) also seems to fix the problem,
but I don't know which is better.
Regards,
Takao Indoh
19 years, 1 month
crash-4.0.2.12 PPC64 changes to make it understand 64k pagesize
by Badari Pulavarty
Hi Dave & Haren,
Here are the changes I made to "crash" to function with PPC64
64k pagesize. Instead of adding whole set of indexs, shifts,
masks, macros and new vtop() routines - I generalized 4-level
pagetable support and set & compute indexes, shift and masks
correctly for 4K and 64K.
Tested with 4K pagesize and 64K pagesize kernels on PPC64.
Please review.
Thanks,
Badari
19 years, 1 month
Re: [Crash-utility] Miscellaneous fixes/enhancements to crash 4.0-2.10 (fwd)
by Castor Fu
On Thu, 10 Nov 2005, Dave Anderson wrote:
>> I also did not feel comfortable accepting the extension-keyword
>> stuff. While I do recognize that it would be useful to be
>> able to dynamically determine what extension modules to load,
>> I don't feel the extend command should be encumbered with the
>> job, but rather such an implementation-specific chore should
>> be handled by an instance of extension library code.
>>
>> For that reason I exported both the load_extension() and
>> unload_extension() functions so that extension library code
>> could use them to in turn load other extension libraries.
>>
>> The call to load_extension() could be made from either the
>> _init() function or from an extension command. At those
>> points in time, the "first" extension library will have all
>> the information (kernel version, crash version, dumpfile,
>> etc.) at its disposal, and then can make the decision as to
>> what additional libraries to load.
Implementing this, I realized that the for this scheme to work,
if I want to access things like the crash version, the 'pc'
structure will have to remain the same. Perhaps a small
set of these functions can be exported to reduce the dependencies
of such a loader on defs.h?
The minimal expectation would be something which would export
pc->program_version
If pc->curcmd were exported, that would also remove dependencies
on calling cmd_usage.
-castor
19 years, 1 month
One more change for crash
by Badari Pulavarty
Hi Dave,
I am playing with "crash" on my machine with it fails with
CONFIG_SPARSE_MEM. It looks like "node_mem_map" doesn't
exist for SPARSE_MEM :(
crash: invalid structure member offset: pglist_data_node_mem_map
FILE: memory.c LINE: 10053 FUNCTION: dump_memory_nodes()
[./crash] error trace: 100afe98 => 100d12f4 => 100d01cc => 10142f64
10142f64: .OFFSET_verify+140
100d01cc: .dump_memory_nodes+520
100d12f4: .node_table_init+492
100afe98: .vm_init+7948
Thanks,
Badari
typedef struct pglist_data {
struct zone node_zones[MAX_NR_ZONES];
struct zonelist node_zonelists[GFP_ZONETYPES];
int nr_zones;
#ifdef CONFIG_FLAT_NODE_MEM_MAP
struct page *node_mem_map;
#endif
struct bootmem_data *bdata;
19 years, 1 month
crash version 4.0-2.12 is available
by Dave Anderson
4.0-2.12 changelog:
Update to diskdump page_desc struct, required for ongoing support
of the diskdump facility's compression feature, currently under
development.
Applied patch from Ken'ichi Ohmichi of NEC to prevent a segmentation
violation during a "bt -f" on an x86_64 task that had taken a NMI
during cpu_idle().
Adapted Badari Pulavarty's patch for recognition of recent 2.6.14
kernel structure/member name changes: mm_struct._rss to _file_rss,
and the kmem_cache_s structure's renaming to kmem_cache. Without
the patch, crash sessions would fail during initialization with an
"crash: invalid structure member offset: kmem_cache_s_num" error,
and the "ps" command would fail with a "ps: invalid structure member
offset: mm_struct_rss" error.
19 years, 1 month
crash causes segmentation fault on x86_64 system.
by 大道憲一
Hi, I've found a problem on crash 4.0-2.
On x86_64 system, crash causes segmentation fault
by executing "bt -f" for the dumpfile created by NMI.
My System is as follows.
CPU : AMD Opteron(tm) Processor 252
arch : x86_64
memory : 16GB
kernel : 2.6.9-22.EL (RHEL4-U2)
crash : 4.0-2 (RHEL4-U2)
diskdumputils: 1.1.9-4 (RHEL4-U2)
The reproduction step is as follows.
1.Boot x86_64 kernel.
2.Start diskdump service.
3.Execute diskdump by pushing the NMI button.
4.Reboot x86_64 kernel.
5.Get the dumpfile by starting diskdump service.
6.Activate crash and execute "bt -f" for the dumpfile.
7.Segmentation fault after printing exception stack.
After printing the NMI exception frame, x86_64_low_budget_back_trace_cmd
calculates the next bt->frameptr without changing RSP. This will cause
the condition
bt->frameptr > rsp
in line x86_64.c:1097 at x86_64_display_full_frame,
causing the following loop to run continuously until it stops with
a segmentation fault.
The attached patch adds the sanity check (bt->frameptr < rsp) in
x86_64_display_full_frame.
The following example describes this problem when NMI occurs within
"default_idle".
@x86_64_low_budget_back_trace_cmd (x86_64.c:1367)
1.about Exception Stack (x86_64.c:1416)
a. Print Exception Stack.
b. Print Register Info(RIP,RSP) from Exception Stack as function before NMI exception.
The RIP points the text in "default_idle".
But the area pointed by RSP keeps the address of the text in "cpu_idle",
because RSP doesn't change while "default_idle" is running.
c. bt->frameptr = RSP + sizeof(ulong).
2.about Process Stack (x86_64.c:1655)
a. Try to print stack of "cpu_idle" in x86_64_display_full_frame.
b. bt->frameptr > RSP because of Section 1.c.
c. Cause segmentation fault.
Ken'ichi Ohmichi
19 years, 1 month
crash version 4.0-2.11 is available
by Dave Anderson
Adapted a number of proposed patches:
- Badari Pulavarty of IBM's implentation of support for 2.6.14
ppc64 kernel's use of 4-level page tables.
- Added a new "extensions" sub-directory for collecting crash
command extension libaries; initially populated with the sample
"echo.c" from the extend help page, along with a device-mapper
related "dminfo.c" module from NEC.
- Castor Fu of 3PAR's implementation of support for LKCD version 10,
as well the handling of single-bit errors in LKCD compressed
pages by trying out all possible single-bit errors. Also his
fixes for better recognizing -fomit-frame-pointer kernel builds,
a stronger defense against potential bogus processor numbers
associated with tasks in dumpfiles, and a fix to re-allow crash
builds for gcc 2.x compilers.
Fix for potential "vmcore: initialization failed" fatal error during
initializaton when using more than just a vmlinux and vmcore command
line arguments.
Fix for diskdump.c compile failures using gcc 2.96.
Update to the x86_64 pseudo-backtrace code to show as a frame the
RSP, RIP and name of the function causing a kernel-mode exception
frame.
Fix for the x86_64 pseudo-backtrace code to not neglect to show the
user-mode exception frame when that task subsequently took a
kernel-mode exception.
Exported the load_extension() and unload_extension() functions so
that they can be called from an extension library.
19 years, 1 month