October 2005 - Crash-utility - Crash Utility List Archives

CA SEOS module causes heartburn for crash(1)

by Kurtis D. Rader

I was asked to look at a s390 SUSE dump which had the Computer Associates SEOS product modules loaded (seos and ksymadd). They exported symbols with addresses well below the address space for modules. For example, "dynamic_Seos_syscall_num" with an address of 0x4f90dc0 which is well below the base of the first module at 0x7880d000. This results in crash trying to mmap an anonymous 1.9 GB region. Which naturally fails on a s390 system where the address space is only 2 GiB in size. Anyone else run across this? Should crash be able to deal with this or should I simply tell the customer to stop using CA's SEOS product if they want us to look at their crash dumps? -- Kurtis D. Rader, Level 3 Linux Support ABC Service Center, Linux Change Team T/L 775-3714, DID +1 503-578-3714

19 years, 7 months

3
14
0 / 0

Re: [Crash-utility] crash 4.0-2.8 fails on 2.6.14-rc5 (EM64T)

by Dave Anderson

> There is no simple way to add #if KERNEL_VERSION > 2.6.10 > in the header file and leave the hardcoded values there ? > THIS_KERNEL_VERSION is based upon crash internal data variables in the kernel_table data structure that get initialized in kernel_init(PRE_GDB) based upon the contents of the kernel's "system_utsname" data structure read from memory or the dumpfile. I was mistaken in using the value of "_stext" as the qualifier, though, since the __START_KERNEL_map value of 0xffffffff80000000 is still the same. But there must be *some* difference in the symbol list that can be used to determine which set of address values to use. It could even be just the *existence* of some new kernel variable introduced as part of the change to the new scheme. Doing an "nm -Bn" on the old and new vmlinux files should yield something obvious. > bt -t seems to better. > > crash> bt 3144 > PID: 3144 TASK: ffff81011dd1e100 CPU: 0 COMMAND: "mingetty" > #0 [ffff81011d6b9c68] schedule at ffffffff803b12b3 > RIP: 000000377c7b85b2 RSP: 00007fffff87a110 RFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffffffff8010dc26 RCX: 00007fffff87a7b0 > RDX: 0000000000000001 RSI: 00007fffff87a8c7 RDI: 0000000000000000 > RBP: 00007fffff87aca0 R8: 00002aaaaaac9b00 R9: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000246 R12: 00007fffff87a900 > R13: 0000000000502d20 R14: 0000000000000000 R15: 000000007c92d8c0 > ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b > crash> bt -t 3144 > PID: 3144 TASK: ffff81011dd1e100 CPU: 0 COMMAND: "mingetty" > START: thread_return (schedule) at ffffffff803b12b3 > [ffff81011d6b9d10] do_con_write at ffffffff802689da > [ffff81011d6b9d80] schedule_timeout at ffffffff803b1e4e > [ffff81011d6b9db0] _spin_lock_irqsave at ffffffff803b28ce > [ffff81011d6b9dc0] add_wait_queue at ffffffff8014cf5c > [ffff81011d6b9de0] read_chan at ffffffff8025d1f7 > [ffff81011d6b9e48] default_wake_function at ffffffff80130c90 > [ffff81011d6b9e78] default_wake_function at ffffffff80130c90 > [ffff81011d6b9e90] tty_ldisc_deref at ffffffff802571c4 > [ffff81011d6b9ed0] tty_read at ffffffff802575ee > [ffff81011d6b9f10] vfs_read at ffffffff80183a46 > [ffff81011d6b9f40] sys_read at ffffffff80183e03 > [ffff81011d6b9f80] system_call at ffffffff8010dc26 > RIP: 000000377c7b85b2 RSP: 00007fffff87a110 RFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffffffff8010dc26 RCX: 00007fffff87a7b0 > RDX: 0000000000000001 RSI: 00007fffff87a8c7 RDI: 0000000000000000 > RBP: 00007fffff87aca0 R8: 00002aaaaaac9b00 R9: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000246 R12: 00007fffff87a900 > R13: 0000000000502d20 R14: 0000000000000000 R15: 000000007c92d8c0 > ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b > crash> > > I still don't understand what happens in x86_64_low_budget_back_trace_cmd() that causes the "bt" command to skip from the starting point in schedule() to the end, where it dumps the user-mode entry exception frame, unless the rsp has been bumped too high by the time it gets to this point: /* * Walk the process stack. */ for (i = (rsp - bt->stackbase)/sizeof(ulong); !done && (rsp < bt->stacktop); i++, rsp += sizeof(ulong)) { ...and that conceivably may have something to do with the exception stack problem. It's hard to say without being there... Thanks, Dave

19 years, 9 months

2
3
0 / 0

Re: Re: [Crash-utility] crash 4.0-2.8 fails on 2.6.14-rc5 (EM64T)

by Dave Anderson

This message was bounced due to its size of its attachment; I've since bumped up the maximum allowable message size: Re: [Crash-utility] crash 4.0-2.8 fails on 2.6.14-rc5 (EM64T) Date: Wed, 26 Oct 2005 09:15:47 -0700 From: Badari Pulavarty <pbadari(a)us.ibm.com> To: <crash-utility(a)redhat.com> References: 1, 2, 3, 4 On Wed, 2005-10-26 at 11:51 -0400, Dave Anderson wrote: > > > > crash: read error: kernel virtual address: ffff8100050eb084 type: > > "tss_struct ist array" > > > > I see that the 2.6.13 kernel defines its init_tss > array like so: > > DEFINE_PER_CPU(struct tss_struct, init_tss) > ____cacheline_maxaligned_in_smp; > > whereas, the earlier 2.6 kernels do it like this: > > DECLARE_PER_CPU(struct tss_struct,init_tss); > > If this change modifies the way that per-cpu variable addresses > are laid out, then I can't tell you what to do without significant > further investigation. But until proven otherwise, let's presume > that the calculations of the per-cpu data is done the same way. > > There are two places where that error message comes from, both > in x86_64_ist_init(), but given that the above per-cpu declarations > are functionally equivalent, there would be the following > kernel symbol in your vmlinux, verifiable like so: > > $ nm -Bn vmlinux | grep per_cpu__init_tss > ffffffff80502100 D per_cpu__init_tss > $ > > If it's not there, crash is hosed, then signficant work needs > to be done to find it. But if the symbol is still intact in > the 2.6.14 kernel, the failure should have come from an incorrect > calculation of the vaddr of the init_tss below: None of the above stuff changed, so we are fine. > static void > x86_64_ist_init(void) > { > ... > > } else if (symbol_exists("per_cpu__init_tss")) { > for (c = 0; c < NR_CPUS; c++) { > if ((kt->flags & SMP) && (kt->flags & > PER_CPU_OFF)) { > if (kt->__per_cpu_offset[c] == 0) > break; > vaddr = symbol_value > ("per_cpu__init_tss") + > kt->__per_cpu_offset[c]; > } else > vaddr = symbol_value > ("per_cpu__init_tss"); > > vaddr += OFFSET(tss_struct_ist); > > readmem(vaddr, KVADDR, &ms->stkinfo.ebase > [c][0], > sizeof(ulong) * 7, "tss_struct ist > array", > FAULT_ON_ERROR); > Yes. I realized that the problem is due to messed up kt->__per_cpu_offset[c] value. These should be offset into the array, they should be small values. I see huge numbers. per-cpu offset: 84afdf60 I also realized that this gets set at the lines I touched earlier :( I can't seem to find out what I screwed up. We are just reading a value from the kernel structure and setting it. > if (ms->stkinfo.ebase[c][0] == 0) > break; > } > } > > I'm also presuming your test kernel is SMP. But I'm wondering > whether > the SMP and PER_CPU_OFF flags are set? Yes. > The SMP flag should have been pre-set in kernel_init(), but the > PER_CPU_OFF flag gets set in x86_64_cpu_pda_init(), which you > have modified. > > You can display the kt->flags contents with a printk x86_64_ist_init > (). > If PER_CPU_OFF is not set, then that's probably the issue here. > > Can you show your new versions of x86_64_cpu_pda_init() and > x86_64_get_smp_cpus()? Here are new versions of x64-64 for your review. Thanks, Badari

19 years, 9 months

2
22
0 / 0

Cash white paper.

by Troy Heber

Hi Dave, I would like to include the Crash white paper as part of the documentation we ship to customers. However, there is no license listed which equals no right to distribute. So I'm hoping for it to be released GPL or, if that's not possible, explicit permission to redistribute an unmodified copy of the white paper here: http://people.redhat.com/anderson/.crash_whitepaper Troy

19 years, 9 months

2
6
0 / 0

crash 4.0-2.8 fails on 2.6.14-rc5 (EM64T)

by Badari Pulavarty

Hi, I am getting following failures from "crash" when tried running on 2.6.14-rc5 on EM64T machine. Is this a known problem ? Thanks, Badari [root@localhost crash-4.0-2.8]# crash --readnow crash 4.0-2.8 Copyright (C) 2002, 2003, 2004, 2005 Red Hat, Inc. Copyright (C) 2004, 2005 IBM Corporation Copyright (C) 1999-2005 Hewlett-Packard Co Copyright (C) 1999, 2002 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... crash: invalid structure member offset: x8664_pda_level4_pgt FILE: x86_64.c LINE: 332 FUNCTION: x86_64_cpu_pda_init() [/usr/bin/crash] error trace: 4456f3 => 4a6c8e => 4a7c0c => 4d0c1e 4d0c1e: OFFSET_verify+117 4a7c0c: x86_64_cpu_pda_init+771 4a6c8e: x86_64_init+1522 4456f3: main_loop+50

19 years, 9 months

2
4
0 / 0

crash version 4.0-2.8 is available

by Dave Anderson

Thanks to Jun'ichi Nomura of NEC for addressing an annoying issue with the "mod" command when a registered kernel module name string does not directly relate to its module object file. For example, the "dm_mod" module comes from the "dm-mod.ko" object file, so the "mod -S" or "mod -s dm_mod" commands would fail to load the debug data for that module: crash> mod -s dm_mod mod: cannot find or load object file for dm_mod module crash> Since this inconsistency seems to be always due to there being underscores in the kernel name string and dashes in the object filename, Jun'ichi's patch retries all module debug data load failures after replacing the underscores with dashes, and voila, it finds the object file: crash> mod -s dm_mod MODULE NAME SIZE OBJECT FILE ffffffffa01a5380 dm_mod 66433 lib/modules/2.6.9-22.2.ELsmp/kernel/drivers/md/dm-mod.ko crash> And, as expected, "mod -S" now finds and loads them all. That is the only change in 4.0-2.8. Thanks, Dave

19 years, 9 months

1
0
0 / 0

crash and the "hugemem" kernel

by Kurtis D. Rader

Should crash be able to read a RHEL 3 hugemem dump? I've got a x86 14 GiB netdump vmcore taken under controlled conditions (e.g., system was booted and a crash dump manually invoked) that the crash(1) command doesn't like: vmcore: initialization failed netdump_data: flags: 5 (NETDUMP_LOCAL|NETDUMP_ELF32) ndfd: 3 ofp: 4b2b95e0 header_size: 4096 num_pt_load_segments: 1 pt_load_segment[0]: file_offset: 1000 phys_start: 0 phys_end: 90000000 netdump_header: 838a700 elf32: 838a700 notes32: 838a734 load32: 838a754 elf64: 0 notes64: 0 load64: 0 nt_prstatus: 838a774 nt_prpsinfo: 838a814 nt_taskstruct: 838a8a0 task_struct: 0 switch_stack: 0 Elf32_Ehdr: e_ident: \177ELF e_ident[EI_CLASS]: 1 (ELFCLASS32) e_ident[EI_DATA]: 1 (ELFDATA2LSB) e_ident[EI_VERSION]: 1 (EV_CURRENT) e_ident[EI_OSABI]: 0 (ELFOSABI_SYSV) e_ident[EI_ABIVERSION]: 0 e_type: 4 (ET_CORE) e_machine: 3 (EM_386) e_version: 1 (EV_CURRENT) e_entry: 0 e_phoff: 34 e_shoff: 0 e_flags: 0 e_ehsize: 34 e_phentsize: 20 e_phnum: 2 e_shentsize: 0 e_shnum: 0 e_shstrndx: 0 Elf32_Phdr: p_type: 4 (PT_NOTE) p_offset: 116 (74) p_vaddr: 0 p_paddr: 0 p_filesz: 396 (18c) p_memsz: 0 (0) p_flags: 0 () p_align: 0 Elf32_Phdr: p_type: 1 (PT_LOAD) p_offset: 4096 (1000) p_vaddr: c0000000 p_paddr: 0 p_filesz: 2415919104 (90000000) p_memsz: 2415919104 (90000000) p_flags: 7 (PF_X|PF_W|PF_R) p_align: 4096 Elf32_Nhdr: n_namesz: 4 ("CORE") n_descsz: 144 n_type: 1 (NT_PRSTATUS) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 023b0dc0 00000001 00000001 00000063 00000006 00000063 00000000 00000068 00000068 66d80000 73d80033 ffffffff 021d0950 00000060 00010286 96c15f3c 00000068 00000000 Elf32_Nhdr: n_namesz: 4 ("CORE") n_descsz: 124 n_type: 3 (NT_PRPSINFO) 00005200 00000000 00000000 00000000 00000000 00000000 00000000 696c6d76 0078756e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Elf32_Nhdr: n_namesz: 4 ("CORE") n_descsz: 80 n_type: 4 (NT_TASKSTRUCT) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 -- Kurtis D. Rader, Level 3 Linux Support ABC Service Center, Linux Change Team T/L 775-3714, DID +1 503-578-3714

19 years, 9 months

2
1
0 / 0

crash version 4.0-2.7 is available

by Dave Anderson

4.0-2.7 changelog entry: Fixed x86_64 backtrace code to recognize 32-bit user code kernel entry exception frames (code segment selectors of 0x23) without issuing a "bt: WARNING: possibly bogus exception frame" message. Fixed x86_64 backtrace code to recognize in-kernel exception frames generated from module text in situations where the module data was not included in the dumpfile, such as in a netdump which resulted in a vmcore-incomplete file. (10/19/05)

19 years, 9 months

1
0
0 / 0

diskdump/x86_64 crash compatibility.

by Neil Horman

Hey all- If I have a RHEL4 crash dump taken from an AMD x86_64 machine, should I be able to examine that vmcore using crash-4.0.1 on a RHEL4 box running on EM64T hardware? I would have assumed that I would be able to, but I keep getting file format errors when trying to read the core. Neil -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman(a)redhat.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/

19 years, 10 months

3
6
0 / 0

Michael Holzheu/Germany/IBM is out of the office.

by Michael Holzheu

I will be out of the office starting 08/17/2005 and will not return until 03/01/2006. I am away for six months and will not read my mails in this time. For technical questions please contact Raimund Schroeder (Raimund.Schroeder(a)de.ibm.com).

19 years, 10 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility October 2005