September 2007 - Crash-utility - Crash Utility List Archives

Re: [Kgdb-bugreport] Problem getting kgdb to read kernel symbols. addresses shifted?

by Pete/Piet Delaney

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Derek Atkins wrote: Dave, I thought you would likely know what's going on here. How about helping out Derek? Sounds like a RedHat'ism and I kinda recall your mentioning it and apologizing for it as an unfortunate RedHat directive. - -piet > ebiederm(a)xmission.com (Eric W. Biederman) writes: > >> Derek Atkins <warlord(a)MIT.EDU> writes: >> >>> Well, gdb agrees with System.map, so I'm sure that gdb itself is >>> okay. It's certainly possible that that the kgdb stub is weird, >>> but /proc/kallsyms doesn't match System.map, and THAT'S what's >>> confusing me most of all. >> Ok. So we must have a relocatable kernel that figures it has been >> relocated. Interesting. >> >> What is your bootloader? > > GRUB > >> What is your kernel version? > > 2.6.22.5-76_kgdb0.fc7-i686 > >> What is your kernel config? > > See the attached .config file. > >> The only time I would expect to see what you are seeing is if >> you are debugging the kdump kernel, which doesn't sound like >> the case. > > Nope. I started with the Fedora 'i686' config and then patched > in the kgdb patches and configuration. > >> If we actually have a truly offset kernel then while things >> may not be perfect this is at least expected. I don't think >> I have heard of anyone handling this case very well. > > :( Like I said before, it SEEMS to work okay by telling GDB > to load in at a different address. > >>> Which was how long ago? ;) >> Long enough ago that I don't remember when ;) > > Heh. > >> Eric > > -derek > > > > ------------------------------------------------------------------------ > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > ------------------------------------------------------------------------ > > _______________________________________________ > Kgdb-bugreport mailing list > Kgdb-bugreport(a)lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG/Z0uJICwm/rv3hoRAl1uAJ9QoR5DhfUGCccgz9KFpEpHkbvaaACdGG2z 9LxB5RdtsUi9IrTKzbPpB1U= =gQee -----END PGP SIGNATURE-----

18 years, 3 months

1
0
0 / 0

invalid regs display in bt

by Richard J Moore

I've been puzzling over why the regs formatted with a backtrace on an IA32 dump are invalid. Here's what I mean: PID: 2692 TASK: f4656630 CPU: 0 COMMAND: "rmmod" #0 [f463ce54] crash_kexec at c044a1f7 #1 [f463ce9c] die at c040651a #2 [f463ced4] do_page_fault at c0603107 #3 [f463cf14] error_code (via page_fault) at c060190a EAX: 00000018 EBX: f8b43400 ECX: f8b4304f EDX: 00200000 DS: 007b ESI: 00000000 ES: 007b EDI: 00000000 SS: 304f ESP: f8b4302b EBP: f463c000 CS: 0060 EIP: f8b43004 ERR: ffffffff EFLAGS: 00210286 They are supposed to represent a valid set of regs that are presented to do_page_fault, which I presume are meant to be valid at the time the exception occurred. Of they can never be a set of valid regs for the simple reason that the CPL is 0 (CS=60) and the RPL of SS is 3, which is an automatic GPF. Since I manufactured the exception that caused this dump, by causing an unrecoverable page fault in ring 0, I known the CS is correct but SS is bogus. Furthermore the the error code (ERR), which is stored by the processor as part of the exception stack frame uses only bits 0-2 for page faults and at most bits 0-15 for other exceptions, the unused bit positions are zero. So ERR is also bogus. On looking at the code in entry.S at page_fault and the other exception entry points I see no attempt to save regs to create a pt_regs struct. The fact that do_page_fault takes pt_regs as the first arg is a hack to get at CS:EIP and SS:ESP at the time of exception. Furthermore error_code loads the exception error code into edx then wipes it out from the stack by storing -1 into this location. I can't actually see a good reason for wiping out the error code. By convention exceptions and interrupts have a -ve integer stored at the error-code location to distinguish them from system calls, but I don't think this is used. signal.c seems to be the only place to look for an error code >=0 but I don't see an exception affects signal.c Can anyone confirm whether setting the error code to -1 is essential. If it isn't then I think we should consider leaving it in place. The long and short of it is: the only thing that has any meaning is CS, EIP and EFLAGS. All of which are saved by the processor. SS and ESP are only saved when the exception occurred at a privilege level >0 but these can never generate a panic. I'd recommend that we change the bt output to format only the three valid regs (possibly SS and ESP, if CPL at time of exception >0). Is there any reason why this shouldn't be changed? Richard Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

18 years, 3 months

3
9
0 / 0

Re: [Crash-utility] crash version 4.0-4.7 is available

by Luc Chouinard

Yes, let's use __ia64__ for uniformity. I'll patch it in. Luc ----- Original Message ---- From: Dave Anderson <anderson(a)redhat.com> To: holzheu(a)linux.vnet.ibm.com; "Discussion list for crash utility usage, maintenance and development" <crash-utility(a)redhat.com>; lucchouina(a)yahoo.com Sent: Wednesday, September 26, 2007 11:57:37 AM Subject: Re: [Crash-utility] crash version 4.0-4.7 is available Michael Holzheu wrote: > On Wed, 2007-09-26 at 10:03 -0400, Dave Anderson wrote: > >>Any results on or s390/s390x? > > > Great to have sial in crash! On s390/s390x it compiles with the > following patch: > Michael, thanks for the test and patch. Luc, this one is also looks OK, but maybe the ia64 patch should use __ia64__, which is what the crash utility code itself depends upon? Thanks, Dave > --- > > diff -Naurp crash-4.0-4.7/extensions/libsial/sial_api.h crash-4.0-4.7-sial-fix-s390/extensions/libsial/sial_api.h > --- crash-4.0-4.7/extensions/libsial/sial_api.h 2007-09-25 17:01:56.000000000 +0200 > +++ crash-4.0-4.7-sial-fix-s390/extensions/libsial/sial_api.h 2007-09-26 17:30:58.000000000 +0200 > @@ -13,6 +13,8 @@ > #define ABI_MIPS 1 > #define ABI_INTEL_X86 2 > #define ABI_INTEL_IA 3 > +#define ABI_S390 4 > +#define ABI_S390X 5 > > /* types of variables */ > #define V_BASE 1 > diff -Naurp crash-4.0-4.7/extensions/sial.c crash-4.0-4.7-sial-fix-s390/extensions/sial.c > --- crash-4.0-4.7/extensions/sial.c 2007-09-25 17:01:56.000000000 +0200 > +++ crash-4.0-4.7-sial-fix-s390/extensions/sial.c 2007-09-26 17:31:05.000000000 +0200 > @@ -807,17 +807,25 @@ _init() /* Register the command set. */ > /* set api, default size, and default sign for types */ > #ifdef i386 > #define SIAL_ABI ABI_INTEL_X86 > -#else > +#else > #ifdef ia64 > #define SIAL_ABI ABI_INTEL_IA > #else > #ifdef __x86_64__ > #define SIAL_ABI ABI_INTEL_IA > #else > +#ifdef __s390__ > +#define SIAL_ABI ABI_S390 > +#else > +#ifdef __s390x__ > +#define SIAL_ABI ABI_S390X > +#else > #error sial: Unkown ABI > #endif > #endif > #endif > +#endif > +#endif > sial_apiset(&icops, SIAL_ABI, sizeof(long), 0); > > sial_version(); > > ____________________________________________________________________________________ Shape Yahoo! in your own image. Join our Network Research Panel today! http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7

18 years, 3 months

2
1
0 / 0

To display of local variables with stack frames

by Neeraj kushwaha

Hi All, This is what I'm trying to do, I have a pretty extensive module for linux which does oops and panic once in a while. After the panic, I would like to find out from the coredump where the panic happened and why it happened. This would require examining the stack trace for the task that caused the dump and also the local variables on the function stack. I can get the stack trace using "trace" which is great but is there any way to get into the stack look at the local variables for each of the functions on the stack, like gdb? Can anyone rescue me on this issue or give some idea how to go about it to implement it. Thanks & Regards Neeraj

18 years, 3 months

2
1
0 / 0

crash version 4.0-4.7 is available

by Dave Anderson

- Incorporation of Luc Chouinard's SIAL interpreter (Simple Image Access Language) as a crash extension module. When loaded with the "extend" command, the sial.so module provides three commands, "load" to load a SIAL script, "unload" to unload it, and "edit", which unloads the script, brings up an $EDITOR-based edit session of the script, and then loads it again. Also, when the sial.so module is loaded, it will automatically load any SIAL scripts found in the /usr/share/sial/crash or $HOME/.sial directories. Therefore, by putting "extend <path-to>/sial.so" in either ./.crashrc or $HOME/.crashrc, all desired SIAL scripts may be loaded on a particular machine in a hands-off manner. For details, consult the README and README.sial files in the extensions/libsial subdirectory. (lucchouina(a)yahoo.com) - Removed hardwired-dependencies in the top-level and extensions subdirectory Makefiles for building extension modules. Now it is possible to copy an extension module's .c file into the extensions subdirectory, and enter "make extensions" from the top-level to build it. If the build of the module requires special handling, a .mk makefile with the same prefix as the .c file may be provided, and and it will be automatically used to build it. (jmoyer(a)redhat.com, anderson(a)redhat.com) - When a 32-bit x86 xenU guest is run on an x86_64 dom0 host, the new-style xen ELF format dumpfile contains an ELF header with an e_machine type of EM_X86_64 (instead of EM_386). This was getting gets rejected with the error message "crash: vmcore: not a supported file format". The fix simply acceptes the e_machine type mismatch, since the new-style ELF format dumpfiles are 64-bit by default. (anderson(a)redhat.com) - Enhanced the "kmem <address>" option to also search for task_struct and kernel stack addresses, and report them with the "set" output. Also, fix for when "kmem <vmalloc-address>" was entered, the header for the mem_map data was not displayed. (anderson(a)redhat.com) - Fix for determining starting rip/rsp backtrace hooks for the panic task in x86_64 xen dom0 kdumps; newer kernels have replaced the call to "xen_machine_kexec" with "machine_kexec", and without this patch may display back-traces with missing frames. Also on x86_64 non-xen kdump panic task backtraces, it is possible that the wrong stack instance of "crash_kexec" is used as the starting hook, which may also lead to missing frames. (anderson(a)redhat.com) - Fix for ia64 LKCD dumpfiles where it is not possible to read the task structure of the task that follows a task which is in the task address "fixup list", and zeroes are returned instead. (atyson(a)hp.com) - Fix for potential "mod -[sS]" failures with modules whose object files contain an unusually large number of sections; module loading attempts may issue a "<segmentation violation in gdb>" message followed by the error message: "mod: [module name]: gdb add-symbol-file command failed". (carl.hsieh(a)teradata.com, anderson(a)redhat.com) - Fix to prevent dumpfile reads beyond EOF when reading new (optimized) xen ELF core xendumps. Without the patch, error messages of the sort: "crash: cannot read index page [number]" may occur during session initialization, with unpredictable run-time results. (yamahata(a)valinux.co.jp) - In x86_xen_kdump_p2m_create(), the same variable was being used as the for-loop index in both an outer and an embedded inner for-loop. As a result, if debug level was equal to or larger than 7, the outer for-loop was repeated only once. (nishimura(a)mxp.nes.nec.co.jp) Download from: http://people.redhat.com/anderson

18 years, 3 months

4
11
0 / 0

Re: SIAL on s390

by Luc Chouinard

I was traveling today. Just got back. Sorry. Its going to take a little while to iron out the bugs. Cliff, Micheal and other power users or sial, please interact directly with me while we make it through this debug period. I'll respond as quick as I can but, of course, the day job takes priority :) Micheal, let's work this one offline. Thanks, Luc ----- Original Message ---- From: Dave Anderson <anderson(a)redhat.com> To: holzheu(a)linux.vnet.ibm.com; "Discussion list for crash utility usage, maintenance and development" <crash-utility(a)redhat.com>; lucchouina(a)yahoo.com Sent: Wednesday, September 26, 2007 2:13:40 PM Subject: Re: SIAL on s390 Michael Holzheu wrote: > On Wed, 2007-09-26 at 10:03 -0400, Dave Anderson wrote: > >>Any results on or s390/s390x? > > > I tried some of our s390 service scripts. It seems that sial in crash > has some problems accessing kernel datatype information: > > We have the following code in one of our sial scripts: > > ... > offset = offsetof(struct klist_node, n_node) + > offsetof(struct device, knode_driver) + > offsetof(struct subchannel, dev); > > > crash> devices > File /root/service/sial/devices.sial, line 171, Error: Unknown member > name [n_node] > > But crash knows that "struct klist_node" has member n_node: > > crash> whatis klist_node > struct klist_node { > struct klist *n_klist; > struct list_head n_node; > struct kref n_ref; > struct completion n_removed; > } > > The same script works fine using lcrash. > > Michael I've changed the subject, and forward this to Luc. He seems to be absent today, or perhaps is ignoring crash-utility email, or mail with the "crash version 4.0-4.7 is available" subject, or whatever. In any case, with all sial-related queries, patches, etc., please also cc: your post to Luc's direct email address lucchouina(a)yahoo.com. Thanks, Dave ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kid...

18 years, 3 months

2
1
0 / 0

[PATCH] Add file size check at the beginning

by Bernhard Walle

I had a bug report because of a truncated vmcore file. The error message just was crash: read error: kernel virtual address: ffff8107f3c3a6c0 type: "cpu_pda entry" which is very unclear for a 'normal' user. This patch adds checking of the file size according to the ELF header at the beginning so that a clear error message can be printed. Please consider adding the patch to crash. Signed-off-by: Bernhard Walle <bwalle(a)suse.de> --- defs.h | 1 + netdump.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+) --- a/defs.h +++ b/defs.h @@ -16,6 +16,7 @@ * GNU General Public License for more details. */ +#define _LARGEFILE64_SOURCE 1 /* stat64() */ #ifndef GDB_COMMON #include <stdio.h> --- a/netdump.c +++ b/netdump.c @@ -33,6 +33,47 @@ static physaddr_t xen_kdump_p2m(physaddr #define ELFREAD 0 #define MIN_PAGE_SIZE (4096) + + +static int +check_netdump_filesize(char *file) +{ + uint64_t max_file_offset = 0; + struct pt_load_segment *pls; + struct stat64 stat; + int i, ret; + + + /* find the maximum file offset */ + for (i = 0; i < nd->num_pt_load_segments; i++) { + uint64_t end, size; + + pls = &nd->pt_load_segments[i]; + + size = pls->phys_end - pls->phys_start; + end = pls->file_offset + size; + + if (end > max_file_offset) + max_file_offset = end; + } + + ret = stat64(file, &stat); + if (ret < 0) { + fprintf(stderr, "Cannot stat64 on %s: %s\n", file, + strerror(errno)); + return FALSE; + } + + if (max_file_offset > stat.st_size) { + fprintf(stderr, "File %s is too short:\n" + "Must be %lld bytes but is only " + "%lld bytes long.\n", + file, max_file_offset, stat.st_size); + return FALSE; + } + + return TRUE; +} /* * Determine whether a file is a netdump/diskdump/kdump creation, @@ -267,6 +308,9 @@ is_netdump(char *file, ulong source_quer if (CRASHDEBUG(1)) netdump_memory_dump(fp); + if (!check_netdump_filesize(file)) + return FALSE; + return nd->header_size; bailout:

18 years, 3 months

2
4
0 / 0

crash and SLES 9/10 kernels

by Daniel Li

Just want to share my findings: From what I saw, the debug smp kernels found in 'kernel-smp-debuginfo' packages for SLES10 (for SLES9SP3, it is 'kernel-smp-debug') from the KOTD ftp site (ftp://ftp.suse.com/pub/projects/kernel/kotd/) work well with 'crash'. All you need to do is to feed it along with the corresponding non-debug kernel to 'crash', no matter you are using it on a live system or on dumps. There must be something I missed while building SMP debug kernels myself from the source, although I still don't know what that was. Later, Daniel

18 years, 3 months

1
0
0 / 0

RFC: crash extension module handling

by Dave Anderson

Lucio's post of his latest Cell/B.E. SPU commands extension module: https://www.redhat.com/archives/crash-utility/2007-September/msg00041.html leads to a larger discussion of how extension modules should best be handled. In Lucio's patch, he modified the top-level Makefile and the extensions subdirectory Makefile to build his extension module, following the directions in the extensions/Makefile file itself, which states: # To add a new extension object: # # - add the new source file to the EXTENSION_SOURCE_FILES list # in the top-level Makefile # - add the object file name to the EXTENSION_OBJECT_FILES list # in the top-level Makefile # - create a compile stanza below, typically using "echo.so" as # a base template. Now, currently in the extensions subdirectory are two extension modules, echo.c and dminfo.c, and there are explicit compile lines for the two of them, which are identical in nature: echo.so: ../defs.h echo.c gcc -nostartfiles -shared -rdynamic -o echo.so echo.c -fPIC \ -D$(TARGET) $(TARGET_CFLAGS) dminfo.so: ../defs.h dminfo.c gcc -nostartfiles -shared -rdynamic -o dminfo.so dminfo.c -fPIC \ -D$(TARGET) $(TARGET_CFLAGS) and in Lucio's patch, he follows the template: +spu.so: ../defs.h spu.c + gcc -nostartfiles -shared -rdynamic -o spu.so spu.c -fPIC \ + -D$(TARGET) $(TARGET_CFLAGS) That's all well and good, but I've decided that I really don't want the crash source package to be a placeholder of this, and numerous other, extension module source files. My feeling is that the crash package should only have to deal with the *mechanism* of handling dynamic extension modules, and not the modules' source code itself. A little history first... The "echo.c" extension module is simply a template from which a new module can be created. The "dminfo.c" extension module is a special-case Red Hat extension module, created as an alternative to the device-mapper guys from having to write their own utility that would do the same kind of kernel-memory grok'ing that crash does. To be honest, I'd prefer it if it were not there in the crash source tree, but there was no acceptable alternative at the time. Furthermore, since that dminfo.c module was added, the crash src.rpm now creates an additional "crash-devel" rpm, which consists of just the defs.h file, which is the only thing needed for building an extension module, and which is installed in /usr/include/crash/defs.h. So, for example, the systemtap folks have their own crash extension module package that they control on their own. They can simply install the crash-devel package -- without having to install the crash source package -- and with their own package Makefile, they compile their stap module, basically doing their own thing "elsewhere". Had that been in place at the time, the dminfo guys could have done the same type of thing. That being said, it is still convenient for many to be able to do the module building from within a crash source tree's extensions subdirectory. Now, given that extension modules should be compile'able in precisely the same manner as echo.c and dminfo.c, what I'd like to do is simply have the "make extensions" command from the top-level Makefile cause the extensions/Makefile to pick up *any* C file in the extensions subdirectory, and compile a module automatically -- without having to modify the top-level Makefile or extensions/Makefile. In other words, just throw your module's C file into the extensions subdirectory, enter "make extensions" from the top-level, and it gets build automatically. No changes required for the top-level Makefile, no changes required for the extensions/Makefile, nor any need to store a myriad of extension modules in the crash source tree. BTW, I'm perfectly willing to add an "extensions" repository accessible from my people page, where contributors can store their latest-and-greatest. Anyway, I've been tinkering with the extensions/Makefile to do such a thing, and have a crude addition that does just that, although it does the compile of all "new" C files every time whether they need it or not -- via the additional "contrib" target: 30c30 < all: link_defs $(OBJECTS) --- > all: link_defs $(OBJECTS) contrib 43a44,50 > contrib: > @for CFILE in `/bin/ls *.c | grep -v echo.c | grep -v dminfo.c | grep -v sial.c`; do \ > OUTPUT=`echo $$CFILE | cut -d. -f1`.so; \ > echo "gcc -nostartfiles -shared -rdynamic -o $$OUTPUT $$CFILE -fPIC -D$(TARGET) $(TARGET_CFLAGS)"; \ > gcc -nostartfiles -shared -rdynamic -o $$OUTPUT $$CFILE -fPIC -D$(TARGET) $(TARGET_CFLAGS); \ > done > It prevents the re-compilation of echo.c and dminfo.c, and of Luc Chouinard's upcoming sial.c extension module. (SIAL is an alternative crash extension mechanism -- more on that when it's available...) Anyway, I've tried screwing around with the Makefile to use a generic *.so target, using $@, $(basename ...) and so on, but I'm not a Makefile master, and I cannot quite get it quite right, although I'm sure it can be done. So if anybody out there can do it cleaner than the "contrib" target above, I'd like to take a look. Thanks, Dave

18 years, 3 months

3
8
0 / 0

Re: [Crash-utility] Cell/B.E. SPU commands extension

by Lucio Correia

> Lucio Correia wrote: > > Hi, > > > > I've developed this crash extension to analyze SPU specific data for > > Cell/B.E. processor. This extension makes use of some important data > > saved by this kernel patch (that is not mainline yet) > > http://ozlabs.org/pipermail/cbe-oss-dev/2007-May/001848.html during the > > crash dump. > > > > I would like to check if there is any issue with the code. > > > > Functionally it looks fine. > > I changed the _init() function to just do an error(INFO, ...) > so I could load the extensions, and the only suggestion > I can make is purely aesthetic, which would be to make > the "help" messages 80 characters or less like the regular > commands are. In other words, the "DESCRIPTION" section > outputs, and the sentences in in the "spuctx" EXAMPLE > section are kind of ugly the way that they run on with > no linefeeds. > > But like I said before, I don't see any issues/problems > with the code -- pretty nifty extension... > > Dave > Thanks for the comments, Dave. I'm correcting these issues. Regards, -- Lucio Correia Software Engineer IBM LTC Brazil

18 years, 3 months

4
5
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility September 2007