Re: [Kgdb-bugreport] Problem getting kgdb to read kernel symbols. addresses shifted?
by Pete/Piet Delaney
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Derek Atkins wrote:
Dave, I thought you would likely know what's going on here.
How about helping out Derek? Sounds like a RedHat'ism and
I kinda recall your mentioning it and apologizing for it
as an unfortunate RedHat directive.
- -piet
> ebiederm(a)xmission.com (Eric W. Biederman) writes:
>
>> Derek Atkins <warlord(a)MIT.EDU> writes:
>>
>>> Well, gdb agrees with System.map, so I'm sure that gdb itself is
>>> okay. It's certainly possible that that the kgdb stub is weird,
>>> but /proc/kallsyms doesn't match System.map, and THAT'S what's
>>> confusing me most of all.
>> Ok. So we must have a relocatable kernel that figures it has been
>> relocated. Interesting.
>>
>> What is your bootloader?
>
> GRUB
>
>> What is your kernel version?
>
> 2.6.22.5-76_kgdb0.fc7-i686
>
>> What is your kernel config?
>
> See the attached .config file.
>
>> The only time I would expect to see what you are seeing is if
>> you are debugging the kdump kernel, which doesn't sound like
>> the case.
>
> Nope. I started with the Fedora 'i686' config and then patched
> in the kgdb patches and configuration.
>
>> If we actually have a truly offset kernel then while things
>> may not be perfect this is at least expected. I don't think
>> I have heard of anyone handling this case very well.
>
> :( Like I said before, it SEEMS to work okay by telling GDB
> to load in at a different address.
>
>>> Which was how long ago? ;)
>> Long enough ago that I don't remember when ;)
>
> Heh.
>
>> Eric
>
> -derek
>
>
>
> ------------------------------------------------------------------------
>
>
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Kgdb-bugreport mailing list
> Kgdb-bugreport(a)lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFG/Z0uJICwm/rv3hoRAl1uAJ9QoR5DhfUGCccgz9KFpEpHkbvaaACdGG2z
9LxB5RdtsUi9IrTKzbPpB1U=
=gQee
-----END PGP SIGNATURE-----
17 years, 3 months
invalid regs display in bt
by Richard J Moore
I've been puzzling over why the regs formatted with a backtrace on an IA32
dump are invalid. Here's what I mean:
PID: 2692 TASK: f4656630 CPU: 0 COMMAND: "rmmod"
#0 [f463ce54] crash_kexec at c044a1f7
#1 [f463ce9c] die at c040651a
#2 [f463ced4] do_page_fault at c0603107
#3 [f463cf14] error_code (via page_fault) at c060190a
EAX: 00000018 EBX: f8b43400 ECX: f8b4304f EDX: 00200000
DS: 007b ESI: 00000000 ES: 007b EDI: 00000000
SS: 304f ESP: f8b4302b EBP: f463c000
CS: 0060 EIP: f8b43004 ERR: ffffffff EFLAGS: 00210286
They are supposed to represent a valid set of regs that are presented to
do_page_fault, which I presume are meant to be valid at the time the
exception occurred.
Of they can never be a set of valid regs for the simple reason that the
CPL is 0 (CS=60) and the RPL of SS is 3, which is an automatic GPF.
Since I manufactured the exception that caused this dump, by causing an
unrecoverable page fault in ring 0, I known the CS is correct but SS is
bogus.
Furthermore the the error code (ERR), which is stored by the processor as
part of the exception stack frame uses only bits 0-2 for page faults and
at most bits 0-15 for other exceptions, the unused bit positions are zero.
So ERR is also bogus.
On looking at the code in entry.S at page_fault and the other exception
entry points I see no attempt to save regs to create a pt_regs struct. The
fact that do_page_fault takes pt_regs as the first arg is a hack to get at
CS:EIP and SS:ESP at the time of exception. Furthermore error_code loads
the exception error code into edx then wipes it out from the stack by
storing -1 into this location. I can't actually see a good reason for
wiping out the error code. By convention exceptions and interrupts have a
-ve integer stored at the error-code location to distinguish them from
system calls, but I don't think this is used. signal.c seems to be the
only place to look for an error code >=0 but I don't see an exception
affects signal.c
Can anyone confirm whether setting the error code to -1 is essential. If
it isn't then I think we should consider leaving it in place.
The long and short of it is: the only thing that has any meaning is CS,
EIP and EFLAGS. All of which are saved by the processor. SS and ESP are
only saved when the exception occurred at a privilege level >0 but these
can never generate a panic.
I'd recommend that we change the bt output to format only the three valid
regs (possibly SS and ESP, if CPL at time of exception >0). Is there any
reason why this shouldn't be changed?
Richard
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
17 years, 3 months
Re: [Crash-utility] crash version 4.0-4.7 is available
by Luc Chouinard
Yes, let's use __ia64__ for uniformity.
I'll patch it in.
Luc
----- Original Message ----
From: Dave Anderson <anderson(a)redhat.com>
To: holzheu(a)linux.vnet.ibm.com; "Discussion list for crash utility usage, maintenance and development" <crash-utility(a)redhat.com>; lucchouina(a)yahoo.com
Sent: Wednesday, September 26, 2007 11:57:37 AM
Subject: Re: [Crash-utility] crash version 4.0-4.7 is available
Michael Holzheu wrote:
> On Wed, 2007-09-26 at 10:03 -0400, Dave Anderson wrote:
>
>>Any results on or s390/s390x?
>
>
> Great to have sial in crash! On s390/s390x it compiles with the
> following patch:
>
Michael, thanks for the test and patch.
Luc, this one is also looks OK, but maybe the ia64 patch
should use __ia64__, which is what the crash utility code
itself depends upon?
Thanks,
Dave
> ---
>
> diff -Naurp crash-4.0-4.7/extensions/libsial/sial_api.h crash-4.0-4.7-sial-fix-s390/extensions/libsial/sial_api.h
> --- crash-4.0-4.7/extensions/libsial/sial_api.h 2007-09-25 17:01:56.000000000 +0200
> +++ crash-4.0-4.7-sial-fix-s390/extensions/libsial/sial_api.h 2007-09-26 17:30:58.000000000 +0200
> @@ -13,6 +13,8 @@
> #define ABI_MIPS 1
> #define ABI_INTEL_X86 2
> #define ABI_INTEL_IA 3
> +#define ABI_S390 4
> +#define ABI_S390X 5
>
> /* types of variables */
> #define V_BASE 1
> diff -Naurp crash-4.0-4.7/extensions/sial.c crash-4.0-4.7-sial-fix-s390/extensions/sial.c
> --- crash-4.0-4.7/extensions/sial.c 2007-09-25 17:01:56.000000000 +0200
> +++ crash-4.0-4.7-sial-fix-s390/extensions/sial.c 2007-09-26 17:31:05.000000000 +0200
> @@ -807,17 +807,25 @@ _init() /* Register the command set. */
> /* set api, default size, and default sign for types */
> #ifdef i386
> #define SIAL_ABI ABI_INTEL_X86
> -#else
> +#else
> #ifdef ia64
> #define SIAL_ABI ABI_INTEL_IA
> #else
> #ifdef __x86_64__
> #define SIAL_ABI ABI_INTEL_IA
> #else
> +#ifdef __s390__
> +#define SIAL_ABI ABI_S390
> +#else
> +#ifdef __s390x__
> +#define SIAL_ABI ABI_S390X
> +#else
> #error sial: Unkown ABI
> #endif
> #endif
> #endif
> +#endif
> +#endif
> sial_apiset(&icops, SIAL_ABI, sizeof(long), 0);
>
> sial_version();
>
>
____________________________________________________________________________________
Shape Yahoo! in your own image. Join our Network Research Panel today! http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7
17 years, 3 months
To display of local variables with stack frames
by Neeraj kushwaha
Hi All,
This is what I'm trying to do, I have a pretty extensive module for
linux which does oops and panic once in a while.
After the panic, I would like to find out from the coredump where the
panic happened and why it happened.
This would require examining the stack trace for the task that caused
the dump and also the local variables on the function stack. I can get
the stack trace using "trace" which is great
but is there any way to get into the stack look at the local variables
for each of the functions on the stack, like gdb?
Can anyone rescue me on this issue or give some idea how to go about
it to implement it.
Thanks & Regards
Neeraj
17 years, 3 months
crash version 4.0-4.7 is available
by Dave Anderson
- Incorporation of Luc Chouinard's SIAL interpreter (Simple Image
Access Language) as a crash extension module. When loaded with
the "extend" command, the sial.so module provides three commands,
"load" to load a SIAL script, "unload" to unload it, and "edit",
which unloads the script, brings up an $EDITOR-based edit session
of the script, and then loads it again. Also, when the sial.so
module is loaded, it will automatically load any SIAL scripts
found in the /usr/share/sial/crash or $HOME/.sial directories.
Therefore, by putting "extend <path-to>/sial.so" in either
./.crashrc or $HOME/.crashrc, all desired SIAL scripts may be
loaded on a particular machine in a hands-off manner. For details,
consult the README and README.sial files in the extensions/libsial
subdirectory. (lucchouina(a)yahoo.com)
- Removed hardwired-dependencies in the top-level and extensions
subdirectory Makefiles for building extension modules. Now it is
possible to copy an extension module's .c file into the extensions
subdirectory, and enter "make extensions" from the top-level to build
it. If the build of the module requires special handling, a .mk
makefile with the same prefix as the .c file may be provided, and
and it will be automatically used to build it.
(jmoyer(a)redhat.com, anderson(a)redhat.com)
- When a 32-bit x86 xenU guest is run on an x86_64 dom0 host, the
new-style xen ELF format dumpfile contains an ELF header with an
e_machine type of EM_X86_64 (instead of EM_386). This was getting
gets rejected with the error message "crash: vmcore: not a supported
file format". The fix simply acceptes the e_machine type mismatch,
since the new-style ELF format dumpfiles are 64-bit by default.
(anderson(a)redhat.com)
- Enhanced the "kmem <address>" option to also search for task_struct
and kernel stack addresses, and report them with the "set" output.
Also, fix for when "kmem <vmalloc-address>" was entered, the header
for the mem_map data was not displayed. (anderson(a)redhat.com)
- Fix for determining starting rip/rsp backtrace hooks for the panic
task in x86_64 xen dom0 kdumps; newer kernels have replaced the
call to "xen_machine_kexec" with "machine_kexec", and without this
patch may display back-traces with missing frames. Also on x86_64
non-xen kdump panic task backtraces, it is possible that the wrong
stack instance of "crash_kexec" is used as the starting hook, which
may also lead to missing frames. (anderson(a)redhat.com)
- Fix for ia64 LKCD dumpfiles where it is not possible to read the task
structure of the task that follows a task which is in the task address
"fixup list", and zeroes are returned instead. (atyson(a)hp.com)
- Fix for potential "mod -[sS]" failures with modules whose object
files contain an unusually large number of sections; module
loading attempts may issue a "<segmentation violation in gdb>"
message followed by the error message: "mod: [module name]: gdb
add-symbol-file command failed".
(carl.hsieh(a)teradata.com, anderson(a)redhat.com)
- Fix to prevent dumpfile reads beyond EOF when reading new (optimized)
xen ELF core xendumps. Without the patch, error messages of the sort:
"crash: cannot read index page [number]" may occur during session
initialization, with unpredictable run-time results.
(yamahata(a)valinux.co.jp)
- In x86_xen_kdump_p2m_create(), the same variable was being used as
the for-loop index in both an outer and an embedded inner for-loop.
As a result, if debug level was equal to or larger than 7, the outer
for-loop was repeated only once. (nishimura(a)mxp.nes.nec.co.jp)
Download from: http://people.redhat.com/anderson
17 years, 3 months
Re: SIAL on s390
by Luc Chouinard
I was traveling today. Just got back. Sorry.
Its going to take a little while to iron out the bugs.
Cliff, Micheal and other power users or sial, please interact directly with me while we make it through this debug period.
I'll respond as quick as I can but, of course, the day job takes priority :)
Micheal, let's work this one offline.
Thanks,
Luc
----- Original Message ----
From: Dave Anderson <anderson(a)redhat.com>
To: holzheu(a)linux.vnet.ibm.com; "Discussion list for crash utility usage, maintenance and development" <crash-utility(a)redhat.com>; lucchouina(a)yahoo.com
Sent: Wednesday, September 26, 2007 2:13:40 PM
Subject: Re: SIAL on s390
Michael Holzheu wrote:
> On Wed, 2007-09-26 at 10:03 -0400, Dave Anderson wrote:
>
>>Any results on or s390/s390x?
>
>
> I tried some of our s390 service scripts. It seems that sial in crash
> has some problems accessing kernel datatype information:
>
> We have the following code in one of our sial scripts:
>
> ...
> offset = offsetof(struct klist_node, n_node) +
> offsetof(struct device, knode_driver) +
> offsetof(struct subchannel, dev);
>
>
> crash> devices
> File /root/service/sial/devices.sial, line 171, Error: Unknown member
> name [n_node]
>
> But crash knows that "struct klist_node" has member n_node:
>
> crash> whatis klist_node
> struct klist_node {
> struct klist *n_klist;
> struct list_head n_node;
> struct kref n_ref;
> struct completion n_removed;
> }
>
> The same script works fine using lcrash.
>
> Michael
I've changed the subject, and forward this to Luc.
He seems to be absent today, or perhaps is ignoring
crash-utility email, or mail with the "crash version 4.0-4.7
is available" subject, or whatever.
In any case, with all sial-related queries, patches, etc.,
please also cc: your post to Luc's direct email address
lucchouina(a)yahoo.com.
Thanks,
Dave
____________________________________________________________________________________
Got a little couch potato?
Check out fun summer activities for kids.
http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kid...
17 years, 3 months
[PATCH] Add file size check at the beginning
by Bernhard Walle
I had a bug report because of a truncated vmcore file. The error message just
was
crash: read error: kernel virtual address: ffff8107f3c3a6c0 type: "cpu_pda
entry"
which is very unclear for a 'normal' user. This patch adds checking of the file
size according to the ELF header at the beginning so that a clear error message
can be printed.
Please consider adding the patch to crash.
Signed-off-by: Bernhard Walle <bwalle(a)suse.de>
---
defs.h | 1 +
netdump.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
--- a/defs.h
+++ b/defs.h
@@ -16,6 +16,7 @@
* GNU General Public License for more details.
*/
+#define _LARGEFILE64_SOURCE 1 /* stat64() */
#ifndef GDB_COMMON
#include <stdio.h>
--- a/netdump.c
+++ b/netdump.c
@@ -33,6 +33,47 @@ static physaddr_t xen_kdump_p2m(physaddr
#define ELFREAD 0
#define MIN_PAGE_SIZE (4096)
+
+
+static int
+check_netdump_filesize(char *file)
+{
+ uint64_t max_file_offset = 0;
+ struct pt_load_segment *pls;
+ struct stat64 stat;
+ int i, ret;
+
+
+ /* find the maximum file offset */
+ for (i = 0; i < nd->num_pt_load_segments; i++) {
+ uint64_t end, size;
+
+ pls = &nd->pt_load_segments[i];
+
+ size = pls->phys_end - pls->phys_start;
+ end = pls->file_offset + size;
+
+ if (end > max_file_offset)
+ max_file_offset = end;
+ }
+
+ ret = stat64(file, &stat);
+ if (ret < 0) {
+ fprintf(stderr, "Cannot stat64 on %s: %s\n", file,
+ strerror(errno));
+ return FALSE;
+ }
+
+ if (max_file_offset > stat.st_size) {
+ fprintf(stderr, "File %s is too short:\n"
+ "Must be %lld bytes but is only "
+ "%lld bytes long.\n",
+ file, max_file_offset, stat.st_size);
+ return FALSE;
+ }
+
+ return TRUE;
+}
/*
* Determine whether a file is a netdump/diskdump/kdump creation,
@@ -267,6 +308,9 @@ is_netdump(char *file, ulong source_quer
if (CRASHDEBUG(1))
netdump_memory_dump(fp);
+ if (!check_netdump_filesize(file))
+ return FALSE;
+
return nd->header_size;
bailout:
17 years, 3 months
crash and SLES 9/10 kernels
by Daniel Li
Just want to share my findings:
From what I saw, the debug smp kernels found in 'kernel-smp-debuginfo'
packages for SLES10 (for SLES9SP3, it is 'kernel-smp-debug') from the
KOTD ftp site (ftp://ftp.suse.com/pub/projects/kernel/kotd/) work well
with 'crash'. All you need to do is to feed it along with the
corresponding non-debug kernel to 'crash', no matter you are using it on
a live system or on dumps. There must be something I missed while
building SMP debug kernels myself from the source, although I still
don't know what that was.
Later,
Daniel
17 years, 3 months
RFC: crash extension module handling
by Dave Anderson
Lucio's post of his latest Cell/B.E. SPU commands extension module:
https://www.redhat.com/archives/crash-utility/2007-September/msg00041.html
leads to a larger discussion of how extension modules should best
be handled.
In Lucio's patch, he modified the top-level Makefile and the extensions
subdirectory Makefile to build his extension module, following the
directions in the extensions/Makefile file itself, which states:
# To add a new extension object:
#
# - add the new source file to the EXTENSION_SOURCE_FILES list
# in the top-level Makefile
# - add the object file name to the EXTENSION_OBJECT_FILES list
# in the top-level Makefile
# - create a compile stanza below, typically using "echo.so" as
# a base template.
Now, currently in the extensions subdirectory are two extension modules,
echo.c and dminfo.c, and there are explicit compile lines for the two
of them, which are identical in nature:
echo.so: ../defs.h echo.c
gcc -nostartfiles -shared -rdynamic -o echo.so echo.c -fPIC \
-D$(TARGET) $(TARGET_CFLAGS)
dminfo.so: ../defs.h dminfo.c
gcc -nostartfiles -shared -rdynamic -o dminfo.so dminfo.c -fPIC \
-D$(TARGET) $(TARGET_CFLAGS)
and in Lucio's patch, he follows the template:
+spu.so: ../defs.h spu.c
+ gcc -nostartfiles -shared -rdynamic -o spu.so spu.c -fPIC \
+ -D$(TARGET) $(TARGET_CFLAGS)
That's all well and good, but I've decided that I really don't
want the crash source package to be a placeholder of this, and
numerous other, extension module source files.
My feeling is that the crash package should only have to deal with
the *mechanism* of handling dynamic extension modules, and not
the modules' source code itself.
A little history first...
The "echo.c" extension module is simply a template from which
a new module can be created. The "dminfo.c" extension module
is a special-case Red Hat extension module, created as an alternative
to the device-mapper guys from having to write their own utility
that would do the same kind of kernel-memory grok'ing that crash
does. To be honest, I'd prefer it if it were not there in the
crash source tree, but there was no acceptable alternative at the
time.
Furthermore, since that dminfo.c module was added, the crash src.rpm now
creates an additional "crash-devel" rpm, which consists of just
the defs.h file, which is the only thing needed for building an
extension module, and which is installed in /usr/include/crash/defs.h.
So, for example, the systemtap folks have their own crash extension
module package that they control on their own. They can simply
install the crash-devel package -- without having to install the
crash source package -- and with their own package Makefile, they
compile their stap module, basically doing their own thing "elsewhere".
Had that been in place at the time, the dminfo guys could have
done the same type of thing.
That being said, it is still convenient for many to be able to
do the module building from within a crash source tree's extensions
subdirectory. Now, given that extension modules should be
compile'able in precisely the same manner as echo.c and dminfo.c,
what I'd like to do is simply have the "make extensions" command
from the top-level Makefile cause the extensions/Makefile to pick
up *any* C file in the extensions subdirectory, and compile a module
automatically -- without having to modify the top-level Makefile or
extensions/Makefile.
In other words, just throw your module's C file into the extensions
subdirectory, enter "make extensions" from the top-level, and it gets
build automatically. No changes required for the top-level Makefile,
no changes required for the extensions/Makefile, nor any need to store
a myriad of extension modules in the crash source tree. BTW, I'm perfectly
willing to add an "extensions" repository accessible from my people page,
where contributors can store their latest-and-greatest.
Anyway, I've been tinkering with the extensions/Makefile to do such
a thing, and have a crude addition that does just that, although
it does the compile of all "new" C files every time whether they
need it or not -- via the additional "contrib" target:
30c30
< all: link_defs $(OBJECTS)
---
> all: link_defs $(OBJECTS) contrib
43a44,50
> contrib:
> @for CFILE in `/bin/ls *.c | grep -v echo.c | grep -v dminfo.c | grep -v
sial.c`; do \
> OUTPUT=`echo $$CFILE | cut -d. -f1`.so; \
> echo "gcc -nostartfiles -shared -rdynamic -o $$OUTPUT $$CFILE -fPIC
-D$(TARGET) $(TARGET_CFLAGS)"; \
> gcc -nostartfiles -shared -rdynamic -o $$OUTPUT $$CFILE -fPIC -D$(TARGET)
$(TARGET_CFLAGS); \
> done
>
It prevents the re-compilation of echo.c and dminfo.c, and of Luc
Chouinard's upcoming sial.c extension module. (SIAL is an alternative
crash extension mechanism -- more on that when it's available...)
Anyway, I've tried screwing around with the Makefile to use a generic
*.so target, using $@, $(basename ...) and so on, but I'm not a Makefile
master, and I cannot quite get it quite right, although I'm sure it can
be done.
So if anybody out there can do it cleaner than the "contrib" target above,
I'd like to take a look.
Thanks,
Dave
17 years, 3 months
Re: [Crash-utility] Cell/B.E. SPU commands extension
by Lucio Correia
> Lucio Correia wrote:
> > Hi,
> >
> > I've developed this crash extension to analyze SPU specific data for
> > Cell/B.E. processor. This extension makes use of some important data
> > saved by this kernel patch (that is not mainline yet)
> > http://ozlabs.org/pipermail/cbe-oss-dev/2007-May/001848.html during the
> > crash dump.
> >
> > I would like to check if there is any issue with the code.
> >
>
> Functionally it looks fine.
>
> I changed the _init() function to just do an error(INFO, ...)
> so I could load the extensions, and the only suggestion
> I can make is purely aesthetic, which would be to make
> the "help" messages 80 characters or less like the regular
> commands are. In other words, the "DESCRIPTION" section
> outputs, and the sentences in in the "spuctx" EXAMPLE
> section are kind of ugly the way that they run on with
> no linefeeds.
>
> But like I said before, I don't see any issues/problems
> with the code -- pretty nifty extension...
>
> Dave
>
Thanks for the comments, Dave. I'm correcting these issues.
Regards,
--
Lucio Correia
Software Engineer
IBM LTC Brazil
17 years, 3 months