----- "Hedi Berriche" <hedi(a)sgi.com> wrote:
Context:
- crash-5.0.1
- glibc 2.4
- vmcore produced by x86_64 sles11 2.6.27.19-5-default
Problem:
crash> mod -s xfs /usr/people/hedi/xfs.ko.debug
mod: xfs: last symbol is not _MODULE_END_xfs?
*** glibc detected *** /tr/x86_64/bin/crash: double free or corruption
(!prev): 0x0000000001558760 ***
<segmentation violation in gdb>
mod: /usr/people/hedi/xfs.ko.debug
gdb add-symbol-file command failed
hangs solid there and has to be killed with SIGKILL.
Grabbing a core reveals the following:
(gdb) bt f
#0 0x00002b628cd0ebb5 in raise () from /lib64/libc.so.6
#1 0x00002b628cd0ffb0 in abort () from /lib64/libc.so.6
#2 0x00002b628cd4a340 in malloc_printerr () from /lib64/libc.so.6
#3 0x00000000005454af in parse_exp_in_context (stringptr=0x400000000,
block=<value optimized out>, comma=<value optimized out>,
void_context_p=0, out_subexp=0x7b4760)
at parse.c:1101
except = {reason = RETURN_ERROR, error = GENERIC_ERROR,
message = 0x1c790a0 "Dwarf Error: Could not find abbrev number 188 [in
module /usr/people/hedi/xfs.ko.debug]"}
old_chain = (struct cleanup *) 0x0
subexp = <value optimized out>
#4 0x000000060000000b in ?? ()
#5 0x0000000000000000 in ?? ()
(gdb) f 3
#3 0x00000000005454af in parse_exp_in_context (stringptr=0x400000000,
block=<value optimized out>, comma=<value optimized out>,
void_context_p=0, out_subexp=0x7b4760)
at parse.c:1101
1101 xfree (expout);
(gdb) list
1096 }
1097 if (except.reason < 0)
1098 {
1099 if (! in_parse_field)
1100 {
1101 xfree (expout);
1102 throw_exception (except);
1103 }
1104 }
1105
Not sure (yet) whether the error
mod: xfs: last symbol is not _MODULE_END_xfs?
Dwarf Error: Could not find abbrev number 188 [in module
/usr/people/hedi/xfs.ko.debug]
is a problem in crash or in the xfs.ko.debug objfile but that's another story,
the problem here is that crash shouldn't crash.
FWIW, this problem is most definitely a regression, indeed crash version
4.-8.11, for example, fails to load the objfile, with exactly the same error
message, with the notable difference that it does *not* crash.
Agreed on all counts. It's crashing now because of the gdb-7.0 integration,
and the attached patch should fix that.
As far as the embedded "add-symbol-file" failure to load the module, you're
right, that's another issue, and what I can suggest is this:
crash> set debug 1
crash> mod -s xfs /usr/people/hedi/xfs.ko.debug
and you will see the full "add-symbol-file" gdb command string that's
failing.
For that matter you can take that full string, remove crash from the picture
entirely, and just enter it into a gdb session:
$ gdb
...
add-symbol-file arg arg arg...
It looks like some kind of Dwarf issue though, and I can't help with that.
However, at least on a RHEL environment, the argument to the mod command
should be the stripped module.ko file, and the module.ko.debug file gets
found automatically, and the two pieces put together. In other words,
taking the "ext3" module, my RHEL5 environment has:
/lib/modules/2.6.18-128.el5/kernel/fs/ext3/ext3.ko
/usr/lib/debug/lib/modules/2.6.18-128.el5/kernel/fs/ext3/ext3.ko.debug
And when it gets loaded, the base "ext3.ko" file is used as the internal
argument to the gdb "add-symbol-file" command:
crash> mod -s ext3
MODULE NAME SIZE OBJECT FILE
ffffffff8806ae00 ext3 168017 /lib/modules/2.6.18-128.el5/kernel/fs/ext3/ext3.ko
crash>
I wonder if you would still see the same issue if you used the base "xfs.ko"
file instead of "xfs.ko.debug"?
For the first time I saw one of those (harmless) "last symbol is not
_MODULE_END_xxx"
messages on a 2.6.32 x86 kernel the other day. I'll look into that.
And lastly:
P.S. The "last symbol is not
_MODULE_END_<modulename>" has been reported
back in Jan 2009 (albeit with the difference that crash would load the
objfile despite the error message)
https://www.redhat.com/archives/crash-utility/2009-January/msg00070.html
but I am not sure the root cause was identified back then, or at least I am
failing to find, in the list archives, any proof of that.
I don't know what the deal was with that...
Dave