Re: [Crash-utility] [PATCH] s390: Fix backtrace code
by Dave Anderson
----- "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com> wrote:
> Hi Dave,
>
> On Fri, 2010-02-26 at 10:42 -0500, Dave Anderson wrote:
> > > I tested vanilla 2.6.32, RHEL5, SLES10 and SLES11.
> > >
> > > But I found a bug with RHEL4:
> >
> > OK good -- I'm glad I asked. I note that RHEL3 doesn't even have
> > a "panic_stack" member. That being the case, this won't work as
> > planned:
> >
> > stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name));
> > if (stack_addr == 0)
> > return;
> >
> > because MEMBER_OFFSET() will return a -1, which will get used as
> > an offset to add to "lc", and will quietly read the wrong data.
> >
>
> Therefore I do this check before:
>
> if (!MEMBER_EXISTS("_lowcore", stack_name))
> return;
>
> Michael
Of course! Sorry I missed that -- queued for the next release...
Thanks,
Dave
14 years, 7 months
Re: [Crash-utility] [PATCH] s390: Fix backtrace code
by Dave Anderson
----- "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com> wrote:
> Hi Dave,
>
> On Fri, 2010-02-26 at 09:50 -0500, Dave Anderson wrote:
> > ----- "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com> wrote:
> >
> > > Hi Dave,
> > >
> > > This patch fixes several bugs in the s390 stack backtrace code
> > > * Add panic stack as second interrupt stack
> > > * Fix printing of access registers (4 bytes instead of 8 bytes)
> > > * Use u64 for s390x register 14
> > > * Fix interrupt stack handling for s390x (use 160 byte overhead
> > > instead of 96)
> >
> > The patch looks OK upon first glance -- can you verify that it's
> > absolutely backwards-compatible to earlier kernel versions?
>
> I tested vanilla 2.6.32, RHEL5, SLES10 and SLES11.
>
> But I found a bug with RHEL4:
OK good -- I'm glad I asked. I note that RHEL3 doesn't even have
a "panic_stack" member. That being the case, this won't work as
planned:
stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name));
if (stack_addr == 0)
return;
because MEMBER_OFFSET() will return a -1, which will get used as
an offset to add to "lc", and will quietly read the wrong data.
Dave
> Older Linux kernels for s390 can be built so that the panic stack is
> not set (CONFIG_CHECK_STACK kernel built option):
>
> *(lowcore_ptr[i]) = S390_lowcore;
> lowcore_ptr[i]->async_stack = stack + (ASYNC_SIZE);
>
> #ifdef CONFIG_CHECK_STACK
> stack = __get_free_pages(GFP_KERNEL,0);
> if (stack == 0ULL)
> panic("smp_boot_cpus failed to allocate memory\n");
> lowcore_ptr[i]->panic_stack = stack + (PAGE_SIZE);
> #endif
>
> RHEL4 has not defined CONFIG_CHECK_STACK. Therefore the following
> patch adds a check, so that the panic stack is only used, when
> it is there.
> ---
> s390.c | 2 ++
> s390x.c | 2 ++
> 2 files changed, 4 insertions(+)
>
> --- a/s390.c
> +++ b/s390.c
> @@ -581,6 +581,8 @@ static void s390_get_int_stack(char *sta
> if (!MEMBER_EXISTS("_lowcore", stack_name))
> return;
> stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name));
> + if (stack_addr == 0)
> + return;
> readmem(stack_addr - INT_STACK_SIZE, KVADDR, int_stack,
> INT_STACK_SIZE, stack_name, FAULT_ON_ERROR);
> *start = stack_addr - INT_STACK_SIZE;
> --- a/s390x.c
> +++ b/s390x.c
> @@ -813,6 +813,8 @@ static void s390x_get_int_stack(char *st
> if (!MEMBER_EXISTS("_lowcore", stack_name))
> return;
> stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name));
> + if (stack_addr == 0)
> + return;
> readmem(stack_addr - INT_STACK_SIZE, KVADDR, int_stack,
> INT_STACK_SIZE, stack_name, FAULT_ON_ERROR);
> *start = stack_addr - INT_STACK_SIZE;
14 years, 7 months
[PATCH] s390: Fix backtrace code
by Michael Holzheu
Hi Dave,
This patch fixes several bugs in the s390 stack backtrace code
* Add panic stack as second interrupt stack
* Fix printing of access registers (4 bytes instead of 8 bytes)
* Use u64 for s390x register 14
* Fix interrupt stack handling for s390x (use 160 byte overhead instead of 96)
---
s390.c | 46 ++++++++++++++++++-------------
s390x.c | 94 +++++++++++++++++++++++++++++++++++-----------------------------
2 files changed, 79 insertions(+), 61 deletions(-)
--- a/s390.c
+++ b/s390.c
@@ -37,7 +37,7 @@
#define S390_PTE_INVALID_MASK 0x80000900
#define S390_PTE_INVALID(x) ((x) & S390_PTE_INVALID_MASK)
-#define ASYNC_STACK_SIZE STACKSIZE() // can be 4096 or 8192
+#define INT_STACK_SIZE STACKSIZE() // can be 4096 or 8192
#define KERNEL_STACK_SIZE STACKSIZE() // can be 4096 or 8192
#define LOWCORE_SIZE 4096
@@ -570,20 +570,21 @@ s390_get_lowcore(int cpu, char* lowcore)
FAULT_ON_ERROR);
}
-/*
- * read in the async stack
+/*
+ * Read interrupt stack (either "async_stack" or "panic_stack");
*/
-static void
-s390_get_async_stack(char* lowcore, char* async_stack, unsigned long* start, unsigned long* end)
+static void s390_get_int_stack(char *stack_name, char* lc, char* int_stack,
+ unsigned long* start, unsigned long* end)
{
- unsigned long async_stack_ptr;
+ unsigned long stack_addr;
- async_stack_ptr = ULONG(lowcore +
- MEMBER_OFFSET("_lowcore","async_stack"));
- readmem(async_stack_ptr-ASYNC_STACK_SIZE,KVADDR, async_stack,
- ASYNC_STACK_SIZE, "async_stack", FAULT_ON_ERROR);
- *start=async_stack_ptr-ASYNC_STACK_SIZE;
- *end=async_stack_ptr;
+ if (!MEMBER_EXISTS("_lowcore", stack_name))
+ return;
+ stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name));
+ readmem(stack_addr - INT_STACK_SIZE, KVADDR, int_stack,
+ INT_STACK_SIZE, stack_name, FAULT_ON_ERROR);
+ *start = stack_addr - INT_STACK_SIZE;
+ *end = stack_addr;
}
/*
@@ -593,16 +594,18 @@ static void
s390_back_trace_cmd(struct bt_info *bt)
{
char* stack;
- char async_stack[ASYNC_STACK_SIZE];
+ char async_stack[INT_STACK_SIZE];
+ char panic_stack[INT_STACK_SIZE];
long ksp,backchain,old_backchain;
int i=0, r14_offset,bc_offset,r14, skip_first_frame=0;
- unsigned long async_start,async_end, stack_end, stack_start, stack_base;
+ unsigned long async_start = 0, async_end = 0;
+ unsigned long panic_start = 0, panic_end = 0;
+ unsigned long stack_end, stack_start, stack_base;
if (bt->hp && bt->hp->eip) {
error(WARNING,
"instruction pointer argument ignored on this architecture!\n");
}
- async_end = async_start = 0;
ksp = bt->stkptr;
/* print lowcore and get async stack when task has cpu */
@@ -622,9 +625,10 @@ s390_back_trace_cmd(struct bt_info *bt)
s390_print_lowcore(lowcore,bt,0);
return;
}
-
- s390_get_async_stack(lowcore,async_stack,&async_start,
- &async_end);
+ s390_get_int_stack("async_stack", lowcore, async_stack,
+ &async_start, &async_end);
+ s390_get_int_stack("panic_stack", lowcore, panic_stack,
+ &panic_start, &panic_end);
s390_print_lowcore(lowcore,bt,1);
fprintf(fp,"\n");
skip_first_frame=1;
@@ -653,7 +657,7 @@ s390_back_trace_cmd(struct bt_info *bt)
unsigned long r14_stack_off;
int j;
- /* Find stack: Either async stack or task stack */
+ /* Find stack: Either async, panic stack or task stack */
if((backchain > stack_start) && (backchain < stack_end)){
stack = bt->stackbuf;
stack_base = stack_start;
@@ -661,6 +665,10 @@ s390_back_trace_cmd(struct bt_info *bt)
&& s390_has_cpu(bt)){
stack = async_stack;
stack_base = async_start;
+ } else if((backchain > panic_start) && (backchain < panic_end)
+ && s390_has_cpu(bt)){
+ stack = panic_stack;
+ stack_base = panic_start;
} else {
/* invalid stackframe */
break;
--- a/s390x.c
+++ b/s390x.c
@@ -36,7 +36,7 @@
#define S390X_PTE_INVALID_MASK 0x900ULL
#define S390X_PTE_INVALID(x) ((x) & S390X_PTE_INVALID_MASK)
-#define ASYNC_STACK_SIZE STACKSIZE() // can be 8192 or 16384
+#define INT_STACK_SIZE STACKSIZE() // can be 8192 or 16384
#define KERNEL_STACK_SIZE STACKSIZE() // can be 8192 or 16384
#define LOWCORE_SIZE 8192
@@ -803,19 +803,20 @@ s390x_get_lowcore(struct bt_info *bt, ch
}
/*
- * read in the async stack
+ * Read interrupt stack (either "async_stack" or "panic_stack");
*/
-static void
-s390x_get_async_stack(char* lowcore, char* async_stack, unsigned long* start, unsigned long* end)
+static void s390x_get_int_stack(char *stack_name, char* lc, char* int_stack,
+ unsigned long* start, unsigned long* end)
{
- unsigned long async_stack_ptr;
+ unsigned long stack_addr;
- async_stack_ptr = ULONG(lowcore +
- MEMBER_OFFSET("_lowcore","async_stack"));
- readmem(async_stack_ptr-ASYNC_STACK_SIZE,KVADDR, async_stack,
- ASYNC_STACK_SIZE, "async_stack", FAULT_ON_ERROR);
- *start=async_stack_ptr-ASYNC_STACK_SIZE;
- *end=async_stack_ptr;
+ if (!MEMBER_EXISTS("_lowcore", stack_name))
+ return;
+ stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name));
+ readmem(stack_addr - INT_STACK_SIZE, KVADDR, int_stack,
+ INT_STACK_SIZE, stack_name, FAULT_ON_ERROR);
+ *start = stack_addr - INT_STACK_SIZE;
+ *end = stack_addr;
}
/*
@@ -825,11 +826,14 @@ static void
s390x_back_trace_cmd(struct bt_info *bt)
{
char* stack;
- char async_stack[ASYNC_STACK_SIZE];
+ char async_stack[INT_STACK_SIZE];
+ char panic_stack[INT_STACK_SIZE];
long ksp,backchain,old_backchain;
- int i=0, r14_offset,bc_offset,r14, skip_first_frame=0;
+ int i=0, r14_offset,bc_offset, skip_first_frame=0;
unsigned long async_start = 0, async_end = 0;
+ unsigned long panic_start = 0, panic_end = 0;
unsigned long stack_end, stack_start, stack_base;
+ unsigned long r14;
if (bt->hp && bt->hp->eip) {
error(WARNING,
@@ -854,9 +858,10 @@ s390x_back_trace_cmd(struct bt_info *bt)
s390x_print_lowcore(lowcore,bt,0);
return;
}
-
- s390x_get_async_stack(lowcore,async_stack,&async_start,
- &async_end);
+ s390x_get_int_stack("async_stack", lowcore, async_stack,
+ &async_start, &async_end);
+ s390x_get_int_stack("panic_stack", lowcore, panic_stack,
+ &panic_start, &panic_end);
s390x_print_lowcore(lowcore,bt,1);
fprintf(fp,"\n");
skip_first_frame=1;
@@ -885,7 +890,7 @@ s390x_back_trace_cmd(struct bt_info *bt)
unsigned long r14_stack_off;
int j;
- /* Find stack: Either async stack or task stack */
+ /* Find stack: Either async, panic stack or task stack */
if((backchain > stack_start) && (backchain < stack_end)){
stack = bt->stackbuf;
stack_base = stack_start;
@@ -893,6 +898,10 @@ s390x_back_trace_cmd(struct bt_info *bt)
&& s390x_has_cpu(bt)){
stack = async_stack;
stack_base = async_start;
+ } else if((backchain > panic_start) && (backchain < panic_end)
+ && s390x_has_cpu(bt)){
+ stack = panic_stack;
+ stack_base = panic_start;
} else {
/* invalid stackframe */
break;
@@ -913,7 +922,7 @@ s390x_back_trace_cmd(struct bt_info *bt)
skip_first_frame=0;
} else {
fprintf(fp," #%i [%08lx] ",i,backchain);
- fprintf(fp,"%s at %x\n", closest_symbol(r14), r14);
+ fprintf(fp,"%s at %lx\n", closest_symbol(r14), r14);
if (bt->flags & BT_LINE_NUMBERS)
s390x_dump_line_number(r14);
i++;
@@ -944,19 +953,20 @@ s390x_back_trace_cmd(struct bt_info *bt)
}
/* Check for interrupt stackframe */
- if((backchain == 0) && (stack == async_stack)){
- unsigned long psw_flags,r15;
+ if((backchain == 0) &&
+ (stack == async_stack || stack == panic_stack)) {
+ int pt_regs_off = old_backchain - stack_base + 160;
+ unsigned long psw_flags;
- psw_flags = ULONG(&stack[old_backchain - stack_base
- +96 +MEMBER_OFFSET("pt_regs","psw")]);
+ psw_flags = ULONG(&stack[pt_regs_off +
+ MEMBER_OFFSET("pt_regs", "psw")]);
if(psw_flags & 0x1000000000000ULL){
/* User psw: should not happen */
break;
}
- r15 = ULONG(&stack[old_backchain - stack_base +
- 96 + MEMBER_OFFSET("pt_regs",
- "gprs") + 15 * S390X_WORD_SIZE]);
- backchain=r15;
+ backchain = ULONG(&stack[pt_regs_off +
+ MEMBER_OFFSET("pt_regs", "gprs") +
+ 15 * S390X_WORD_SIZE]);
fprintf(fp," - Interrupt -\n");
}
} while(backchain != 0);
@@ -1036,28 +1046,28 @@ s390x_print_lowcore(char* lc, struct bt_
fprintf(fp," -access registers:\n");
ptr = lc + MEMBER_OFFSET("_lowcore","access_regs_save_area");
- tmp[0]=ULONG(ptr);
- tmp[1]=ULONG(ptr + 4);
- tmp[2]=ULONG(ptr + 2 * 4);
- tmp[3]=ULONG(ptr + 3 * 4);
+ tmp[0]=UINT(ptr);
+ tmp[1]=UINT(ptr + 4);
+ tmp[2]=UINT(ptr + 2 * 4);
+ tmp[3]=UINT(ptr + 3 * 4);
fprintf(fp," %#010lx %#010lx %#010lx %#010lx\n",
tmp[0], tmp[1], tmp[2], tmp[3]);
- tmp[0]=ULONG(ptr + 4 * 4);
- tmp[1]=ULONG(ptr + 5 * 4);
- tmp[2]=ULONG(ptr + 6 * 4);
- tmp[3]=ULONG(ptr + 7 * 4);
+ tmp[0]=UINT(ptr + 4 * 4);
+ tmp[1]=UINT(ptr + 5 * 4);
+ tmp[2]=UINT(ptr + 6 * 4);
+ tmp[3]=UINT(ptr + 7 * 4);
fprintf(fp," %#010lx %#010lx %#010lx %#010lx\n",
tmp[0], tmp[1], tmp[2], tmp[3]);
- tmp[0]=ULONG(ptr + 8 * 4);
- tmp[1]=ULONG(ptr + 9 * 4);
- tmp[2]=ULONG(ptr + 10* 4);
- tmp[3]=ULONG(ptr + 11* 4);
+ tmp[0]=UINT(ptr + 8 * 4);
+ tmp[1]=UINT(ptr + 9 * 4);
+ tmp[2]=UINT(ptr + 10 * 4);
+ tmp[3]=UINT(ptr + 11 * 4);
fprintf(fp," %#010lx %#010lx %#010lx %#010lx\n",
tmp[0], tmp[1], tmp[2], tmp[3]);
- tmp[0]=ULONG(ptr + 12* 4);
- tmp[1]=ULONG(ptr + 13* 4);
- tmp[2]=ULONG(ptr + 14* 4);
- tmp[3]=ULONG(ptr + 15* 4);
+ tmp[0]=UINT(ptr + 12 * 4);
+ tmp[1]=UINT(ptr + 13 * 4);
+ tmp[2]=UINT(ptr + 14 * 4);
+ tmp[3]=UINT(ptr + 15 * 4);
fprintf(fp," %#010lx %#010lx %#010lx %#010lx\n",
tmp[0], tmp[1], tmp[2], tmp[3]);
14 years, 7 months
Re: [Crash-utility] User-land backtrace?
by Dave Anderson
----- "Darrin Thompson" <darrinth(a)gmail.com> wrote:
> On Wed, Feb 24, 2010 at 12:21 PM, Dave Anderson < anderson(a)redhat.com
> > wrote:
>
>
>
> That's right. That is the stack value that will be restored upon
> return to user-space, and the EIP will be restored to 00f14402.
>
> One thing to make sure of is that when you do the "rd -u", you
> have set the crash utility to the context of the task whose "bt"
> output you're showing. "rd -u" will read the user space of the
> current task (i.e., the task shown if you do a "set" command).
>
> Could that be adapted into a way to produce a userspace core dump that
> we could feed to regular old gdb?
This question comes up from time to time.
If all of a task's user pages were in memory (not swapped out), and
therefore in the vmcore -- which is becoming more and more unlikely
with the use of makedumpfile to skip user pages altogether -- then
theoretically the kernel's elf_core_dump() function could basically
be "ported" to user-space. I think...
Anyway, I'm not particularly interested in doing it. But it would
be an excellent candidate for an extension module if anybody's willing
to take it on.
Dave
14 years, 7 months
Re: [Crash-utility] User-land backtrace?
by Dave Anderson
----- "Gallus" <gall.cwpl(a)gmail.com> wrote:
> Hi,
> is it possible to display a stack trace of an user space process?
No.
You could do a raw "rd -u" of the user-space stack, but given that
the crash utility has no knowledge of any user-space symbols,
it's probably not going to be very illuminating.
Dave
14 years, 7 months
User-land backtrace?
by Gallus
Hi,
is it possible to display a stack trace of an user space process?
Gallus
14 years, 7 months
Re: [Crash-utility] [crash-5.0.1] glibc detected: double free or corruption (!prev)
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> Agreed on all counts. It's crashing now because of the gdb-7.0 integration,
> and the attached patch should fix that.
Check that -- the first patch is not enough, because it will retry
the add-symbol-file operation the "old way", which I presume will
also fail. Try this second patch...
Thanks,
Dave
14 years, 7 months
Re: [Crash-utility] [crash-5.0.1] glibc detected: double free or corruption (!prev)
by Dave Anderson
----- "Hedi Berriche" <hedi(a)sgi.com> wrote:
> Context:
>
> - crash-5.0.1
> - glibc 2.4
> - vmcore produced by x86_64 sles11 2.6.27.19-5-default
>
> Problem:
>
> crash> mod -s xfs /usr/people/hedi/xfs.ko.debug
> mod: xfs: last symbol is not _MODULE_END_xfs?
> *** glibc detected *** /tr/x86_64/bin/crash: double free or corruption
> (!prev): 0x0000000001558760 ***
> <segmentation violation in gdb>
> mod: /usr/people/hedi/xfs.ko.debug
> gdb add-symbol-file command failed
>
> hangs solid there and has to be killed with SIGKILL.
>
> Grabbing a core reveals the following:
>
> (gdb) bt f
> #0 0x00002b628cd0ebb5 in raise () from /lib64/libc.so.6
> #1 0x00002b628cd0ffb0 in abort () from /lib64/libc.so.6
> #2 0x00002b628cd4a340 in malloc_printerr () from /lib64/libc.so.6
> #3 0x00000000005454af in parse_exp_in_context (stringptr=0x400000000,
> block=<value optimized out>, comma=<value optimized out>,
> void_context_p=0, out_subexp=0x7b4760)
> at parse.c:1101
> except = {reason = RETURN_ERROR, error = GENERIC_ERROR,
> message = 0x1c790a0 "Dwarf Error: Could not find abbrev number 188 [in
> module /usr/people/hedi/xfs.ko.debug]"}
> old_chain = (struct cleanup *) 0x0
> subexp = <value optimized out>
> #4 0x000000060000000b in ?? ()
> #5 0x0000000000000000 in ?? ()
>
> (gdb) f 3
> #3 0x00000000005454af in parse_exp_in_context (stringptr=0x400000000,
> block=<value optimized out>, comma=<value optimized out>,
> void_context_p=0, out_subexp=0x7b4760)
> at parse.c:1101
> 1101 xfree (expout);
>
> (gdb) list
> 1096 }
> 1097 if (except.reason < 0)
> 1098 {
> 1099 if (! in_parse_field)
> 1100 {
> 1101 xfree (expout);
> 1102 throw_exception (except);
> 1103 }
> 1104 }
> 1105
>
> Not sure (yet) whether the error
>
> mod: xfs: last symbol is not _MODULE_END_xfs?
> Dwarf Error: Could not find abbrev number 188 [in module /usr/people/hedi/xfs.ko.debug]
>
> is a problem in crash or in the xfs.ko.debug objfile but that's another story,
> the problem here is that crash shouldn't crash.
>
>
> FWIW, this problem is most definitely a regression, indeed crash version
> 4.-8.11, for example, fails to load the objfile, with exactly the same error
> message, with the notable difference that it does *not* crash.
Agreed on all counts. It's crashing now because of the gdb-7.0 integration,
and the attached patch should fix that.
As far as the embedded "add-symbol-file" failure to load the module, you're
right, that's another issue, and what I can suggest is this:
crash> set debug 1
crash> mod -s xfs /usr/people/hedi/xfs.ko.debug
and you will see the full "add-symbol-file" gdb command string that's failing.
For that matter you can take that full string, remove crash from the picture
entirely, and just enter it into a gdb session:
$ gdb
...
add-symbol-file arg arg arg...
It looks like some kind of Dwarf issue though, and I can't help with that.
However, at least on a RHEL environment, the argument to the mod command
should be the stripped module.ko file, and the module.ko.debug file gets
found automatically, and the two pieces put together. In other words,
taking the "ext3" module, my RHEL5 environment has:
/lib/modules/2.6.18-128.el5/kernel/fs/ext3/ext3.ko
/usr/lib/debug/lib/modules/2.6.18-128.el5/kernel/fs/ext3/ext3.ko.debug
And when it gets loaded, the base "ext3.ko" file is used as the internal
argument to the gdb "add-symbol-file" command:
crash> mod -s ext3
MODULE NAME SIZE OBJECT FILE
ffffffff8806ae00 ext3 168017 /lib/modules/2.6.18-128.el5/kernel/fs/ext3/ext3.ko
crash>
I wonder if you would still see the same issue if you used the base "xfs.ko"
file instead of "xfs.ko.debug"?
For the first time I saw one of those (harmless) "last symbol is not _MODULE_END_xxx"
messages on a 2.6.32 x86 kernel the other day. I'll look into that.
And lastly:
> P.S. The "last symbol is not _MODULE_END_<modulename>" has been reported
> back in Jan 2009 (albeit with the difference that crash would load the
> objfile despite the error message)
>
> https://www.redhat.com/archives/crash-utility/2009-January/msg00070.html
>
> but I am not sure the root cause was identified back then, or at least I am
> failing to find, in the list archives, any proof of that.
I don't know what the deal was with that...
Dave
14 years, 7 months
[crash-5.0.1] glibc detected: double free or corruption (!prev)
by Hedi Berriche
Context:
- crash-5.0.1
- glibc 2.4
- vmcore produced by x86_64 sles11 2.6.27.19-5-default
Problem:
crash> mod -s xfs /usr/people/hedi/xfs.ko.debug
mod: xfs: last symbol is not _MODULE_END_xfs?
*** glibc detected *** /tr/x86_64/bin/crash: double free or corruption (!prev): 0x0000000001558760 ***
<segmentation violation in gdb>
mod: /usr/people/hedi/xfs.ko.debug
gdb add-symbol-file command failed
hangs solid there and has to be killed with SIGKILL.
Grabbing a core reveals the following:
(gdb) bt f
#0 0x00002b628cd0ebb5 in raise () from /lib64/libc.so.6
#1 0x00002b628cd0ffb0 in abort () from /lib64/libc.so.6
#2 0x00002b628cd4a340 in malloc_printerr () from /lib64/libc.so.6
#3 0x00000000005454af in parse_exp_in_context (stringptr=0x400000000, block=<value optimized out>, comma=<value optimized out>, void_context_p=0, out_subexp=0x7b4760)
at parse.c:1101
except = {reason = RETURN_ERROR, error = GENERIC_ERROR, message = 0x1c790a0 "Dwarf Error: Could not find abbrev number 188 [in module /usr/people/hedi/xfs.ko.debug]"}
old_chain = (struct cleanup *) 0x0
subexp = <value optimized out>
#4 0x000000060000000b in ?? ()
#5 0x0000000000000000 in ?? ()
(gdb) f 3
#3 0x00000000005454af in parse_exp_in_context (stringptr=0x400000000, block=<value optimized out>, comma=<value optimized out>, void_context_p=0, out_subexp=0x7b4760)
at parse.c:1101
1101 xfree (expout);
(gdb) list
1096 }
1097 if (except.reason < 0)
1098 {
1099 if (! in_parse_field)
1100 {
1101 xfree (expout);
1102 throw_exception (except);
1103 }
1104 }
1105
Not sure (yet) whether the error
mod: xfs: last symbol is not _MODULE_END_xfs?
Dwarf Error: Could not find abbrev number 188 [in module /usr/people/hedi/xfs.ko.debug]
is a problem in crash or in the xfs.ko.debug objfile but that's another story,
the problem here is that crash shouldn't crash.
FWIW, this problem is most definitely a regression, indeed crash version
4.-8.11, for example, fails to load the objfile, with exactly the same error
message, with the notable difference that it does *not* crash.
Cheers,
Hedi.
P.S. The "last symbol is not _MODULE_END_<modulename>" has been reported
back in Jan 2009 (albeit with the difference that crash would load the
objfile despite the error message)
https://www.redhat.com/archives/crash-utility/2009-January/msg00070.html
but I am not sure the root cause was identified back then, or at least I am
failing to find, in the list archives, any proof of that.
--
Hedi Berriche
Global Product Support
http://www.sgi.com/support
14 years, 7 months
[ANNOUNCE] crash version 5.0.1 is available
by Dave Anderson
- Due to a change in the x86 disassembler output from the embedded
gdb-7.0 that was introduced in crash version 5.0.0, there may be
a stream of warning messages during invocation that indicate
"crash: invalid input: <string>:" and "crash: input string too
large: <string>: (9 vs 8)" on 2.6.20 and earlier x86 kernels.
(anderson(a)redhat.com)
- As of glibc 2.11, the mkstemps() function has been introduced as a
versioned symbol. As a result, crash utility binaries built on host
machines with glibc 2.11 or later cannot be run on systems that run
pre-2.11 glibc versions, failing during invocation with the error
message "crash: relocation error: crash: symbol mkstemps, version
GLIBC_2.11 not defined in file libc.so.6 with link time reference".
With the patch, the pre-existing version of mkstemps() from the
built-in libiberty.a library will always be used.
(jmoyer(a)redhat.com)
- Fix for the "irq" command on 2.6.33 and later kernels to account for
the removal of the irqaction.mask structure member. Without the
patch, the "irq" command fails with the error message "irq: invalid
structure member offset: irqaction_mask".
(bernhard(a)bwalle.de)
- Added a defensive mechanism to handle a corrupted "cache_cache"
kmem_cache structure. Without the patch, a vmcore that had such
a corruption caused a failure during invocation with the error
message "crash: zero-size memory allocation!".
(anderson(a)redhat.com)
- Fix for the "swap", "kmem -i", and "vm -p" commands to account for
the 2.6.33 kernel changes to the swap_info_struct data structure and
the swap_info[] array type. Without the patch, "swap" would show
only the command's header, "kmem -i" would show zero swap usage, and
"vm -p" would show "(unknown swap location)" when translating the
swap file name for any swapped-out pages in the task.
(anderson(a)redhat.com)
- Fix for a segmentation violation during session invocation when
running against 2.6.30 or later x86_64 dumpfiles whose kernel is not
configured with CONFIG_SMP.
(anderson(a)redhat.com)
- Fix for the "bt" command on an ia64 "INIT" process that interrupted
a task that was running in user space, but was unable to modify the
original (interrupted) task's stack. Without the patch, the "INIT"
task's backtrace would not display the task that was interrupted,
and would display the error message "bt: unwind: failed to locate
return link (ip=<user-virtual-address>)!". With the patch, the
interrupted task information is displayed in the same manner as if
the original stack had been modified.
(tindoh(a)redhat.com)
- Fix for x86, s390, s390x and ia64 architectures to set the system
cpu count equal to the highest cpu online plus one. Without the
patch, those architectures would use the number of online cpus as
the system's total cpu count, which would be misleading when any
offline cpu number was less than the highest online cpu number.
(anderson(a)redhat.com)
- Fix for package build failure on x86_64 when using gcc-4.5. Without
the patch, these types of errors are generated:
unwind_x86_32_64.c:50:2: error: initializer element is not constant
unwind_x86_32_64.c:50:2: error: (near initialization for 'reg_info[7].offs')
unwind_x86_32_64.c:50:2: error: initializer element is not constant
unwind_x86_32_64.c:50:2: error: (near initialization for 'reg_info[8].offs')
(troy.heber(a)hp.com)
- Fix to recognize the symbol type change of per-cpu variables from
'd' or 'D' to 'V'. Without the patch, entering a command of the
form "p per_cpu__<variable>" would fail with the error message
"p: gdb request failed: p per_cpu__<variable>". With the fix,
the symbol is recognized as a per-cpu variable, in which case the
data type of the variable is displayed, followed by a list of the
virtual addresses of each per-cpu instance of the variable.
(anderson(a)redhat.com)
- Fix for the "struct" and "union" commands when passed an address that
is in a valid kernel virtual address region but is either unmapped or
non-existent. Without the patch, the following three error messages
are displayed:
struct <name> struct: invalid kernel virtual address:
<kernel-address> type: "gdb_readmem_callback"
gdb called without error_hook: Cannot access memory at address
<kernel-address>
*** glibc detected *** crash: double free or corruption (!prev):
<crash-address> ***
followed by a backtrace and the crash utility memory map. The session
aborts at that point. With the fix, the commands will fail gracefully
after displaying error messages reporting that the kernel virtual
address cannot be accessed.
(anderson(a)redhat.com)
- Update for 2.6.33 and later s390 and s390x kernels to account for the
"_lowcore" structure member name change from "st_status_fixed_logout"
to "psw_save_area".
(holzheu(a)linux.vnet.ibm.com)
- Fix for very large Xen domU dumpfiles that locate the base offset of
relevant ELF sections beyond the 4GB mark. Without the patch, the
crash session fails with the error messages "crash: cannot find mfn
<number> (0x<number>) in page index" followed by "crash: cannot
read/find cr3 page".
(anderson(a)redhat.com, xiaowei.hu(a)oracle.com)
- If a kernel crash occurs during a kernel module loading operation,
it is possible that a subsequent crash session on the vmcore may
result in a segmentation violation during the "please wait...
(gathering module symbol data)" phase.
(john.wright(a)hp.com)
- Fix for a gdb-7.0 regression that causes the line number capability
to fail with certain ranges of x86 base kernel text addresses.
Without the patch, the "dis -l <symbol>" or "sym <symbol>"
commands would fail to show line number information for certain
ranges of base kernel text addresses.
(anderson(a)redhat.com)
- Fix for the "bt" command when run on offline s390/s390x "swapper"
idle tasks. Without the patch, the command fails with the error
message "bt: invalid kernel virtual address: ffffffffffffc000
type: async_stack".
(holzheu(a)linux.vnet.ibm.com)
- Preparation for future s390x ELF dumpfile format.
(holzheu(a)linux.vnet.ibm.com)
Download from: http://people.redhat.com/anderson
14 years, 7 months