February 2010 - Crash-utility - Crash Utility List Archives

Re: [Crash-utility] [PATCH] s390: Fix backtrace code

by Dave Anderson

----- "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com> wrote: > Hi Dave, > > On Fri, 2010-02-26 at 10:42 -0500, Dave Anderson wrote: > > > I tested vanilla 2.6.32, RHEL5, SLES10 and SLES11. > > > > > > But I found a bug with RHEL4: > > > > OK good -- I'm glad I asked. I note that RHEL3 doesn't even have > > a "panic_stack" member. That being the case, this won't work as > > planned: > > > > stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name)); > > if (stack_addr == 0) > > return; > > > > because MEMBER_OFFSET() will return a -1, which will get used as > > an offset to add to "lc", and will quietly read the wrong data. > > > > Therefore I do this check before: > > if (!MEMBER_EXISTS("_lowcore", stack_name)) > return; > > Michael Of course! Sorry I missed that -- queued for the next release... Thanks, Dave

15 years, 4 months

1
0
0 / 0

Re: [Crash-utility] [PATCH] s390: Fix backtrace code

by Dave Anderson

----- "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com> wrote: > Hi Dave, > > On Fri, 2010-02-26 at 09:50 -0500, Dave Anderson wrote: > > ----- "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com> wrote: > > > > > Hi Dave, > > > > > > This patch fixes several bugs in the s390 stack backtrace code > > > * Add panic stack as second interrupt stack > > > * Fix printing of access registers (4 bytes instead of 8 bytes) > > > * Use u64 for s390x register 14 > > > * Fix interrupt stack handling for s390x (use 160 byte overhead > > > instead of 96) > > > > The patch looks OK upon first glance -- can you verify that it's > > absolutely backwards-compatible to earlier kernel versions? > > I tested vanilla 2.6.32, RHEL5, SLES10 and SLES11. > > But I found a bug with RHEL4: OK good -- I'm glad I asked. I note that RHEL3 doesn't even have a "panic_stack" member. That being the case, this won't work as planned: stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name)); if (stack_addr == 0) return; because MEMBER_OFFSET() will return a -1, which will get used as an offset to add to "lc", and will quietly read the wrong data. Dave > Older Linux kernels for s390 can be built so that the panic stack is > not set (CONFIG_CHECK_STACK kernel built option): > > *(lowcore_ptr[i]) = S390_lowcore; > lowcore_ptr[i]->async_stack = stack + (ASYNC_SIZE); > > #ifdef CONFIG_CHECK_STACK > stack = __get_free_pages(GFP_KERNEL,0); > if (stack == 0ULL) > panic("smp_boot_cpus failed to allocate memory\n"); > lowcore_ptr[i]->panic_stack = stack + (PAGE_SIZE); > #endif > > RHEL4 has not defined CONFIG_CHECK_STACK. Therefore the following > patch adds a check, so that the panic stack is only used, when > it is there. > --- > s390.c | 2 ++ > s390x.c | 2 ++ > 2 files changed, 4 insertions(+) > > --- a/s390.c > +++ b/s390.c > @@ -581,6 +581,8 @@ static void s390_get_int_stack(char *sta > if (!MEMBER_EXISTS("_lowcore", stack_name)) > return; > stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name)); > + if (stack_addr == 0) > + return; > readmem(stack_addr - INT_STACK_SIZE, KVADDR, int_stack, > INT_STACK_SIZE, stack_name, FAULT_ON_ERROR); > *start = stack_addr - INT_STACK_SIZE; > --- a/s390x.c > +++ b/s390x.c > @@ -813,6 +813,8 @@ static void s390x_get_int_stack(char *st > if (!MEMBER_EXISTS("_lowcore", stack_name)) > return; > stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name)); > + if (stack_addr == 0) > + return; > readmem(stack_addr - INT_STACK_SIZE, KVADDR, int_stack, > INT_STACK_SIZE, stack_name, FAULT_ON_ERROR); > *start = stack_addr - INT_STACK_SIZE;

15 years, 4 months

2
1
0 / 0

[PATCH] s390: Fix backtrace code

by Michael Holzheu

Hi Dave, This patch fixes several bugs in the s390 stack backtrace code * Add panic stack as second interrupt stack * Fix printing of access registers (4 bytes instead of 8 bytes) * Use u64 for s390x register 14 * Fix interrupt stack handling for s390x (use 160 byte overhead instead of 96) --- s390.c | 46 ++++++++++++++++++------------- s390x.c | 94 +++++++++++++++++++++++++++++++++++----------------------------- 2 files changed, 79 insertions(+), 61 deletions(-) --- a/s390.c +++ b/s390.c @@ -37,7 +37,7 @@ #define S390_PTE_INVALID_MASK 0x80000900 #define S390_PTE_INVALID(x) ((x) & S390_PTE_INVALID_MASK) -#define ASYNC_STACK_SIZE STACKSIZE() // can be 4096 or 8192 +#define INT_STACK_SIZE STACKSIZE() // can be 4096 or 8192 #define KERNEL_STACK_SIZE STACKSIZE() // can be 4096 or 8192 #define LOWCORE_SIZE 4096 @@ -570,20 +570,21 @@ s390_get_lowcore(int cpu, char* lowcore) FAULT_ON_ERROR); } -/* - * read in the async stack +/* + * Read interrupt stack (either "async_stack" or "panic_stack"); */ -static void -s390_get_async_stack(char* lowcore, char* async_stack, unsigned long* start, unsigned long* end) +static void s390_get_int_stack(char *stack_name, char* lc, char* int_stack, + unsigned long* start, unsigned long* end) { - unsigned long async_stack_ptr; + unsigned long stack_addr; - async_stack_ptr = ULONG(lowcore + - MEMBER_OFFSET("_lowcore","async_stack")); - readmem(async_stack_ptr-ASYNC_STACK_SIZE,KVADDR, async_stack, - ASYNC_STACK_SIZE, "async_stack", FAULT_ON_ERROR); - *start=async_stack_ptr-ASYNC_STACK_SIZE; - *end=async_stack_ptr; + if (!MEMBER_EXISTS("_lowcore", stack_name)) + return; + stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name)); + readmem(stack_addr - INT_STACK_SIZE, KVADDR, int_stack, + INT_STACK_SIZE, stack_name, FAULT_ON_ERROR); + *start = stack_addr - INT_STACK_SIZE; + *end = stack_addr; } /* @@ -593,16 +594,18 @@ static void s390_back_trace_cmd(struct bt_info *bt) { char* stack; - char async_stack[ASYNC_STACK_SIZE]; + char async_stack[INT_STACK_SIZE]; + char panic_stack[INT_STACK_SIZE]; long ksp,backchain,old_backchain; int i=0, r14_offset,bc_offset,r14, skip_first_frame=0; - unsigned long async_start,async_end, stack_end, stack_start, stack_base; + unsigned long async_start = 0, async_end = 0; + unsigned long panic_start = 0, panic_end = 0; + unsigned long stack_end, stack_start, stack_base; if (bt->hp && bt->hp->eip) { error(WARNING, "instruction pointer argument ignored on this architecture!\n"); } - async_end = async_start = 0; ksp = bt->stkptr; /* print lowcore and get async stack when task has cpu */ @@ -622,9 +625,10 @@ s390_back_trace_cmd(struct bt_info *bt) s390_print_lowcore(lowcore,bt,0); return; } - - s390_get_async_stack(lowcore,async_stack,&async_start, - &async_end); + s390_get_int_stack("async_stack", lowcore, async_stack, + &async_start, &async_end); + s390_get_int_stack("panic_stack", lowcore, panic_stack, + &panic_start, &panic_end); s390_print_lowcore(lowcore,bt,1); fprintf(fp,"\n"); skip_first_frame=1; @@ -653,7 +657,7 @@ s390_back_trace_cmd(struct bt_info *bt) unsigned long r14_stack_off; int j; - /* Find stack: Either async stack or task stack */ + /* Find stack: Either async, panic stack or task stack */ if((backchain > stack_start) && (backchain < stack_end)){ stack = bt->stackbuf; stack_base = stack_start; @@ -661,6 +665,10 @@ s390_back_trace_cmd(struct bt_info *bt) && s390_has_cpu(bt)){ stack = async_stack; stack_base = async_start; + } else if((backchain > panic_start) && (backchain < panic_end) + && s390_has_cpu(bt)){ + stack = panic_stack; + stack_base = panic_start; } else { /* invalid stackframe */ break; --- a/s390x.c +++ b/s390x.c @@ -36,7 +36,7 @@ #define S390X_PTE_INVALID_MASK 0x900ULL #define S390X_PTE_INVALID(x) ((x) & S390X_PTE_INVALID_MASK) -#define ASYNC_STACK_SIZE STACKSIZE() // can be 8192 or 16384 +#define INT_STACK_SIZE STACKSIZE() // can be 8192 or 16384 #define KERNEL_STACK_SIZE STACKSIZE() // can be 8192 or 16384 #define LOWCORE_SIZE 8192 @@ -803,19 +803,20 @@ s390x_get_lowcore(struct bt_info *bt, ch } /* - * read in the async stack + * Read interrupt stack (either "async_stack" or "panic_stack"); */ -static void -s390x_get_async_stack(char* lowcore, char* async_stack, unsigned long* start, unsigned long* end) +static void s390x_get_int_stack(char *stack_name, char* lc, char* int_stack, + unsigned long* start, unsigned long* end) { - unsigned long async_stack_ptr; + unsigned long stack_addr; - async_stack_ptr = ULONG(lowcore + - MEMBER_OFFSET("_lowcore","async_stack")); - readmem(async_stack_ptr-ASYNC_STACK_SIZE,KVADDR, async_stack, - ASYNC_STACK_SIZE, "async_stack", FAULT_ON_ERROR); - *start=async_stack_ptr-ASYNC_STACK_SIZE; - *end=async_stack_ptr; + if (!MEMBER_EXISTS("_lowcore", stack_name)) + return; + stack_addr = ULONG(lc + MEMBER_OFFSET("_lowcore", stack_name)); + readmem(stack_addr - INT_STACK_SIZE, KVADDR, int_stack, + INT_STACK_SIZE, stack_name, FAULT_ON_ERROR); + *start = stack_addr - INT_STACK_SIZE; + *end = stack_addr; } /* @@ -825,11 +826,14 @@ static void s390x_back_trace_cmd(struct bt_info *bt) { char* stack; - char async_stack[ASYNC_STACK_SIZE]; + char async_stack[INT_STACK_SIZE]; + char panic_stack[INT_STACK_SIZE]; long ksp,backchain,old_backchain; - int i=0, r14_offset,bc_offset,r14, skip_first_frame=0; + int i=0, r14_offset,bc_offset, skip_first_frame=0; unsigned long async_start = 0, async_end = 0; + unsigned long panic_start = 0, panic_end = 0; unsigned long stack_end, stack_start, stack_base; + unsigned long r14; if (bt->hp && bt->hp->eip) { error(WARNING, @@ -854,9 +858,10 @@ s390x_back_trace_cmd(struct bt_info *bt) s390x_print_lowcore(lowcore,bt,0); return; } - - s390x_get_async_stack(lowcore,async_stack,&async_start, - &async_end); + s390x_get_int_stack("async_stack", lowcore, async_stack, + &async_start, &async_end); + s390x_get_int_stack("panic_stack", lowcore, panic_stack, + &panic_start, &panic_end); s390x_print_lowcore(lowcore,bt,1); fprintf(fp,"\n"); skip_first_frame=1; @@ -885,7 +890,7 @@ s390x_back_trace_cmd(struct bt_info *bt) unsigned long r14_stack_off; int j; - /* Find stack: Either async stack or task stack */ + /* Find stack: Either async, panic stack or task stack */ if((backchain > stack_start) && (backchain < stack_end)){ stack = bt->stackbuf; stack_base = stack_start; @@ -893,6 +898,10 @@ s390x_back_trace_cmd(struct bt_info *bt) && s390x_has_cpu(bt)){ stack = async_stack; stack_base = async_start; + } else if((backchain > panic_start) && (backchain < panic_end) + && s390x_has_cpu(bt)){ + stack = panic_stack; + stack_base = panic_start; } else { /* invalid stackframe */ break; @@ -913,7 +922,7 @@ s390x_back_trace_cmd(struct bt_info *bt) skip_first_frame=0; } else { fprintf(fp," #%i [%08lx] ",i,backchain); - fprintf(fp,"%s at %x\n", closest_symbol(r14), r14); + fprintf(fp,"%s at %lx\n", closest_symbol(r14), r14); if (bt->flags & BT_LINE_NUMBERS) s390x_dump_line_number(r14); i++; @@ -944,19 +953,20 @@ s390x_back_trace_cmd(struct bt_info *bt) } /* Check for interrupt stackframe */ - if((backchain == 0) && (stack == async_stack)){ - unsigned long psw_flags,r15; + if((backchain == 0) && + (stack == async_stack || stack == panic_stack)) { + int pt_regs_off = old_backchain - stack_base + 160; + unsigned long psw_flags; - psw_flags = ULONG(&stack[old_backchain - stack_base - +96 +MEMBER_OFFSET("pt_regs","psw")]); + psw_flags = ULONG(&stack[pt_regs_off + + MEMBER_OFFSET("pt_regs", "psw")]); if(psw_flags & 0x1000000000000ULL){ /* User psw: should not happen */ break; } - r15 = ULONG(&stack[old_backchain - stack_base + - 96 + MEMBER_OFFSET("pt_regs", - "gprs") + 15 * S390X_WORD_SIZE]); - backchain=r15; + backchain = ULONG(&stack[pt_regs_off + + MEMBER_OFFSET("pt_regs", "gprs") + + 15 * S390X_WORD_SIZE]); fprintf(fp," - Interrupt -\n"); } } while(backchain != 0); @@ -1036,28 +1046,28 @@ s390x_print_lowcore(char* lc, struct bt_ fprintf(fp," -access registers:\n"); ptr = lc + MEMBER_OFFSET("_lowcore","access_regs_save_area"); - tmp[0]=ULONG(ptr); - tmp[1]=ULONG(ptr + 4); - tmp[2]=ULONG(ptr + 2 * 4); - tmp[3]=ULONG(ptr + 3 * 4); + tmp[0]=UINT(ptr); + tmp[1]=UINT(ptr + 4); + tmp[2]=UINT(ptr + 2 * 4); + tmp[3]=UINT(ptr + 3 * 4); fprintf(fp," %#010lx %#010lx %#010lx %#010lx\n", tmp[0], tmp[1], tmp[2], tmp[3]); - tmp[0]=ULONG(ptr + 4 * 4); - tmp[1]=ULONG(ptr + 5 * 4); - tmp[2]=ULONG(ptr + 6 * 4); - tmp[3]=ULONG(ptr + 7 * 4); + tmp[0]=UINT(ptr + 4 * 4); + tmp[1]=UINT(ptr + 5 * 4); + tmp[2]=UINT(ptr + 6 * 4); + tmp[3]=UINT(ptr + 7 * 4); fprintf(fp," %#010lx %#010lx %#010lx %#010lx\n", tmp[0], tmp[1], tmp[2], tmp[3]); - tmp[0]=ULONG(ptr + 8 * 4); - tmp[1]=ULONG(ptr + 9 * 4); - tmp[2]=ULONG(ptr + 10* 4); - tmp[3]=ULONG(ptr + 11* 4); + tmp[0]=UINT(ptr + 8 * 4); + tmp[1]=UINT(ptr + 9 * 4); + tmp[2]=UINT(ptr + 10 * 4); + tmp[3]=UINT(ptr + 11 * 4); fprintf(fp," %#010lx %#010lx %#010lx %#010lx\n", tmp[0], tmp[1], tmp[2], tmp[3]); - tmp[0]=ULONG(ptr + 12* 4); - tmp[1]=ULONG(ptr + 13* 4); - tmp[2]=ULONG(ptr + 14* 4); - tmp[3]=ULONG(ptr + 15* 4); + tmp[0]=UINT(ptr + 12 * 4); + tmp[1]=UINT(ptr + 13 * 4); + tmp[2]=UINT(ptr + 14 * 4); + tmp[3]=UINT(ptr + 15 * 4); fprintf(fp," %#010lx %#010lx %#010lx %#010lx\n", tmp[0], tmp[1], tmp[2], tmp[3]);

15 years, 4 months

2
2
0 / 0

Re: [Crash-utility] User-land backtrace?

by Dave Anderson

----- "Darrin Thompson" <darrinth(a)gmail.com> wrote: > On Wed, Feb 24, 2010 at 12:21 PM, Dave Anderson < anderson(a)redhat.com > > wrote: > > > > That's right. That is the stack value that will be restored upon > return to user-space, and the EIP will be restored to 00f14402. > > One thing to make sure of is that when you do the "rd -u", you > have set the crash utility to the context of the task whose "bt" > output you're showing. "rd -u" will read the user space of the > current task (i.e., the task shown if you do a "set" command). > > Could that be adapted into a way to produce a userspace core dump that > we could feed to regular old gdb? This question comes up from time to time. If all of a task's user pages were in memory (not swapped out), and therefore in the vmcore -- which is becoming more and more unlikely with the use of makedumpfile to skip user pages altogether -- then theoretically the kernel's elf_core_dump() function could basically be "ported" to user-space. I think... Anyway, I'm not particularly interested in doing it. But it would be an excellent candidate for an extension module if anybody's willing to take it on. Dave

15 years, 4 months

1
0
0 / 0

Re: [Crash-utility] User-land backtrace?

by Dave Anderson

----- "Gallus" <gall.cwpl(a)gmail.com> wrote: > Hi, > is it possible to display a stack trace of an user space process? No. You could do a raw "rd -u" of the user-space stack, but given that the crash utility has no knowledge of any user-space symbols, it's probably not going to be very illuminating. Dave

15 years, 4 months

3
5
0 / 0

User-land backtrace?

by Gallus

Hi, is it possible to display a stack trace of an user space process? Gallus

15 years, 4 months

1
0
0 / 0

Re: [Crash-utility] [crash-5.0.1] glibc detected: double free or corruption (!prev)

by Dave Anderson

----- "Dave Anderson" <anderson(a)redhat.com> wrote: > Agreed on all counts. It's crashing now because of the gdb-7.0 integration, > and the attached patch should fix that. Check that -- the first patch is not enough, because it will retry the add-symbol-file operation the "old way", which I presume will also fail. Try this second patch... Thanks, Dave

15 years, 4 months

1
0
0 / 0

Re: [Crash-utility] [crash-5.0.1] glibc detected: double free or corruption (!prev)

by Dave Anderson

----- "Hedi Berriche" <hedi(a)sgi.com> wrote: > Context: > > - crash-5.0.1 > - glibc 2.4 > - vmcore produced by x86_64 sles11 2.6.27.19-5-default > > Problem: > > crash> mod -s xfs /usr/people/hedi/xfs.ko.debug > mod: xfs: last symbol is not _MODULE_END_xfs? > *** glibc detected *** /tr/x86_64/bin/crash: double free or corruption > (!prev): 0x0000000001558760 *** > <segmentation violation in gdb> > mod: /usr/people/hedi/xfs.ko.debug > gdb add-symbol-file command failed > > hangs solid there and has to be killed with SIGKILL. > > Grabbing a core reveals the following: > > (gdb) bt f > #0 0x00002b628cd0ebb5 in raise () from /lib64/libc.so.6 > #1 0x00002b628cd0ffb0 in abort () from /lib64/libc.so.6 > #2 0x00002b628cd4a340 in malloc_printerr () from /lib64/libc.so.6 > #3 0x00000000005454af in parse_exp_in_context (stringptr=0x400000000, > block=<value optimized out>, comma=<value optimized out>, > void_context_p=0, out_subexp=0x7b4760) > at parse.c:1101 > except = {reason = RETURN_ERROR, error = GENERIC_ERROR, > message = 0x1c790a0 "Dwarf Error: Could not find abbrev number 188 [in > module /usr/people/hedi/xfs.ko.debug]"} > old_chain = (struct cleanup *) 0x0 > subexp = <value optimized out> > #4 0x000000060000000b in ?? () > #5 0x0000000000000000 in ?? () > > (gdb) f 3 > #3 0x00000000005454af in parse_exp_in_context (stringptr=0x400000000, > block=<value optimized out>, comma=<value optimized out>, > void_context_p=0, out_subexp=0x7b4760) > at parse.c:1101 > 1101 xfree (expout); > > (gdb) list > 1096 } > 1097 if (except.reason < 0) > 1098 { > 1099 if (! in_parse_field) > 1100 { > 1101 xfree (expout); > 1102 throw_exception (except); > 1103 } > 1104 } > 1105 > > Not sure (yet) whether the error > > mod: xfs: last symbol is not _MODULE_END_xfs? > Dwarf Error: Could not find abbrev number 188 [in module /usr/people/hedi/xfs.ko.debug] > > is a problem in crash or in the xfs.ko.debug objfile but that's another story, > the problem here is that crash shouldn't crash. > > > FWIW, this problem is most definitely a regression, indeed crash version > 4.-8.11, for example, fails to load the objfile, with exactly the same error > message, with the notable difference that it does *not* crash. Agreed on all counts. It's crashing now because of the gdb-7.0 integration, and the attached patch should fix that. As far as the embedded "add-symbol-file" failure to load the module, you're right, that's another issue, and what I can suggest is this: crash> set debug 1 crash> mod -s xfs /usr/people/hedi/xfs.ko.debug and you will see the full "add-symbol-file" gdb command string that's failing. For that matter you can take that full string, remove crash from the picture entirely, and just enter it into a gdb session: $ gdb ... add-symbol-file arg arg arg... It looks like some kind of Dwarf issue though, and I can't help with that. However, at least on a RHEL environment, the argument to the mod command should be the stripped module.ko file, and the module.ko.debug file gets found automatically, and the two pieces put together. In other words, taking the "ext3" module, my RHEL5 environment has: /lib/modules/2.6.18-128.el5/kernel/fs/ext3/ext3.ko /usr/lib/debug/lib/modules/2.6.18-128.el5/kernel/fs/ext3/ext3.ko.debug And when it gets loaded, the base "ext3.ko" file is used as the internal argument to the gdb "add-symbol-file" command: crash> mod -s ext3 MODULE NAME SIZE OBJECT FILE ffffffff8806ae00 ext3 168017 /lib/modules/2.6.18-128.el5/kernel/fs/ext3/ext3.ko crash> I wonder if you would still see the same issue if you used the base "xfs.ko" file instead of "xfs.ko.debug"? For the first time I saw one of those (harmless) "last symbol is not _MODULE_END_xxx" messages on a 2.6.32 x86 kernel the other day. I'll look into that. And lastly: > P.S. The "last symbol is not _MODULE_END_<modulename>" has been reported > back in Jan 2009 (albeit with the difference that crash would load the > objfile despite the error message) > > https://www.redhat.com/archives/crash-utility/2009-January/msg00070.html > > but I am not sure the root cause was identified back then, or at least I am > failing to find, in the list archives, any proof of that. I don't know what the deal was with that... Dave

15 years, 4 months

1
0
0 / 0

[crash-5.0.1] glibc detected: double free or corruption (!prev)

by Hedi Berriche

Context: - crash-5.0.1 - glibc 2.4 - vmcore produced by x86_64 sles11 2.6.27.19-5-default Problem: crash> mod -s xfs /usr/people/hedi/xfs.ko.debug mod: xfs: last symbol is not _MODULE_END_xfs? *** glibc detected *** /tr/x86_64/bin/crash: double free or corruption (!prev): 0x0000000001558760 *** <segmentation violation in gdb> mod: /usr/people/hedi/xfs.ko.debug gdb add-symbol-file command failed hangs solid there and has to be killed with SIGKILL. Grabbing a core reveals the following: (gdb) bt f #0 0x00002b628cd0ebb5 in raise () from /lib64/libc.so.6 #1 0x00002b628cd0ffb0 in abort () from /lib64/libc.so.6 #2 0x00002b628cd4a340 in malloc_printerr () from /lib64/libc.so.6 #3 0x00000000005454af in parse_exp_in_context (stringptr=0x400000000, block=<value optimized out>, comma=<value optimized out>, void_context_p=0, out_subexp=0x7b4760) at parse.c:1101 except = {reason = RETURN_ERROR, error = GENERIC_ERROR, message = 0x1c790a0 "Dwarf Error: Could not find abbrev number 188 [in module /usr/people/hedi/xfs.ko.debug]"} old_chain = (struct cleanup *) 0x0 subexp = <value optimized out> #4 0x000000060000000b in ?? () #5 0x0000000000000000 in ?? () (gdb) f 3 #3 0x00000000005454af in parse_exp_in_context (stringptr=0x400000000, block=<value optimized out>, comma=<value optimized out>, void_context_p=0, out_subexp=0x7b4760) at parse.c:1101 1101 xfree (expout); (gdb) list 1096 } 1097 if (except.reason < 0) 1098 { 1099 if (! in_parse_field) 1100 { 1101 xfree (expout); 1102 throw_exception (except); 1103 } 1104 } 1105 Not sure (yet) whether the error mod: xfs: last symbol is not _MODULE_END_xfs? Dwarf Error: Could not find abbrev number 188 [in module /usr/people/hedi/xfs.ko.debug] is a problem in crash or in the xfs.ko.debug objfile but that's another story, the problem here is that crash shouldn't crash. FWIW, this problem is most definitely a regression, indeed crash version 4.-8.11, for example, fails to load the objfile, with exactly the same error message, with the notable difference that it does *not* crash. Cheers, Hedi. P.S. The "last symbol is not _MODULE_END_<modulename>" has been reported back in Jan 2009 (albeit with the difference that crash would load the objfile despite the error message) https://www.redhat.com/archives/crash-utility/2009-January/msg00070.html but I am not sure the root cause was identified back then, or at least I am failing to find, in the list archives, any proof of that. -- Hedi Berriche Global Product Support http://www.sgi.com/support

15 years, 4 months

1
0
0 / 0

[ANNOUNCE] crash version 5.0.1 is available

by Dave Anderson

- Due to a change in the x86 disassembler output from the embedded gdb-7.0 that was introduced in crash version 5.0.0, there may be a stream of warning messages during invocation that indicate "crash: invalid input: <string>:" and "crash: input string too large: <string>: (9 vs 8)" on 2.6.20 and earlier x86 kernels. (anderson(a)redhat.com) - As of glibc 2.11, the mkstemps() function has been introduced as a versioned symbol. As a result, crash utility binaries built on host machines with glibc 2.11 or later cannot be run on systems that run pre-2.11 glibc versions, failing during invocation with the error message "crash: relocation error: crash: symbol mkstemps, version GLIBC_2.11 not defined in file libc.so.6 with link time reference". With the patch, the pre-existing version of mkstemps() from the built-in libiberty.a library will always be used. (jmoyer(a)redhat.com) - Fix for the "irq" command on 2.6.33 and later kernels to account for the removal of the irqaction.mask structure member. Without the patch, the "irq" command fails with the error message "irq: invalid structure member offset: irqaction_mask". (bernhard(a)bwalle.de) - Added a defensive mechanism to handle a corrupted "cache_cache" kmem_cache structure. Without the patch, a vmcore that had such a corruption caused a failure during invocation with the error message "crash: zero-size memory allocation!". (anderson(a)redhat.com) - Fix for the "swap", "kmem -i", and "vm -p" commands to account for the 2.6.33 kernel changes to the swap_info_struct data structure and the swap_info[] array type. Without the patch, "swap" would show only the command's header, "kmem -i" would show zero swap usage, and "vm -p" would show "(unknown swap location)" when translating the swap file name for any swapped-out pages in the task. (anderson(a)redhat.com) - Fix for a segmentation violation during session invocation when running against 2.6.30 or later x86_64 dumpfiles whose kernel is not configured with CONFIG_SMP. (anderson(a)redhat.com) - Fix for the "bt" command on an ia64 "INIT" process that interrupted a task that was running in user space, but was unable to modify the original (interrupted) task's stack. Without the patch, the "INIT" task's backtrace would not display the task that was interrupted, and would display the error message "bt: unwind: failed to locate return link (ip=<user-virtual-address>)!". With the patch, the interrupted task information is displayed in the same manner as if the original stack had been modified. (tindoh(a)redhat.com) - Fix for x86, s390, s390x and ia64 architectures to set the system cpu count equal to the highest cpu online plus one. Without the patch, those architectures would use the number of online cpus as the system's total cpu count, which would be misleading when any offline cpu number was less than the highest online cpu number. (anderson(a)redhat.com) - Fix for package build failure on x86_64 when using gcc-4.5. Without the patch, these types of errors are generated: unwind_x86_32_64.c:50:2: error: initializer element is not constant unwind_x86_32_64.c:50:2: error: (near initialization for 'reg_info[7].offs') unwind_x86_32_64.c:50:2: error: initializer element is not constant unwind_x86_32_64.c:50:2: error: (near initialization for 'reg_info[8].offs') (troy.heber(a)hp.com) - Fix to recognize the symbol type change of per-cpu variables from 'd' or 'D' to 'V'. Without the patch, entering a command of the form "p per_cpu__<variable>" would fail with the error message "p: gdb request failed: p per_cpu__<variable>". With the fix, the symbol is recognized as a per-cpu variable, in which case the data type of the variable is displayed, followed by a list of the virtual addresses of each per-cpu instance of the variable. (anderson(a)redhat.com) - Fix for the "struct" and "union" commands when passed an address that is in a valid kernel virtual address region but is either unmapped or non-existent. Without the patch, the following three error messages are displayed: struct <name> struct: invalid kernel virtual address: <kernel-address> type: "gdb_readmem_callback" gdb called without error_hook: Cannot access memory at address <kernel-address> *** glibc detected *** crash: double free or corruption (!prev): <crash-address> *** followed by a backtrace and the crash utility memory map. The session aborts at that point. With the fix, the commands will fail gracefully after displaying error messages reporting that the kernel virtual address cannot be accessed. (anderson(a)redhat.com) - Update for 2.6.33 and later s390 and s390x kernels to account for the "_lowcore" structure member name change from "st_status_fixed_logout" to "psw_save_area". (holzheu(a)linux.vnet.ibm.com) - Fix for very large Xen domU dumpfiles that locate the base offset of relevant ELF sections beyond the 4GB mark. Without the patch, the crash session fails with the error messages "crash: cannot find mfn <number> (0x<number>) in page index" followed by "crash: cannot read/find cr3 page". (anderson(a)redhat.com, xiaowei.hu(a)oracle.com) - If a kernel crash occurs during a kernel module loading operation, it is possible that a subsequent crash session on the vmcore may result in a segmentation violation during the "please wait... (gathering module symbol data)" phase. (john.wright(a)hp.com) - Fix for a gdb-7.0 regression that causes the line number capability to fail with certain ranges of x86 base kernel text addresses. Without the patch, the "dis -l <symbol>" or "sym <symbol>" commands would fail to show line number information for certain ranges of base kernel text addresses. (anderson(a)redhat.com) - Fix for the "bt" command when run on offline s390/s390x "swapper" idle tasks. Without the patch, the command fails with the error message "bt: invalid kernel virtual address: ffffffffffffc000 type: async_stack". (holzheu(a)linux.vnet.ibm.com) - Preparation for future s390x ELF dumpfile format. (holzheu(a)linux.vnet.ibm.com) Download from: http://people.redhat.com/anderson

15 years, 4 months

2
1
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility February 2010