crash-utility-bounces(a)redhat.com wrote on 15.01.2010 18:54:48:
From:
Dave Anderson <anderson(a)redhat.com>
Date:
15.01.2010 18:59
Subject:
Re: [Crash-utility] crash-5.0: Segmentation fault with
x86_64_get_active_set
Sent by:
crash-utility-bounces(a)redhat.com
----- "ville mattila" <ville.mattila(a)stonesoft.com> wrote:
> crash-utility-bounces(a)redhat.com wrote on 14.01.2010 16:08:41:
>
> > From:
> >
> > Dave Anderson <anderson(a)redhat.com>
> >
> > To:
> >
> > ----- "ville mattila" <ville.mattila(a)stonesoft.com> wrote:
> >
> > > Hello,
> > >
> > > I get segementation fault from our 64-bit kernel crash
> > > This crash is caused by "echo c > /proc/sys-trigger".
> > > The reason seems to be that the x86_64_cpu_pda_init is
> > > not called at least gdb do not break there.
> > >
> > > Here is a little patch that fixes it. Everyting seems to
> > > work correctly. I'll provide more info if needed.
> > >
> > >
> > > --- crash-5.0.0/x86_64.c 2010-01-06 21:38:27.000000000 +0200
> > > +++ crash-5.0.0-64bit/x86_64.c 2010-01-14 08:24:13.679603706 +0200
> > > @@ -6325,6 +6325,12 @@ x86_64_get_active_set(void)
> > >
> > > ms = machdep->machspec;
> > >
> > > + if (!ms->current) {
> > > + error(INFO, "%s: Cannot get active set, ms->current is
NULL\n",
> > > + __func__);
> > > + return;
> > > + }
> > > +
> >
> > That patch just masks the real problem.
> >
> > What kernel version is it?
> >
> > If it's 2.6.30 or later, then x86_64_per_cpu_init() should
> > be called, otherwise x86_64_cpu_pda_init() is called. And
> > whichever one that gets called should allocate the array.
> >
> > 2.6.30 or later kernels should show:
> >
> > crash> struct x8664_pda
> > struct: invalid data structure reference: x8664_pda
> > crash>
> >
> > and they will use x86_64_per_cpu_init().
> >
> > Kernels prior to 2.6.30 should show:
> >
> > crash> struct x8664_pda
> > struct x8664_pda {
> > struct task_struct *pcurrent;
> > long unsigned int data_offset;
> > long unsigned int kernelstack;
> > long unsigned int oldrsp;
> > long unsigned int debugstack;
> > int irqcount;
> > int cpunumber;
> > char *irqstackptr;
> > int nodenumber;
> > unsigned int __softirq_pending;
> > unsigned int __nmi_count;
> > int mmu_state;
> > struct mm_struct *active_mm;
> > unsigned int apic_timer_irqs;
> > }
> > SIZE: 128
> > crash>
> >
> > and they will use x86_64_cpu_pda_init().
> >
> > If you're having trouble with gdb, can you put some fprintf(fp, ...)
> > calls in the relevant function and find out why it isn't doing
> > the calloc() call?
>
>
> Yes I thought so. This is a customized 2.6.31.7
kernel.org
> kernel. This is a UP configuration e.g. CONFIG_SMP is n.
> I think the problem is that the PER_CPU_OFF is not set.
Ahah -- that would do it. UP x86_64 kernels are so rare
that apparently nobody ever noticed, and I don't have a UP
x86_64 vmcore to even test with. (RHEL5 doesn't even ship
a UP x86_64 kernel).
Anyway, that change went into 4.0-8.11. And as far as I
can tell, x86_64_per_cpu_init() should still populate the
single "ms->current[0]" task from the "per_cpu__current_task"
symbol from UP kernels -- which doesn't need the PER_CPU_OFF
translation mechanism. In other words, I think you should
be able to do this on your UP kernel:
crash> px per_cpu__current_task
and it should show the panic task address that comes up as the
current task upon invocation. Is that right?
Yes this works correctly.
> Btw, the "struct" command caused another segementation fault.
> Here is gdb bt:
>
> (gdb) bt
> #0 0x00007f74b3524a92 in strcmp () from /lib/libc.so.6
> #1 0x0000000000534284 in lookup_partial_symtab (name=0x120e3c0
> "x8664_pda")
> at symtab.c:276
> #2 0x00000000005344ed in lookup_symtab (name=0x120e3c0 "x8664_pda")
> at symtab.c:228
> #3 0x000000000060019d in c_lex () at c-exp.y:2149
> #4 0x00000000006008f5 in c_parse_internal () at c-exp.c.tmp:1468
> #5 0x00000000006022dd in c_parse () at c-exp.y:2225
> #6 0x000000000055f614 in parse_exp_in_context
> (stringptr=0x7fffbc2f2260,
> block=<value optimized out>, comma=<value optimized out>,
> void_context_p=0, out_subexp=0x0) at parse.c:1094
> #7 0x000000000055f924 in parse_expression (string=0x7fffbc2f2950
> "x8664_pda")
> at parse.c:1144
> #8 0x000000000053291b in gdb_command_funnel (req=0xca2c00) at
> symtab.c:4992
> #9 0x00000000004c1740 in gdb_interface (req=0xca2c00) at
> gdb_interface.c:407
> #10 0x00000000004e9dca in datatype_info (name=0xb618a7 "x8664_pda",
> member=0x0, dm=0x7fffbc2f3620) at symbols.c:4146
> #11 0x00000000004eb1ee in arg_to_datatype (s=0xb618a7 "x8664_pda",
> dm=0x7fffbc2f3620, flags=524290) at symbols.c:4867
> #12 0x00000000004efa1b in cmd_datatype_common (flags=2048) at
> symbols.c:4664
> #13 0x000000000045efd9 in exec_command () at main.c:644
> #14 0x000000000045f1fa in main_loop () at main.c:603
> #15 0x00000000005452a9 in captured_command_loop (data=0x120e3c0)
> at ./main.c:226
> #16 0x00000000005434e4 in catch_errors (func=0x5452a0
> <captured_command_loop>,
> func_args=0x0, errstring=0x7f9d7c "", mask=<value optimized out>)
> at exceptions.c:520
> #17 0x0000000000544d36 in captured_main (data=<value optimized out>)
> at ./main.c:924
> #18 0x00000000005434e4 in catch_errors (func=0x544340 <captured_main>,
> func_args=0x7fffbc2f38b0, errstring=0x7f9d7c "",
> mask=<value optimized out>) at exceptions.c:520
> #19 0x000000000054412f in gdb_main_entry (argc=<value optimized out>,
> argv=<value optimized out>) at ./main.c:939
> #20 0x000000000045fece in main (argc=3, argv=0x7fffbc2f3a08) at
> main.c:517
> (gdb) frame 1
> #1 0x0000000000534284 in lookup_partial_symtab (name=0x120e3c0
> "x8664_pda")
> at symtab.c:276
> 276 if (FILENAME_CMP (name, pst->filename) == 0)
> (gdb) p name
> $4 = 0x120e3c0 "x8664_pda"
> (gdb) p pst
> $5 = (struct partial_symtab *) 0x14d6040
> (gdb) p pst->filename
> $6 = 0x0
> (gdb) p *pst
> $7 = {next = 0x0, filename = 0x0, fullname = 0x0, dirname = 0x0,
> objfile = 0x0, section_offsets = 0x0, textlow = 0, texthigh = 0,
> dependencies = 0x0, number_of_dependencies = 0, globals_offset = 0,
> n_global_syms = 0, statics_offset = 0, n_static_syms = 0, symtab =
> 0x0,
> read_symtab = 0, read_symtab_private = 0x0, readin = 0 '\0'}
> (gdb)
>
>
> I fixed it with the patch below:
> -- crash-5.0.0/gdb-7.0/gdb/symtab.c 2010-01-15 10:41:00.919973440
> +0200
> +++ crash-5.0.0-64bit/gdb-7.0/gdb/symtab.c 2010-01-15
> 10:19:21.436128740 +0200
> @@ -256,7 +256,7 @@ got_symtab:
> struct partial_symtab *
> lookup_partial_symtab (const char *name)
> {
> - struct partial_symtab *pst;
> + struct partial_symtab *pst = NULL;
> struct objfile *objfile;
> char *full_path = NULL;
> char *real_path = NULL;
> @@ -273,7 +273,7 @@ lookup_partial_symtab (const char *name)
>
> ALL_PSYMTABS (objfile, pst)
> {
> - if (FILENAME_CMP (name, pst->filename) == 0)
> + if (pst->filename && FILENAME_CMP (name, pst->filename) == 0)
> {
> return (pst);
> }
> @@ -311,7 +311,7 @@ lookup_partial_symtab (const char *name)
> if (lbasename (name) == name)
> ALL_PSYMTABS (objfile, pst)
> {
> - if (FILENAME_CMP (lbasename (pst->filename), name) == 0)
> + if (pst->filename && FILENAME_CMP (lbasename (pst->filename), name)
> == 0)
> return (pst);
> }
Weird -- so you're apparently able to do that when running any
"struct <non-existent>" command from the crash command line?
But I can't reproduce that -- this is what should happen:
crash> struct this_is_junk
struct: invalid data structure reference: this_is_junk
crash>
Yes, before patching I always got segmentation fault when
using "struct". After patch everything seems to be fine.
and I don't understand what could be different with your
custom kernel?
> >
> > Either that, or if you can make the vmlinux/vmcore pair available
> > for me to download, I can look at it.
>
> I'll arrange this if the above information is not enough.
Yes please -- can you put the vmlinux/vmcore pair somewhere
where I can download it? You can send me the particulars
off-line to anderson(a)redhat.com.
I've sent you a email about the location.
- Ville