----- "ville mattila" <ville.mattila(a)stonesoft.com> wrote:
crash-utility-bounces(a)redhat.com wrote on 14.01.2010 16:08:41:
> From:
>
> Dave Anderson <anderson(a)redhat.com>
>
> To:
>
> ----- "ville mattila" <ville.mattila(a)stonesoft.com> wrote:
>
> > Hello,
> >
> > I get segementation fault from our 64-bit kernel crash
> > This crash is caused by "echo c > /proc/sys-trigger".
> > The reason seems to be that the x86_64_cpu_pda_init is
> > not called at least gdb do not break there.
> >
> > Here is a little patch that fixes it. Everyting seems to
> > work correctly. I'll provide more info if needed.
> >
> >
> > --- crash-5.0.0/x86_64.c 2010-01-06 21:38:27.000000000 +0200
> > +++ crash-5.0.0-64bit/x86_64.c 2010-01-14 08:24:13.679603706 +0200
> > @@ -6325,6 +6325,12 @@ x86_64_get_active_set(void)
> >
> > ms = machdep->machspec;
> >
> > + if (!ms->current) {
> > + error(INFO, "%s: Cannot get active set, ms->current is NULL\n",
> > + __func__);
> > + return;
> > + }
> > +
>
> That patch just masks the real problem.
>
> What kernel version is it?
>
> If it's 2.6.30 or later, then x86_64_per_cpu_init() should
> be called, otherwise x86_64_cpu_pda_init() is called. And
> whichever one that gets called should allocate the array.
>
> 2.6.30 or later kernels should show:
>
> crash> struct x8664_pda
> struct: invalid data structure reference: x8664_pda
> crash>
>
> and they will use x86_64_per_cpu_init().
>
> Kernels prior to 2.6.30 should show:
>
> crash> struct x8664_pda
> struct x8664_pda {
> struct task_struct *pcurrent;
> long unsigned int data_offset;
> long unsigned int kernelstack;
> long unsigned int oldrsp;
> long unsigned int debugstack;
> int irqcount;
> int cpunumber;
> char *irqstackptr;
> int nodenumber;
> unsigned int __softirq_pending;
> unsigned int __nmi_count;
> int mmu_state;
> struct mm_struct *active_mm;
> unsigned int apic_timer_irqs;
> }
> SIZE: 128
> crash>
>
> and they will use x86_64_cpu_pda_init().
>
> If you're having trouble with gdb, can you put some fprintf(fp, ...)
> calls in the relevant function and find out why it isn't doing
> the calloc() call?
Yes I thought so. This is a customized 2.6.31.7
kernel.org
kernel. This is a UP configuration e.g. CONFIG_SMP is n.
I think the problem is that the PER_CPU_OFF is not set.
Ahah -- that would do it. UP x86_64 kernels are so rare
that apparently nobody ever noticed, and I don't have a UP
x86_64 vmcore to even test with. (RHEL5 doesn't even ship
a UP x86_64 kernel).
Anyway, that change went into 4.0-8.11. And as far as I
can tell, x86_64_per_cpu_init() should still populate the
single "ms->current[0]" task from the "per_cpu__current_task"
symbol from UP kernels -- which doesn't need the PER_CPU_OFF
translation mechanism. In other words, I think you should
be able to do this on your UP kernel:
crash> px per_cpu__current_task
and it should show the panic task address that comes up as the
current task upon invocation. Is that right?
Btw, the "struct" command caused another segementation
fault.
Here is gdb bt:
(gdb) bt
#0 0x00007f74b3524a92 in strcmp () from /lib/libc.so.6
#1 0x0000000000534284 in lookup_partial_symtab (name=0x120e3c0
"x8664_pda")
at symtab.c:276
#2 0x00000000005344ed in lookup_symtab (name=0x120e3c0 "x8664_pda")
at symtab.c:228
#3 0x000000000060019d in c_lex () at c-exp.y:2149
#4 0x00000000006008f5 in c_parse_internal () at c-exp.c.tmp:1468
#5 0x00000000006022dd in c_parse () at c-exp.y:2225
#6 0x000000000055f614 in parse_exp_in_context
(stringptr=0x7fffbc2f2260,
block=<value optimized out>, comma=<value optimized out>,
void_context_p=0, out_subexp=0x0) at parse.c:1094
#7 0x000000000055f924 in parse_expression (string=0x7fffbc2f2950
"x8664_pda")
at parse.c:1144
#8 0x000000000053291b in gdb_command_funnel (req=0xca2c00) at
symtab.c:4992
#9 0x00000000004c1740 in gdb_interface (req=0xca2c00) at
gdb_interface.c:407
#10 0x00000000004e9dca in datatype_info (name=0xb618a7 "x8664_pda",
member=0x0, dm=0x7fffbc2f3620) at symbols.c:4146
#11 0x00000000004eb1ee in arg_to_datatype (s=0xb618a7 "x8664_pda",
dm=0x7fffbc2f3620, flags=524290) at symbols.c:4867
#12 0x00000000004efa1b in cmd_datatype_common (flags=2048) at
symbols.c:4664
#13 0x000000000045efd9 in exec_command () at main.c:644
#14 0x000000000045f1fa in main_loop () at main.c:603
#15 0x00000000005452a9 in captured_command_loop (data=0x120e3c0)
at ./main.c:226
#16 0x00000000005434e4 in catch_errors (func=0x5452a0
<captured_command_loop>,
func_args=0x0, errstring=0x7f9d7c "", mask=<value optimized out>)
at exceptions.c:520
#17 0x0000000000544d36 in captured_main (data=<value optimized out>)
at ./main.c:924
#18 0x00000000005434e4 in catch_errors (func=0x544340 <captured_main>,
func_args=0x7fffbc2f38b0, errstring=0x7f9d7c "",
mask=<value optimized out>) at exceptions.c:520
#19 0x000000000054412f in gdb_main_entry (argc=<value optimized out>,
argv=<value optimized out>) at ./main.c:939
#20 0x000000000045fece in main (argc=3, argv=0x7fffbc2f3a08) at
main.c:517
(gdb) frame 1
#1 0x0000000000534284 in lookup_partial_symtab (name=0x120e3c0
"x8664_pda")
at symtab.c:276
276 if (FILENAME_CMP (name, pst->filename) == 0)
(gdb) p name
$4 = 0x120e3c0 "x8664_pda"
(gdb) p pst
$5 = (struct partial_symtab *) 0x14d6040
(gdb) p pst->filename
$6 = 0x0
(gdb) p *pst
$7 = {next = 0x0, filename = 0x0, fullname = 0x0, dirname = 0x0,
objfile = 0x0, section_offsets = 0x0, textlow = 0, texthigh = 0,
dependencies = 0x0, number_of_dependencies = 0, globals_offset = 0,
n_global_syms = 0, statics_offset = 0, n_static_syms = 0, symtab =
0x0,
read_symtab = 0, read_symtab_private = 0x0, readin = 0 '\0'}
(gdb)
I fixed it with the patch below:
-- crash-5.0.0/gdb-7.0/gdb/symtab.c 2010-01-15 10:41:00.919973440
+0200
+++ crash-5.0.0-64bit/gdb-7.0/gdb/symtab.c 2010-01-15
10:19:21.436128740 +0200
@@ -256,7 +256,7 @@ got_symtab:
struct partial_symtab *
lookup_partial_symtab (const char *name)
{
- struct partial_symtab *pst;
+ struct partial_symtab *pst = NULL;
struct objfile *objfile;
char *full_path = NULL;
char *real_path = NULL;
@@ -273,7 +273,7 @@ lookup_partial_symtab (const char *name)
ALL_PSYMTABS (objfile, pst)
{
- if (FILENAME_CMP (name, pst->filename) == 0)
+ if (pst->filename && FILENAME_CMP (name, pst->filename) == 0)
{
return (pst);
}
@@ -311,7 +311,7 @@ lookup_partial_symtab (const char *name)
if (lbasename (name) == name)
ALL_PSYMTABS (objfile, pst)
{
- if (FILENAME_CMP (lbasename (pst->filename), name) == 0)
+ if (pst->filename && FILENAME_CMP (lbasename (pst->filename), name)
== 0)
return (pst);
}
Weird -- so you're apparently able to do that when running any
"struct <non-existent>" command from the crash command line?
But I can't reproduce that -- this is what should happen:
crash> struct this_is_junk
struct: invalid data structure reference: this_is_junk
crash>
and I don't understand what could be different with your
custom kernel?
>
> Either that, or if you can make the vmlinux/vmcore pair available
> for me to download, I can look at it.
I'll arrange this if the above information is not enough.
Yes please -- can you put the vmlinux/vmcore pair somewhere
where I can download it? You can send me the particulars
off-line to anderson(a)redhat.com.
Thanks,
Dave