-----Original Message-----
From: crash-utility-bounces(a)redhat.com [mailto:crash-utility-
bounces(a)redhat.com] On Behalf Of Dave Anderson
Sent: Thursday, August 12, 2010 12:18 PM
To: Discussion list for crash utility usage, maintenance and
development
Subject: Re: [Crash-utility] crash: invalid structure member offset
----- "Reinoud Koornstra" <koornstra(a)hp.com> wrote:
> Thanks,
>
> Using crash 5.0.6 worked nicely.
> However, I can't really look at a lot because of a bad EIP code.
>
> [ 726.601381] 802.1Q VLAN Support v1.8 Ben Greear
<greearb(a)candelatech.com>
> [ 726.601384] All bugs added by David S. Miller <davem(a)redhat.com>
> [ 726.646757] BUG: unable to handle kernel NULL pointer dereference
at 00000000
> [ 726.732410] IP: [<00000000>]
> [ 726.766933] *pdpt = 0000000000431001 *pde = 0000000000000000
> [ 726.766937] Oops: 0010 [#1] SMP
> [ 726.790844] Modules linked in: 8021q iptable_filter ip_tables
> x_tables ip_gre af_packet i2c_dev i2c_qs i2c_algo_bit i2c_core garp
> stp llc ixgbe inet_lro psmouse serio_raw intel_agp shpchp iTCO_wdt
> pci_hotplug iTCO_vendor_support agpgart ext3 jbd mbcache sd_mod
> crc_t10dif sg ata_piix ata_generic ahci libata scsi_mod ehci_hcd
> uhci_hcd usbcore [last unloaded: 8021q]
> [ 726.790844]
> [ 726.790844] Pid: 4, comm: ksoftirqd/0 Tainted: P (2.6.27)
> [ 726.790844] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0
> [ 726.790844] EIP is at 0x0
> [ 726.790844] EAX: e7f4c498 EBX: 00000000 ECX: 77470000 EDX:
e7f4c498
> [ 726.790844] ESI: 4bd1d300 EDI: 00000007 EBP: f784df88 ESP:
f784df78
> [ 726.790844] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 726.790844] Process ksoftirqd/0 (pid: 4, ti=f784c000 task=f783a5b0
task.ti=f784c000)
> [ 726.790844] Stack: 40168080 00000001 403daaa0 4042c500 f784df90
401681bf f784dfb0 4012fe92
> [ 726.790844] 0000000a 00000000 40429340 00000246 00000000
40130120 f784dfbc 4012ff55
> [ 726.790844] 4042c500 f784dfcc 40130182 fffffffc 00000000
f784dfe0 4013e707 4013e6c0
> [ 726.790844] Call Trace:
> [ 726.790844] [<40168080>] ? __rcu_process_callbacks+0x70/0x190
> [ 726.790844] [<401681bf>] ? rcu_process_callbacks+0x1f/0x40
> [ 726.790844] [<4012fe92>] ? __do_softirq+0x82/0x100
> [ 726.790844] [<40130120>] ? ksoftirqd+0x0/0xe0
> [ 726.790844] [<4012ff55>] ? do_softirq+0x45/0x50
> [ 726.790844] [<40130182>] ? ksoftirqd+0x62/0xe0
> [ 726.790844] [<4013e707>] ? kthread+0x47/0x80
> [ 726.790844] [<4013e6c0>] ? kthread+0x0/0x80
> [ 726.790844] [<4010494f>] ? kernel_thread_helper+0x7/0x10
> [ 726.790844] =======================
> [ 726.790844] Code: Bad EIP value.
> [ 726.790844] EIP: [<00000000>] 0x0 SS:ESP 0068:f784df78
>
> So now I can't figure out the piece of code where this dereferencing
> occurred. :(
Yeah, I don't know why the exception frame didn't displayed below in
the
bt output, but I think it may have been confusion due the the kernel
text
region starting a 4000000 (instead of the typical 3G/1G user/kernel
virtual
address split). I'm guessing your kernel is configured as 1G/3G user-
kernel?
That's right, the kernel is configured as 1G/3G user/kernel.
(I've never seen that before...)
It's a weird config indeed. I'll try rewriting some stuff so it consumes way less
memory so a normal kernel/user split can be used.
Never the less, why the pointer became null remains unsolved for the moment. :-)
Would the user/kernel split also be an issue in 64 bit?
Reinoud.
Anyway, somehow the EIP got zeroed out, and it took a fault trying
to handle that. That can happen if a kernel function corrupts its
own stack by incorrectly writing to its own local stack variables,
and in so doing writes a zero into the return address saved on the
stack. Then when the function returns, that zero is loaded into the
EIP, and you'd see something like the above.
The exception frame in the log shows that the ESP is f784df78,
and looking at the trace data below, it looks like
rcu_process_callbacks()
may have ended up calling something that lead to the EIP corruption.
Just a guess though...
Dave
>
> crash> bt
> PID: 4 TASK: f783a5b0 CPU: 0 COMMAND: "ksoftirqd/0"
> #0 [f784de88] crash_kexec at 401534a8
> #1 [f784df28] __slab_free at 4019677f
> #2 [f784df8c] rcu_process_callbacks at 401681ba
> #3 [f784df94] __do_softirq at 4012fe90
> #4 [f784dfb4] do_softirq at 4012ff50
> #5 [f784dfd0] kthread at 4013e705
> #6 [f784dfe4] kernel_thread_helper at 4010494d
>
> Thanks,
>
> Reinoud.
>
>
> > -----Original Message-----
> > From: crash-utility-bounces(a)redhat.com [mailto:crash-utility-
> > bounces(a)redhat.com] On Behalf Of Dave Anderson
> > Sent: Thursday, August 12, 2010 6:14 AM
> > To: Discussion list for crash utility usage, maintenance and
> > development
> > Subject: Re: [Crash-utility] crash: invalid structure member offset
> >
> >
> > ----- "Reinoud Koornstra" <koornstra(a)hp.com> wrote:
> >
> > > Hi Everyone,
> > >
> > > I am trying to read a core file into crash, but I've got bad luck
> as
> > you can see below.
> > > Is core file corrupt? It is a vmcore file from a 32 bits kernel
> that
> > > was compiled with PAE, could that have corrupted things?
> > > Any hints here?
> > > Thanks,
> > >
> > > Reinoud.
> > >
> > > $ crash System.map-2.6.27 ./vmlinux-2.6.27 ./vmcore
> > >
> > > crash 4.0-3.7
> >
> > I don't know if the vmcore is corrupt, but PAE wouldn't be an
> issue.
> >
> > However, you are running a version of crash that was released
> almost
> > 4 years ago (13-Oct-2006) against a two-year-old kernel that was
> > released 15-Oct-2008. That's pretty much a guarantee of failure.
> >
> > Try updating to version 5.0.6 and see what happens.
> >
> > And BTW, if the vmlinux file is the exact same kernel as the
> > one that generated the vmcore file, you don't need a System.map
> > argument.
> >
> > Dave
> >
> >
> >
> > 15-Oct-2008
> >
> > > Copyright 2002, 2003, 2004, 2005, 2006 Red Hat, Inc.
> > > Copyright 2004, 2005, 2006 IBM Corporation
> > > Copyright 1999-2006 Hewlett-Packard Co
> > > Copyright 2005 Fujitsu Limited
> > > Copyright 2005 NEC Corporation
> > > Copyright 1999, 2002 Silicon Graphics, Inc.
> > > Copyright 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> > > This program is free software, covered by the GNU General Public
> > License,
> > > and you are welcome to change it and/or distribute copies of it
> under
> > > certain conditions. Enter "help copying" to see the
conditions.
> > > This program has absolutely no warranty. Enter "help warranty"
> for
> > > details.
> > >
> > > GNU gdb 6.1
> > > Copyright 2004 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License,
> and
> > you are
> > > welcome to change it and/or distribute copies of it under certain
> > conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB. Type "show warranty"
> for
> > details.
> > > This GDB was configured as "i686-pc-linux-gnu"...
> > >
> > > please wait... (gathering kmem slab cache data)
> > >
> > > crash: invalid structure member offset: kmem_cache_s_c_num
> > > FILE: memory.c LINE: 6891 FUNCTION: kmem_cache_init()
> > >
> > > [/usr/bin/crash] error trace: 80827a9 => 8095398 => 80aa7ef =>
> > > 8131e88
> > > /usr/bin/nm: /usr/bin/crash: no symbols
> > > /usr/bin/nm: /usr/bin/crash: no symbols
> > > /usr/bin/nm: /usr/bin/crash: no symbols
> > > /usr/bin/nm: /usr/bin/crash: no symbols
> > >
> > > WARNING: Because this kernel was compiled with gcc version 4.1.2,
> > certain
> > > commands or command options may fail unless crash is
> invoked
> > with
> > > the "--readnow" command line option.
> >
> > --
> > Crash-utility mailing list
> > Crash-utility(a)redhat.com
> >
https://www.redhat.com/mailman/listinfo/crash-utility
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
>
https://www.redhat.com/mailman/listinfo/crash-utility
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility