mm_struct for exiting tasks
by Justin Vreeland
Hello,
I'm Justin Vreeland, I'm currently and intern at Cray working with the
OS/Kernel group. We use crash frequently to track down various problems
and sometimes we need to get information about tasks that were exiting
when the dump was taken. Because the mm_struct has been removed from
the task struct. Crash doesn't let you use vtop or vm to do this so I
added a way to specify mm_struct (with -M) for tasks whose stats is
'Exiting'.
Currently it's a bit hackish it modifies the tasks status and context to
pass all the checks, and then restores both before returning. If this
is something you're interested in it I'd be happy to bring it up to
snuff. Modifications are attached.
--
-Justin
11 years, 4 months
debug 3th part module which oops the system
by Han Pingtian
Hey there,
I'm trying to analyse the vmcore come from an oops caused by a module. The module
comes from here:
http://www.linuxforu.com/2011/01/understanding-a-kernel-oops
This web page wants to teach how to analyse kernel oops. It provided a
module named 'oops', which triggers a NULL pointer dereference in its
init function.
The problem is I cannot figure out how to use crash to analyse vmcore:
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu"...
KERNEL: /usr/lib/debug/lib/modules/2.6.18-348.el5/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2013-07-01-04:43/vmcore
CPUS: 20
DATE: Mon Jul 1 04:38:49 2013
UPTIME: 00:33:44
LOAD AVERAGE: 0.22, 0.18, 0.07
TASKS: 482
NODENAME: lawlp3.upt.austin.ibm.com
RELEASE: 2.6.18-348.el5
VERSION: #1 SMP Wed Nov 28 21:23:52 EST 2012
MACHINE: ppc64 (3550 Mhz)
MEMORY: 3.2 GB
PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details)
PID: 5402
COMMAND: "insmod"
TASK: c0000000cfa35150 [THREAD_INFO: c0000000ce5d0000]
CPU: 15
STATE: TASK_RUNNING (PANIC)
crash> log
... ....
oops: module license 'unspecified' taints kernel.
oops from the module
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xd000000001460060
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=128 NUMA
Modules linked in: oops(PU) nfsd exportfs auth_rpcgss autofs4 hidp nfs nfs_acl rfcomm l2cap bluetooth lockd sunrpc ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables be2iscsi ib_iser rdma_cm ib_addr ib_cm ib_sa ib_mad iw_cm iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_multipath scsi_dh snd_powermac snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_timer snd soundcore i2c_core parport_pc lp parport sg iw_cxgb3 ib_core cxgb3 ibmveth 8021q dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc ibmvfc scsi_transport_fc ibmvscsic sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
NIP: D000000001460060 LR: D000000001460050 CTR: 0000000000000004
REGS: c0000000ce5d39b0 TRAP: 0300 Tainted: P ---- (2.6.18-348.el5)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022482 XER: 00000006
DAR: 0000000000000000, DSISR: 0000000042000000
TASK = c0000000cfa35150[5402] 'insmod' THREAD: c0000000ce5d0000 CPU: 15
GPR00: D000000001460050 C0000000CE5D3C30 D00000000146C930 0000000000000000
GPR04: 8000000000001032 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 C0000000015FBB68 0000000000000000
GPR12: 0000000000000000 C000000000570B80 0000000000000000 D0000000012B1850
GPR16: D0000000012B1810 D0000000014601B0 0000000000000000 0000000000000000
GPR20: 0000000000000028 D0000000012B0CE9 C0000000005A12E8 0000000000000029
GPR24: D0000000012A0000 000000000000002A C0000000CD6F5A80 C0000000CD6F5AB0
GPR28: C0000000005A18C8 D000000001460680 D00000000146C900 D000000001460680
NIP [D000000001460060] .my_oops_init+0x2c/0xd4 [oops]
LR [D000000001460050] .my_oops_init+0x1c/0xd4 [oops]
Call Trace:
[C0000000CE5D3C30] [C000000000098944] .sys_init_module+0x1a88/0x1d18 (unreliable)
[C0000000CE5D3E30] [C0000000000086A4] syscall_exit+0x0/0x40
Instruction dump:
4e800020 7c0802a6 fbc1fff0 ebc28000 f8010010 f821ff81 e87e8008 4800002d
e8410028 39200000 38210080 38600000 <91290000> e8010010 ebc1fff0 7c0803a6
<0>Sending IPI to other cpus...
crash> whatis my_oops_init
whatis: gdb request failed: whatis my_oops_init
crash> mod -s oops
MODULE NAME SIZE OBJECT FILE
d000000001460680 oops 18752 /lib/modules/2.6.18-348.el5/kernel/oops.ko
crash> whatis my_oops_init
int my_oops_init(void);
crash> dis -l .my_oops_init
<nothing outputed>
crash> sym -m oops
d000000001460000 MODULE START: oops
d000000001460000 (t) .my_oops_exit
d000000001460000 (t) .cleanup_module
d000000001460034 (t) .my_oops_init
d000000001460034 (t) .init_module
d000000001460130 (r) ____versions
d000000001460130 (r) __versions
d000000001460680 (D) __this_module
d000000001464910 (D) cleanup_module
d000000001464910 (d) my_oops_exit
d000000001464920 (D) init_module
d000000001464920 (d) my_oops_init
d000000001464940 MODULE END: oops
crash> bt
PID: 5402 TASK: c0000000cfa35150 CPU: 15 COMMAND: "insmod"
R0: d000000001460050 R1: c0000000ce5d3c30 R2: d00000000146c930
R3: 0000000000000000 R4: 8000000000001032 R5: 0000000000000000
R6: 0000000000000000 R7: 0000000000000000 R8: 0000000000000000
R9: 0000000000000000 R10: c0000000015fbb68 R11: 0000000000000000
R12: 0000000000000000 R13: c000000000570b80 R14: 0000000000000000
R15: d0000000012b1850 R16: d0000000012b1810 R17: d0000000014601b0
R18: 0000000000000000 R19: 0000000000000000 R20: 0000000000000028
R21: d0000000012b0ce9 R22: c0000000005a12e8 R23: 0000000000000029
R24: d0000000012a0000 R25: 000000000000002a R26: c0000000cd6f5a80
R27: c0000000cd6f5ab0 R28: c0000000005a18c8 R29: d000000001460680
R30: d00000000146c900 R31: d000000001460680
NIP: d000000001460060 MSR: 8000000000009032 OR3: c0000000005a13c0
CTR: 0000000000000004 LR: d000000001460050 XER: 0000000000000006
CCR: 0000000024022482 MQ: c0000000cd6f5ab0 DAR: 0000000000000000
DSISR: 0000000042000000 Syscall Result: 0000000000000000
NIP [d000000001460060] .init_module
LR [d000000001460050] .init_module
#0 [c0000000ce5d3c30] .sys_init_module at c000000000098944
#1 [c0000000ce5d3e30] syscall_exit at c0000000000086a4
syscall [c00] exception frame:
R0: 0000000000000080 R1: 00000000ff91fb60 R2: 000000000fff8eb0
R3: 0000000010020028 R4: 000000000001caf8 R5: 0000000010020018
R6: 000000000000002d R7: fffffffffeff0000 R8: 000000000002ffe0
R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000000000 R13: 000000001001959c R14: 0000000000000000
R15: 0000000000000000 R16: 0000000000000000 R17: 0000000000000000
R18: 0000000000000000 R19: 0000000000000000 R20: 0000000000000000
R21: 0000000000000000 R22: 0000000000000000 R23: 0000000000000000
R24: 000000000ffbf280 R25: 00000000ff91fdf0 R26: 0000000010020018
R27: 00000000ff91ff05 R28: 0000000000020000 R29: 000000000001caf8
R30: 0000000010020028 R31: 0000000000000003
NIP: 000000000ff0496c MSR: 000000000000d032 OR3: 0000000010020028
CTR: 000000000ff04964 LR: 0000000010000bf8 XER: 0000000000000000
CCR: 0000000044000484 MQ: 0000000002756c28 DAR: 000000001004002c
DSISR: 0000000042000000 Syscall Result: 0000000000000000
crash>
as you can see, the 'bt' command says the problem is at '.init_module',
but in fact it should come from '.my_oops_init'. But 'dis -l
.my_oops_init' shows nothing. I cannot use crash to figure out which line
of source code caused the oops. But using gdb as being stated in the web page I
can find the code line easily.
Please help. Thanks.
11 years, 4 months
PPC64: vtop of module and user-space virtual addresses fails on 3.10 kernels
by Dave Anderson
This is a request for some help from the IBM'ers on the list...
Starting somewhere in the 3.10 timeframe (I believe), the
virtual-to-physical translation of kernel modules no longer
works for ppc64. User-space vtop also seems to be broken.
For example, here's an example using a 3.10.0-0.rc4.59.el7.ppc64
kernel, which shows the "WARNING: cannot access vmalloc'd module
memory" message during initialization, and I also show the results
of a "vtop" on the first and last module addresses in the kernel's
"modules" list:
# crash
crash 7.0.1
Copyright (C) 2002-2013 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu"...
WARNING: cannot access vmalloc'd module memory
KERNEL: /usr/lib/debug/lib/modules/3.10.0-0.rc4.59.el7.ppc64/vmlinux
DUMPFILE: /dev/crash
CPUS: 28
DATE: Mon Jun 24 13:35:45 2013
UPTIME: 01:54:23
LOAD AVERAGE: 0.00, 0.17, 0.23
TASKS: 318
NODENAME: ibm-p730-04-lp4.rhts.eng.bos.redhat.com
RELEASE: 3.10.0-0.rc4.59.el7.ppc64
VERSION: #1 SMP Mon Jun 3 14:42:22 EDT 2013
MACHINE: ppc64 (3550 Mhz)
MEMORY: 8 GB
PID: 30000
COMMAND: "crash"
TASK: c000000192860000 [THREAD_INFO: c0000001c6c00000]
CPU: 5
STATE: TASK_RUNNING (ACTIVE)
crash> p modules
modules = $1 = {
next = 0xd00000000d019080,
prev = 0xd000000001016f10
}
crash> vtop 0xd00000000d019080
VIRTUAL PHYSICAL
d00000000d019080 (not mapped)
PAGE DIRECTORY: c0000000011d0000
L4: c0000000011d0000 => c0000001fba80000
PMD: c0000001fba80000 => c0000001fba70000
PMD: c0000001fba80000 => fba76808
PTE: fba76808 => 0
crash> vtop 0xd000000001016f10
VIRTUAL PHYSICAL
d000000001016f10 (not mapped)
PAGE DIRECTORY: c0000000011d0000
L4: c0000000011d0000 => c0000001fba80000
PMD: c0000001fba80000 => c0000001fba70000
PMD: c0000001fba80000 => fba70808
PTE: fba70808 => 0
crash>
I'm not at all familiar with ppc64 page table walk-throughs,
but what little debugging I've tried has yielded nothing other
than module-address-translation that end up with PTE's that
contain zero like the above.
Also, the vtop of user-space addresses also seems to be completely
disfunctional. Taking the live system above, if I take the user-space
address of the page-table page buffer used by the crash utility itself,
and try to do a vtop on it, it yields this obviously bogus result:
crash> help -m | grep ptbl:
ptbl: 100133d6f10
crash> vtop 100133d6f10
VIRTUAL PHYSICAL
100133d6f10 (not mapped)
PAGE DIRECTORY: c0000001e0940000
L4: c0000001e0940008 => 0
VMA START END FLAGS FILE
c00000001b7d0000 10012fe0000 100167b0000 100073
crash>
It seems to have been something recently introduced, as here's
a 3.9.0-0.rc8.54.el7.ppc64 kernel, which works just fine:
# crash
crash 7.0.1
Copyright (C) 2002-2013 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu"...
KERNEL: /usr/lib/debug/lib/modules/3.9.0-0.rc8.54.el7.ppc64/vmlinux
DUMPFILE: /dev/crash
CPUS: 28
DATE: Mon Jun 24 13:50:38 2013
UPTIME: 00:12:20
LOAD AVERAGE: 0.22, 0.17, 0.16
TASKS: 316
NODENAME: ibm-p730-04-lp4.rhts.eng.bos.redhat.com
RELEASE: 3.9.0-0.rc8.54.el7.ppc64
VERSION: #1 SMP Mon Apr 22 18:30:40 EDT 2013
MACHINE: ppc64 (3550 Mhz)
MEMORY: 8 GB
PID: 7035
COMMAND: "crash"
TASK: c0000001ddf00000 [THREAD_INFO: c0000001bb780000]
CPU: 4
STATE: TASK_RUNNING (ACTIVE)
crash> p modules
modules = $1 = {
next = 0xd00000000c729100,
prev = 0xd000000000fe6ff8
}
crash> vtop 0xd00000000c729100
VIRTUAL PHYSICAL
d00000000c729100 1cfb29100
PAGE DIRECTORY: c000000001190000
L4: c000000001190000 => c0000001fba80000
PMD: c0000001fba80000 => c0000001fba60000
PMD: c0000001fba80000 => fba66390
PTE: fba66390 => 73ec88000395
PAGE: 1cfb20000
PTE PHYSICAL FLAGS
73ec88000395 1cfb20000 (PRESENT|RW|COHERENT|DIRTY|ACCESSED)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c0000000042c26f0 1cfb20000 0 0 1 73c0400000000
crash> vtop 0xd000000000fe6ff8
VIRTUAL PHYSICAL
d000000000fe6ff8 1eba16ff8
PAGE DIRECTORY: c000000001190000
L4: c000000001190000 => c0000001fba80000
PMD: c0000001fba80000 => c0000001fba60000
PMD: c0000001fba80000 => fba607f0
PTE: fba607f0 => 7ae848000395
PAGE: 1eba10000
PTE PHYSICAL FLAGS
7ae848000395 1eba10000 (PRESENT|RW|COHERENT|DIRTY|ACCESSED)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c000000004482338 1eba10000 0 0 1 7ac0400000000
crash>
And the user-space vtop example works as expected:
crash> help -m | grep ptbl:
ptbl: 1001d208c80
crash> vtop 1001d208c80
VIRTUAL PHYSICAL
1001d208c80 1a1b18c80
PAGE DIRECTORY: c0000001ec8a9c00
L4: c0000001ec8a9c08 => c0000001f0a50000
PMD: c0000001f0a50008 => c0000001c7f20000
PMD: c0000001f0a50008 => c7f26900
PTE: c7f26900 => 686c48000393
PAGE: 1a1b10000
PTE PHYSICAL FLAGS
686c48000393 1a1b10000 (PRESENT|USER|COHERENT|DIRTY|ACCESSED)
VMA START END FLAGS FILE
c0000001eb523af0 1001ce20000 100206f0000 100073
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c000000003fe26b8 1a1b10000 c0000001e93f4c41 1001d20 1 6840400080068
crash>
Any ideas?
Thanks,
Dave
11 years, 4 months