Haren Myneni wrote:
Rachita Kothiyal wrote:
>On Thu, Feb 23, 2006 at 09:49:37AM -0500, Dave Anderson wrote:
>
>
>>Ok, then I guess I'll take that as a thumbs-up.
>>
>>Waiting on Rachita's go-ahead...
>>
>>
>
>Dave,
>
>After the application of the patch (posted by Haren)
>on crash-4.0-2.21, I am now able to open the dump using crash
>for analysis.
>
>The following may be unrelated to the present discussion, but
>it is an observation:
>
>When I do 'bt -a' I get the following error on one of the cpus:
>
>PID: 2871 TASK: c000000161d05800 CPU: 4 COMMAND: "klogd"
>bt: invalid kernel virtual address: ff807a50 type: "Regs NIP value"
>
>
Rachita,
As I mentioned before, this task should be running in user space.
You should notice the similar kind of stack trace even using GDB. Better
to give proper error message here.
Is ff807a50 typically a legitimate user-space stack address
in ppc64 user VM? You could probably run the address
through IN_TASK_VMA(), and if it is a valid user-space
stack address, just indicate that the process was running
in user-space.
Now I understand why you (ppc64) dump the register set
first, because all the other processor types would show
a stack trace emanating from user-space down into the
reception of the IP interrupt issued by the panicking
processor.
About your other issue: I could not reproduce it.
crash 4.0-2.21
Copyright (C) 2002, 2003, 2004, 2005, 2006 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005 Fujitsu Limited
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu"...
crash: pglist_data.node_mem_map structure member does not exist.
crash: certain memory-related commands will fail or display invalid data
KERNEL: /home/hbabu/2616-rc2-k1/vmlinux
DUMPFILE: /home/vmcore_2616_rc2_0207
CPUS: 2
DATE: Tue Feb 7 16:56:08 2006
UPTIME: 00:00:09
LOAD AVERAGE: 0.05, 0.24, 0.12
TASKS: 57
NODENAME: elm3a135
RELEASE: 2.6.16-rc2-kexec-k1
VERSION: #6 SMP Tue Feb 7 16:46:10 PST 2006
MACHINE: ppc64 (unknown Mhz)
MEMORY: 2.9 GB
PANIC: "SysRq : Trigger a crashdump"
PID: 11076
COMMAND: "kpanic"
TASK: c00000000bc6d800 [THREAD_INFO: c0000000ac504000]
CPU: 1
STATE: TASK_RUNNING (SYSRQ)
crash> bt
PID: 11076 TASK: c00000000bc6d800 CPU: 1 COMMAND: "kpanic"
R0: 0000000000000000 R1: c0000000ac507970 R2: c00000000077a4a0
R3: c0000000ac5079e0 R4: 0000000000000000 R5: 0000000000000000
R6: 756d700d0a657220 R7: 6120637261736864 R8: 0000000000000000
R9: c0000000007b0fa0 R10: 0000000000000000 R11: c0000000007b0fa8
R12: 8000000000001032 R13: c0000000005a5d80 R14: 0000000000000000
R15: 0000000000000000 R16: 00000000100bbf08 R17: 00000000100bbeb8
R18: 0000000010070000 R19: 0000000000000000 R20: 0000000010046720
R21: 000000000000001f R22: 00000000100040e8 R23: 0000000010004d74
R24: 8000000000009032 R25: 0000000000000000 R26: 0000000000000000
R27: 0000000000000063 R28: 0000000000000009 R29: 0000000000000000
R30: c0000000005e1560 R31: c0000000b96dd000
NIP: c0000000000777a8 MSR: 8000000000001032 OR3: c0000000ac7202f8
CTR: c000000000278b04 LR: c000000000278b18 XER: 0000000000000000
CCR: c0000000ac507b90 MQ: 0000000000000000 DAR: 0000000000000063
DSISR: 0000000000000009 Syscall Result: 0000000000000000
NIP [c0000000000777a8] .crash_kexec
LR [c000000000278b18] .sysrq_handle_crashdump
#0 [c0000000ac507970] .crash_kexec at c0000000000777d0
#1 [c0000000ac507b50] .sysrq_handle_crashdump at c000000000278b18
#2 [c0000000ac507bd0] .__handle_sysrq at c0000000002789c0
#3 [c0000000ac507c80] .write_sysrq_trigger at c000000000105478
#4 [c0000000ac507d00] .vfs_write at c0000000000b72ec
#5 [c0000000ac507d90] .sys_write at c0000000000b74c4
#6 [c0000000ac507e30] syscall_exit at c0000000000086f8
syscall [c01] exception frame:
R0: 0000000000000004 R1: 00000000ffd109d0 R2: 000000004001ee60
R3: 0000000000000001 R4: 000000001004f4a8 R5: 0000000000000002
R6: 000000001004f3a8 R7: 0000000000000011 R8: 000000001004f530
R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000000000 R13: 000000001004c9d8
NIP: 000000000ff691e8 MSR: 000000000200f032 OR3: 0000000000000001
CTR: 00000000100040ec LR: 000000001000432c XER: 0000000020000000
CCR: 0000000048008448 MQ: c00000000077a4a0 DAR: 00000000100040ec
DSISR: 0000000040000000 Syscall Result: 0000000000000000
crash> set -c 0
PID: 0
COMMAND: "swapper"
TASK: c0000000005a5050 (1 of 2) [THREAD_INFO: c000000000558000]
CPU: 0
STATE: TASK_RUNNING (ACTIVE)
crash> bt
PID: 0 TASK: c0000000005a5050 CPU: 0 COMMAND: "swapper"
R0: 0000000000000000 R1: c00000000055bd80 R2: c00000000077a4a0
R3: 0000000000000000 R4: c0000000005a5350 R5: 0000000000000002
R6: 0000000024004042 R7: 0000000000000000 R8: c00000000055ba00
R9: c0000000005a4e88 R10: 0000008000000000 R11: 00003fef00100649
R12: 0000000028004028 R13: c0000000005a5b80
NIP: c000000000018648 MSR: 8000000000009032 OR3: 0000000000000000
CTR: 0000000000000000 LR: c0000000000186b8 XER: 0000000020000000
CCR: 0000000044004042 MQ: c0000000005a5050 DAR: c0000000b780b780
DSISR: c0000000000186b8 Syscall Result: 0000000000000000
NIP [c000000000018648] .default_idle
#0 [c00000000055bd80] .default_idle at c0000000000186b8
#1 [c00000000055be00] .cpu_idle at c0000000000184f4
#2 [c00000000055be70] .rest_init at c0000000000092f4
#3 [c00000000055bef0] .start_kernel at c000000000502760
#4 [c00000000055bf90] .hmt_init at c000000000008574
crash> set -c 1
PID: 11076
COMMAND: "kpanic"
TASK: c00000000bc6d800 [THREAD_INFO: c0000000ac504000]
CPU: 1
STATE: TASK_RUNNING (SYSRQ)
crash> bt
PID: 11076 TASK: c00000000bc6d800 CPU: 1 COMMAND: "kpanic"
R0: 0000000000000000 R1: c0000000ac507970 R2: c00000000077a4a0
R3: c0000000ac5079e0 R4: 0000000000000000 R5: 0000000000000000
R6: 756d700d0a657220 R7: 6120637261736864 R8: 0000000000000000
R9: c0000000007b0fa0 R10: 0000000000000000 R11: c0000000007b0fa8
R12: 8000000000001032 R13: c0000000005a5d80 R14: 0000000000000000
R15: 0000000000000000 R16: 00000000100bbf08 R17: 00000000100bbeb8
R18: 0000000010070000 R19: 0000000000000000 R20: 0000000010046720
R21: 000000000000001f R22: 00000000100040e8 R23: 0000000010004d74
R24: 8000000000009032 R25: 0000000000000000 R26: 0000000000000000
R27: 0000000000000063 R28: 0000000000000009 R29: 0000000000000000
R30: c0000000005e1560 R31: c0000000b96dd000
NIP: c0000000000777a8 MSR: 8000000000001032 OR3: c0000000ac7202f8
CTR: c000000000278b04 LR: c000000000278b18 XER: 0000000000000000
CCR: c0000000ac507b90 MQ: 0000000000000000 DAR: 0000000000000063
DSISR: 0000000000000009 Syscall Result: 0000000000000000
NIP [c0000000000777a8] .crash_kexec
LR [c000000000278b18] .sysrq_handle_crashdump
#0 [c0000000ac507970] .crash_kexec at c0000000000777d0
#1 [c0000000ac507b50] .sysrq_handle_crashdump at c000000000278b18
#2 [c0000000ac507bd0] .__handle_sysrq at c0000000002789c0
#3 [c0000000ac507c80] .write_sysrq_trigger at c000000000105478
#4 [c0000000ac507d00] .vfs_write at c0000000000b72ec
#5 [c0000000ac507d90] .sys_write at c0000000000b74c4
#6 [c0000000ac507e30] syscall_exit at c0000000000086f8
syscall [c01] exception frame:
R0: 0000000000000004 R1: 00000000ffd109d0 R2: 000000004001ee60
R3: 0000000000000001 R4: 000000001004f4a8 R5: 0000000000000002
R6: 000000001004f3a8 R7: 0000000000000011 R8: 000000001004f530
R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000000000 R13: 000000001004c9d8
NIP: 000000000ff691e8 MSR: 000000000200f032 OR3: 0000000000000001
CTR: 00000000100040ec LR: 000000001000432c XER: 0000000020000000
CCR: 0000000048008448 MQ: c00000000077a4a0 DAR: 00000000100040ec
DSISR: 0000000040000000 Syscall Result: 0000000000000000
crash>
Probably, this issue is showing up on your system (has 8 CPUS) since my
system is having only 2 CPUs. We need to investigate.
That's all I could think of as well. Rachita also didn't mention
whether he could do "set <task|pid>" of that same task, and then
get a backtrace? But a crash-gdb backtrace would be helpful.
Dave, I tested very few commands on PPC64 vmcore. Where as Rachita is
doing more testing. We might see some bugs which I have not encountered.
We will get back to you with patches as we find bugs.
That's understood and not a problem -- especially on kernels
that are beyond the RHEL4 era. Do you want me to go ahead
and put out a new release with your paca fix?
Dave