Dave Anderson wrote:
Haren Myneni wrote:
> Rachita Kothiyal wrote:
>
> >On Thu, Feb 23, 2006 at 09:49:37AM -0500, Dave Anderson wrote:
> >
> >
> >>Ok, then I guess I'll take that as a thumbs-up.
> >>
> >>Waiting on Rachita's go-ahead...
> >>
> >>
> >
> >Dave,
> >
> >After the application of the patch (posted by Haren)
> >on crash-4.0-2.21, I am now able to open the dump using crash
> >for analysis.
> >
> >The following may be unrelated to the present discussion, but
> >it is an observation:
> >
> >When I do 'bt -a' I get the following error on one of the cpus:
> >
> >PID: 2871 TASK: c000000161d05800 CPU: 4 COMMAND: "klogd"
> >bt: invalid kernel virtual address: ff807a50 type: "Regs NIP value"
> >
> >
> Rachita,
> As I mentioned before, this task should be running in user space.
> You should notice the similar kind of stack trace even using GDB. Better
> to give proper error message here.
>
>
Is ff807a50 typically a legitimate user-space stack address
in ppc64 user VM? You could probably run the address
through IN_TASK_VMA(), and if it is a valid user-space
stack address, just indicate that the process was running
in user-space.
ff807a50 is in user space. Yes, the kernel address on PPC64 starts at
c000000000000000. Not only we should print an message says that "running
is user space", but also to display traces for other active traces. It
is a bug too. I will send the fix ASAP.
Now I understand why you (ppc64) dump the register set
first, because all the other processor types would show
a stack trace emanating from user-space down into the
reception of the IP interrupt issued by the panicking
processor.
Yes, displaying regs is already part of stack trace on other archs.
>
> About your other issue: I could not reproduce it.
>
> crash 4.0-2.21
> Copyright (C) 2002, 2003, 2004, 2005, 2006 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005 Fujitsu Limited
> Copyright (C) 2005 NEC Corporation
> Copyright (C) 1999, 2002 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public
> License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for
> details.
>
> GNU gdb 6.1
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "powerpc64-unknown-linux-gnu"...
>
> crash: pglist_data.node_mem_map structure member does not exist.
> crash: certain memory-related commands will fail or display invalid data
>
> KERNEL: /home/hbabu/2616-rc2-k1/vmlinux
> DUMPFILE: /home/vmcore_2616_rc2_0207
> CPUS: 2
> DATE: Tue Feb 7 16:56:08 2006
> UPTIME: 00:00:09
> LOAD AVERAGE: 0.05, 0.24, 0.12
> TASKS: 57
> NODENAME: elm3a135
> RELEASE: 2.6.16-rc2-kexec-k1
> VERSION: #6 SMP Tue Feb 7 16:46:10 PST 2006
> MACHINE: ppc64 (unknown Mhz)
> MEMORY: 2.9 GB
> PANIC: "SysRq : Trigger a crashdump"
> PID: 11076
> COMMAND: "kpanic"
> TASK: c00000000bc6d800 [THREAD_INFO: c0000000ac504000]
> CPU: 1
> STATE: TASK_RUNNING (SYSRQ)
>
> crash> bt
> PID: 11076 TASK: c00000000bc6d800 CPU: 1 COMMAND: "kpanic"
>
> R0: 0000000000000000 R1: c0000000ac507970 R2: c00000000077a4a0
> R3: c0000000ac5079e0 R4: 0000000000000000 R5: 0000000000000000
> R6: 756d700d0a657220 R7: 6120637261736864 R8: 0000000000000000
> R9: c0000000007b0fa0 R10: 0000000000000000 R11: c0000000007b0fa8
> R12: 8000000000001032 R13: c0000000005a5d80 R14: 0000000000000000
> R15: 0000000000000000 R16: 00000000100bbf08 R17: 00000000100bbeb8
> R18: 0000000010070000 R19: 0000000000000000 R20: 0000000010046720
> R21: 000000000000001f R22: 00000000100040e8 R23: 0000000010004d74
> R24: 8000000000009032 R25: 0000000000000000 R26: 0000000000000000
> R27: 0000000000000063 R28: 0000000000000009 R29: 0000000000000000
> R30: c0000000005e1560 R31: c0000000b96dd000
> NIP: c0000000000777a8 MSR: 8000000000001032 OR3: c0000000ac7202f8
> CTR: c000000000278b04 LR: c000000000278b18 XER: 0000000000000000
> CCR: c0000000ac507b90 MQ: 0000000000000000 DAR: 0000000000000063
> DSISR: 0000000000000009 Syscall Result: 0000000000000000
> NIP [c0000000000777a8] .crash_kexec
> LR [c000000000278b18] .sysrq_handle_crashdump
>
> #0 [c0000000ac507970] .crash_kexec at c0000000000777d0
> #1 [c0000000ac507b50] .sysrq_handle_crashdump at c000000000278b18
> #2 [c0000000ac507bd0] .__handle_sysrq at c0000000002789c0
> #3 [c0000000ac507c80] .write_sysrq_trigger at c000000000105478
> #4 [c0000000ac507d00] .vfs_write at c0000000000b72ec
> #5 [c0000000ac507d90] .sys_write at c0000000000b74c4
> #6 [c0000000ac507e30] syscall_exit at c0000000000086f8
> syscall [c01] exception frame:
> R0: 0000000000000004 R1: 00000000ffd109d0 R2: 000000004001ee60
> R3: 0000000000000001 R4: 000000001004f4a8 R5: 0000000000000002
> R6: 000000001004f3a8 R7: 0000000000000011 R8: 000000001004f530
> R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000
> R12: 0000000000000000 R13: 000000001004c9d8
> NIP: 000000000ff691e8 MSR: 000000000200f032 OR3: 0000000000000001
> CTR: 00000000100040ec LR: 000000001000432c XER: 0000000020000000
> CCR: 0000000048008448 MQ: c00000000077a4a0 DAR: 00000000100040ec
> DSISR: 0000000040000000 Syscall Result: 0000000000000000
>
> crash> set -c 0
> PID: 0
> COMMAND: "swapper"
> TASK: c0000000005a5050 (1 of 2) [THREAD_INFO: c000000000558000]
> CPU: 0
> STATE: TASK_RUNNING (ACTIVE)
> crash> bt
> PID: 0 TASK: c0000000005a5050 CPU: 0 COMMAND: "swapper"
>
> R0: 0000000000000000 R1: c00000000055bd80 R2: c00000000077a4a0
> R3: 0000000000000000 R4: c0000000005a5350 R5: 0000000000000002
> R6: 0000000024004042 R7: 0000000000000000 R8: c00000000055ba00
> R9: c0000000005a4e88 R10: 0000008000000000 R11: 00003fef00100649
> R12: 0000000028004028 R13: c0000000005a5b80
> NIP: c000000000018648 MSR: 8000000000009032 OR3: 0000000000000000
> CTR: 0000000000000000 LR: c0000000000186b8 XER: 0000000020000000
> CCR: 0000000044004042 MQ: c0000000005a5050 DAR: c0000000b780b780
> DSISR: c0000000000186b8 Syscall Result: 0000000000000000
> NIP [c000000000018648] .default_idle
>
> #0 [c00000000055bd80] .default_idle at c0000000000186b8
> #1 [c00000000055be00] .cpu_idle at c0000000000184f4
> #2 [c00000000055be70] .rest_init at c0000000000092f4
> #3 [c00000000055bef0] .start_kernel at c000000000502760
> #4 [c00000000055bf90] .hmt_init at c000000000008574
> crash> set -c 1
> PID: 11076
> COMMAND: "kpanic"
> TASK: c00000000bc6d800 [THREAD_INFO: c0000000ac504000]
> CPU: 1
> STATE: TASK_RUNNING (SYSRQ)
> crash> bt
> PID: 11076 TASK: c00000000bc6d800 CPU: 1 COMMAND: "kpanic"
>
> R0: 0000000000000000 R1: c0000000ac507970 R2: c00000000077a4a0
> R3: c0000000ac5079e0 R4: 0000000000000000 R5: 0000000000000000
> R6: 756d700d0a657220 R7: 6120637261736864 R8: 0000000000000000
> R9: c0000000007b0fa0 R10: 0000000000000000 R11: c0000000007b0fa8
> R12: 8000000000001032 R13: c0000000005a5d80 R14: 0000000000000000
> R15: 0000000000000000 R16: 00000000100bbf08 R17: 00000000100bbeb8
> R18: 0000000010070000 R19: 0000000000000000 R20: 0000000010046720
> R21: 000000000000001f R22: 00000000100040e8 R23: 0000000010004d74
> R24: 8000000000009032 R25: 0000000000000000 R26: 0000000000000000
> R27: 0000000000000063 R28: 0000000000000009 R29: 0000000000000000
> R30: c0000000005e1560 R31: c0000000b96dd000
> NIP: c0000000000777a8 MSR: 8000000000001032 OR3: c0000000ac7202f8
> CTR: c000000000278b04 LR: c000000000278b18 XER: 0000000000000000
> CCR: c0000000ac507b90 MQ: 0000000000000000 DAR: 0000000000000063
> DSISR: 0000000000000009 Syscall Result: 0000000000000000
> NIP [c0000000000777a8] .crash_kexec
> LR [c000000000278b18] .sysrq_handle_crashdump
>
> #0 [c0000000ac507970] .crash_kexec at c0000000000777d0
> #1 [c0000000ac507b50] .sysrq_handle_crashdump at c000000000278b18
> #2 [c0000000ac507bd0] .__handle_sysrq at c0000000002789c0
> #3 [c0000000ac507c80] .write_sysrq_trigger at c000000000105478
> #4 [c0000000ac507d00] .vfs_write at c0000000000b72ec
> #5 [c0000000ac507d90] .sys_write at c0000000000b74c4
> #6 [c0000000ac507e30] syscall_exit at c0000000000086f8
> syscall [c01] exception frame:
> R0: 0000000000000004 R1: 00000000ffd109d0 R2: 000000004001ee60
> R3: 0000000000000001 R4: 000000001004f4a8 R5: 0000000000000002
> R6: 000000001004f3a8 R7: 0000000000000011 R8: 000000001004f530
> R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000
> R12: 0000000000000000 R13: 000000001004c9d8
> NIP: 000000000ff691e8 MSR: 000000000200f032 OR3: 0000000000000001
> CTR: 00000000100040ec LR: 000000001000432c XER: 0000000020000000
> CCR: 0000000048008448 MQ: c00000000077a4a0 DAR: 00000000100040ec
> DSISR: 0000000040000000 Syscall Result: 0000000000000000
>
> crash>
>
> Probably, this issue is showing up on your system (has 8 CPUS) since my
> system is having only 2 CPUs. We need to investigate.
>
>
That's all I could think of as well. Rachita also didn't mention
whether he could do "set <task|pid>" of that same task, and then
get a backtrace? But a crash-gdb backtrace would be helpful.
It looks like her system has 16 CPUs (I believe with SMT). I also
checked whether enabling SMT will cause the problem on
paca[cpu#].dataoffset. Based on my information so far, paca[] will be
created even for SMT threads too.
>
> Dave, I tested very few commands on PPC64 vmcore. Where as Rachita is
> doing more testing. We might see some bugs which I have not encountered.
> We will get back to you with patches as we find bugs.
>
That's understood and not a problem -- especially on kernels
that are beyond the RHEL4 era. Do you want me to go ahead
and put out a new release with your paca fix?
Sure, if you have some other fixes or on other archs. Otherwise, can we
wait for early next week. I am wondering what is causing for Rachita's
issue. Is it related to the same paca.dataoffset patch? just want to
make sure.
BTW, I ran the crash tool on RHEL4 vmcore (not the recent RHEL4 update
version) to see whether I am breaking backward compatibility. Small fix.
I somehow overlooked. Sorry. Probably, that might be the reason I saved
one RHEL4 vmcore and the corresponding vmlinux.debug.
Thanks
Haren
Dave
------------------------------------------------------------------------
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility