On Tue, 2006-04-25 at 14:42 -0400, Dave Anderson wrote:
Badari Pulavarty wrote:
> Hi,
>
> I get following crash warnings on x86-64 machine. Wondering why ?
> And also, its not showing stacks correctly.
>
> Thanks,
> Badari
>
> # ./crash /var/log/dump/2006-04-24-08:02/vmcore /usr/src/linux/vmlinux
>
> crash 4.0-2.23
> Copyright (C) 2002, 2003, 2004, 2005, 2006 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005 Fujitsu Limited
> Copyright (C) 2005 NEC Corporation
> Copyright (C) 1999, 2002 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public
> License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for
> details.
>
> GNU gdb 6.1
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "x86_64-unknown-linux-gnu"...
>
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> WARNING: possibly bogus exception frame
> KERNEL: /usr/src/linux/vmlinux
> DUMPFILE: /var/log/dump/2006-04-24-08:02/vmcore
> CPUS: 2
> DATE: Mon Apr 24 08:02:03 2006
> UPTIME: 00:06:31
> LOAD AVERAGE: 0.00, 0.00, 0.00
> TASKS: 63
> NODENAME: elm3a242
> RELEASE: 2.6.16-20-smp
> VERSION: #1 SMP Mon Apr 10 04:51:13 UTC 2006
> MACHINE: x86_64 (3000 Mhz)
> MEMORY: 4.6 GB
> PANIC: "SysRq : Trigger a crashdump"
> PID: 0
> COMMAND: "swapper"
> TASK: ffffffff80335340 (1 of 2) [THREAD_INFO:
> ffffffff8045c000]
> CPU: 0
> STATE: TASK_RUNNING (ACTIVE)
>
> crash> bt
> PID: 0 TASK: ffffffff80335340 CPU: 0 COMMAND: "swapper"
> #0 [ffffffff8045dee8] schedule at ffffffff802cf6fa
It's hard to debug this from here, but...
Two things look strange, (1) it's not finding the proper starting
point for the panicking (?) idle thread -- and possibly not even finding
the correct panic task, and (2) the "possibly bogus exception
frame" messages are due to the x86_64.c x86_64_eframe_verify()
function finding something irregular in the exception frames (the
pt_regs) of several processes while it made a search of all
possible processes for the panic task.
I would guess that if you do a "foreach bt", you will see the
"possibly bogus" messages associated with the user-space
exception frames of all user-space generated processes
(i.e. not kernel threads). It would be interesting to see what
those frames look like, and why they are considered strange,
probably a new cs or ss value that's never been used before?
As far as the determination of the panic task, I'm presuming
that this was generated from a kdump dumpfile. The netdump.c
get_netdump_panic_task() function, which has a bunch of
kdump-specific code, is failing to find the panic task from the
data in the ELF header notes. Running "crash -d1 ..." will indicate
how crash is trying to determine the panic task. I don't know
whether the idle task was even the one that took the sysrq,
or whether it just defaulted to that task because it couldn't find
any other likely suspects. You'll have to debug it from your
end, starting from get_netdump_panic_task().
Dave
Dave,
I added little debug and found that x86_64_eframe_verify() returns
FALSE to due to !(rflags & 0x2) (rflags = 0x200 in this dump).
Given that "crash" runs fine on live machine, I am going to assume
that its a problem with kdump format for now :(
Thanks,
Badari