- Kdump ELF vmcores contain NT_PRSTATUS notes for online cpus only, so
if cpus have been offlined prior to a crash, there will be fewer
notes than the number of cpus in the system, and therefore there will
not be a one-to-one correlation between each cpu and its associated
NT_PRSTATUS note. That causes backtrace failures for architectures
like ppc64 that depend upon the contents of the NT_PRSTATUS notes for
gathering the starting stack location.
(chandru(a)in.ibm.com, anderson(a)redhat.com)
- Fix and enhancement for the "dev" command. When the command was run
against 2.6.26 or later kernels, it would fail with the error message
"dev: invalid structure member offset: char_device_struct_fops".
Additionally, even when the command did work, more often than not it
would fail to determine the file_operations structure associated with
the block or character device, and erroneously display "(none)" or
"(unused)". This patch makes a more comprehensive search for the
file_operations structure, and instead of just displaying its address
and symbolic translation, it will display the address of the data
structure that contains the pointer to the file_operations structure,
along with the symbolic translation of the file_operations structure.
For character devices, the containing structure is a "cdev", and for
block devices the containing structure is a "gendisk". The command
output adds new CDEV and GENDISK columns, and under the OPERATIONS
column is the symbolic translation of its file_operations structure.
(anderson(a)redhat.com, bob.montgomery(a)hp.com)
- Fix for a potential segmentation violation when running "foreach bt"
on a very active live system with many processes starting and ending.
Without the patch, a segmentation violation could occur when a "bt"
was attempted on a task that had become non-existent. This would
happen on x86_64 or ppc64 machines, and was due to the usage of a
kernel stack pointer taken from a stale/invalid task_struct. The
command will now recognize the bad stack pointer and display the
error message "bt: task no longer exists" or "bt: invalid/stale
stack pointer for this task: <address>".
(anderson(a)redhat.com)
- Fix to correctly read LKCD Version 8 and later x86 dumpfile headers.
(talk90091e(a)gmail.com)
- If a kdump NMI issued to a non-crashing x86_64 cpu was received while
running in schedule(), after having set the next task as "current" in
the cpu's runqueue, but prior to changing the kernel stack to that of
the next task, then a backtrace would fail to make the transition
from the NMI exception stack back to the process stack, with the
error message "bt: cannot transition from exception stack to current
process stack". This patch will report inconsistencies found between
a task marked as the current task in a cpu's runqueue, and the task
found in the per-cpu x8664_pda "pcurrent" field (2.6.29 and earlier)
or the per-cpu "current_task" variable (2.6.30 and later). If it can
be safely determined that the runqueue setting (used by default) is
premature, then the crash utility's internal per-cpu active task will
be changed to be the task indicated by the appropriate architecture
specific value. Also, a new "set -a <task>" option has been added
to manually set a task to be the "active" task on its cpu.
(anderson(a)redhat.com)
- Fix for x86_64 "bt" command when transitioning from the IRQ stack
back to the process stack on 2.6.29 and later kernels. Without the
patch, the interrupt exception frame address on the process stack
would be incorrectly determined, and its display would typically be
preceded by "[exception RIP: unknown or invalid address]", and the
backtrace would fail from that point on.
(anderson(a)redhat.com)
- Enhancement to the "runq" command to show the current task in each
cpu's runqueue, plus a few formatting changes to make the output
easier to understand.
(anderson(a)redhat.com)
- Fix for a memory leak when running on live systems, due to the
repetitive reallocation of the internal array of active tasks.
(anderson(a)redhat.com)
- Fix for usage with vmlinux debuginfo files using Dwarf 3 format,
for example, the Fedora 2.6.31-0.24.rc0.git18.fc12 kernel. Without
the patch, the crash session fails during initialization with the
error message: "Dwarf Error: wrong version in compilation unit header
(is 3, should be 2) [in module <path-to>/vmlinux]", followed by
the erroneous message "crash: <path-to>/vmlinux: no debugging
data available". The patch simply accepts the Dwarf 3 header, and
the embedded gdb-6.1 version still appears to work with the updated
vmlinux debuginfo file format.
(anderson(a)redhat.com)
- Fix for faulty invocation failure when a System.map file is used as
an argument with a compressed diskdump or compressed kdump dumpfile.
If the System.map argument appears after the vmcore file on the
command line, as in: "crash vmcore System.map vmlinux", the crash
session fails immediately with the error message: "crash: vmcore:
initialization failed". With the patch, the arguments may be entered
in any order.
(anderson(a)redhat.com)
- Fix for a potential segmentation violation during invocation if a
vmcore file, a System.map file, and a non-matching vmlinux file are
used as command line arguments. The problem is that whenever a
System.map file is used, it is presumed that the user knows what he
is doing, and that the vmlinux file is not the same as the kernel
that generated the vmcore; therefore the vmlinux/vmcore matching and
verification routines are not performed. However, if the kernel data
structures in the non-matching vmlinux vary widely enough from the
kernel that generated the vmcore, all manners of bogus data may be
read and consumed. The reported segmentation violation occurred when
using a vmcore created from a "stock" Red Hat kernel with a vmlinux
file from a Red Hat "debug" kernel, where the kernel data structures
are significantly different. The patch adds a several new defensive
mechanisms, and displays additional warning messages, when invalid or
questionable data is read, and as a result the crash session will fail
in a more reasonable manner.
(anderson(a)redhat.com)
- Adjusted several virtual and physical memory address definitions for
2.6.31 x86_64 kernels: MAX_PHYSMEM_BITS, VMALLOC_START, VMALLOC_END,
VMEMMAP_VADDR, VMEMMAP_END, MODULES_VADDR and MODULES_END. Without
the patch, when run against CONFIG_SPARSEMEM_VMEMMAP 2.6.31 kernels,
the "kmem -i" option would hang, and when run against CONFIG_SLUB and
CONFIG_SPARSEMEM_VMEMMAP 2.6.31 kernels, the "kmem -s" option would
report numerous errors indicating "kmem: read error: kernel virtual
address: <address> type: page inuse", where the <address> was
a legitimate virtual-memmap page structure address.
(anderson(a)redhat.com)
- Improvement for CONFIG_SLUB "kmem -s" or "kmem -S" options when an
invalid slab page link address is encountered. Without the patch,
the commands fail with a generic "invalid kernel virtual address"
read error message, and "kmem -s" would not display any previously
collected statistics. With the patch, the error message displays
the slab cache name, the list type, and the invalid pointer found,
for example, "kmem: dentry: partial list: page.lru.next: 100100".
(anderson(a)redhat.com)
Download from:
http://people.redhat.com/anderson