[crash-utility] [lkcd-devel] Patch to add LKCD vmcore validation feature
by Vitaly Kuzmichev
Hello,
Attached is the patch to add separate tool for validating LKCD netdumps
and blockdumps.
We are planning to add this feature in our fork of crash-3.10.
Our customers requested this feature, but we have found that the 'crash'
does not print any warnings when someone tries to load incomplete
vmcore. They need a simple way to verify if core file generated from
LKCD is complete.
--
Best regards,
Vitaly Kuzmichev, Software Engineer,
MontaVista Software, LLC.
14 years, 5 months
Re: [Crash-utility] crash can't handle virsh dump file
by Dave Anderson
----- "Gui Jianfeng" <guijianfeng(a)cn.fujitsu.com> wrote:
> Paolo Bonzini wrote:
> > On 05/28/2010 11:19 AM, Gui Jianfeng wrote:
> >> Gui Jianfeng wrote:
> >>> Hi all,
> >>>
> >>> I made use of "virsh dump" to generate a dumpfile, but crash seems fails at initializing time.
> >>> I decode the dumpfile and found there's a "block" header section, but seems crash doesn't support
> >>> such section so it failed. Am i missing something?
> >>
> >> Any one can help? How can i make use of crash checking the dumpfile?
> >
> > Something like the attached should do it (untested because I don't know
> > where the crash upstream repo is, though I have likely asked this already).
>
> Thanks for sharing, i'll try it. :)
>
> Thanks,
> Gui
>
> >
> > Thanks,
> >
> > Paolo
> >
I missed all the excitement by taking today off. Can you let
us know how Paolo's patch worked for you?
Thanks,
Dave
14 years, 7 months
crash can't handle virsh dump file
by Gui Jianfeng
Hi all,
I made use of "virsh dump" to generate a dumpfile, but crash seems fails at initializing time.
I decode the dumpfile and found there's a "block" header section, but seems crash doesn't support
such section so it failed. Am i missing something?
--
Regards
Gui Jianfeng
14 years, 7 months
Re: [Crash-utility] crash can't handle virsh dump file
by Gui Jianfeng
Paolo Bonzini wrote:
> On 05/28/2010 11:19 AM, Gui Jianfeng wrote:
>> Gui Jianfeng wrote:
>>> Hi all,
>>>
>>> I made use of "virsh dump" to generate a dumpfile, but crash seems
>>> fails at initializing time.
>>> I decode the dumpfile and found there's a "block" header section, but
>>> seems crash doesn't support
>>> such section so it failed. Am i missing something?
>>
>> Any one can help? How can i make use of crash checking the dumpfile?
>
> Something like the attached should do it (untested because I don't know
> where the crash upstream repo is, though I have likely asked this already).
Thanks for sharing, i'll try it. :)
Thanks,
Gui
>
> Thanks,
>
> Paolo
>
14 years, 7 months
Re: [Crash-utility] Why are there two ways of getting register values for active tasks?
by Dave Anderson
----- "Daisuke HATAYAMA" <d.hatayama(a)jp.fujitsu.com> wrote:
> Hi Dave.
>
> Well, I have still a question: Does kdump-compressed format contain
> register values for CPUs?
>
> I've looked into part of makedumpfile reading ELF but found out that
> yet. It appears to me that makedumpfile ignores all note info except
> for vmcoreinfo's location.
That's correct, there are no per-cpu register values. From the crash
utility's perspective, all it gets from the makedumpfile-generated
compressed dumpfile is the diskdump_header and kdump_sub_header:
struct disk_dump_header {
char signature[SIG_LEN]; /* = "DISKDUMP" */
int header_version; /* Dump header version */
struct new_utsname utsname; /* copy of system_utsname */
struct timeval timestamp; /* Time stamp */
unsigned int status; /* Above flags */
int block_size; /* Size of a block in byte */
int sub_hdr_size; /* Size of arch dependent
header in blocks */
unsigned int bitmap_blocks; /* Size of Memory bitmap in
block */
unsigned int max_mapnr; /* = max_mapnr */
unsigned int total_ram_blocks;/* Number of blocks should be
written */
unsigned int device_blocks; /* Number of total blocks in
* the dump device */
unsigned int written_blocks; /* Number of written blocks */
unsigned int current_cpu; /* CPU# which handles dump */
int nr_cpus; /* Number of CPUs */
struct task_struct *tasks[0];
};
struct kdump_sub_header {
unsigned long phys_base;
int dump_level; /* header_version 1 and later */
int split; /* header_version 2 and later */
unsigned long start_pfn; /* header_version 2 and later */
unsigned long end_pfn; /* header_version 2 and later */
};
Dave
14 years, 7 months
[ANNOUNCE] crash version 5.0.4 is available
by Dave Anderson
- Fix for the x86 "bt" command when a newly-forked task's resumption
EIP address value is set to the "ret_from_fork" entry point by
copy_thread(). Without the patch, the backtrace attempt would
display "bt: cannot resolve stack trace", dump the text symbols on
the stack, and a possible USER-MODE exception frame.
(anderson(a)redhat.com)
- Fix for the x86 "bt" command if the kdump-generated NMI interrupts
a task running in kernel space at a point in the system_call entry
point code prior to the call to a system call function. Without the
patch, the backtrace attempt would display "bt: cannot resolve stack
trace", dump the text symbols on the kernel stack, and display any
"KERNEL-MODE" exception frames followed by a possible "USER-MODE"
exception frame.
(anderson(a)redhat.com)
- Fix for the "bt" command on 2.6.29 and later x86_64 kernels to
recognize and display exception frames generated by exceptions that
do not result in a stack switch, such as general protection faults.
Without the patch, the backtrace would potentially not display the
exception frames because the "error_exit" assembly-code label in
entry_64.S was replaced by the error_exit() entry point.
(anderson(a)redhat.com)
- The kernel patch for ppc64 CONFIG_SPARSEMEM_VMEMMAP kernels that
stores vmemmap page mapping information so that the crash utility
is able to translate vmemmap'd kernel virtual addresses has been
updated. The crash utility patch that was (preemptively) applied
in 5.0.2 for the initial kernel patch needs this update.
(anderson(a)redhat.com)
- Fix the error message for the "dev -p" comand when run on 2.6.26
or later kernels, which no longer have the global "pci_devices"
list head. The patch changes the message to show "dev: -p option
not supported or applicable on this architecture or kernel", instead
of the misleading "dev: no PCI devices found on this system" message.
(anderson(a)redhat.com)
- If a cpu in an s390 or s390x dumpfile is offline, and the "bt"
command receives a backtrace request for the "swapper" task on that
cpu, the command will display "CPU offline".
(holzheu(a)linux.vnet.ibm.com)
- Fix for 2.6.34 and later x86_64 kernels which generate per-cpu
symbols of type 'd' or type 'D' instead of type 'V'. Without the
patch, an x86_64 crash session fails during initialization with the
error message "crash: cannot determine idle task addresses from
init_tasks[] or runqueues[]", followed by "crash: cannot resolve
init_task_union". It is unclear why some kernel builds result in
only type 'V' per-cpu symbols, whereas others result in in type 'd'
and 'D', so the patch accepts both.
(Kashyap.Desai(a)lsi.com)
- Fix to prevent a segmentation violation during initialization in
the x86_64_get_active_set() function by verifying that the array
of current tasks in machdep->machspec->current[] has actually been
allocated. Theoretically it should never be NULL, but in the
unlikely event that x86_64_per_cpu_init() fails to find the required
per-cpu symbols, it will return without allocating the array.
(anderson(a)redhat.com)
- Fix to support KVM dumpfiles created with "virsh dump" that create
"cpu" header sections using a QEMU CPU_SAVE_VERSION version greater
than the supported version of 9. Without the patch, the crash
session fails during initialization with the error message "crash:
qemu-load.c:501: cpu_init_load_64: Assertion `version_id >= 4 &&
version_id <= 9' failed." The patch now accepts CPU_VERSION_VERSION
values up to 12.
(anderson(a)redhat.com)
- Fix for x86_64 KVM dumpfiles created with "virsh dump" whose kernels
have a "_text" virtual address higher than __START_KERNEL_map.
Without the patch, the physical base address calculation fails,
making the dumpfile unusable.
(anderson(a)redhat.com, pbonzini(a)redhat.com)
- Implemented a new "map" command that is seen only when running with
KVM guest dumpfiles created with "virsh dump". The layout of this
dumpfile format does not allow the access of system memory in a
"random-access" manner. Therefore, during session initialization, a
potentially time-consuming dumpfile scan procedure is required to
create a physical-memory-to-file-offset memory map for use during the
session. The new "map" command allows the user to either append the
memory map to the end of the dumpfile, or to create a discrete memory
map file. In either case, the dumpfile scan will not be required
during subsequent sessions. The command's help page may be seen by
entering "crash -h map".
(anderson(a)redhat.com)
- Fix for an incorrect calculation of the physical base address of a
fully-virtualized x86_64 RHEL6 guest kernel running on a RHEL5 Xen
host. Without the patch, the session failed during initialization
with the error messages "crash: cannot determine base kernel version"
and "crash: vmlinux and vmcore do not match!"
(anderson(a)redhat.com)
- Fix for the "bt" command on inactive (blocked) tasks on 2.6.33 and
later x86_64 kernels, which have the "thread_return" symbol removed
from the embedded "switch_to" macro. Without the patch, when run on
blocked tasks, the command would fail with the error message "bt:
cannot resolve thread_return".
(anderson(a)redhat.com)
- Fix for the "bt" command on 2.6.33 and later x86 kernels, which moved
the "system_call" assembly function to the .kprobes.text section.
Without the patch, the command would typically display two invalid
stack frames, both indicating they were in "ia32_sysenter_target".
(anderson(a)redhat.com)
- Fix for a segmentation violation caused by the "extensions/trace.c"
extension module, as seen when running the "trace show -c <cpu>"
command from that module.
(laijs(a)cn.fujitsu.com)
- Implemented a "trace dump -t" command for the "extensions/trace.c"
extension module. The module already has a "trace show" command
to show what events had happened before the system crashed, but it
is just 1000 lines of code and it is not as complete as the related
"trace-cmd report" command from trace-cmd(1). The new extension
module command generates a "trace.dat" file, which in turn can be
used by the "trace-cmd report" option of trace-cmd(1). So this
patch improves both the crash trace command and the trace-cmd(1)
as well, which can now handle ftrace even if the kernel crashed.
(laijs(a)cn.fujitsu.com)
Download from: http://people.redhat.com/anderson
14 years, 7 months
Re: [Crash-utility] backtrace failure on x86_64 and x86 in 2.6.33/34 kernels due to "thread_return" removal
by Dave Anderson
----- "Masami Hiramatsu" <mhiramat(a)redhat.com> wrote:
> Hi Dave,
>
> Are these issues only for crash tools? or it occurs in kernel func-backtrace too?
> And how would you fix it?
They are crash issues only, in having to deal with the shifting sands of
the underlying kernel.
In both cases, the problem has always been that assembly-code labels are
stored as text symbols, which is confusing to the backtrace code. And in
both cases, the new kernel changes interfered with the work-arounds put in
place by the crash utility to handle them.
In any case, it's not a big deal as it's fixable in the crash utility.
Thanks,
Dave
14 years, 7 months
Re: [Crash-utility] Why are there two ways of getting register values for active tasks?
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> ----- "Daisuke HATAYAMA" <d.hatayama(a)jp.fujitsu.com> wrote:
>
> > Hi, all.
> >
> > I have a question on the implementation of
> get_netdump_regs_x86_64().
> >
> > Currently, in order to get register values for active tasks, only
> > panic task makes use of note information. On the other hand, other
> > active tasks search stack frame for registers saved at nmi
> > switch. However, crash dump contains the note information for every
> > CPUs, so I think it uncessary to search stack frame.
>
> Originally it was done that way because the code was written for
> netdump-generated dumpfiles, which only generated note information
> for the panic task. But if I'm not mistaken, given that recent
> kernels do not store debuginfo data for the user_regs_struct, it
> almost always falls through into x86_64_get_stack_frame().
I take that back -- when it's not in the debuginfo, it hardwires
the user_regs_struct data structure information.
That being the case, I don't remember why it is restricted to the
panic task, but it had to have been put in place based upon actual
dumpfiles where it didn't work correctly for a non-panic task.
If I get the time, I'll remove the restriction and run it on
my set of stashed dumpfile examples to see if I can be more
specific.
Anyway, good question -- sorry for such a weak answer...
Dave
14 years, 7 months
Re: [Crash-utility] backtrace failure on x86_64 and x86 in 2.6.33/34 kernels due to "thread_return" removal
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> I've got a fix for x86_64 -- which have always depended on the existence of
> the "thread_return" label. But I note that x86 backtraces also are not working,
> which I'll take a look at today.
As it turns out, the x86 backtrace failures in 2.6.33/34 are caused by a different
kprobes-related commit, which moved the system_call assembly function to the
.kprobes.text section:
commit a00e817f42663941ea0aa5f85a9d1c4f8b212839
Author: Masami Hiramatsu <mhiramat(a)redhat.com>
Date: Tue Sep 8 12:47:55 2009 -0400
kprobes/x86-32: Move irq-exit functions to kprobes section
Move irq-exit functions to .kprobes.text section to protect against
kprobes recursion.
When I ran kprobe stress test on x86-32, I found below symbols
cause unrecoverable recursive probing:
ret_from_exception
ret_from_intr
check_userspace
restore_all
restore_all_notrace
restore_nocheck
irq_return
And also, I found some interrupt/exception entry points that
cause similar problems.
This patch moves those symbols (including their container functions)
to .kprobes.text section to prevent any kprobes probing.
Signed-off-by: Masami Hiramatsu <mhiramat(a)redhat.com>
Cc: Frederic Weisbecker <fweisbec(a)gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth(a)in.ibm.com>
Cc: Jim Keniston <jkenisto(a)us.ibm.com>
Cc: Ingo Molnar <mingo(a)elte.hu>
LKML-Reference: <20090908164755.24050.81182.stgit(a)dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec(a)gmail.com>
... [ snip ] ...
@@ -513,6 +521,10 @@ sysexit_audit:
PTGS_TO_GS_EX
ENDPROC(ia32_sysenter_target)
+/*
+ * syscall stub including irq exit should be protected against kprobes
+ */
+ .pushsection .kprobes.text, "ax"
# system call handler stub
ENTRY(system_call)
RING0_INT_FRAME # can't unwind into user space anyway
@@ -705,6 +717,10 @@ syscall_badsys:
jmp resume_userspace
END(syscall_badsys)
CFI_ENDPROC
+/*
+ * End of kprobes section
+ */
+ .popsection
I should have a fix tomorrow (if that's the only issue)...
Dave
14 years, 7 months
backtrace failure on x86_64 and x86 in 2.6.33/34 kernels due to "thread_return" removal
by Dave Anderson
Just an FYI -- I'm delaying a new release that I had hoped to do today
because backtraces for blocked x86_64 tasks no longer work with recent
kernels because this commit removed the "thread_return" label:
commit c12a229bc5971534537a7d0e49e44f9f1f5d0336
Author: Masami Hiramatsu <mhiramat(a)redhat.com>
Date: Thu Nov 5 11:03:59 2009 -0500
x86: Remove unused thread_return label from switch_to()
Remove unused thread_return label from switch_to() macro on
x86-64. Since this symbol cuts into schedule(), backtrace at the
latter half of schedule() was always shown as thread_return().
Signed-off-by: Masami Hiramatsu <mhiramat(a)redhat.com>
Cc: systemtap <systemtap(a)sources.redhat.com>
Cc: DLE <dle-develop(a)lists.sourceforge.net>
LKML-Reference: <20091105160359.5181.26225.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo(a)elte.hu>
diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
index f08f973..1a953e2 100644
--- a/arch/x86/include/asm/system.h
+++ b/arch/x86/include/asm/system.h
@@ -128,8 +128,6 @@ do { \
"movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */ \
"movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */ \
"call __switch_to\n\t" \
- ".globl thread_return\n" \
- "thread_return:\n\t" \
"movq "__percpu_arg([current_task])",%%rsi\n\t" \
__switch_canary \
"movq %P[thread_info](%%rsi),%%r8\n\t" \
I've got a fix for x86_64 -- which have always depended on the existence of
the "thread_return" label. But I note that x86 backtraces also are not working,
which I'll take a look at today.
Dave
14 years, 7 months