May 2010 - Crash-utility - Crash Utility List Archives

[crash-utility] [lkcd-devel] Patch to add LKCD vmcore validation feature

by Vitaly Kuzmichev

Hello, Attached is the patch to add separate tool for validating LKCD netdumps and blockdumps. We are planning to add this feature in our fork of crash-3.10. Our customers requested this feature, but we have found that the 'crash' does not print any warnings when someone tries to load incomplete vmcore. They need a simple way to verify if core file generated from LKCD is complete. -- Best regards, Vitaly Kuzmichev, Software Engineer, MontaVista Software, LLC.

14 years, 11 months

1
1
0 / 0

Re: [Crash-utility] crash can't handle virsh dump file

by Dave Anderson

----- "Gui Jianfeng" <guijianfeng(a)cn.fujitsu.com> wrote: > Paolo Bonzini wrote: > > On 05/28/2010 11:19 AM, Gui Jianfeng wrote: > >> Gui Jianfeng wrote: > >>> Hi all, > >>> > >>> I made use of "virsh dump" to generate a dumpfile, but crash seems fails at initializing time. > >>> I decode the dumpfile and found there's a "block" header section, but seems crash doesn't support > >>> such section so it failed. Am i missing something? > >> > >> Any one can help? How can i make use of crash checking the dumpfile? > > > > Something like the attached should do it (untested because I don't know > > where the crash upstream repo is, though I have likely asked this already). > > Thanks for sharing, i'll try it. :) > > Thanks, > Gui > > > > > Thanks, > > > > Paolo > > I missed all the excitement by taking today off. Can you let us know how Paolo's patch worked for you? Thanks, Dave

15 years, 1 month

3
8
0 / 0

crash can't handle virsh dump file

by Gui Jianfeng

Hi all, I made use of "virsh dump" to generate a dumpfile, but crash seems fails at initializing time. I decode the dumpfile and found there's a "block" header section, but seems crash doesn't support such section so it failed. Am i missing something? -- Regards Gui Jianfeng

15 years, 2 months

2
2
0 / 0

Re: [Crash-utility] crash can't handle virsh dump file

by Gui Jianfeng

Paolo Bonzini wrote: > On 05/28/2010 11:19 AM, Gui Jianfeng wrote: >> Gui Jianfeng wrote: >>> Hi all, >>> >>> I made use of "virsh dump" to generate a dumpfile, but crash seems >>> fails at initializing time. >>> I decode the dumpfile and found there's a "block" header section, but >>> seems crash doesn't support >>> such section so it failed. Am i missing something? >> >> Any one can help? How can i make use of crash checking the dumpfile? > > Something like the attached should do it (untested because I don't know > where the crash upstream repo is, though I have likely asked this already). Thanks for sharing, i'll try it. :) Thanks, Gui > > Thanks, > > Paolo >

15 years, 2 months

1
0
0 / 0

Re: [Crash-utility] Why are there two ways of getting register values for active tasks?

by Dave Anderson

----- "Daisuke HATAYAMA" <d.hatayama(a)jp.fujitsu.com> wrote: > Hi Dave. > > Well, I have still a question: Does kdump-compressed format contain > register values for CPUs? > > I've looked into part of makedumpfile reading ELF but found out that > yet. It appears to me that makedumpfile ignores all note info except > for vmcoreinfo's location. That's correct, there are no per-cpu register values. From the crash utility's perspective, all it gets from the makedumpfile-generated compressed dumpfile is the diskdump_header and kdump_sub_header: struct disk_dump_header { char signature[SIG_LEN]; /* = "DISKDUMP" */ int header_version; /* Dump header version */ struct new_utsname utsname; /* copy of system_utsname */ struct timeval timestamp; /* Time stamp */ unsigned int status; /* Above flags */ int block_size; /* Size of a block in byte */ int sub_hdr_size; /* Size of arch dependent header in blocks */ unsigned int bitmap_blocks; /* Size of Memory bitmap in block */ unsigned int max_mapnr; /* = max_mapnr */ unsigned int total_ram_blocks;/* Number of blocks should be written */ unsigned int device_blocks; /* Number of total blocks in * the dump device */ unsigned int written_blocks; /* Number of written blocks */ unsigned int current_cpu; /* CPU# which handles dump */ int nr_cpus; /* Number of CPUs */ struct task_struct *tasks[0]; }; struct kdump_sub_header { unsigned long phys_base; int dump_level; /* header_version 1 and later */ int split; /* header_version 2 and later */ unsigned long start_pfn; /* header_version 2 and later */ unsigned long end_pfn; /* header_version 2 and later */ }; Dave

15 years, 2 months

2
1
0 / 0

[ANNOUNCE] crash version 5.0.4 is available

by Dave Anderson

- Fix for the x86 "bt" command when a newly-forked task's resumption EIP address value is set to the "ret_from_fork" entry point by copy_thread(). Without the patch, the backtrace attempt would display "bt: cannot resolve stack trace", dump the text symbols on the stack, and a possible USER-MODE exception frame. (anderson(a)redhat.com) - Fix for the x86 "bt" command if the kdump-generated NMI interrupts a task running in kernel space at a point in the system_call entry point code prior to the call to a system call function. Without the patch, the backtrace attempt would display "bt: cannot resolve stack trace", dump the text symbols on the kernel stack, and display any "KERNEL-MODE" exception frames followed by a possible "USER-MODE" exception frame. (anderson(a)redhat.com) - Fix for the "bt" command on 2.6.29 and later x86_64 kernels to recognize and display exception frames generated by exceptions that do not result in a stack switch, such as general protection faults. Without the patch, the backtrace would potentially not display the exception frames because the "error_exit" assembly-code label in entry_64.S was replaced by the error_exit() entry point. (anderson(a)redhat.com) - The kernel patch for ppc64 CONFIG_SPARSEMEM_VMEMMAP kernels that stores vmemmap page mapping information so that the crash utility is able to translate vmemmap'd kernel virtual addresses has been updated. The crash utility patch that was (preemptively) applied in 5.0.2 for the initial kernel patch needs this update. (anderson(a)redhat.com) - Fix the error message for the "dev -p" comand when run on 2.6.26 or later kernels, which no longer have the global "pci_devices" list head. The patch changes the message to show "dev: -p option not supported or applicable on this architecture or kernel", instead of the misleading "dev: no PCI devices found on this system" message. (anderson(a)redhat.com) - If a cpu in an s390 or s390x dumpfile is offline, and the "bt" command receives a backtrace request for the "swapper" task on that cpu, the command will display "CPU offline". (holzheu(a)linux.vnet.ibm.com) - Fix for 2.6.34 and later x86_64 kernels which generate per-cpu symbols of type 'd' or type 'D' instead of type 'V'. Without the patch, an x86_64 crash session fails during initialization with the error message "crash: cannot determine idle task addresses from init_tasks[] or runqueues[]", followed by "crash: cannot resolve init_task_union". It is unclear why some kernel builds result in only type 'V' per-cpu symbols, whereas others result in in type 'd' and 'D', so the patch accepts both. (Kashyap.Desai(a)lsi.com) - Fix to prevent a segmentation violation during initialization in the x86_64_get_active_set() function by verifying that the array of current tasks in machdep->machspec->current[] has actually been allocated. Theoretically it should never be NULL, but in the unlikely event that x86_64_per_cpu_init() fails to find the required per-cpu symbols, it will return without allocating the array. (anderson(a)redhat.com) - Fix to support KVM dumpfiles created with "virsh dump" that create "cpu" header sections using a QEMU CPU_SAVE_VERSION version greater than the supported version of 9. Without the patch, the crash session fails during initialization with the error message "crash: qemu-load.c:501: cpu_init_load_64: Assertion `version_id >= 4 && version_id <= 9' failed." The patch now accepts CPU_VERSION_VERSION values up to 12. (anderson(a)redhat.com) - Fix for x86_64 KVM dumpfiles created with "virsh dump" whose kernels have a "_text" virtual address higher than __START_KERNEL_map. Without the patch, the physical base address calculation fails, making the dumpfile unusable. (anderson(a)redhat.com, pbonzini(a)redhat.com) - Implemented a new "map" command that is seen only when running with KVM guest dumpfiles created with "virsh dump". The layout of this dumpfile format does not allow the access of system memory in a "random-access" manner. Therefore, during session initialization, a potentially time-consuming dumpfile scan procedure is required to create a physical-memory-to-file-offset memory map for use during the session. The new "map" command allows the user to either append the memory map to the end of the dumpfile, or to create a discrete memory map file. In either case, the dumpfile scan will not be required during subsequent sessions. The command's help page may be seen by entering "crash -h map". (anderson(a)redhat.com) - Fix for an incorrect calculation of the physical base address of a fully-virtualized x86_64 RHEL6 guest kernel running on a RHEL5 Xen host. Without the patch, the session failed during initialization with the error messages "crash: cannot determine base kernel version" and "crash: vmlinux and vmcore do not match!" (anderson(a)redhat.com) - Fix for the "bt" command on inactive (blocked) tasks on 2.6.33 and later x86_64 kernels, which have the "thread_return" symbol removed from the embedded "switch_to" macro. Without the patch, when run on blocked tasks, the command would fail with the error message "bt: cannot resolve thread_return". (anderson(a)redhat.com) - Fix for the "bt" command on 2.6.33 and later x86 kernels, which moved the "system_call" assembly function to the .kprobes.text section. Without the patch, the command would typically display two invalid stack frames, both indicating they were in "ia32_sysenter_target". (anderson(a)redhat.com) - Fix for a segmentation violation caused by the "extensions/trace.c" extension module, as seen when running the "trace show -c <cpu>" command from that module. (laijs(a)cn.fujitsu.com) - Implemented a "trace dump -t" command for the "extensions/trace.c" extension module. The module already has a "trace show" command to show what events had happened before the system crashed, but it is just 1000 lines of code and it is not as complete as the related "trace-cmd report" command from trace-cmd(1). The new extension module command generates a "trace.dat" file, which in turn can be used by the "trace-cmd report" option of trace-cmd(1). So this patch improves both the crash trace command and the trace-cmd(1) as well, which can now handle ftrace even if the kernel crashed. (laijs(a)cn.fujitsu.com) Download from: http://people.redhat.com/anderson

15 years, 2 months

1
0
0 / 0

Re: [Crash-utility] backtrace failure on x86_64 and x86 in 2.6.33/34 kernels due to "thread_return" removal

by Dave Anderson

----- "Masami Hiramatsu" <mhiramat(a)redhat.com> wrote: > Hi Dave, > > Are these issues only for crash tools? or it occurs in kernel func-backtrace too? > And how would you fix it? They are crash issues only, in having to deal with the shifting sands of the underlying kernel. In both cases, the problem has always been that assembly-code labels are stored as text symbols, which is confusing to the backtrace code. And in both cases, the new kernel changes interfered with the work-arounds put in place by the crash utility to handle them. In any case, it's not a big deal as it's fixable in the crash utility. Thanks, Dave

15 years, 2 months

1
0
0 / 0

Re: [Crash-utility] Why are there two ways of getting register values for active tasks?

by Dave Anderson

----- "Dave Anderson" <anderson(a)redhat.com> wrote: > ----- "Daisuke HATAYAMA" <d.hatayama(a)jp.fujitsu.com> wrote: > > > Hi, all. > > > > I have a question on the implementation of > get_netdump_regs_x86_64(). > > > > Currently, in order to get register values for active tasks, only > > panic task makes use of note information. On the other hand, other > > active tasks search stack frame for registers saved at nmi > > switch. However, crash dump contains the note information for every > > CPUs, so I think it uncessary to search stack frame. > > Originally it was done that way because the code was written for > netdump-generated dumpfiles, which only generated note information > for the panic task. But if I'm not mistaken, given that recent > kernels do not store debuginfo data for the user_regs_struct, it > almost always falls through into x86_64_get_stack_frame(). I take that back -- when it's not in the debuginfo, it hardwires the user_regs_struct data structure information. That being the case, I don't remember why it is restricted to the panic task, but it had to have been put in place based upon actual dumpfiles where it didn't work correctly for a non-panic task. If I get the time, I'll remove the restriction and run it on my set of stashed dumpfile examples to see if I can be more specific. Anyway, good question -- sorry for such a weak answer... Dave

15 years, 2 months

2
1
0 / 0

Re: [Crash-utility] backtrace failure on x86_64 and x86 in 2.6.33/34 kernels due to "thread_return" removal

by Dave Anderson

----- "Dave Anderson" <anderson(a)redhat.com> wrote: > I've got a fix for x86_64 -- which have always depended on the existence of > the "thread_return" label. But I note that x86 backtraces also are not working, > which I'll take a look at today. As it turns out, the x86 backtrace failures in 2.6.33/34 are caused by a different kprobes-related commit, which moved the system_call assembly function to the .kprobes.text section: commit a00e817f42663941ea0aa5f85a9d1c4f8b212839 Author: Masami Hiramatsu <mhiramat(a)redhat.com> Date: Tue Sep 8 12:47:55 2009 -0400 kprobes/x86-32: Move irq-exit functions to kprobes section Move irq-exit functions to .kprobes.text section to protect against kprobes recursion. When I ran kprobe stress test on x86-32, I found below symbols cause unrecoverable recursive probing: ret_from_exception ret_from_intr check_userspace restore_all restore_all_notrace restore_nocheck irq_return And also, I found some interrupt/exception entry points that cause similar problems. This patch moves those symbols (including their container functions) to .kprobes.text section to prevent any kprobes probing. Signed-off-by: Masami Hiramatsu <mhiramat(a)redhat.com> Cc: Frederic Weisbecker <fweisbec(a)gmail.com> Cc: Ananth N Mavinakayanahalli <ananth(a)in.ibm.com> Cc: Jim Keniston <jkenisto(a)us.ibm.com> Cc: Ingo Molnar <mingo(a)elte.hu> LKML-Reference: <20090908164755.24050.81182.stgit(a)dhcp-100-2-132.bos.redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec(a)gmail.com> ... [ snip ] ... @@ -513,6 +521,10 @@ sysexit_audit: PTGS_TO_GS_EX ENDPROC(ia32_sysenter_target) +/* + * syscall stub including irq exit should be protected against kprobes + */ + .pushsection .kprobes.text, "ax" # system call handler stub ENTRY(system_call) RING0_INT_FRAME # can't unwind into user space anyway @@ -705,6 +717,10 @@ syscall_badsys: jmp resume_userspace END(syscall_badsys) CFI_ENDPROC +/* + * End of kprobes section + */ + .popsection I should have a fix tomorrow (if that's the only issue)... Dave

15 years, 2 months

2
1
0 / 0

backtrace failure on x86_64 and x86 in 2.6.33/34 kernels due to "thread_return" removal

by Dave Anderson

Just an FYI -- I'm delaying a new release that I had hoped to do today because backtraces for blocked x86_64 tasks no longer work with recent kernels because this commit removed the "thread_return" label: commit c12a229bc5971534537a7d0e49e44f9f1f5d0336 Author: Masami Hiramatsu <mhiramat(a)redhat.com> Date: Thu Nov 5 11:03:59 2009 -0500 x86: Remove unused thread_return label from switch_to() Remove unused thread_return label from switch_to() macro on x86-64. Since this symbol cuts into schedule(), backtrace at the latter half of schedule() was always shown as thread_return(). Signed-off-by: Masami Hiramatsu <mhiramat(a)redhat.com> Cc: systemtap <systemtap(a)sources.redhat.com> Cc: DLE <dle-develop(a)lists.sourceforge.net> LKML-Reference: <20091105160359.5181.26225.stgit@harusame> Signed-off-by: Ingo Molnar <mingo(a)elte.hu> diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h index f08f973..1a953e2 100644 --- a/arch/x86/include/asm/system.h +++ b/arch/x86/include/asm/system.h @@ -128,8 +128,6 @@ do { \ "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */ \ "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */ \ "call __switch_to\n\t" \ - ".globl thread_return\n" \ - "thread_return:\n\t" \ "movq "__percpu_arg([current_task])",%%rsi\n\t" \ __switch_canary \ "movq %P[thread_info](%%rsi),%%r8\n\t" \ I've got a fix for x86_64 -- which have always depended on the existence of the "thread_return" label. But I note that x86 backtraces also are not working, which I'll take a look at today. Dave

15 years, 2 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility May 2010