March 2008 - Crash-utility - Crash Utility List Archives

Re: [Crash-utility] crash aborts with cannot determine idle task

by Dave Anderson

> While running crash-4.0-6.1 on a vmcore , crash is aborting with > > -------- > crash: cannot determine idle task addresses from init_tasks[] or runqueues[] > > crash: cannot resolve "init_task_union" > ------- > > > during startup. The kernel is later than 2.6.18 . The changelog > http://people.redhat.com/anderson/crash.changelog.html mentions that this > is possibly fixed in version 4.0-3.1 . Hence could you pls point me to the > patch that fixed this problem. > > thanks, > Chandru That particular two-year-old patch simply recognized and dealt with the kernel name change from "struct runqueue" to "struct rq": --- kernel.c 2 Aug 2006 14:34:35 -0000 1.140 +++ kernel.c 2 Aug 2006 18:35:31 -0000 1.141 @@ -55,6 +55,7 @@ int i; char *p1, *p2, buf[BUFSIZE]; struct syment *sp1, *sp2; + char *rqstruct; if (pc->flags & KERNEL_DEBUG_QUERY) return; @@ -158,7 +159,15 @@ &kt->__per_cpu_offset[0]); kt->flags |= PER_CPU_OFF; } - MEMBER_OFFSET_INIT(runqueue_cpu, "runqueue", "cpu"); + if (STRUCT_EXISTS("runqueue")) + rqstruct = "runqueue"; + else if (STRUCT_EXISTS("rq")) + rqstruct = "rq"; + + MEMBER_OFFSET_INIT(runqueue_cpu, rqstruct, "cpu"); + /* + * 'cpu' does not exist in 'struct rq'. + */ if (VALID_MEMBER(runqueue_cpu) && (get_array_length("runqueue.cpu", NULL, 0) > 0)) { MEMBER_OFFSET_INIT(cpu_s_curr, "cpu_s", "curr"); @@ -183,17 +192,17 @@ "runq_siblings: %d: __cpu_idx and __rq_idx arrays don't exist?\n", kt->runq_siblings); } else { - MEMBER_OFFSET_INIT(runqueue_idle, "runqueue", "idle"); - MEMBER_OFFSET_INIT(runqueue_curr, "runqueue", "curr"); + MEMBER_OFFSET_INIT(runqueue_idle, rqstruct, "idle"); + MEMBER_OFFSET_INIT(runqueue_curr, rqstruct, "curr"); ASSIGN_OFFSET(runqueue_cpu) = INVALID_OFFSET; } - MEMBER_OFFSET_INIT(runqueue_active, "runqueue", "active"); - MEMBER_OFFSET_INIT(runqueue_expired, "runqueue", "expired"); - MEMBER_OFFSET_INIT(runqueue_arrays, "runqueue", "arrays"); + MEMBER_OFFSET_INIT(runqueue_active, rqstruct, "active"); + MEMBER_OFFSET_INIT(runqueue_expired, rqstruct, "expired"); + MEMBER_OFFSET_INIT(runqueue_arrays, rqstruct, "arrays"); MEMBER_OFFSET_INIT(prio_array_queue, "prio_array", "queue"); MEMBER_OFFSET_INIT(prio_array_nr_active, "prio_array", "nr_active"); - STRUCT_SIZE_INIT(runqueue, "runqueue"); + STRUCT_SIZE_INIT(runqueue, rqstruct); STRUCT_SIZE_INIT(prio_array, "prio_array"); /* So that patch was required for 2.6.18. When you say that the "kernel is later than 2.6.18", well, that doesn't help me much. Look at the crash function get_idle_threads() in task.c, which is where you're failing. It runs through the history of the symbols that Linux has used over the years for the run queues. For the most recent kernels, it looks for the "per_cpu__runqueues" symbol. At least on 2.6.25-rc2, the kernel still defines them in kernel/sched.c like this: static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); So if you do an "nm -Bn vmlinux | grep runqueues", you should see: # nm -Bn vmlinux-2.6.25-rc1-ext4-1 | grep runqueues ffffffff8082b700 d per_cpu__runqueues # I'm guessing that's not the problem -- so presuming that the symbol *does* exist, find out why it's failing to increment "cnt" in this part of get_idle_threads(): if (symbol_exists("per_cpu__runqueues") && VALID_MEMBER(runqueue_idle)) { runqbuf = GETBUF(SIZE(runqueue)); for (i = 0; i < nr_cpus; i++) { if ((kt->flags & SMP) && (kt->flags & PER_CPU_OFF)) { runq = symbol_value("per_cpu__runqueues") + kt->__per_cpu_offset[i]; } else runq = symbol_value("per_cpu__runqueues"); readmem(runq, KVADDR, runqbuf, SIZE(runqueue), "runqueues entry (per_cpu)", FAULT_ON_ERROR); tasklist[i] = ULONG(runqbuf + OFFSET(runqueue_idle)); if (IS_KVADDR(tasklist[i])) cnt++; } } Determine whether it even makes it to the inner for loop, whether the pre-determined nr_cpus value makes sense, whether the SMP flag reflects whether the kernel was compiled for SMP, whether the PER_CPU_OFF flag was set, what address was calculated, etc... Dave

17 years, 4 months

2
5
0 / 0

crash version 4.0-6.2 is available

by Dave Anderson

- Implemented a new "rd -S" option which, like the "-s" option, displays the symbolic translation of kernel virtual addresses, but also recognizes the virtual addresses of slab objects, and when found, the address is replaced by the kmem_cache slab name string inside brackets. (anderson(a)redhat.com) - Make the found address displayed by "kmem -[sS] <address>" be the address of the containing object if the <address> argument is offset from the beginning of the object. This only applies to kernels using kernel/slab.c; CONFIG_SLUB kernels currently do display the address of the containing object. (anderson(a)redhat.com) - Fix for "kmem -[sS] [address]" in 2.6.25 CONFIG_SLUB kernels, which address changes in the kernel's per-slab free list tracking. Without the patch, error messages of the type "kmem: invalid kernel virtual address: 10700 type: get_freepointer" would be seen when the full list of objects in a per-cpu slab was displayed. (anderson(a)redhat.com) - Fix for "kmem -[sS] <slab-address>" in 2.6.25 CONFIG_SLUB kernels, in which the slab structure is actually a page struct. Some slab addresses would not be recognized as such, and therefore without the patch, error messages of the type "kmem: address is not allocated in slab subsystem: <slab-address>" would be seen. (anderson(a)redhat.com) - Fix for an initialization-time failure with Ubuntu kernels because of a mismatch between the /proc/version string and the linux_banner string, due to additional information appended to the linux_banner string in Ubuntu kernels. (anderson(a)redhat.com, asid(a)hp.com) - Fix for the "net" command in 2.6.22 and 2.6.23 kernels, where the "dev_base" net_device structure was replaced by the "dev_base_head" list_head. Without the patch, the "net" command with no arguments would fail with the error message: "net: dev_base does not exist!". (eteo(a)redhat.com) - Fix for the "net" command in 2.6.24 and later kernels where the global "dev_base_head" list_head has been removed, and the network devices are linked from the "init_net" net structure. Without the patch, the "net" command with no arguments would fail with the error message: "net: dev_base does not exist!". (anderson(a)redhat.com) - For kernels configured with CONFIG_SLUB, "kmem -S" has been updated to properly differentiate whether a cache's "full" slabs are tracked but whose full list is empty, or whether the full slabs are not tracked at all. Without this patch, a cache's full list could be indicated as "(empty)" instead of the more correct indication of "(not tracked)". (i-kitayama(a)ap.jp.nec.com, anderson(a)redhat.com) - Fix for the "vm" command when the crash session was invoked with the -s command line option. Without the patch, if invoked prior to a "set", "ps" or "vtop" command, the "vm" command run against a task other than the initial context would mistakenly indicate that the task contained no virtual memory. (anderson(a)redhat.com, baiwd(a)cn.fujitsu.com) - Fix/workaround for the "search -k" command option on relocatable 2.6-era ia64 machines configured with CONFIG_SPARSEMEM. Without the patch, an immediate segmentation violation occurs. (anderson(a)redhat.com, yzgcsu(a)cn.fujitsu.com) Download from: http://people.redhat.com/anderson

17 years, 4 months

1
0
0 / 0

Re: [Crash-utility] Re: search -k

by Dave Anderson

Dave Anderson wrote: > I can reproduce it on a bare-metal RHEL5 kernel, so let me > figure out what's going on... Hello Yang, The problem is that the functions that implement the search command were originally written to be processor-neutral on machines with a relatively small, contiguous, unity-mapped kernel/static-data region and a vmalloc region. With the advent of machines with sparse memory regions, architectures with separately-mapped kernel regions that are no longer part of their unity-mapped regions, the command does not scale well because too much time is spent dealing with inter-region non-existent memory. It's due for a proper re-write with a machine-dependent next_kpage() assist function. I was mistaken in stating that your patch would skip the vmalloc section, but it could only compile on an ia64. Attached is an ugly patch that does does the same thing as yours did, compiles on all architectures, and would still work on 2.4-era ia64 kernels whose kernel text/static-data were still located in the unity-mapped region 7. Some day I'll get around to a real fix... Thanks, Dave

17 years, 4 months

1
0
0 / 0

Re: search -k

by Yang Zhiguo

hi, i run crash with gdb. [root@rhel51rc2 crash-4.0-6.1]# gdb ./crash GNU gdb Red Hat Linux (6.5-25.el5rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "ia64-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) b search Breakpoint 1 at 0x40000000000f8e11: file memory.c, line 11025. (gdb) r -s Starting program: /home/yangzg/crash-4.0-6.1/crash -s crash> search -k 12345 Breakpoint 1, search (start=11529215046068469760, end=18446744073709551615, mask=0, memtype=1, value=0x60000fffffe3eab0, vcnt=1) at memory.c:11025 11025 if (start & (sizeof(long)-1)) { (gdb) n 11030 pagebuf = GETBUF(PAGESIZE()); (gdb) 11031 next = start; (gdb) 11033 for (pp = VIRTPAGEBASE(start); next < end; next = pp) { (gdb) 11034 lastpage = (VIRTPAGEBASE(next) == VIRTPAGEBASE(end)); (gdb) 11035 if (LKCD_DUMPFILE()) (gdb) 11038 switch (memtype) (gdb) 11050 if (!kvtop(CURRENT_CONTEXT(), pp, &paddr, 0) || (gdb) s kvtop (tc=0x6000000001ec1c50, kvaddr=11529215046068469760, paddr=0x60000fffffe368e8, verbose=0) at memory.c:2306 2306 return (machdep->kvtop(tc ? tc : CURRENT_CONTEXT(), kvaddr, (gdb) s ia64_kvtop (tc=0x6000000001ec1c50, kvaddr=11529215046068469760, paddr=0x60000fffffe368e8, verbose=0) at ia64.c:1031 1031 if (!IS_KVADDR(kvaddr)) (gdb) n 1034 if (!vt->vmalloc_start) { (gdb) 1039 switch (VADDR_REGION(kvaddr)) (gdb) 1054 if (ia64_IS_VMALLOC_ADDR(kvaddr)) (gdb) 1056 *paddr = ia64_VTOP(kvaddr); (gdb) s ia64_VTOP (vaddr=11529215046068469760) at ia64.c:3501 3501 ms = &ia64_machine_specific; (gdb) n 3503 switch (VADDR_REGION(vaddr)) (gdb) 3522 if (ia64_IS_VMALLOC_ADDR(vaddr) || (gdb) 3531 paddr = vaddr - ms->kernel_start + (gdb) 3533 break; (gdb) p/x paddr $1 = 0xffffffff04000000 ======>error occured (gdb) p/x vaddr $2 = 0xa000000000000000 (gdb) p/x ms->kernel_start $3 = 0xa000000100000000 (gdb) p/x ms->phys_start $4 = 0x4000000 (gdb) Best Regards, yang

17 years, 4 months

2
1
0 / 0

crash aborts with cannot determine idle task

by Chandru

While running crash-4.0-6.1 on a vmcore , crash is aborting with -------- crash: cannot determine idle task addresses from init_tasks[] or runqueues[] crash: cannot resolve "init_task_union" ------- during startup. The kernel is later than 2.6.18 . The changelog http://people.redhat.com/anderson/crash.changelog.html mentions that this is possibly fixed in version 4.0-3.1 . Hence could you pls point me to the patch that fixed this problem. thanks, Chandru

17 years, 4 months

1
0
0 / 0

search -k

by Yang Zhiguo

hi, When i use search command as following, there is a Segmentation fault. crash> search -k 12345 Segmentation fault With the following patch, it is OK? --- ../crash/crash-4.0-6.1/memory.c 2008-02-29 01:09:10.000000000 +0900 +++ memory.c 2008-03-28 10:32:47.000000000 +0900 @@ -11047,6 +11047,11 @@ search(ulong start, ulong end, ulong mas break; case KVADDR: + if (machine_type("IA64") && (machdep->machspec->kernel_start > pp)) { + pp = machdep->machspec->kernel_start; + continue; + } + if (!kvtop(CURRENT_CONTEXT(), pp, &paddr, 0) || !phys_to_page(paddr, &page)) { if (!next_kpage(pp, &pp)) Best Regards,

17 years, 4 months

2
1
0 / 0

crash -s may have some problem

by baiwd

Hi: When I use crash, I encounter this strange thing. Executing "vm 1" under crash, I see: PID: 1 TASK: e0000001ff908000 CPU: 0 COMMAND: "init" MM PGD RSS TOTAL_VM e000000185baac80 e000000185ca8000 256k 4464k VMA START END FLAGS FILE e000000185abc878 0 4000 84011 e000000185abca88 2000000000000000 200000000003c000 875 /lib/ld-2.5.so e000000185abcb38 2000000000048000 2000000000050000 100873 /lib/ld-2.5.so e000000185abfaa8 2000000000060000 20000000000e0000 75 /lib/libsepol.so.1 e000000185abf738 20000000000e0000 20000000000ec000 70 /lib/libsepol.so.1 e000000185abf948 20000000000ec000 20000000000f0000 100073 /lib/libsepol.so.1 e000000185abf9f8 20000000000f0000 20000000000fc000 100073 e000000185abfb58 20000000000fc000 2000000000124000 75 /lib/libselinux.so.1 e000000185abf898 2000000000124000 2000000000130000 70 /lib/libselinux.so.1 e000000185abf5d8 2000000000130000 2000000000134000 100073 /lib/libselinux.so.1 e000000185abf688 2000000000134000 2000000000138000 100073 e000000185abd218 2000000000138000 20000000003a0000 75 /lib/libc-2.5.so e000000185abd168 20000000003a0000 20000000003ac000 70 /lib/libc-2.5.so e000000185abc458 20000000003ac000 20000000003b4000 100073 /lib/libc-2.5.so e000000185abc718 20000000003b4000 20000000003b8000 100073 e000000185abd798 20000000003b8000 20000000003c0000 75 /lib/libdl-2.5.so e000000185abc508 20000000003c0000 20000000003cc000 70 /lib/libdl-2.5.so e000000185abd848 20000000003cc000 20000000003d0000 100073 /lib/libdl-2.5.so e000000185abd8f8 20000000003d0000 20000000003e8000 100073 e000000185abc928 4000000000000000 4000000000014000 1875 /sbin/init But when I execute "vm 1" under crash -s, the second part is missing. [root@rhel51rc2 crash-4.0-6.1]# crash -s crash> vm 1 PID: 1 TASK: e0000001ff908000 CPU: 0 COMMAND: "init" MM PGD RSS TOTAL_VM 0 0 0k 0k I think is caused by that IS_ZOMBIE(task) in memory.c failed. And this failure is caused by that the value of _ZOMBIE_ has not been initialized. the initialization of _ZOMBIE_ is in initialize_task_struct(), called by show_context() finally. But when executing "crash -s", it's not called. I added the following code in memory.c before using _ZOMBIE_, but I don't know whether it's good, it need to change initialize_task_state() to nonstatic and use TASK_STATE_UNINITIALIZED which is now in task.c only. " if (_ZOMBIE_ == TASK_STATE_UNINITIALIZED) initialize_task_state(); " Best Regards -- Bai Weidong EMail£ºbaiwd(a)cn.fujitsu.com --------------------------------------------------

17 years, 4 months

2
1
0 / 0

crash: cannot gather a stable task list via pid_hash (500 retries)

by Eugene Teo

Hi Dave, I tried to run crash on Fedora 8's kernel 2.6.24.3-12.fc8 x86_64, and it has errors that look like the following: [...] crash: duplicate task in pid_hash: ffff81012f0811d0 crash: duplicate task in pid_hash: ffff81012f0811d0 crash: duplicate task in pid_hash: ffff81012f0811d0 crash: duplicate task in pid_hash: ffff81012f0811d0 crash: duplicate task in pid_hash: ffff81012f0811d0 crash: cannot gather a stable task list via pid_hash (500 retries) I ran crash with -d7, and uploaded the log for debugging: http://hera.kernel.org/~eugeneteo/crash.log Thanks, Eugene

17 years, 4 months

2
4
0 / 0

Re: ANN: crash extension for networking stuff

by Dave Anderson

Hi Alex, Nice. Now that's a serious extension! I'd like to add a reference to the general Python/Crash API as a third main section to the http://people.redhat.com/anderson/extensions.html page, with a subsection within it that references xportshow as an example. (And then in the future any new commands made the same way could be added.) If you want to tinker with that html page and send me a copy off-line, please do so -- I just don't want to butcher the explanation. And if you don't want to do any html hacking (it's a pretty simple page), just tell me how you'd like to describe/format it, and I'll take it from there. Again, really nice work -- thanks, Dave

17 years, 4 months

2
1
0 / 0

ANN: crash extension for networking stuff

by Alex Sidorenko

Prebuilt extension modules released for x86 and x86_64 architectures today. You can download them from http://sourceforge.net/projects/pykdump/, two packages of interest are mpykdump-x86 (0.5.1) mpykdump-x86_64 (0.5.1) They can be used immediately on any x86 or x86_64 Linux distribution with GLIBC 2.3 or later (e.g. RHEL3-5, SLES9-11, Ubuntu etc.) The extension provides 'xportshow' command with many options. In particular, it can produce output similar to 'netstat' with different options, print routing tables, ARP-tables, statistics, interface information and so on. Most functions work on all kernels in 2.4.21-2.6.24 range. Please see examples at http://pykdump.wiki.sourceforge.net/xportshow The extension is built using PyKdump framework (Python scripting for crash) but it does not need Python installed. Everything needed to run is present in a single file that depends only on GLIBC-family libraries. The source packages on SF site are very old; if you are interested in sources and/or building your own extensions, you can use the 'testing' branch of SVN. The build instructions can be found in projects Wiki, http://pykdump.wiki.sourceforge.net/ The performance of PyKdump is on par with SIAL (PyKdump is much faster for data manipulation, SIAL is faster for dump structures access). In general I find SIAL great for simple scripts and PyKdump for bigger projects (e.g. automated 1st pass dumpanalysis running many tests). The choice of functionality was driven by practical needs while working on problems for HP Linux support (x86, x86_64 and ia64); additional functions will be added as needed. -- ------------------------------------------------------------------ Alexandre Sidorenko email: asid(a)hp.com Global Solutions Engineering: Unix Networking Hewlett-Packard (Canada) ------------------------------------------------------------------

17 years, 4 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility March 2008