Re: [Crash-utility] crash aborts with cannot determine idle task
by Dave Anderson
> While running crash-4.0-6.1 on a vmcore , crash is aborting with
>
> --------
> crash: cannot determine idle task addresses from init_tasks[] or runqueues[]
>
> crash: cannot resolve "init_task_union"
> -------
>
>
> during startup. The kernel is later than 2.6.18 . The changelog
> http://people.redhat.com/anderson/crash.changelog.html mentions that this
> is possibly fixed in version 4.0-3.1 . Hence could you pls point me to the
> patch that fixed this problem.
>
> thanks,
> Chandru
That particular two-year-old patch simply recognized and dealt with the kernel
name change from "struct runqueue" to "struct rq":
--- kernel.c 2 Aug 2006 14:34:35 -0000 1.140
+++ kernel.c 2 Aug 2006 18:35:31 -0000 1.141
@@ -55,6 +55,7 @@
int i;
char *p1, *p2, buf[BUFSIZE];
struct syment *sp1, *sp2;
+ char *rqstruct;
if (pc->flags & KERNEL_DEBUG_QUERY)
return;
@@ -158,7 +159,15 @@
&kt->__per_cpu_offset[0]);
kt->flags |= PER_CPU_OFF;
}
- MEMBER_OFFSET_INIT(runqueue_cpu, "runqueue", "cpu");
+ if (STRUCT_EXISTS("runqueue"))
+ rqstruct = "runqueue";
+ else if (STRUCT_EXISTS("rq"))
+ rqstruct = "rq";
+
+ MEMBER_OFFSET_INIT(runqueue_cpu, rqstruct, "cpu");
+ /*
+ * 'cpu' does not exist in 'struct rq'.
+ */
if (VALID_MEMBER(runqueue_cpu) &&
(get_array_length("runqueue.cpu", NULL, 0) > 0)) {
MEMBER_OFFSET_INIT(cpu_s_curr, "cpu_s", "curr");
@@ -183,17 +192,17 @@
"runq_siblings: %d: __cpu_idx and __rq_idx arrays don't exist?\n",
kt->runq_siblings);
} else {
- MEMBER_OFFSET_INIT(runqueue_idle, "runqueue", "idle");
- MEMBER_OFFSET_INIT(runqueue_curr, "runqueue", "curr");
+ MEMBER_OFFSET_INIT(runqueue_idle, rqstruct, "idle");
+ MEMBER_OFFSET_INIT(runqueue_curr, rqstruct, "curr");
ASSIGN_OFFSET(runqueue_cpu) = INVALID_OFFSET;
}
- MEMBER_OFFSET_INIT(runqueue_active, "runqueue", "active");
- MEMBER_OFFSET_INIT(runqueue_expired, "runqueue", "expired");
- MEMBER_OFFSET_INIT(runqueue_arrays, "runqueue", "arrays");
+ MEMBER_OFFSET_INIT(runqueue_active, rqstruct, "active");
+ MEMBER_OFFSET_INIT(runqueue_expired, rqstruct, "expired");
+ MEMBER_OFFSET_INIT(runqueue_arrays, rqstruct, "arrays");
MEMBER_OFFSET_INIT(prio_array_queue, "prio_array", "queue");
MEMBER_OFFSET_INIT(prio_array_nr_active, "prio_array",
"nr_active");
- STRUCT_SIZE_INIT(runqueue, "runqueue");
+ STRUCT_SIZE_INIT(runqueue, rqstruct);
STRUCT_SIZE_INIT(prio_array, "prio_array");
/*
So that patch was required for 2.6.18.
When you say that the "kernel is later than 2.6.18", well, that doesn't
help me much.
Look at the crash function get_idle_threads() in task.c, which is where
you're failing. It runs through the history of the symbols that Linux
has used over the years for the run queues. For the most recent kernels,
it looks for the "per_cpu__runqueues" symbol. At least on 2.6.25-rc2,
the kernel still defines them in kernel/sched.c like this:
static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
So if you do an "nm -Bn vmlinux | grep runqueues", you should see:
# nm -Bn vmlinux-2.6.25-rc1-ext4-1 | grep runqueues
ffffffff8082b700 d per_cpu__runqueues
#
I'm guessing that's not the problem -- so presuming that the symbol *does*
exist, find out why it's failing to increment "cnt" in this part of
get_idle_threads():
if (symbol_exists("per_cpu__runqueues") &&
VALID_MEMBER(runqueue_idle)) {
runqbuf = GETBUF(SIZE(runqueue));
for (i = 0; i < nr_cpus; i++) {
if ((kt->flags & SMP) && (kt->flags & PER_CPU_OFF)) {
runq = symbol_value("per_cpu__runqueues") +
kt->__per_cpu_offset[i];
} else
runq = symbol_value("per_cpu__runqueues");
readmem(runq, KVADDR, runqbuf,
SIZE(runqueue), "runqueues entry (per_cpu)",
FAULT_ON_ERROR);
tasklist[i] = ULONG(runqbuf + OFFSET(runqueue_idle));
if (IS_KVADDR(tasklist[i]))
cnt++;
}
}
Determine whether it even makes it to the inner for loop, whether
the pre-determined nr_cpus value makes sense, whether the SMP flag
reflects whether the kernel was compiled for SMP, whether the PER_CPU_OFF
flag was set, what address was calculated, etc...
Dave
16 years, 7 months
crash version 4.0-6.2 is available
by Dave Anderson
- Implemented a new "rd -S" option which, like the "-s" option,
displays the symbolic translation of kernel virtual addresses,
but also recognizes the virtual addresses of slab objects, and when
found, the address is replaced by the kmem_cache slab name string
inside brackets. (anderson(a)redhat.com)
- Make the found address displayed by "kmem -[sS] <address>" be the
address of the containing object if the <address> argument is
offset from the beginning of the object. This only applies to
kernels using kernel/slab.c; CONFIG_SLUB kernels currently do display
the address of the containing object.
(anderson(a)redhat.com)
- Fix for "kmem -[sS] [address]" in 2.6.25 CONFIG_SLUB kernels, which
address changes in the kernel's per-slab free list tracking. Without
the patch, error messages of the type "kmem: invalid kernel virtual
address: 10700 type: get_freepointer" would be seen when the full
list of objects in a per-cpu slab was displayed.
(anderson(a)redhat.com)
- Fix for "kmem -[sS] <slab-address>" in 2.6.25 CONFIG_SLUB kernels,
in which the slab structure is actually a page struct. Some slab
addresses would not be recognized as such, and therefore without the
patch, error messages of the type "kmem: address is not allocated in
slab subsystem: <slab-address>" would be seen.
(anderson(a)redhat.com)
- Fix for an initialization-time failure with Ubuntu kernels because
of a mismatch between the /proc/version string and the linux_banner
string, due to additional information appended to the linux_banner
string in Ubuntu kernels. (anderson(a)redhat.com, asid(a)hp.com)
- Fix for the "net" command in 2.6.22 and 2.6.23 kernels, where the
"dev_base" net_device structure was replaced by the "dev_base_head"
list_head. Without the patch, the "net" command with no arguments
would fail with the error message: "net: dev_base does not exist!".
(eteo(a)redhat.com)
- Fix for the "net" command in 2.6.24 and later kernels where the
global "dev_base_head" list_head has been removed, and the network
devices are linked from the "init_net" net structure. Without the
patch, the "net" command with no arguments would fail with the
error message: "net: dev_base does not exist!".
(anderson(a)redhat.com)
- For kernels configured with CONFIG_SLUB, "kmem -S" has been updated
to properly differentiate whether a cache's "full" slabs are tracked
but whose full list is empty, or whether the full slabs are not
tracked at all. Without this patch, a cache's full list could be
indicated as "(empty)" instead of the more correct indication of
"(not tracked)". (i-kitayama(a)ap.jp.nec.com, anderson(a)redhat.com)
- Fix for the "vm" command when the crash session was invoked with
the -s command line option. Without the patch, if invoked prior to
a "set", "ps" or "vtop" command, the "vm" command run against a
task other than the initial context would mistakenly indicate that
the task contained no virtual memory.
(anderson(a)redhat.com, baiwd(a)cn.fujitsu.com)
- Fix/workaround for the "search -k" command option on relocatable
2.6-era ia64 machines configured with CONFIG_SPARSEMEM. Without
the patch, an immediate segmentation violation occurs.
(anderson(a)redhat.com, yzgcsu(a)cn.fujitsu.com)
Download from: http://people.redhat.com/anderson
16 years, 8 months
Re: [Crash-utility] Re: search -k
by Dave Anderson
Dave Anderson wrote:
> I can reproduce it on a bare-metal RHEL5 kernel, so let me
> figure out what's going on...
Hello Yang,
The problem is that the functions that implement the search
command were originally written to be processor-neutral on
machines with a relatively small, contiguous, unity-mapped
kernel/static-data region and a vmalloc region. With the
advent of machines with sparse memory regions, architectures
with separately-mapped kernel regions that are no longer part
of their unity-mapped regions, the command does not scale well
because too much time is spent dealing with inter-region
non-existent memory. It's due for a proper re-write with
a machine-dependent next_kpage() assist function.
I was mistaken in stating that your patch would skip the vmalloc
section, but it could only compile on an ia64. Attached is an
ugly patch that does does the same thing as yours did, compiles
on all architectures, and would still work on 2.4-era ia64 kernels
whose kernel text/static-data were still located in the unity-mapped
region 7.
Some day I'll get around to a real fix...
Thanks,
Dave
16 years, 8 months
Re: search -k
by Yang Zhiguo
hi,
i run crash with gdb.
[root@rhel51rc2 crash-4.0-6.1]# gdb ./crash
GNU gdb Red Hat Linux (6.5-25.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "ia64-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) b search
Breakpoint 1 at 0x40000000000f8e11: file memory.c, line 11025.
(gdb) r -s
Starting program: /home/yangzg/crash-4.0-6.1/crash -s
crash> search -k 12345
Breakpoint 1, search (start=11529215046068469760, end=18446744073709551615, mask=0, memtype=1, value=0x60000fffffe3eab0, vcnt=1) at memory.c:11025
11025 if (start & (sizeof(long)-1)) {
(gdb) n
11030 pagebuf = GETBUF(PAGESIZE());
(gdb)
11031 next = start;
(gdb)
11033 for (pp = VIRTPAGEBASE(start); next < end; next = pp) {
(gdb)
11034 lastpage = (VIRTPAGEBASE(next) == VIRTPAGEBASE(end));
(gdb)
11035 if (LKCD_DUMPFILE())
(gdb)
11038 switch (memtype)
(gdb)
11050 if (!kvtop(CURRENT_CONTEXT(), pp, &paddr, 0) ||
(gdb) s
kvtop (tc=0x6000000001ec1c50, kvaddr=11529215046068469760, paddr=0x60000fffffe368e8, verbose=0) at memory.c:2306
2306 return (machdep->kvtop(tc ? tc : CURRENT_CONTEXT(), kvaddr,
(gdb) s
ia64_kvtop (tc=0x6000000001ec1c50, kvaddr=11529215046068469760, paddr=0x60000fffffe368e8, verbose=0) at ia64.c:1031
1031 if (!IS_KVADDR(kvaddr))
(gdb) n
1034 if (!vt->vmalloc_start) {
(gdb)
1039 switch (VADDR_REGION(kvaddr))
(gdb)
1054 if (ia64_IS_VMALLOC_ADDR(kvaddr))
(gdb)
1056 *paddr = ia64_VTOP(kvaddr);
(gdb) s
ia64_VTOP (vaddr=11529215046068469760) at ia64.c:3501
3501 ms = &ia64_machine_specific;
(gdb) n
3503 switch (VADDR_REGION(vaddr))
(gdb)
3522 if (ia64_IS_VMALLOC_ADDR(vaddr) ||
(gdb)
3531 paddr = vaddr - ms->kernel_start +
(gdb)
3533 break;
(gdb) p/x paddr
$1 = 0xffffffff04000000 ======>error occured
(gdb) p/x vaddr
$2 = 0xa000000000000000
(gdb) p/x ms->kernel_start
$3 = 0xa000000100000000
(gdb) p/x ms->phys_start
$4 = 0x4000000
(gdb)
Best Regards,
yang
16 years, 8 months
crash aborts with cannot determine idle task
by Chandru
While running crash-4.0-6.1 on a vmcore , crash is aborting with
--------
crash: cannot determine idle task addresses from init_tasks[] or runqueues[]
crash: cannot resolve "init_task_union"
-------
during startup. The kernel is later than 2.6.18 . The changelog
http://people.redhat.com/anderson/crash.changelog.html mentions that
this is possibly fixed in version 4.0-3.1 . Hence could you pls point
me to the patch that fixed this problem.
thanks,
Chandru
16 years, 8 months
search -k
by Yang Zhiguo
hi,
When i use search command as following, there is a Segmentation fault.
crash> search -k 12345
Segmentation fault
With the following patch, it is OK?
--- ../crash/crash-4.0-6.1/memory.c 2008-02-29 01:09:10.000000000 +0900
+++ memory.c 2008-03-28 10:32:47.000000000 +0900
@@ -11047,6 +11047,11 @@ search(ulong start, ulong end, ulong mas
break;
case KVADDR:
+ if (machine_type("IA64") &&
(machdep->machspec->kernel_start > pp)) {
+ pp = machdep->machspec->kernel_start;
+ continue;
+ }
+
if (!kvtop(CURRENT_CONTEXT(), pp, &paddr, 0) ||
!phys_to_page(paddr, &page)) {
if (!next_kpage(pp, &pp))
Best Regards,
16 years, 8 months
crash -s may have some problem
by baiwd
Hi:
When I use crash, I encounter this strange thing.
Executing "vm 1" under crash, I see:
PID: 1 TASK: e0000001ff908000 CPU: 0 COMMAND: "init"
MM PGD RSS TOTAL_VM
e000000185baac80 e000000185ca8000 256k 4464k
VMA START END FLAGS FILE
e000000185abc878 0 4000 84011
e000000185abca88 2000000000000000 200000000003c000 875 /lib/ld-2.5.so
e000000185abcb38 2000000000048000 2000000000050000 100873 /lib/ld-2.5.so
e000000185abfaa8 2000000000060000 20000000000e0000 75 /lib/libsepol.so.1
e000000185abf738 20000000000e0000 20000000000ec000 70 /lib/libsepol.so.1
e000000185abf948 20000000000ec000 20000000000f0000 100073 /lib/libsepol.so.1
e000000185abf9f8 20000000000f0000 20000000000fc000 100073
e000000185abfb58 20000000000fc000 2000000000124000 75
/lib/libselinux.so.1
e000000185abf898 2000000000124000 2000000000130000 70
/lib/libselinux.so.1
e000000185abf5d8 2000000000130000 2000000000134000 100073
/lib/libselinux.so.1
e000000185abf688 2000000000134000 2000000000138000 100073
e000000185abd218 2000000000138000 20000000003a0000 75 /lib/libc-2.5.so
e000000185abd168 20000000003a0000 20000000003ac000 70 /lib/libc-2.5.so
e000000185abc458 20000000003ac000 20000000003b4000 100073 /lib/libc-2.5.so
e000000185abc718 20000000003b4000 20000000003b8000 100073
e000000185abd798 20000000003b8000 20000000003c0000 75 /lib/libdl-2.5.so
e000000185abc508 20000000003c0000 20000000003cc000 70 /lib/libdl-2.5.so
e000000185abd848 20000000003cc000 20000000003d0000 100073 /lib/libdl-2.5.so
e000000185abd8f8 20000000003d0000 20000000003e8000 100073
e000000185abc928 4000000000000000 4000000000014000 1875 /sbin/init
But when I execute "vm 1" under crash -s, the second part is missing.
[root@rhel51rc2 crash-4.0-6.1]# crash -s
crash> vm 1
PID: 1 TASK: e0000001ff908000 CPU: 0 COMMAND: "init"
MM PGD RSS TOTAL_VM
0 0 0k 0k
I think is caused by that IS_ZOMBIE(task) in memory.c failed.
And this failure
is caused by that the value of _ZOMBIE_ has not been initialized. the
initialization of
_ZOMBIE_ is in initialize_task_struct(), called by show_context()
finally. But when
executing "crash -s", it's not called.
I added the following code in memory.c before using _ZOMBIE_, but
I don't know
whether it's good, it need to change initialize_task_state() to
nonstatic and use
TASK_STATE_UNINITIALIZED which is now in task.c only.
"
if (_ZOMBIE_ == TASK_STATE_UNINITIALIZED)
initialize_task_state();
"
Best Regards
--
Bai Weidong
EMail£ºbaiwd(a)cn.fujitsu.com
--------------------------------------------------
16 years, 8 months
crash: cannot gather a stable task list via pid_hash (500 retries)
by Eugene Teo
Hi Dave,
I tried to run crash on Fedora 8's kernel 2.6.24.3-12.fc8 x86_64, and
it has errors that look like the following:
[...]
crash: duplicate task in pid_hash: ffff81012f0811d0
crash: duplicate task in pid_hash: ffff81012f0811d0
crash: duplicate task in pid_hash: ffff81012f0811d0
crash: duplicate task in pid_hash: ffff81012f0811d0
crash: duplicate task in pid_hash: ffff81012f0811d0
crash: cannot gather a stable task list via pid_hash (500 retries)
I ran crash with -d7, and uploaded the log for debugging:
http://hera.kernel.org/~eugeneteo/crash.log
Thanks,
Eugene
16 years, 8 months
Re: ANN: crash extension for networking stuff
by Dave Anderson
Hi Alex,
Nice. Now that's a serious extension!
I'd like to add a reference to the general Python/Crash API as a third main
section to the http://people.redhat.com/anderson/extensions.html page, with
a subsection within it that references xportshow as an example. (And then
in the future any new commands made the same way could be added.)
If you want to tinker with that html page and send me a copy off-line,
please do so -- I just don't want to butcher the explanation. And
if you don't want to do any html hacking (it's a pretty simple page),
just tell me how you'd like to describe/format it, and I'll take it
from there.
Again, really nice work -- thanks,
Dave
16 years, 8 months
ANN: crash extension for networking stuff
by Alex Sidorenko
Prebuilt extension modules released for x86 and x86_64 architectures today.
You can download them from http://sourceforge.net/projects/pykdump/, two
packages of interest are
mpykdump-x86 (0.5.1)
mpykdump-x86_64 (0.5.1)
They can be used immediately on any x86 or x86_64 Linux distribution with
GLIBC 2.3 or later (e.g. RHEL3-5, SLES9-11, Ubuntu etc.)
The extension provides 'xportshow' command with many options. In particular,
it can produce output similar to 'netstat' with different options, print
routing tables, ARP-tables, statistics, interface information and so on.
Most functions work on all kernels in 2.4.21-2.6.24 range. Please see examples
at
http://pykdump.wiki.sourceforge.net/xportshow
The extension is built using PyKdump framework (Python scripting for crash)
but it does not need Python installed. Everything needed to run is present in
a single file that depends only on GLIBC-family libraries.
The source packages on SF site are very old; if you are interested in sources
and/or building your own extensions, you can use the 'testing' branch of SVN.
The build instructions can be found in projects Wiki,
http://pykdump.wiki.sourceforge.net/
The performance of PyKdump is on par with SIAL (PyKdump is much faster for
data manipulation, SIAL is faster for dump structures access). In general I
find SIAL great for simple scripts and PyKdump for bigger projects (e.g.
automated 1st pass dumpanalysis running many tests).
The choice of functionality was driven by practical needs while working on
problems for HP Linux support (x86, x86_64 and ia64); additional functions
will be added as needed.
--
------------------------------------------------------------------
Alexandre Sidorenko email: asid(a)hp.com
Global Solutions Engineering: Unix Networking
Hewlett-Packard (Canada)
------------------------------------------------------------------
16 years, 8 months