----- Original Message -----
ok. I rewrite the patch and they are tested ok in my box.
Thanks
Zhang
My tests weren't so successful this time, and I also have some questions
about the runq -g output.
I tested your latest patches on a sample set of 70 dumpfiles whose
kernels all use CFS runqueues. In 7 of the 70 "runq -g" tests,
the command caused the crash session to fail like so:
crash> runq -g
CPU 0
CURRENT: PID: 0 TASK: ffffffff81c0d020 COMMAND: "swapper/0"
RT PRIO_ARRAY: ffff8802277d4590
[no tasks queued]
CFS RB_ROOT: ffff880227018070
[120] PID: 3 TASK: ffff8802221a2670 COMMAND: "ksoftirqd/0"
CPU 1
CURRENT: PID: 0 TASK: ffff8802221e2670 COMMAND: "swapper/1"
RT PRIO_ARRAY: ffff880426fd4590
[no tasks queued]
CFS RB_ROOT: ffff880227018070
[no tasks queued]
CPU 2
CURRENT: PID: 0 TASK: ffff880222220000 COMMAND: "swapper/2"
RT PRIO_ARRAY: ffff8802279d4590
[no tasks queued]
CFS RB_ROOT: ffff880227018070
[no tasks queued]
CPU 3
CURRENT: PID: 11615 TASK: ffff88020c50a670 COMMAND: "runtest.sh"
RT PRIO_ARRAY: ffff8804271d4590
[no tasks queued]
CFS RB_ROOT: ffff880227018070
GROUP CFS RB_ROOT: ffff88041e0d2790
[no tasks queued]
CPU 4
CURRENT: PID: 0 TASK: ffff880222250000 COMMAND: "swapper/4"
RT PRIO_ARRAY: ffff880227bd4590
[no tasks queued]
CFS RB_ROOT: ffff880227018070
[no tasks queued]
CPU 5
CURRENT: PID: 0 TASK: ffff880222252670 COMMAND: "swapper/5"
RT PRIO_ARRAY: ffff8804273d4590
[no tasks queued]
CFS RB_ROOT: ffff880227018070
[no tasks queued]
CPU 6
CURRENT: PID: 1055 TASK: ffff88020c620000 COMMAND: "rsyslogd"
RT PRIO_ARRAY: ffff880227dd4590
[no tasks queued]
CFS RB_ROOT: ffff880227018070
GROUP CFS RB_ROOT: ffff88021df3a790 <system>
GROUP CFS RB_ROOT: ffff88020d556c78 <rsyslog.service>
[no tasks queued]
CPU 7
CURRENT: PID: 33 TASK: ffff880421d48000 COMMAND: "migration/7"
RT PRIO_ARRAY: ffff8804275d4590
[no tasks queued]
CFS RB_ROOT: ffff880227018070
[120] PID: 10653 TASK: ffff88041ead0000 COMMAND: "kworker/7:2"
*** glibc detected *** /var/CVS/crash-6.1.0/crash: free(): invalid pointer:
0x0000000004e2c210 ***
======= Backtrace: =========
/lib64/libc.so.6[0x33ff87bfee]
/var/CVS/crash-6.1.0/crash(freebuf+0x29a)[0x46f327]
/var/CVS/crash-6.1.0/crash[0x4cae70]
/var/CVS/crash-6.1.0/crash[0x4ce4b2]
/var/CVS/crash-6.1.0/crash(cmd_runq+0xfb)[0x4c9b5b]
/var/CVS/crash-6.1.0/crash(exec_command+0x20a)[0x45db7f]
/var/CVS/crash-6.1.0/crash(main_loop+0x281)[0x45d973]
/var/CVS/crash-6.1.0/crash[0x5ad159]
/var/CVS/crash-6.1.0/crash(catch_errors+0x5e)[0x5ac59e]
/var/CVS/crash-6.1.0/crash[0x5adde6]
/var/CVS/crash-6.1.0/crash(catch_errors+0x5e)[0x5ac59e]
/var/CVS/crash-6.1.0/crash(gdb_main+0x24)[0x5ae0b4]
/var/CVS/crash-6.1.0/crash(gdb_main_entry+0x2e)[0x5ae0ee]
/var/CVS/crash-6.1.0/crash(gdb_main_loop+0x263)[0x4ed248]
/var/CVS/crash-6.1.0/crash(main+0x268d)[0x45d6de]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x33ff821735]
/var/CVS/crash-6.1.0/crash[0x45af5d]
======= Memory map: ========
00400000-00a62000 r-xp 00000000 fd:01 3021564
/var/CVS/crash-6.1.0/crash
00c61000-00c85000 rw-p 00661000 fd:01 3021564
/var/CVS/crash-6.1.0/crash
00c85000-00e66000 rw-p 00000000 00:00 0
017ae000-05b7b000 rw-p 00000000 00:00 0 [heap]
33ff400000-33ff420000 r-xp 00000000 fd:01 1869596
/usr/lib64/ld-2.15.so
33ff61f000-33ff620000 r--p 0001f000 fd:01 1869596
/usr/lib64/ld-2.15.so
33ff620000-33ff621000 rw-p 00020000 fd:01 1869596
/usr/lib64/ld-2.15.so
33ff621000-33ff622000 rw-p 00000000 00:00 0
33ff800000-33ff9ac000 r-xp 00000000 fd:01 1869597
/usr/lib64/libc-2.15.so
33ff9ac000-33ffbac000 ---p 001ac000 fd:01 1869597
/usr/lib64/libc-2.15.so
33ffbac000-33ffbb0000 r--p 001ac000 fd:01 1869597
/usr/lib64/libc-2.15.so
33ffbb0000-33ffbb2000 rw-p 001b0000 fd:01 1869597
/usr/lib64/libc-2.15.so
33ffbb2000-33ffbb7000 rw-p 00000000 00:00 0
3400000000-3400003000 r-xp 00000000 fd:01 1838776
/usr/lib64/libdl-2.15.so
3400003000-3400202000 ---p 00003000 fd:01 1838776
/usr/lib64/libdl-2.15.so
3400202000-3400203000 r--p 00002000 fd:01 1838776
/usr/lib64/libdl-2.15.so
3400203000-3400204000 rw-p 00003000 fd:01 1838776
/usr/lib64/libdl-2.15.so
3400800000-3400817000 r-xp 00000000 fd:01 1869608
/usr/lib64/libz.so.1.2.5
3400817000-3400a16000 ---p 00017000 fd:01 1869608
/usr/lib64/libz.so.1.2.5
3400a16000-3400a17000 rw-p 00016000 fd:01 1869608
/usr/lib64/libz.so.1.2.5
3400c00000-3400cfa000 r-xp 00000000 fd:01 1869598
/usr/lib64/libm-2.15.so
3400cfa000-3400ef9000 ---p 000fa000 fd:01 1869598
/usr/lib64/libm-2.15.so
3400ef9000-3400efa000 r--p 000f9000 fd:01 1869598
/usr/lib64/libm-2.15.so
3400efa000-3400efb000 rw-p 000fa000 fd:01 1869598
/usr/lib64/libm-2.15.so
340f800000-340f825000 r-xp 00000000 fd:01 1869641
/usr/lib64/libtinfo.so.5.9
340f825000-340fa24000 ---p 00025000 fd:01 1869641
/usr/lib64/libtinfo.so.5.9
340fa24000-340fa28000 r--p 00024000 fd:01 1869641
/usr/lib64/libtinfo.so.5.9
340fa28000-340fa29000 rw-p 00028000 fd:01 1869641
/usr/lib64/libtinfo.so.5.9
3414800000-3414823000 r-xp 00000000 fd:01 1869642
/usr/lib64/libncurses.so.5.9
3414823000-3414a22000 ---p 00023000 fd:01 1869642
/usr/lib64/libncurses.so.5.9
3414a22000-3414a23000 r--p 00022000 fd:01 1869642
/usr/lib64/libncurses.so.5.9
3414a23000-3414a24000 rw-p 00023000 fd:01 1869642
/usr/lib64/libncurses.so.5.9
3ef8c00000-3ef8c15000 r-xp 00000000 fd:01 1842976
/usr/lib64/libgcc_s-4.7.2-20120921.so.1
3ef8c15000-3ef8e14000 ---p 00015000 fd:01 1842976
/usr/lib64/libgcc_s-4.7.2-20120921.so.1
3ef8e14000-3ef8e15000 rw-p 00014000 fd:01 1842976
/usr/lib64/libgcc_s-4.7.2-20120921.so.1
7fa098c20000-7fa099ce4000 r--p 0705e000 fd:01 3014793
/var/tmp/vmlinux.gz_pnPD6v (deleted)
7fa099ce4000-7fa09a9a5000 rw-p 00000000 00:00 0
7fa09b337000-7fa09b50c000 r--p 06e8a000 fd:01 3014793
/var/tmp/vmlinux.gz_pnPD6v (deleted)
7fa09b50c000-7fa09b9fe000 r--p 06814000 fd:01 3014793
/var/tmp/vmlinux.gz_pnPD6v (deleted)
7fa09b9fe000-7fa09bd46000 r--p 08121000 fd:01 3014793
/var/tmp/vmlinux.gz_pnPD6v (deleted)
7fa09bd46000-7fa09bf93000 r--p 065c8000 fd:01 3014793
/var/tmp/vmlinux.gz_pnPD6v (deleted)
7fa09bf93000-7fa0a100c000 r--p 01550000 fd:01 3014793
/var/tmp/vmlinux.gz_pnPD6v (deleted)
7fa0a100c000-7fa0a1298000 r--p 08468000 fd:01 3014793
/var/tmp/vmlinux.gz_pnPD6v (deleted)
7fa0a1298000-7fa0a25d4000 rw-p 00000000 00:00 0
7fa0a26f8000-7fa0a29d1000 rw-p 00000000 00:00 0
7fa0a2a63000-7fa0a8e90000 r--p 00000000 fd:01 1869461
/usr/lib/locale/locale-archive
7fa0a8e90000-7fa0a90af000 rw-p 00000000 00:00 0
7fa0a90bd000-7fa0a90be000 rw-p 00000000 00:00 0
7fa0a90be000-7fa0a90c5000 r--s 00000000 fd:01 1968224
/usr/lib64/gconv/gconv-modules.cache
7fa0a90c5000-7fa0a90cb000 rw-p 00000000 00:00 0
7fff15db3000-7fff15dd4000 rw-p 00000000 00:00 0 [stack]
7fff15dff000-7fff15e00000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted (core dumped)
$
Here's the gdb trace of the core above:
$ gdb ./crash core.19908
GNU gdb (GDB) Fedora (7.4.50.20120120-52.fc17)
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<
http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /var/CVS/crash-6.1.0/crash...done.
[New LWP 19908]
Core was generated by `./crash vmcore vmlinux.gz'.
Program terminated with signal 6, Aborted.
#0 0x00000033ff835925 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.15-57.fc17.x86_64
libgcc-4.7.2-2.fc17.x86_64 ncurses-libs-5.9-4.20120204.fc17.x86_64
zlib-1.2.5-7.fc17.x86_64
(gdb) bt
#0 0x00000033ff835925 in raise () from /lib64/libc.so.6
#1 0x00000033ff8370d8 in abort () from /lib64/libc.so.6
#2 0x00000033ff874e6b in __libc_message () from /lib64/libc.so.6
#3 0x00000033ff87bfee in _int_free () from /lib64/libc.so.6
#4 0x000000000046f327 in freebuf (addr=0x3f234a0
"systemd-stdout-syslog-bridge.service") at tools.c:4710
#5 0x00000000004cae70 in free_task_group_info_array () at task.c:7578
#6 0x00000000004ce4b2 in dump_tasks_by_task_group () at task.c:8462
#7 0x00000000004c9b5b in cmd_runq () at task.c:7158
#8 0x000000000045db7f in exec_command () at main.c:764
#9 0x000000000045d973 in main_loop () at main.c:712
#10 0x00000000005ad159 in captured_command_loop (data=data@entry=0x0) at ./main.c:228
#11 0x00000000005ac59e in catch_errors (func=func@entry=0x5ad150
<captured_command_loop>,
func_args=func_args@entry=0x0, errstring=errstring@entry=0x8caaeb "",
mask=mask@entry=6) at exceptions.c:531
#12 0x00000000005adde6 in captured_main (data=data@entry=0x7fff3c694000) at ./main.c:958
#13 0x00000000005ac59e in catch_errors (func=func@entry=0x5ad230 <captured_main>,
func_args=func_args@entry=0x7fff3c694000, errstring=errstring@entry=0x8caaeb
"", mask=mask@entry=6)
at exceptions.c:531
#14 0x00000000005ae0b4 in gdb_main (args=args@entry=0x7fff3c694000) at ./main.c:973
#15 0x00000000005ae0ee in gdb_main_entry (argc=<optimized out>, argv=<optimized
out>) at ./main.c:993
#16 0x00000000004ed248 in gdb_main_loop (argc=3, argv=0x7fff3c694188) at
gdb_interface.c:76
#17 0x000000000045d6de in main (argc=3, argv=0x7fff3c694188) at main.c:617
(gdb) q
In a quick debugging session of your free_task_group_info_array()
I printed out the addresses being FREEBUF()'d, and I noted that
there were numerous instances of the same address being free twice:
static void
free_task_group_info_array(void)
{
int i;
for (i = 0; i < tgi_p; i++) {
if (tgi_array[i]->name)
FREEBUF(tgi_array[i]->name);
FREEBUF(tgi_array[i]);
}
tgi_p = 0;
FREEBUF(tgi_array);
}
I put one of the failing vmlinux/vmcore pairs here for you
to debug:
http://people.redhat.com/anderson/zhangyanfei
Secondly, another question I have is the meaning of the command's output.
First, consider this "runq" output:
crash> runq
CPU 0 RUNQUEUE: ffff8800090436c0
CURRENT: PID: 588 TASK: ffff88007e4877a0 COMMAND: "udevd"
RT PRIO_ARRAY: ffff8800090437c8
[no tasks queued]
CFS RB_ROOT: ffff880009043740
[118] PID: 2110 TASK: ffff88007d470860 COMMAND: "check-cdrom.sh"
[118] PID: 2109 TASK: ffff88007f1247a0 COMMAND: "check-cdrom.sh"
[118] PID: 2114 TASK: ffff88007f20e080 COMMAND: "udevd"
CPU 1 RUNQUEUE: ffff88000905b6c0
CURRENT: PID: 2113 TASK: ffff88007e8ac140 COMMAND: "udevd"
RT PRIO_ARRAY: ffff88000905b7c8
[no tasks queued]
CFS RB_ROOT: ffff88000905b740
[118] PID: 2092 TASK: ffff88007d7a4760 COMMAND: "MAKEDEV"
[118] PID: 1983 TASK: ffff88007e59f140 COMMAND: "udevd"
[118] PID: 2064 TASK: ffff88007e40f7a0 COMMAND: "udevd"
[115] PID: 2111 TASK: ffff88007e4278a0 COMMAND: "kthreadd"
crash>
In the above case, the per-cpu "rq" structure addresses are shown as:
CPU 0 RUNQUEUE: ffff8800090436c0
CPU 1 RUNQUEUE: ffff88000905b6c0
And embedded in each of the rq structures above are these two rb_root
structures:
CFS RB_ROOT: ffff880009043740 (embedded in rq @ffff8800090436c0)
CFS RB_ROOT: ffff88000905b740 (embedded in rq @ffff88000905b6c0)
And starting at those rb_root structures, the tree of tasks are dumped.
Now, your "runq -q" option doesn't show any "starting point"
structure
address, but rather they just show "CPU 0" and "CPU 1":
crash> runq -g
CPU 0
CURRENT: PID: 588 TASK: ffff88007e4877a0 COMMAND: "udevd"
RT PRIO_ARRAY: ffff8800090437c8
[no tasks queued]
CFS RB_ROOT: ffff880009093548
[118] PID: 2110 TASK: ffff88007d470860 COMMAND: "check-cdrom.sh"
[118] PID: 2109 TASK: ffff88007f1247a0 COMMAND: "check-cdrom.sh"
[118] PID: 2114 TASK: ffff88007f20e080 COMMAND: "udevd"
CPU 1
CURRENT: PID: 2113 TASK: ffff88007e8ac140 COMMAND: "udevd"
RT PRIO_ARRAY: ffff88000905b7c8
[no tasks queued]
CFS RB_ROOT: ffff880009093548
[118] PID: 2092 TASK: ffff88007d7a4760 COMMAND: "MAKEDEV"
[118] PID: 1983 TASK: ffff88007e59f140 COMMAND: "udevd"
[118] PID: 2064 TASK: ffff88007e40f7a0 COMMAND: "udevd"
[115] PID: 2111 TASK: ffff88007e4278a0 COMMAND: "kthreadd"
crash>
I would think that there might be a useful address of a per-cpu
structure that could be shown there as well?
And secondly, I'm confused as to why the "CFS RB_ROOT" address for
all cpus is the same address -- for example, above they are both at
ffff880009093548. How can the two rb trees have the same rb_root?
Dave