On Thu, 2019-04-18 at 11:09 -0400, Dave Anderson wrote:
>
> ----- Original Message -----
> > On Thu, 2019-04-18 at 15:02 +0100, Pierguido Lambri wrote:
> > > Hello,
> > >
> > > Today while I was looking into a vmcore, I got suddenly the
> > > message
> > > in $SUBJECT.
> > > It started after I did a search into the process stack pages
> > > (search
> > > -t)
> > > and for each command I run afterwards I kept getting that
> > > message.
> > > For example:
> > >
> > > $ retrace-server-interact 603967269 crash
> > > ...
> > > crash> search -t ffff88040a0d5280
> > >
> > > search: invalid list entry: 0
> > >
> > > search: invalid list entry: 0
> > >
> > > search: invalid list entry: 0
> > > PID: 606 TASK: ffff88082d226eb0 CPU: 5 COMMAND:
> > > "xfsaild/dm-0"
> > > ffff88083ff5b948: ffff88040a0d5280
> > > ffff88083ff5b990: ffff88040a0d5280
> > > ffff88083ff5baa8: ffff88040a0d5280
> > > ffff88083ff5baf0: ffff88040a0d5280
> > > ffff88083ff5bcf0: ffff88040a0d5280
> > > ffff88083ff5bd38: ffff88040a0d5280
> > > ffff88083ff5bd98: ffff88040a0d5280
> > >
> > >
> > > WARNING: malloc/free mismatch (29/32)
> > >
> > > crash> ps -m | grep UN
> > > [ 0 00:00:00.146] [UN] PID: 1811 TASK: ffff880c17bd1fa0 CPU:
> > > 1 COMMAND: "cp"
> > > WARNING: malloc/free mismatch (29/32)
> > >
> > > I guess this comes from a possible corrupted vmcore (I just got
> > > it
> > > from this vmcore),
> > > but I wonder why every new command keeps returning the same
> > > message.
> > >
> > > Thanks,
> > >
> > > Pier
> > >
> > > --
> > > Crash-utility mailing list
> > > Crash-utility(a)redhat.com
> > >
https://www.redhat.com/mailman/listinfo/crash-utility
> >
> > FWIW, I just pulled this up after plambri pinged me. This is the
> > backtrace that is being hit though I've not dug in more:
> >
> > Breakpoint 3, do_list (ld=0x7ffffffea6c0) at tools.c:3820
> > 3820 error(INFO, "\ninvalid list
> > entry:
> > 0\n");
> > (gdb) list
> > 3815 return -1;
> > 3816 }
> > 3817
> > 3818 if (next == 0) {
> > 3819 if (ld->flags & LIST_HEAD_FORMAT) {
> > 3820 error(INFO, "\ninvalid list
> > entry:
> > 0\n");
> > 3821 if (close_hq_on_return)
> > 3822 hq_close();
> > 3823 return -1;
> > 3824 }
> > (gdb) bt
> > #0 do_list (ld=0x7ffffffea6c0) at tools.c:3820
> > #1 0x000000000047ec82 in dump_vmap_area (vi=0x7ffffffed0d0) at
> > memory.c:8724
> > #2 dump_vmlist (vi=0x7ffffffed0d0) at memory.c:8590
> > #3 0x000000000047f3eb in last_vmalloc_address () at memory.c:16792
> > #4 0x0000000000515e6b in x86_64_get_kvaddr_ranges
> > (vrp=0x7fffffffd340) at x86_64.c:8706
> > #5 0x000000000049c6ae in cmd_search () at memory.c:13988
> > #6 0x0000000000465f9c in exec_command () at main.c:879
> > #7 0x00000000004661ca in main_loop () at main.c:826
> > #8 0x00000000006b21a3 in captured_command_loop (data=<value
> > optimized out>) t main.c:258
> > #9 0x00000000006b0a8b in catch_errors (func=0x6b2190
> > <captured_command_loop>, func_args=0x0, errstring=0x90c106 "",
> > mask=6) at exceptions.c:557
> > #10 0x00000000006b3076 in captured_main (data=<value optimized
> > out>) at main.c:1064
> > #11 0x00000000006b0a8b in catch_errors (func=0x6b22b0
> > <captured_main>, func_args=0x7fffffffe2e0, errstring=0x90c106
"",
> > mask=6) at exceptions.c:557
> > #12 0x00000000006b1fa4 in gdb_main (args=<value optimized out>) at
> > main.c:1079
> > #13 0x00000000006b1fde in gdb_main_entry (argc=<value optimized
> > out>, argv=<value optimized out>) at main.c:1099
> > #14 0x0000000000467030 in main (argc=3, argv=0x7fffffffe458) at
> > main.c:707
>
> Hmmm, the vmap_area list is a list_head type list, so there should
> never be
> a NULL "next" pointer.
>
> I'm guessing that "kmem -v" also fails? The last vmap_area entry
> should point back to
> the global "vmap_area_list" list header, for example:
>
> crash> kmem -v | tail
> ffff96e7ecaaca80 ffff96e54c89c400 ffffffffc0e54000 -
> ffffffffc0e5a000 24576
> ffff96e757ffe380 ffff96e4be98f3c0 ffffffffc0e5d000 -
> ffffffffc0e6d000 65536
> ffff96e467b33400 ffff96e6a3ae1a00 ffffffffc0e6d000 -
> ffffffffc0e73000 24576
> ffff96e85cf4e600 ffff96e752c52b40 ffffffffc0e77000 -
> ffffffffc0e7c000 20480
> ffff96e85cf4e380 ffff96e5506c6c00 ffffffffc0e7c000 -
> ffffffffc0e81000 20480
> ffff96e802baa500 ffff96e5506c69c0 ffffffffc0e81000 -
> ffffffffc0e86000 20480
> ffff96e802baac00 ffff96e5506c6cc0 ffffffffc0e86000 -
> ffffffffc0e8c000 24576
> ffff96e574196f80 ffff96e55ffd6c80 ffffffffc0e90000 -
> ffffffffc0e95000 20480
> ffff96e574196680 ffff96e55ffd6880 ffffffffc0e95000 -
> ffffffffc0e9a000 20480
> ffff96e87c222800 ffff96e5496ca680 ffffffffc0e9a000 -
> ffffffffc0ea4000 40960
> crash> vmap_area ffff96e87c222800
> struct vmap_area {
> va_start = 18446744072651120640,
> va_end = 18446744072651161600,
> flags = 4,
> rb_node = {
> __rb_parent_color = 18446628510972342169,
> rb_right = 0x0,
> rb_left = 0xffff96e574196698
> },
> list = {
> next = 0xffffffffae69af90,
> prev = 0xffff96e5741966b0
> },
> purge_list = {
> next = 0x0,
> prev = 0xdead000000000200
> },
> vm = 0xffff96e5496ca680,
> callback_head = {
> next = 0x0,
> func = 0xffff96e71d51aa00
> }
> }
> crash> sym 0xffffffffae69af90
> ffffffffae69af90 (D) vmap_area_list
> crash>
>
> Dave
>
>
>
Yeah kmem -v fails as well:
crash> kmem -v
kmem: invalid list entry: 0
WARNING: malloc/free mismatch (29/30)
crash>
There's no indicating of an error when crash loads though - only after
running these commands. Do you think this a damaged vmcore that is not
obvious?
I don't know it's damaged or if it's a symptom of the kernel crash. Is the
kernel crash happening while the vmlist is being modified?
It's not obvious because it would only be seen when dump_vmlist() is called.
When dump_vmlist() calls dump_vmap_area(), and do_list() returns -1 back to
dump_vmap_area(), it gets used as a loop-ending index, and then causes
presumably bogus values to get returned:
do_vmap_area(struct meminfo *vi)
{
...
ld->end = symbol_value("vmap_area_list");
cnt = do_list(ld);
for (i = 0; i < cnt; i++) {
...
if (vi->flags & GET_HIGHEST)
vi->retval = start+size;
if (vi->flags & GET_VMLIST_COUNT)
vi->retval = count;
if (vi->flags & VMLIST_VERIFY)
vi->retval = verified;
}
Mabye dump_vmap_area() should do a error(FATAL, ...) if cnt is -1? Although,
that would kill all search command attempts. It's hard to say.
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility