----- Original Message -----
> On Thu, 2019-04-18 at 14:05 -0400, Dave Anderson wrote:
> >
> > ----- Original Message -----
> > > On Thu, 2019-04-18 at 11:09 -0400, Dave Anderson wrote:
> > > >
> > > > ----- Original Message -----
> > > > > On Thu, 2019-04-18 at 15:02 +0100, Pierguido Lambri wrote:
> > > > > > Hello,
> > > > > >
> > > > > > Today while I was looking into a vmcore, I got suddenly
> > > > > > the
> > > > > > message
> > > > > > in $SUBJECT.
> > > > > > It started after I did a search into the process stack
> > > > > > pages
> > > > > > (search
> > > > > > -t)
> > > > > > and for each command I run afterwards I kept getting that
> > > > > > message.
> > > > > > For example:
> > > > > >
> > > > > > $ retrace-server-interact 603967269 crash
> > > > > > ...
> > > > > > crash> search -t ffff88040a0d5280
> > > > > >
> > > > > > search: invalid list entry: 0
> > > > > >
> > > > > > search: invalid list entry: 0
> > > > > >
> > > > > > search: invalid list entry: 0
> > > > > > PID: 606 TASK: ffff88082d226eb0 CPU: 5 COMMAND:
> > > > > > "xfsaild/dm-0"
> > > > > > ffff88083ff5b948: ffff88040a0d5280
> > > > > > ffff88083ff5b990: ffff88040a0d5280
> > > > > > ffff88083ff5baa8: ffff88040a0d5280
> > > > > > ffff88083ff5baf0: ffff88040a0d5280
> > > > > > ffff88083ff5bcf0: ffff88040a0d5280
> > > > > > ffff88083ff5bd38: ffff88040a0d5280
> > > > > > ffff88083ff5bd98: ffff88040a0d5280
> > > > > >
> > > > > >
> > > > > > WARNING: malloc/free mismatch (29/32)
> > > > > >
> > > > > > crash> ps -m | grep UN
> > > > > > [ 0 00:00:00.146] [UN] PID: 1811 TASK:
> > > > > > ffff880c17bd1fa0 CPU:
> > > > > > 1 COMMAND: "cp"
> > > > > > WARNING: malloc/free mismatch (29/32)
> > > > > >
> > > > > > I guess this comes from a possible corrupted vmcore (I
> > > > > > just
> > > > > > got
> > > > > > it
> > > > > > from this vmcore),
> > > > > > but I wonder why every new command keeps returning the
> > > > > > same
> > > > > > message.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Pier
> > > > > >
> > > > > > --
> > > > > > Crash-utility mailing list
> > > > > > Crash-utility(a)redhat.com
> > > > > >
https://www.redhat.com/mailman/listinfo/crash-utility
> > > > >
> > > > > FWIW, I just pulled this up after plambri pinged me. This
> > > > > is
> > > > > the
> > > > > backtrace that is being hit though I've not dug in more:
> > > > >
> > > > > Breakpoint 3, do_list (ld=0x7ffffffea6c0) at tools.c:3820
> > > > > 3820 error(INFO,
> > > > > "\ninvalid
> > > > > list
> > > > > entry:
> > > > > 0\n");
> > > > > (gdb) list
> > > > > 3815 return -1;
> > > > > 3816 }
> > > > > 3817
> > > > > 3818 if (next == 0) {
> > > > > 3819 if (ld->flags &
> > > > > LIST_HEAD_FORMAT) {
> > > > > 3820 error(INFO,
> > > > > "\ninvalid
> > > > > list
> > > > > entry:
> > > > > 0\n");
> > > > > 3821 if
> > > > > (close_hq_on_return)
> > > > > 3822 hq_close();
> > > > > 3823 return -1;
> > > > > 3824 }
> > > > > (gdb) bt
> > > > > #0 do_list (ld=0x7ffffffea6c0) at tools.c:3820
> > > > > #1 0x000000000047ec82 in dump_vmap_area
> > > > > (vi=0x7ffffffed0d0) at
> > > > > memory.c:8724
> > > > > #2 dump_vmlist (vi=0x7ffffffed0d0) at memory.c:8590
> > > > > #3 0x000000000047f3eb in last_vmalloc_address () at
> > > > > memory.c:16792
> > > > > #4 0x0000000000515e6b in x86_64_get_kvaddr_ranges
> > > > > (vrp=0x7fffffffd340) at x86_64.c:8706
> > > > > #5 0x000000000049c6ae in cmd_search () at memory.c:13988
> > > > > #6 0x0000000000465f9c in exec_command () at main.c:879
> > > > > #7 0x00000000004661ca in main_loop () at main.c:826
> > > > > #8 0x00000000006b21a3 in captured_command_loop
> > > > > (data=<value
> > > > > optimized out>) t main.c:258
> > > > > #9 0x00000000006b0a8b in catch_errors (func=0x6b2190
> > > > > <captured_command_loop>, func_args=0x0,
errstring=0x90c106
> > > > > "",
> > > > > mask=6) at exceptions.c:557
> > > > > #10 0x00000000006b3076 in captured_main (data=<value
> > > > > optimized
> > > > > out>) at main.c:1064
> > > > > #11 0x00000000006b0a8b in catch_errors (func=0x6b22b0
> > > > > <captured_main>, func_args=0x7fffffffe2e0,
> > > > > errstring=0x90c106
> > > > > "",
> > > > > mask=6) at exceptions.c:557
> > > > > #12 0x00000000006b1fa4 in gdb_main (args=<value optimized
> > > > > out>)
> > > > > at
> > > > > main.c:1079
> > > > > #13 0x00000000006b1fde in gdb_main_entry (argc=<value
> > > > > optimized
> > > > > out>, argv=<value optimized out>) at main.c:1099
> > > > > #14 0x0000000000467030 in main (argc=3,
> > > > > argv=0x7fffffffe458) at
> > > > > main.c:707
> > > >
> > > > Hmmm, the vmap_area list is a list_head type list, so there
> > > > should
> > > > never be
> > > > a NULL "next" pointer.
> > > >
> > > > I'm guessing that "kmem -v" also fails? The last
vmap_area
> > > > entry
> > > > should point back to
> > > > the global "vmap_area_list" list header, for example:
> > > >
> > > > crash> kmem -v | tail
> > > > ffff96e7ecaaca80 ffff96e54c89c400 ffffffffc0e54000 -
> > > > ffffffffc0e5a000 24576
> > > > ffff96e757ffe380 ffff96e4be98f3c0 ffffffffc0e5d000 -
> > > > ffffffffc0e6d000 65536
> > > > ffff96e467b33400 ffff96e6a3ae1a00 ffffffffc0e6d000 -
> > > > ffffffffc0e73000 24576
> > > > ffff96e85cf4e600 ffff96e752c52b40 ffffffffc0e77000 -
> > > > ffffffffc0e7c000 20480
> > > > ffff96e85cf4e380 ffff96e5506c6c00 ffffffffc0e7c000 -
> > > > ffffffffc0e81000 20480
> > > > ffff96e802baa500 ffff96e5506c69c0 ffffffffc0e81000 -
> > > > ffffffffc0e86000 20480
> > > > ffff96e802baac00 ffff96e5506c6cc0 ffffffffc0e86000 -
> > > > ffffffffc0e8c000 24576
> > > > ffff96e574196f80 ffff96e55ffd6c80 ffffffffc0e90000 -
> > > > ffffffffc0e95000 20480
> > > > ffff96e574196680 ffff96e55ffd6880 ffffffffc0e95000 -
> > > > ffffffffc0e9a000 20480
> > > > ffff96e87c222800 ffff96e5496ca680 ffffffffc0e9a000 -
> > > > ffffffffc0ea4000 40960
> > > > crash> vmap_area ffff96e87c222800
> > > > struct vmap_area {
> > > > va_start = 18446744072651120640,
> > > > va_end = 18446744072651161600,
> > > > flags = 4,
> > > > rb_node = {
> > > > __rb_parent_color = 18446628510972342169,
> > > > rb_right = 0x0,
> > > > rb_left = 0xffff96e574196698
> > > > },
> > > > list = {
> > > > next = 0xffffffffae69af90,
> > > > prev = 0xffff96e5741966b0
> > > > },
> > > > purge_list = {
> > > > next = 0x0,
> > > > prev = 0xdead000000000200
> > > > },
> > > > vm = 0xffff96e5496ca680,
> > > > callback_head = {
> > > > next = 0x0,
> > > > func = 0xffff96e71d51aa00
> > > > }
> > > > }
> > > > crash> sym 0xffffffffae69af90
> > > > ffffffffae69af90 (D) vmap_area_list
> > > > crash>
> > > >
> > > > Dave
> > > >
> > > >
> > > >
> > >
> > > Yeah kmem -v fails as well:
> > > crash> kmem -v
> > >
> > > kmem: invalid list entry: 0
> > > WARNING: malloc/free mismatch (29/30)
> > > crash>
> > >
> > >
> > > There's no indicating of an error when crash loads though -
> > > only
> > > after
> > > running these commands. Do you think this a damaged vmcore
> > > that is
> > > not
> > > obvious?
> >
> > I don't know it's damaged or if it's a symptom of the kernel
> > crash. Is the
> > kernel crash happening while the vmlist is being modified?
> >
>
> No there are no active processes modifying the vmap_area_list
>
> It is crash due to memory corruption and there are 3rd party
> modules.
> But the crash is inside xfs and does not appear in any way related
> to
> this nor are the 3rd party modules in any backtraces.
>
>
> > It's not obvious because it would only be seen when dump_vmlist()
> > is
> > called.
> > When dump_vmlist() calls dump_vmap_area(), and do_list() returns
> > -1
> > back to
> > dump_vmap_area(), it gets used as a loop-ending index, and then
> > causes
> > presumably bogus values to get returned:
> >
> > do_vmap_area(struct meminfo *vi)
> > {
> > ...
> >
> > ld->end = symbol_value("vmap_area_list");
> > cnt = do_list(ld);
> >
> > for (i = 0; i < cnt; i++) {
> > ...
> >
> > if (vi->flags & GET_HIGHEST)
> > vi->retval = start+size;
> >
> > if (vi->flags & GET_VMLIST_COUNT)
> > vi->retval = count;
> >
> > if (vi->flags & VMLIST_VERIFY)
> > vi->retval = verified;
> > }
> >
> > Mabye dump_vmap_area() should do a error(FATAL, ...) if cnt is
> > -1? Although,
> > that would kill all search command attempts. It's hard to say.
> >
> > Dave
> >
> >
>
> I would maybe consider something like this which seems to fix the
> persistent malloc/free mismatch errors and give some results?
>
> crash> kmem -v
>
> kmem: invalid list entry: 0
> crash>
> crash> kmem -v
>
> kmem: invalid list entry: 0
> crash> search 0xdeadbeef
>
> search: invalid list entry: 0
>
> search: invalid list entry: 0
>
> search: invalid list entry: 0
> ffff88078d2cda80: deadbeef
> crash>
> crash> kmem -v
>
> kmem: invalid list entry: 0
> crash> quit
>
>
> $ git diff memory.c
> diff --git a/memory.c b/memory.c
> index 8cdab06..7161d9d 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -8722,6 +8722,11 @@ dump_vmap_area(struct meminfo *vi)
> ld->list_head_offset = OFFSET(vmap_area_list);
> ld->end = symbol_value("vmap_area_list");
> cnt = do_list(ld);
> + if (cnt < 0) {
> + vi->retval = 0;
> + FREEBUF(vmap_area_buf);
> + return;
> + }
>
> for (i = 0; i < cnt; i++) {
> if (!(pc->curcmd_flags & HEADER_PRINTED) && (i ==
> 0) &&
>
> --
I was wondering how the search command would handle its call to
machdep->get_kvaddr_ranges()
with the patch above -- which would return 0 as the vmalloc address
range's "end" address.
But given your output above, apparently it seems to work around it.
Thanks,
Dave
As far as I could tell, the code properly checks for a non-zero
meminfo.retval before proceeding in all instances.