----- Original Message -----
 > On Thu, 2019-04-18 at 14:05 -0400, Dave Anderson wrote:
 > > 
 > > ----- Original Message -----
 > > > On Thu, 2019-04-18 at 11:09 -0400, Dave Anderson wrote:
 > > > > 
 > > > > ----- Original Message -----
 > > > > > On Thu, 2019-04-18 at 15:02 +0100, Pierguido Lambri wrote:
 > > > > > > Hello,
 > > > > > > 
 > > > > > > Today while I was looking into a vmcore, I got suddenly
 > > > > > > the
 > > > > > > message
 > > > > > > in $SUBJECT.
 > > > > > > It started after I did a search into the process stack
 > > > > > > pages
 > > > > > > (search
 > > > > > > -t)
 > > > > > > and for each command I run afterwards I kept getting that
 > > > > > > message.
 > > > > > > For example:
 > > > > > > 
 > > > > > > $ retrace-server-interact 603967269 crash
 > > > > > > ...
 > > > > > > crash> search -t ffff88040a0d5280
 > > > > > > 
 > > > > > > search: invalid list entry: 0
 > > > > > > 
 > > > > > > search: invalid list entry: 0
 > > > > > > 
 > > > > > > search: invalid list entry: 0
 > > > > > > PID: 606    TASK: ffff88082d226eb0  CPU: 5   COMMAND:
 > > > > > > "xfsaild/dm-0"
 > > > > > > ffff88083ff5b948: ffff88040a0d5280
 > > > > > > ffff88083ff5b990: ffff88040a0d5280
 > > > > > > ffff88083ff5baa8: ffff88040a0d5280
 > > > > > > ffff88083ff5baf0: ffff88040a0d5280
 > > > > > > ffff88083ff5bcf0: ffff88040a0d5280
 > > > > > > ffff88083ff5bd38: ffff88040a0d5280
 > > > > > > ffff88083ff5bd98: ffff88040a0d5280
 > > > > > > 
 > > > > > > 
 > > > > > > WARNING: malloc/free mismatch (29/32)
 > > > > > > 
 > > > > > > crash> ps -m | grep UN
 > > > > > > [ 0 00:00:00.146] [UN]  PID: 1811   TASK:
 > > > > > > ffff880c17bd1fa0  CPU:
 > > > > > > 1   COMMAND: "cp"
 > > > > > > WARNING: malloc/free mismatch (29/32)
 > > > > > > 
 > > > > > > I guess this comes from a possible corrupted vmcore (I
 > > > > > > just
 > > > > > > got
 > > > > > > it
 > > > > > > from this vmcore),
 > > > > > > but I wonder why every new command keeps returning the
 > > > > > > same
 > > > > > > message.
 > > > > > > 
 > > > > > > Thanks,
 > > > > > > 
 > > > > > > Pier
 > > > > > > 
 > > > > > > --
 > > > > > > Crash-utility mailing list
 > > > > > > Crash-utility(a)redhat.com
 > > > > > > 
https://www.redhat.com/mailman/listinfo/crash-utility
 > > > > > 
 > > > > > FWIW, I just pulled this up after plambri pinged me.  This
 > > > > > is
 > > > > > the
 > > > > > backtrace that is being hit though I've not dug in more:
 > > > > > 
 > > > > > Breakpoint 3, do_list (ld=0x7ffffffea6c0) at tools.c:3820
 > > > > > 3820                                    error(INFO,
 > > > > > "\ninvalid
 > > > > > list
 > > > > > entry:
 > > > > > 0\n");
 > > > > > (gdb) list
 > > > > > 3815                            return -1;
 > > > > > 3816                    }
 > > > > > 3817
 > > > > > 3818                    if (next == 0) {
 > > > > > 3819                            if (ld->flags &
 > > > > > LIST_HEAD_FORMAT) {
 > > > > > 3820                                    error(INFO,
 > > > > > "\ninvalid
 > > > > > list
 > > > > > entry:
 > > > > > 0\n");
 > > > > > 3821                                    if
 > > > > > (close_hq_on_return)
 > > > > > 3822                                            hq_close();
 > > > > > 3823                                    return -1;
 > > > > > 3824                            }
 > > > > > (gdb) bt
 > > > > > #0  do_list (ld=0x7ffffffea6c0) at tools.c:3820
 > > > > > #1  0x000000000047ec82 in dump_vmap_area
 > > > > > (vi=0x7ffffffed0d0) at
 > > > > > memory.c:8724
 > > > > > #2  dump_vmlist (vi=0x7ffffffed0d0) at memory.c:8590
 > > > > > #3  0x000000000047f3eb in last_vmalloc_address () at
 > > > > > memory.c:16792
 > > > > > #4  0x0000000000515e6b in x86_64_get_kvaddr_ranges
 > > > > > (vrp=0x7fffffffd340) at x86_64.c:8706
 > > > > > #5  0x000000000049c6ae in cmd_search () at memory.c:13988
 > > > > > #6  0x0000000000465f9c in exec_command () at main.c:879
 > > > > > #7  0x00000000004661ca in main_loop () at main.c:826
 > > > > > #8  0x00000000006b21a3 in captured_command_loop
 > > > > > (data=<value
 > > > > > optimized out>) t main.c:258
 > > > > > #9  0x00000000006b0a8b in catch_errors (func=0x6b2190
 > > > > > <captured_command_loop>, func_args=0x0,
errstring=0x90c106
 > > > > > "",
 > > > > > mask=6) at exceptions.c:557
 > > > > > #10 0x00000000006b3076 in captured_main (data=<value
 > > > > > optimized
 > > > > > out>) at main.c:1064
 > > > > > #11 0x00000000006b0a8b in catch_errors (func=0x6b22b0
 > > > > > <captured_main>, func_args=0x7fffffffe2e0,
 > > > > > errstring=0x90c106
 > > > > > "",
 > > > > > mask=6) at exceptions.c:557
 > > > > > #12 0x00000000006b1fa4 in gdb_main (args=<value optimized
 > > > > > out>)
 > > > > > at
 > > > > > main.c:1079
 > > > > > #13 0x00000000006b1fde in gdb_main_entry (argc=<value
 > > > > > optimized
 > > > > > out>, argv=<value optimized out>) at main.c:1099
 > > > > > #14 0x0000000000467030 in main (argc=3,
 > > > > > argv=0x7fffffffe458) at
 > > > > > main.c:707
 > > > > 
 > > > > Hmmm, the vmap_area list is a list_head type list, so there
 > > > > should
 > > > > never be
 > > > > a NULL "next" pointer.
 > > > > 
 > > > > I'm guessing that "kmem -v" also fails?  The last
vmap_area
 > > > > entry
 > > > > should point back to
 > > > > the global "vmap_area_list" list header, for example:
 > > > > 
 > > > >   crash> kmem -v | tail
 > > > >   ffff96e7ecaaca80  ffff96e54c89c400  ffffffffc0e54000 -
 > > > > ffffffffc0e5a000    24576
 > > > >   ffff96e757ffe380  ffff96e4be98f3c0  ffffffffc0e5d000 -
 > > > > ffffffffc0e6d000    65536
 > > > >   ffff96e467b33400  ffff96e6a3ae1a00  ffffffffc0e6d000 -
 > > > > ffffffffc0e73000    24576
 > > > >   ffff96e85cf4e600  ffff96e752c52b40  ffffffffc0e77000 -
 > > > > ffffffffc0e7c000    20480
 > > > >   ffff96e85cf4e380  ffff96e5506c6c00  ffffffffc0e7c000 -
 > > > > ffffffffc0e81000    20480
 > > > >   ffff96e802baa500  ffff96e5506c69c0  ffffffffc0e81000 -
 > > > > ffffffffc0e86000    20480
 > > > >   ffff96e802baac00  ffff96e5506c6cc0  ffffffffc0e86000 -
 > > > > ffffffffc0e8c000    24576
 > > > >   ffff96e574196f80  ffff96e55ffd6c80  ffffffffc0e90000 -
 > > > > ffffffffc0e95000    20480
 > > > >   ffff96e574196680  ffff96e55ffd6880  ffffffffc0e95000 -
 > > > > ffffffffc0e9a000    20480
 > > > >   ffff96e87c222800  ffff96e5496ca680  ffffffffc0e9a000 -
 > > > > ffffffffc0ea4000    40960
 > > > >   crash> vmap_area ffff96e87c222800
 > > > >   struct vmap_area {
 > > > >     va_start = 18446744072651120640,
 > > > >     va_end = 18446744072651161600,
 > > > >     flags = 4,
 > > > >     rb_node = {
 > > > >       __rb_parent_color = 18446628510972342169,
 > > > >       rb_right = 0x0,
 > > > >       rb_left = 0xffff96e574196698
 > > > >     },
 > > > >     list = {
 > > > >       next = 0xffffffffae69af90,
 > > > >       prev = 0xffff96e5741966b0
 > > > >     },
 > > > >     purge_list = {
 > > > >       next = 0x0,
 > > > >       prev = 0xdead000000000200
 > > > >     },
 > > > >     vm = 0xffff96e5496ca680,
 > > > >     callback_head = {
 > > > >       next = 0x0,
 > > > >       func = 0xffff96e71d51aa00
 > > > >     }
 > > > >   }
 > > > >   crash> sym 0xffffffffae69af90
 > > > >   ffffffffae69af90 (D) vmap_area_list
 > > > >   crash>
 > > > >   
 > > > > Dave
 > > > > 
 > > > > 
 > > > > 
 > > > 
 > > > Yeah kmem -v fails as well:
 > > > crash> kmem -v
 > > > 
 > > > kmem: invalid list entry: 0
 > > > WARNING: malloc/free mismatch (29/30)
 > > > crash>
 > > > 
 > > > 
 > > > There's no indicating of an error when crash loads though -
 > > > only
 > > > after
 > > > running these commands.  Do you think this a damaged vmcore
 > > > that is
 > > > not
 > > > obvious?
 > > 
 > > I don't know it's damaged or if it's a symptom of the kernel
 > > crash.  Is the
 > > kernel crash happening while the vmlist is being modified?
 > > 
 > 
 > No there are no active processes modifying the vmap_area_list
 > 
 > It is crash due to memory corruption and there are 3rd party
 > modules.
 > But the crash is inside xfs and does not appear in any way related
 > to
 > this nor are the 3rd party modules in any backtraces.
 > 
 > 
 > > It's not obvious because it would only be seen when dump_vmlist()
 > > is
 > > called.
 > > When dump_vmlist() calls dump_vmap_area(), and do_list() returns
 > > -1
 > > back to
 > > dump_vmap_area(), it gets used as a loop-ending index, and then
 > > causes
 > > presumably bogus values to get returned:
 > > 
 > >    do_vmap_area(struct meminfo *vi)
 > >    {
 > >    ...
 > >   
 > >           ld->end = symbol_value("vmap_area_list");
 > >           cnt = do_list(ld);
 > > 
 > >           for (i = 0; i < cnt; i++) {
 > >    ...
 > >   
 > >           if (vi->flags & GET_HIGHEST)
 > >                   vi->retval = start+size;
 > > 
 > >           if (vi->flags & GET_VMLIST_COUNT)
 > >                   vi->retval = count;
 > > 
 > >           if (vi->flags & VMLIST_VERIFY)
 > >                   vi->retval = verified;
 > >   }
 > > 
 > > Mabye dump_vmap_area() should do a error(FATAL, ...) if cnt is
 > > -1?  Although,
 > > that would kill all search command attempts.  It's hard to say.
 > > 
 > > Dave
 > > 
 > > 
 > 
 > I would maybe consider something like this which seems to fix the
 > persistent malloc/free mismatch errors and give some results?
 > 
 > crash> kmem -v
 > 
 > kmem: invalid list entry: 0
 > crash>
 > crash> kmem -v
 > 
 > kmem: invalid list entry: 0
 > crash> search 0xdeadbeef
 > 
 > search: invalid list entry: 0
 > 
 > search: invalid list entry: 0
 > 
 > search: invalid list entry: 0
 > ffff88078d2cda80: deadbeef
 > crash>
 > crash> kmem -v
 > 
 > kmem: invalid list entry: 0
 > crash> quit
 > 
 > 
 > $ git diff memory.c
 > diff --git a/memory.c b/memory.c
 > index 8cdab06..7161d9d 100644
 > --- a/memory.c
 > +++ b/memory.c
 > @@ -8722,6 +8722,11 @@ dump_vmap_area(struct meminfo *vi)
 >         ld->list_head_offset = OFFSET(vmap_area_list);
 >         ld->end = symbol_value("vmap_area_list");
 >         cnt = do_list(ld);
 > +       if (cnt < 0) {
 > +               vi->retval = 0;
 > +               FREEBUF(vmap_area_buf);
 > +               return;
 > +       }
 >  
 >         for (i = 0; i < cnt; i++) {
 >                 if (!(pc->curcmd_flags & HEADER_PRINTED) && (i ==
 > 0) &&
 > 
 > --
 
 I was wondering how the search command would handle its call to
 machdep->get_kvaddr_ranges()
 with the patch above -- which would return 0 as the vmalloc address
 range's "end" address.
 But given your output above, apparently it seems to work around it.
 
 Thanks,
   Dave
 
  
As far as I could tell, the code properly checks for a non-zero
meminfo.retval before proceeding in all instances.