On Thu, 2019-04-18 at 14:05 -0400, Dave Anderson wrote:
>
> ----- Original Message -----
> > On Thu, 2019-04-18 at 11:09 -0400, Dave Anderson wrote:
> > >
> > > ----- Original Message -----
> > > > On Thu, 2019-04-18 at 15:02 +0100, Pierguido Lambri wrote:
> > > > > Hello,
> > > > >
> > > > > Today while I was looking into a vmcore, I got suddenly the
> > > > > message
> > > > > in $SUBJECT.
> > > > > It started after I did a search into the process stack pages
> > > > > (search
> > > > > -t)
> > > > > and for each command I run afterwards I kept getting that
> > > > > message.
> > > > > For example:
> > > > >
> > > > > $ retrace-server-interact 603967269 crash
> > > > > ...
> > > > > crash> search -t ffff88040a0d5280
> > > > >
> > > > > search: invalid list entry: 0
> > > > >
> > > > > search: invalid list entry: 0
> > > > >
> > > > > search: invalid list entry: 0
> > > > > PID: 606 TASK: ffff88082d226eb0 CPU: 5 COMMAND:
> > > > > "xfsaild/dm-0"
> > > > > ffff88083ff5b948: ffff88040a0d5280
> > > > > ffff88083ff5b990: ffff88040a0d5280
> > > > > ffff88083ff5baa8: ffff88040a0d5280
> > > > > ffff88083ff5baf0: ffff88040a0d5280
> > > > > ffff88083ff5bcf0: ffff88040a0d5280
> > > > > ffff88083ff5bd38: ffff88040a0d5280
> > > > > ffff88083ff5bd98: ffff88040a0d5280
> > > > >
> > > > >
> > > > > WARNING: malloc/free mismatch (29/32)
> > > > >
> > > > > crash> ps -m | grep UN
> > > > > [ 0 00:00:00.146] [UN] PID: 1811 TASK:
> > > > > ffff880c17bd1fa0 CPU:
> > > > > 1 COMMAND: "cp"
> > > > > WARNING: malloc/free mismatch (29/32)
> > > > >
> > > > > I guess this comes from a possible corrupted vmcore (I just
> > > > > got
> > > > > it
> > > > > from this vmcore),
> > > > > but I wonder why every new command keeps returning the same
> > > > > message.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Pier
> > > > >
> > > > > --
> > > > > Crash-utility mailing list
> > > > > Crash-utility(a)redhat.com
> > > > >
https://www.redhat.com/mailman/listinfo/crash-utility
> > > >
> > > > FWIW, I just pulled this up after plambri pinged me. This is
> > > > the
> > > > backtrace that is being hit though I've not dug in more:
> > > >
> > > > Breakpoint 3, do_list (ld=0x7ffffffea6c0) at tools.c:3820
> > > > 3820 error(INFO, "\ninvalid
> > > > list
> > > > entry:
> > > > 0\n");
> > > > (gdb) list
> > > > 3815 return -1;
> > > > 3816 }
> > > > 3817
> > > > 3818 if (next == 0) {
> > > > 3819 if (ld->flags &
> > > > LIST_HEAD_FORMAT) {
> > > > 3820 error(INFO, "\ninvalid
> > > > list
> > > > entry:
> > > > 0\n");
> > > > 3821 if (close_hq_on_return)
> > > > 3822 hq_close();
> > > > 3823 return -1;
> > > > 3824 }
> > > > (gdb) bt
> > > > #0 do_list (ld=0x7ffffffea6c0) at tools.c:3820
> > > > #1 0x000000000047ec82 in dump_vmap_area (vi=0x7ffffffed0d0) at
> > > > memory.c:8724
> > > > #2 dump_vmlist (vi=0x7ffffffed0d0) at memory.c:8590
> > > > #3 0x000000000047f3eb in last_vmalloc_address () at
> > > > memory.c:16792
> > > > #4 0x0000000000515e6b in x86_64_get_kvaddr_ranges
> > > > (vrp=0x7fffffffd340) at x86_64.c:8706
> > > > #5 0x000000000049c6ae in cmd_search () at memory.c:13988
> > > > #6 0x0000000000465f9c in exec_command () at main.c:879
> > > > #7 0x00000000004661ca in main_loop () at main.c:826
> > > > #8 0x00000000006b21a3 in captured_command_loop (data=<value
> > > > optimized out>) t main.c:258
> > > > #9 0x00000000006b0a8b in catch_errors (func=0x6b2190
> > > > <captured_command_loop>, func_args=0x0, errstring=0x90c106
"",
> > > > mask=6) at exceptions.c:557
> > > > #10 0x00000000006b3076 in captured_main (data=<value optimized
> > > > out>) at main.c:1064
> > > > #11 0x00000000006b0a8b in catch_errors (func=0x6b22b0
> > > > <captured_main>, func_args=0x7fffffffe2e0, errstring=0x90c106
> > > > "",
> > > > mask=6) at exceptions.c:557
> > > > #12 0x00000000006b1fa4 in gdb_main (args=<value optimized
out>)
> > > > at
> > > > main.c:1079
> > > > #13 0x00000000006b1fde in gdb_main_entry (argc=<value optimized
> > > > out>, argv=<value optimized out>) at main.c:1099
> > > > #14 0x0000000000467030 in main (argc=3, argv=0x7fffffffe458) at
> > > > main.c:707
> > >
> > > Hmmm, the vmap_area list is a list_head type list, so there
> > > should
> > > never be
> > > a NULL "next" pointer.
> > >
> > > I'm guessing that "kmem -v" also fails? The last vmap_area
entry
> > > should point back to
> > > the global "vmap_area_list" list header, for example:
> > >
> > > crash> kmem -v | tail
> > > ffff96e7ecaaca80 ffff96e54c89c400 ffffffffc0e54000 -
> > > ffffffffc0e5a000 24576
> > > ffff96e757ffe380 ffff96e4be98f3c0 ffffffffc0e5d000 -
> > > ffffffffc0e6d000 65536
> > > ffff96e467b33400 ffff96e6a3ae1a00 ffffffffc0e6d000 -
> > > ffffffffc0e73000 24576
> > > ffff96e85cf4e600 ffff96e752c52b40 ffffffffc0e77000 -
> > > ffffffffc0e7c000 20480
> > > ffff96e85cf4e380 ffff96e5506c6c00 ffffffffc0e7c000 -
> > > ffffffffc0e81000 20480
> > > ffff96e802baa500 ffff96e5506c69c0 ffffffffc0e81000 -
> > > ffffffffc0e86000 20480
> > > ffff96e802baac00 ffff96e5506c6cc0 ffffffffc0e86000 -
> > > ffffffffc0e8c000 24576
> > > ffff96e574196f80 ffff96e55ffd6c80 ffffffffc0e90000 -
> > > ffffffffc0e95000 20480
> > > ffff96e574196680 ffff96e55ffd6880 ffffffffc0e95000 -
> > > ffffffffc0e9a000 20480
> > > ffff96e87c222800 ffff96e5496ca680 ffffffffc0e9a000 -
> > > ffffffffc0ea4000 40960
> > > crash> vmap_area ffff96e87c222800
> > > struct vmap_area {
> > > va_start = 18446744072651120640,
> > > va_end = 18446744072651161600,
> > > flags = 4,
> > > rb_node = {
> > > __rb_parent_color = 18446628510972342169,
> > > rb_right = 0x0,
> > > rb_left = 0xffff96e574196698
> > > },
> > > list = {
> > > next = 0xffffffffae69af90,
> > > prev = 0xffff96e5741966b0
> > > },
> > > purge_list = {
> > > next = 0x0,
> > > prev = 0xdead000000000200
> > > },
> > > vm = 0xffff96e5496ca680,
> > > callback_head = {
> > > next = 0x0,
> > > func = 0xffff96e71d51aa00
> > > }
> > > }
> > > crash> sym 0xffffffffae69af90
> > > ffffffffae69af90 (D) vmap_area_list
> > > crash>
> > >
> > > Dave
> > >
> > >
> > >
> >
> > Yeah kmem -v fails as well:
> > crash> kmem -v
> >
> > kmem: invalid list entry: 0
> > WARNING: malloc/free mismatch (29/30)
> > crash>
> >
> >
> > There's no indicating of an error when crash loads though - only
> > after
> > running these commands. Do you think this a damaged vmcore that is
> > not
> > obvious?
>
> I don't know it's damaged or if it's a symptom of the kernel
> crash. Is the
> kernel crash happening while the vmlist is being modified?
>
No there are no active processes modifying the vmap_area_list
It is crash due to memory corruption and there are 3rd party modules.
But the crash is inside xfs and does not appear in any way related to
this nor are the 3rd party modules in any backtraces.
> It's not obvious because it would only be seen when dump_vmlist() is
> called.
> When dump_vmlist() calls dump_vmap_area(), and do_list() returns -1
> back to
> dump_vmap_area(), it gets used as a loop-ending index, and then
> causes
> presumably bogus values to get returned:
>
> do_vmap_area(struct meminfo *vi)
> {
> ...
>
> ld->end = symbol_value("vmap_area_list");
> cnt = do_list(ld);
>
> for (i = 0; i < cnt; i++) {
> ...
>
> if (vi->flags & GET_HIGHEST)
> vi->retval = start+size;
>
> if (vi->flags & GET_VMLIST_COUNT)
> vi->retval = count;
>
> if (vi->flags & VMLIST_VERIFY)
> vi->retval = verified;
> }
>
> Mabye dump_vmap_area() should do a error(FATAL, ...) if cnt is
> -1? Although,
> that would kill all search command attempts. It's hard to say.
>
> Dave
>
>
I would maybe consider something like this which seems to fix the
persistent malloc/free mismatch errors and give some results?
crash> kmem -v
kmem: invalid list entry: 0
crash>
crash> kmem -v
kmem: invalid list entry: 0
crash> search 0xdeadbeef
search: invalid list entry: 0
search: invalid list entry: 0
search: invalid list entry: 0
ffff88078d2cda80: deadbeef
crash>
crash> kmem -v
kmem: invalid list entry: 0
crash> quit
$ git diff memory.c
diff --git a/memory.c b/memory.c
index 8cdab06..7161d9d 100644
--- a/memory.c
+++ b/memory.c
@@ -8722,6 +8722,11 @@ dump_vmap_area(struct meminfo *vi)
ld->list_head_offset = OFFSET(vmap_area_list);
ld->end = symbol_value("vmap_area_list");
cnt = do_list(ld);
+ if (cnt < 0) {
+ vi->retval = 0;
+ FREEBUF(vmap_area_buf);
+ return;
+ }
for (i = 0; i < cnt; i++) {
if (!(pc->curcmd_flags & HEADER_PRINTED) && (i == 0)
&&
--
I was wondering how the search command would handle its call to
machdep->get_kvaddr_ranges()
with the patch above -- which would return 0 as the vmalloc address range's
"end" address.
But given your output above, apparently it seems to work around it.
Thanks,
Dave