On Fri, Jul 5, 2024 at 9:25 AM lijiang <lijiang(a)redhat.com> wrote:
On Fri, Jul 5, 2024 at 7:29 AM Tao Liu <ltao(a)redhat.com>
wrote:
> Hi Lianbo,
>
> On Mon, Jul 1, 2024 at 9:15 PM lijiang <lijiang(a)redhat.com> wrote:
> >
> > Hi, Tao
> > On Thu, Jun 27, 2024 at 12:08 PM Tao Liu <ltao(a)redhat.com> wrote:
> >>
> >> Hi Lianbo,
> >>
> >> Thanks for the patch.
> >>
> >> On Fri, Jun 14, 2024 at 3:00 PM Lianbo Jiang <lijiang(a)redhat.com>
> wrote:
> >> >
> >> > Sometimes, in a production environment, there are still some vmcores
> >> > that are incomplete, such as partial header or the data is corrupted.
> >> > When crash tool attempts to parse such vmcores, it may fail as below:
> >> >
> >> > $ ./crash --osrelease vmcore
> >> > Bus error (core dumped)
> >> >
> >> > or
> >> >
> >> > $ crash vmlinux vmcore
> >> > ...
> >> > Bus error (core dumped)
> >> > $
> >> >
> >> > The gdb sees that crash tool reads out a null bitmap from the header
> in
> >> > this vmcore, when executing memcpy(), emits a SIGBUS error as below:
> >> >
> >> > $ gdb /home/lijiang/src/crash/crash /tmp/core.126301
> >> > Core was generated by `./crash --osrelease
> /home/lijiang/src/39317/vmcore'.
> >> > Program terminated with signal SIGBUS, Bus error.
> >> > #0 __memcpy_evex_unaligned_erms () at
> ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
> >> > 831 LOAD_ONE_SET((%rsi), PAGE_SIZE, %VMM(4), %VMM(5),
> %VMM(6), %VMM(7))
> >> > (gdb) bt
> >> > #0 __memcpy_evex_unaligned_erms () at
> ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
> >> > #1 0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f
> "/home/lijiang/src/39317/vmcore") at diskdump.c:820
> >> > #2 0x0000000000651cf3 in is_diskdump (file=0x7ffc59ddff5f
> "/home/lijiang/src/39317/vmcore") at diskdump.c:1042
> >> > #3 0x0000000000502ac9 in get_osrelease (dumpfile=0x7ffc59ddff5f
> "/home/lijiang/src/39317/vmcore") at main.c:1938
> >> > #4 0x00000000004fb2e8 in main (argc=3, argv=0x7ffc59dde3a8) at
> main.c:271
> >> > (gdb) frame 1
> >> > #1 0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f
> "/home/lijiang/src/39317/vmcore") at diskdump.c:820
> >> > 820 memcpy(dd->dumpable_bitmap, dd->bitmap +
> bitmap_len/2,
> >> > (gdb) p dd->dumpable_bitmap
> >> > $1 = 0x7f8e89800010 ""
> >> > (gdb) p dd->bitmap
> >> > $2 = 0x7f8e87e09000 ""
> >> > (gdb) p dd->bitmap + bitmap_len/2
> >> > $3 = 0x7f8e88a17000 ""
> >> > (gdb) p *(char*)(dd->bitmap+bitmap_len/2)
> >> > $4 = 0 '\000'
> >> > (gdb) p bitmap_len/2
> >> > $5 = 12640256
> >> >
> >> > Let's add a sanity check for such cases to avoid causing a SIGBUS
> error.
> >> >
> >> I didn't really understand the reason why the SIGBUS error occurs. Is
> >> it because bitmap_len/2 is too large(12M), which exceeded the
> >> dd->bitmap memory range?
> >
> >
> > I did not see any limitations on the bitmap memory range.
> >
> > Looks like an unaligned memory address was accessed, please see here:
> >
> > #0 __memcpy_evex_unaligned_erms () at
> ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
> > 831 LOAD_ONE_SET((%rsi), PAGE_SIZE, %VMM(4), %VMM(5),
> %VMM(6), %VMM(7))
> >
> > Maybe the vmcore is corrupted, anyway I have never seen it before this
> issue.
> >
> >>
> >> Why *(char*)(dd->bitmap+bitmap_len/2) == '\0' will be considered
as an
> >> error, do we expect a value like "0xff" here?
> >
> >
> > No.
> > If it is a null string, no need to copy it(or it may be an exceptional
> string because of corrupted data).
> >
> I still didn't understand the check here. Assume we have a bitmap like
> the following. The above code only checks if the starting byte is 0.
>
No, as above mentioned, the *(char*)(dd->bitmap+bitmap_len/2) == '\0' is
checking if it is a null string, not a 0 value. They are different.
What about if there are 1s after the leading 0? Do these be considered
> as valid bitmap and should they be copied?
> ... 0 1 1 0 1 1 0 1 1 1 1
> ^
>
This is not a null string. The value of checking
*(char*)(dd->bitmap+bitmap_len/2) == '\0' is 'false', not
'true'.
> ... len/2
>
> I understand that you want to skip the unnecessary memcpy to bypass
>
As I mentioned in the last email, If it is a null string('\0'), no need to
copy it. Let's differentiate between 0 and '\0', they are not the same.
Thanks
Lianbo
the SIGBUS error by setting a check condition. I'm worrying if the
> check may filter out those valid bitmaps.
>
Think it over again, looks like your concern is correct, Tao.
Let's see if there is a better way to fix the current issue.
Thanks
Lianbo
> Thanks,
> Tao Liu
>
> >>
> >>
> >> I guess what we want here is to handle SIGBUS errors nicely right? why
> >> don't we add a SIGBUS handler and process it directly?
> >
> >
> > This may prevent a core dump file from being generated.
> > And users can easily report it with a core dump file, sometimes It's
> hard to reproduce for us.
> >
> > Thanks
> > Lianbo
> >
> >>
> >> Thanks,
> >> Tao Liu
> >>
> >> > With the patch:
> >> > $ crash -s vmlinux vmcore
> >> > crash: vmcore: not a supported file format
> >> > ...
> >> > Enter "crash -h" for details.
> >> >
> >> > $ crash --osrelease vmcore
> >> > unknown
> >> >
> >> > Reported-by: Buland Kumar Singh <bsingh(a)redhat.com>
> >> > Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com>
> >> > ---
> >> > diskdump.c | 6 ++++--
> >> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/diskdump.c b/diskdump.c
> >> > index 1f7118cacfc6..c31eab01aa05 100644
> >> > --- a/diskdump.c
> >> > +++ b/diskdump.c
> >> > @@ -814,10 +814,12 @@ restart:
> >> > madvise(dd->bitmap, bitmap_len, MADV_WILLNEED);
> >> > }
> >> >
> >> > - if (dump_is_partial(header))
> >> > + if (dump_is_partial(header)) {
> >> > + if (*(char*)(dd->bitmap + bitmap_len/2) ==
'\0')
> >> > + goto err;
> >> > memcpy(dd->dumpable_bitmap, dd->bitmap +
> bitmap_len/2,
> >> > bitmap_len/2);
> >> > - else
> >> > + } else
> >> > memcpy(dd->dumpable_bitmap, dd->bitmap,
bitmap_len);
> >> >
> >> > dd->data_offset
> >> > --
> >> > 2.45.1
> >> > --
> >> > Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io
> >> > To unsubscribe send an email to
> devel-leave(a)lists.crash-utility.osci.io
> >> > https://${domain_name}/admin/lists/
> devel.lists.crash-utility.osci.io/
> >> > Contribution Guidelines:
https://github.com/crash-utility/crash/wiki
> >>
>
>