[Crash-utility] Re: [PATCH] Fix a "Bus error" issue caused by 'crash --osrelease' or crash loading

Wednesday, 26 June 2024

Hi Lianbo,

Thanks for the patch.

On Fri, Jun 14, 2024 at 3:00 PM Lianbo Jiang <lijiang(a)redhat.com&gt; wrote:
...

 Sometimes, in a production environment, there are still some vmcores
 that are incomplete, such as partial header or the data is corrupted.
 When crash tool attempts to parse such vmcores, it may fail as below:

   $ ./crash --osrelease vmcore
   Bus error (core dumped)

 or

   $ crash vmlinux vmcore
   ...
   Bus error (core dumped)
  $

 The gdb sees that crash tool reads out a null bitmap from the header in
 this vmcore, when executing memcpy(), emits a SIGBUS error as below:

   $ gdb /home/lijiang/src/crash/crash /tmp/core.126301
   Core was generated by `./crash --osrelease /home/lijiang/src/39317/vmcore'.
   Program terminated with signal SIGBUS, Bus error.
   #0  __memcpy_evex_unaligned_erms () at
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
   831             LOAD_ONE_SET((%rsi), PAGE_SIZE, %VMM(4), %VMM(5), %VMM(6), %VMM(7))
   (gdb) bt
   #0  __memcpy_evex_unaligned_erms () at
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
   #1  0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f
"/home/lijiang/src/39317/vmcore") at diskdump.c:820
   #2  0x0000000000651cf3 in is_diskdump (file=0x7ffc59ddff5f
"/home/lijiang/src/39317/vmcore") at diskdump.c:1042
   #3  0x0000000000502ac9 in get_osrelease (dumpfile=0x7ffc59ddff5f
"/home/lijiang/src/39317/vmcore") at main.c:1938
   #4  0x00000000004fb2e8 in main (argc=3, argv=0x7ffc59dde3a8) at main.c:271
   (gdb) frame 1
   #1  0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f
"/home/lijiang/src/39317/vmcore") at diskdump.c:820
   820                   memcpy(dd->dumpable_bitmap, dd->bitmap + bitmap_len/2,
   (gdb) p dd->dumpable_bitmap
   $1 = 0x7f8e89800010 ""
   (gdb) p dd->bitmap
   $2 = 0x7f8e87e09000 ""
   (gdb) p dd->bitmap + bitmap_len/2
   $3 = 0x7f8e88a17000 ""
   (gdb) p *(char*)(dd->bitmap+bitmap_len/2)
   $4 = 0 '\000'
   (gdb) p bitmap_len/2
   $5 = 12640256

 Let's add a sanity check for such cases to avoid causing a SIGBUS error.
 I didn't really understand the reason why the SIGBUS error occurs. Is
it because bitmap_len/2 is too large(12M), which exceeded the
dd->bitmap memory range?

Why *(char*)(dd->bitmap+bitmap_len/2) == '\0' will be considered as an
error, do we expect a value like "0xff" here?

I guess what we want here is to handle SIGBUS errors nicely right? why
don't we add a SIGBUS handler and process it directly?

Thanks,
Tao Liu

...
 With the patch:
   $ crash -s vmlinux vmcore
   crash: vmcore: not a supported file format
   ...
   Enter "crash -h" for details.

   $ crash --osrelease vmcore
   unknown

 Reported-by: Buland Kumar Singh <bsingh(a)redhat.com&gt;
 Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com&gt;
 ---
  diskdump.c | 6 ++++--
  1 file changed, 4 insertions(+), 2 deletions(-)

 diff --git a/diskdump.c b/diskdump.c
 index 1f7118cacfc6..c31eab01aa05 100644
 --- a/diskdump.c
 +++ b/diskdump.c
 @@ -814,10 +814,12 @@ restart:
                 madvise(dd->bitmap, bitmap_len, MADV_WILLNEED);
         }

 -       if (dump_is_partial(header))
 +       if (dump_is_partial(header)) {
 +               if (*(char*)(dd->bitmap + bitmap_len/2) == '\0')
 +                       goto err;
                 memcpy(dd->dumpable_bitmap, dd->bitmap + bitmap_len/2,
                        bitmap_len/2);
 -       else
 +       } else
                 memcpy(dd->dumpable_bitmap, dd->bitmap, bitmap_len);

         dd->data_offset
 --
 2.45.1
 --
 Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io
 To unsubscribe send an email to devel-leave(a)lists.crash-utility.osci.io
 https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
 Contribution Guidelines: https://github.com/crash-utility/crash/wiki 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[Crash-utility] Re: [PATCH] Fix a "Bus error" issue caused by 'crash --osrelease' or crash loading