Hi Lianbo,
Thanks for the patch.
On Fri, Jun 14, 2024 at 3:00 PM Lianbo Jiang <lijiang(a)redhat.com> wrote:
Sometimes, in a production environment, there are still some vmcores
that are incomplete, such as partial header or the data is corrupted.
When crash tool attempts to parse such vmcores, it may fail as below:
$ ./crash --osrelease vmcore
Bus error (core dumped)
or
$ crash vmlinux vmcore
...
Bus error (core dumped)
$
The gdb sees that crash tool reads out a null bitmap from the header in
this vmcore, when executing memcpy(), emits a SIGBUS error as below:
$ gdb /home/lijiang/src/crash/crash /tmp/core.126301
Core was generated by `./crash --osrelease /home/lijiang/src/39317/vmcore'.
Program terminated with signal SIGBUS, Bus error.
#0 __memcpy_evex_unaligned_erms () at
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
831 LOAD_ONE_SET((%rsi), PAGE_SIZE, %VMM(4), %VMM(5), %VMM(6), %VMM(7))
(gdb) bt
#0 __memcpy_evex_unaligned_erms () at
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
#1 0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f
"/home/lijiang/src/39317/vmcore") at diskdump.c:820
#2 0x0000000000651cf3 in is_diskdump (file=0x7ffc59ddff5f
"/home/lijiang/src/39317/vmcore") at diskdump.c:1042
#3 0x0000000000502ac9 in get_osrelease (dumpfile=0x7ffc59ddff5f
"/home/lijiang/src/39317/vmcore") at main.c:1938
#4 0x00000000004fb2e8 in main (argc=3, argv=0x7ffc59dde3a8) at main.c:271
(gdb) frame 1
#1 0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f
"/home/lijiang/src/39317/vmcore") at diskdump.c:820
820 memcpy(dd->dumpable_bitmap, dd->bitmap + bitmap_len/2,
(gdb) p dd->dumpable_bitmap
$1 = 0x7f8e89800010 ""
(gdb) p dd->bitmap
$2 = 0x7f8e87e09000 ""
(gdb) p dd->bitmap + bitmap_len/2
$3 = 0x7f8e88a17000 ""
(gdb) p *(char*)(dd->bitmap+bitmap_len/2)
$4 = 0 '\000'
(gdb) p bitmap_len/2
$5 = 12640256
Let's add a sanity check for such cases to avoid causing a SIGBUS error.
I didn't really understand the reason why the SIGBUS error occurs. Is
it because bitmap_len/2 is too large(12M), which exceeded the
dd->bitmap memory range?
Why *(char*)(dd->bitmap+bitmap_len/2) == '\0' will be considered as an
error, do we expect a value like "0xff" here?
I guess what we want here is to handle SIGBUS errors nicely right? why
don't we add a SIGBUS handler and process it directly?
Thanks,
Tao Liu
With the patch:
$ crash -s vmlinux vmcore
crash: vmcore: not a supported file format
...
Enter "crash -h" for details.
$ crash --osrelease vmcore
unknown
Reported-by: Buland Kumar Singh <bsingh(a)redhat.com>
Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com>
---
diskdump.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/diskdump.c b/diskdump.c
index 1f7118cacfc6..c31eab01aa05 100644
--- a/diskdump.c
+++ b/diskdump.c
@@ -814,10 +814,12 @@ restart:
madvise(dd->bitmap, bitmap_len, MADV_WILLNEED);
}
- if (dump_is_partial(header))
+ if (dump_is_partial(header)) {
+ if (*(char*)(dd->bitmap + bitmap_len/2) == '\0')
+ goto err;
memcpy(dd->dumpable_bitmap, dd->bitmap + bitmap_len/2,
bitmap_len/2);
- else
+ } else
memcpy(dd->dumpable_bitmap, dd->bitmap, bitmap_len);
dd->data_offset
--
2.45.1
--
Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io
To unsubscribe send an email to devel-leave(a)lists.crash-utility.osci.io
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines:
https://github.com/crash-utility/crash/wiki