Re: [Crash-utility] [PATCH 0/5] [RFC] Multi-thread support for search cmd

Tuesday, 4 April 2023

Hi Tao,

On Sat, 25 Mar 2023 12:12:12 +0800
Tao Liu <ltao(a)redhat.com&gt; wrote:

...
 The primary part of the patchset will introduce multithread support
for search
 cmd to improve its performance. A search operation is mainly made up with 2
 steps: 1) readmem data into pagebuf, 2) search specific values within the
 pagebuf. A typical workflow of search is as follows:

 for addr from low to high:
 do
 	readmem(addr, pagebuf)
 	search_value(value, pagebuf)
 	addr += pagesize
 done

 There are 2 points which we can accelerate: 1) readmem don't have to wait
 search_value, when search_value is working, readmem can read the next pagebuf
 at the same time. 2) higher addr don't have to wait lower addr, they can be
 processed at the same time if we carefully arrange the output order.

 For point 1, we introduce zones for pagebuf, e.g. search_value can work on
 zone 0 while readmem can prepare the data for zone 1. For point 2, we introduce
 multiple search_value in threads, e.g. readmem will prepare 100 pages as a
 batch, then we will have 4 threads of search_value, thread 0 handles page 1~25,
 thread 2 handles page 26~50 page, thread 3 handles page 51~75, thread 4 handles
 page 76~100.

 A typical workflow of multithread search implemented in this patchset is as
 follows, which removed thread synchronization:

 pagebuf[ZONE][BATCH]
 zone_index = buf_index = 0
 create_thread(4, search_value)
 for addr from low to high:
 do
 	if buf_index < BATCH
 		readmem(addr, pagebuf[zone_index][buf_index++])
 		addr += pagesize
 	else
 		start_thread(pagebuf[zone_index], 0/4 * BATCH, 1/4 * BATCH)
 		start_thread(pagebuf[zone_index], 1/4 * BATCH, 2/4 * BATCH)
 		start_thread(pagebuf[zone_index], 2/4 * BATCH, 3/4 * BATCH)
 		start_thread(pagebuf[zone_index], 3/4 * BATCH, 4/4 * BATCH)
 		zone_index++
 		buf_index = 0
 	fi
 done

 readmem works in the main process and not multi-threaded, because readmem will
 not only read data from vmcore, decompress it, but walk through page tables if
 virtual address given. It is hard to reimplement it into thread safe version,
 search_value is easier to be made thread-safe. By carefully choose batch size
 and thread num, we can maximize the concurrency.

 The last part of the patchset, is replacing lseek/read to pread for kcore and
 diskdumped vmcore.

 Here is the performance test result chart. Please note the vmcore and
 kcore are tested seperately on 2 different machines. crash-orig is the
 crash compiled with clean upstream code, crash-pread is the code with only
 pread patch applied(patch 5), crash-multi is the code with only multithread
 patches applied(patch 1~4).

 ulong search:

     $ time echo "search abcd" | ./crash-orig vmcore vmlinux > /dev/null
     $ time echo "search abcd -f 4 -n 4" | ./crash-multi vmcore vmlinux >
/dev/null

 			 45G vmcore				64G kcore
 		real        user        sys    		real       user       sys
 crash-orig	16m56.595s  15m57.188s  0m56.698s	1m37.982s  0m51.625s  0m46.266s
 crash-pread	16m46.366s  15m55.790s  0m48.894s	1m9.179s   0m36.646s  0m32.368s
 crash-multi	16m26.713s  19m8.722s   1m29.263s	1m27.661s  0m57.789s  0m54.604s

 string search:

     $ time echo "search -c abcddbca" | ./crash-orig vmcore vmlinux >
/dev/null
     $ time echo "search -c abcddbca -f 4 -n 4" | ./crash-multi vmcore vmlinux
> /dev/null

 			45G vmcore				64G kcore
 		real        user        sys    		real       user       sys
 crash-orig	33m33.481s  32m38.321s  0m52.771s	8m32.034s  7m50.050s  0m41.478s
 crash-pread	33m25.623s  32m35.019s  0m47.394s	8m4.347s   7m35.352s  0m28.479s
 crash-multi	16m31.016s  38m27.456s  1m11.048s	5m11.725s  7m54.224s  0m44.186s

 Discussion:

 1) Either multithread and pread patches can improve the performance a
    bit, so if both patches applied, the performance can be better.

 2) Multi-thread search performs much better in search time consumptive
    tasks, such as string search. 

 Tao Liu (5):
   Introduce multi-thread to search_virtual
   Introduce multi-thread to search_physical
   Introduce multi-thread to string search
   Introduce multi-thread options to search cmd
   Replace lseek/read into pread for kcore and vmcore reading.

  defs.h     |    6 +
  diskdump.c |   11 +-
  help.c     |   17 +-
  memory.c   | 1176 +++++++++++++++++++++++++++++++++++++++++-----------
  netdump.c  |    5 +-
  task.c     |   14 +
  6 files changed, 969 insertions(+), 260 deletions(-)

Great series!

I must admit that I don't get all the details of the implementation,
yet. But for what I understand I agree with Lianbo. In particular that
the last patch is independent and can be merged without needing to wait
for the rest. Furthermore it would be great if you could move the thread
handling and some basic functionality (like work queues etc.) to a
library. This will not only allow other parts of crash to reuse them
when they get multi threaded but it will also make your code a lot
easier to understand. For example look at all the local variables
needed in search_{virtual,physical}. There were already more than a
human brain can handle and you had to add quite a bunch more.

Having that said, I've noticed that you had to make the same changes
over and over again. For me that's usually a sign that multiple
functions are doing the same and should be merged. For example take
search_chars and search_chars_p. The body of functions are
identical. The only difference between them is the return type. Similar
the other search_{ulong,uint,ushort}{_p} functions or search_virtual
and search_physical. This is an old problem of the code but when you
need to touch it anyway it's a great opportunity for a cleanup ;-)

Finally I don't understand the interface you are proposing. Especially
the -f option seems much to complicated to me. Why can't the user
simply pass the batch size (or if you prefer the batch size per
thread)? At least for me that would be much more intuitive compared to
some 'factor' that is later used to calculate the batch size.
Furthermore I don't understand why the -c and -x options are mutual
exclusive. The way I see it there shouldn't be much of a difference
between the single and multi threaded case.

Thanks
Philipp

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] [PATCH 0/5] [RFC] Multi-thread support for search cmd