Thank you for the patchset, Tao.
On Sat, Mar 25, 2023 at 8:00 PM <crash-utility-request(a)redhat.com> wrote:
 Date: Sat, 25 Mar 2023 12:12:12 +0800
 From: Tao Liu <ltao(a)redhat.com>
 To: crash-utility(a)redhat.com
 Subject: [Crash-utility] [PATCH 0/5] [RFC] Multi-thread support for
         search cmd
 Message-ID: <20230325041217.8184-1-ltao(a)redhat.com>
 Content-Type: text/plain; charset="US-ASCII"; x-default=true
 The primary part of the patchset will introduce multithread support for
 search
 cmd to improve its performance. A search operation is mainly made up with 2
 
To be honest, I'm not sure if it's really worth introducing multi-thread
only for a "search"
command, as the "search" command is not commonly used, and the performance
issue
only occurs when the "search" command reads large memory. Let's see if Kazu
has any
comments about it.
But the  [PATCH 5/5] looks good to me, and it should be a separate patch.
Thanks.
Lianbo
 steps: 1) readmem data into pagebuf, 2) search specific values within
the
 pagebuf. A typical workflow of search is as follows:
 for addr from low to high:
 do
         readmem(addr, pagebuf)
         search_value(value, pagebuf)
         addr += pagesize
 done
 There are 2 points which we can accelerate: 1) readmem don't have to wait
 search_value, when search_value is working, readmem can read the next
 pagebuf
 at the same time. 2) higher addr don't have to wait lower addr, they can be
 processed at the same time if we carefully arrange the output order.
 For point 1, we introduce zones for pagebuf, e.g. search_value can work on
 zone 0 while readmem can prepare the data for zone 1. For point 2, we
 introduce
 multiple search_value in threads, e.g. readmem will prepare 100 pages as a
 batch, then we will have 4 threads of search_value, thread 0 handles page
 1~25,
 thread 2 handles page 26~50 page, thread 3 handles page 51~75, thread 4
 handles
 page 76~100.
 A typical workflow of multithread search implemented in this patchset is as
 follows, which removed thread synchronization:
 pagebuf[ZONE][BATCH]
 zone_index = buf_index = 0
 create_thread(4, search_value)
 for addr from low to high:
 do
         if buf_index < BATCH
                 readmem(addr, pagebuf[zone_index][buf_index++])
                 addr += pagesize
         else
                 start_thread(pagebuf[zone_index], 0/4 * BATCH, 1/4 * BATCH)
                 start_thread(pagebuf[zone_index], 1/4 * BATCH, 2/4 * BATCH)
                 start_thread(pagebuf[zone_index], 2/4 * BATCH, 3/4 * BATCH)
                 start_thread(pagebuf[zone_index], 3/4 * BATCH, 4/4 * BATCH)
                 zone_index++
                 buf_index = 0
         fi
 done
 readmem works in the main process and not multi-threaded, because readmem
 will
 not only read data from vmcore, decompress it, but walk through page
 tables if
 virtual address given. It is hard to reimplement it into thread safe
 version,
 search_value is easier to be made thread-safe. By carefully choose batch
 size
 and thread num, we can maximize the concurrency.
 The last part of the patchset, is replacing lseek/read to pread for kcore
 and
 diskdumped vmcore.
 Here is the performance test result chart. Please note the vmcore and
 kcore are tested seperately on 2 different machines. crash-orig is the
 crash compiled with clean upstream code, crash-pread is the code with only
 pread patch applied(patch 5), crash-multi is the code with only multithread
 patches applied(patch 1~4).
 ulong search:
     $ time echo "search abcd" | ./crash-orig vmcore vmlinux > /dev/null
     $ time echo "search abcd -f 4 -n 4" | ./crash-multi vmcore vmlinux >
 /dev/null
                          45G vmcore                             64G kcore
                 real        user        sys             real       user
    sys
 crash-orig      16m56.595s  15m57.188s  0m56.698s       1m37.982s
 0m51.625s  0m46.266s
 crash-pread     16m46.366s  15m55.790s  0m48.894s       1m9.179s
  0m36.646s  0m32.368s
 crash-multi     16m26.713s  19m8.722s   1m29.263s       1m27.661s
 0m57.789s  0m54.604s
 string search:
     $ time echo "search -c abcddbca" | ./crash-orig vmcore vmlinux >
 /dev/null
     $ time echo "search -c abcddbca -f 4 -n 4" | ./crash-multi vmcore
 vmlinux > /dev/null
                         45G vmcore                              64G kcore
                 real        user        sys             real       user
    sys
 crash-orig      33m33.481s  32m38.321s  0m52.771s       8m32.034s
 7m50.050s  0m41.478s
 crash-pread     33m25.623s  32m35.019s  0m47.394s       8m4.347s
  7m35.352s  0m28.479s
 crash-multi     16m31.016s  38m27.456s  1m11.048s       5m11.725s
 7m54.224s  0m44.186s
 Discussion:
 1) Either multithread and pread patches can improve the performance a
    bit, so if both patches applied, the performance can be better.
 2) Multi-thread search performs much better in search time consumptive
    tasks, such as string search.
 Tao Liu (5):
   Introduce multi-thread to search_virtual
   Introduce multi-thread to search_physical
   Introduce multi-thread to string search
   Introduce multi-thread options to search cmd
   Replace lseek/read into pread for kcore and vmcore reading.
  defs.h     |    6 +
  diskdump.c |   11 +-
  help.c     |   17 +-
  memory.c   | 1176 +++++++++++++++++++++++++++++++++++++++++-----------
  netdump.c  |    5 +-
  task.c     |   14 +
  6 files changed, 969 insertions(+), 260 deletions(-)
 --
 2.33.1