Re: [Crash-utility] [PATCH 0/5] [RFC] Multi-thread support for search cmd

Monday, 3 April 2023

On Tue, Apr 4, 2023 at 10:51 AM Tao Liu <ltao(a)redhat.com&gt; wrote:

...
 >
 > To be honest, I'm not sure if it's really worth introducing multi-thread
 only for a "search"
 > command, as the "search" command is not commonly used, and the
 performance issue
 > only occurs when the "search" command reads large memory. Let's see
if
 Kazu has any
 > comments about it.
 >
 > But the  [PATCH 5/5] looks good to me, and it should be a separate patch.
 >

 Thanks for your comments. The original motivation for patch5/5 is to
 make a thread-safe version of readmem(), which failed, but the
 patch5/5 itself is good to simplify the reading process. As for the

It's true. The patch5/5 can be picked up first as a separate patch.
And the patches 1-4 still need to be discussed.

multithread part of patch1~4, some CEE people of redhat are using
...
 search cmd more often, and they also complain about other crash cmds
 such as kmem suffering from performance issues. They are still trying

Could you please share more details about the search command scenario?
This may help us make a good choice. Usually crash-utility wouldn't
recommend
introducing a new function only for a specific usage/scenario.

to give me a list of these cmds. I agree currently the performance
...
 gains and the cost of code maintenance doesn't have a good
balance if
 only for one search cmd. I will see if the multi-thread code can work
 as a common library so other commands can benefit from it as well.

You are right, Tao. That is my concern. If the multi-thread can easily serve
for crash-utility startup or other commands(to improve performance), and is
easy to maintain. That would be fine to me.

Thanks.
Lianbo

...
 Anyway I want to see how maintainers think about the performance
 issues, whether multithread is an acceptable solution, and if there
 are any suggestions if I want to go further.

 Thanks,
 Tao Liu

 > Thanks.
 > Lianbo
 >
 >>
 >> steps: 1) readmem data into pagebuf, 2) search specific values within
 the
 >> pagebuf. A typical workflow of search is as follows:
 >>
 >> for addr from low to high:
 >> do
 >>         readmem(addr, pagebuf)
 >>         search_value(value, pagebuf)
 >>         addr += pagesize
 >> done
 >>
 >> There are 2 points which we can accelerate: 1) readmem don't have to
 wait
 >> search_value, when search_value is working, readmem can read the next
 pagebuf
 >> at the same time. 2) higher addr don't have to wait lower addr, they
 can be
 >> processed at the same time if we carefully arrange the output order.
 >>
 >> For point 1, we introduce zones for pagebuf, e.g. search_value can work
 on
 >> zone 0 while readmem can prepare the data for zone 1. For point 2, we
 introduce
 >> multiple search_value in threads, e.g. readmem will prepare 100 pages
 as a
 >> batch, then we will have 4 threads of search_value, thread 0 handles
 page 1~25,
 >> thread 2 handles page 26~50 page, thread 3 handles page 51~75, thread 4
 handles
 >> page 76~100.
 >>
 >> A typical workflow of multithread search implemented in this patchset
 is as
 >> follows, which removed thread synchronization:
 >>
 >> pagebuf[ZONE][BATCH]
 >> zone_index = buf_index = 0
 >> create_thread(4, search_value)
 >> for addr from low to high:
 >> do
 >>         if buf_index < BATCH
 >>                 readmem(addr, pagebuf[zone_index][buf_index++])
 >>                 addr += pagesize
 >>         else
 >>                 start_thread(pagebuf[zone_index], 0/4 * BATCH, 1/4 *
 BATCH)
 >>                 start_thread(pagebuf[zone_index], 1/4 * BATCH, 2/4 *
 BATCH)
 >>                 start_thread(pagebuf[zone_index], 2/4 * BATCH, 3/4 *
 BATCH)
 >>                 start_thread(pagebuf[zone_index], 3/4 * BATCH, 4/4 *
 BATCH)
 >>                 zone_index++
 >>                 buf_index = 0
 >>         fi
 >> done
 >>
 >> readmem works in the main process and not multi-threaded, because
 readmem will
 >> not only read data from vmcore, decompress it, but walk through page
 tables if
 >> virtual address given. It is hard to reimplement it into thread safe
 version,
 >> search_value is easier to be made thread-safe. By carefully choose
 batch size
 >> and thread num, we can maximize the concurrency.
 >>
 >> The last part of the patchset, is replacing lseek/read to pread for
 kcore and
 >> diskdumped vmcore.
 >>
 >> Here is the performance test result chart. Please note the vmcore and
 >> kcore are tested seperately on 2 different machines. crash-orig is the
 >> crash compiled with clean upstream code, crash-pread is the code with
 only
 >> pread patch applied(patch 5), crash-multi is the code with only
 multithread
 >> patches applied(patch 1~4).
 >>
 >> ulong search:
 >>
 >>     $ time echo "search abcd" | ./crash-orig vmcore vmlinux >
/dev/null
 >>     $ time echo "search abcd -f 4 -n 4" | ./crash-multi vmcore
vmlinux
 > /dev/null
 >>
 >>                          45G vmcore                             64G
 kcore
 >>                 real        user        sys             real
  user       sys
 >> crash-orig      16m56.595s  15m57.188s  0m56.698s       1m37.982s
 0m51.625s  0m46.266s
 >> crash-pread     16m46.366s  15m55.790s  0m48.894s       1m9.179s
  0m36.646s  0m32.368s
 >> crash-multi     16m26.713s  19m8.722s   1m29.263s       1m27.661s
 0m57.789s  0m54.604s
 >>
 >> string search:
 >>
 >>     $ time echo "search -c abcddbca" | ./crash-orig vmcore vmlinux
>
 /dev/null
 >>     $ time echo "search -c abcddbca -f 4 -n 4" | ./crash-multi vmcore
 vmlinux > /dev/null
 >>
 >>                         45G vmcore                              64G
 kcore
 >>                 real        user        sys             real
  user       sys
 >> crash-orig      33m33.481s  32m38.321s  0m52.771s       8m32.034s
 7m50.050s  0m41.478s
 >> crash-pread     33m25.623s  32m35.019s  0m47.394s       8m4.347s
  7m35.352s  0m28.479s
 >> crash-multi     16m31.016s  38m27.456s  1m11.048s       5m11.725s
 7m54.224s  0m44.186s
 >>
 >> Discussion:
 >>
 >> 1) Either multithread and pread patches can improve the performance a
 >>    bit, so if both patches applied, the performance can be better.
 >>
 >> 2) Multi-thread search performs much better in search time consumptive
 >>    tasks, such as string search.
 >>
 >> Tao Liu (5):
 >>   Introduce multi-thread to search_virtual
 >>   Introduce multi-thread to search_physical
 >>   Introduce multi-thread to string search
 >>   Introduce multi-thread options to search cmd
 >>   Replace lseek/read into pread for kcore and vmcore reading.
 >>
 >>  defs.h     |    6 +
 >>  diskdump.c |   11 +-
 >>  help.c     |   17 +-
 >>  memory.c   | 1176 +++++++++++++++++++++++++++++++++++++++++-----------
 >>  netdump.c  |    5 +-
 >>  task.c     |   14 +
 >>  6 files changed, 969 insertions(+), 260 deletions(-)
 >>
 >> --
 >> 2.33.1

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] [PATCH 0/5] [RFC] Multi-thread support for search cmd