Re: [Crash-utility] Questions on multi-thread for crash

Friday, 10 February 2023

On Fri, Feb 10, 2023 at 12:42:36AM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote:
...
 Hi Tao,

 On 2023/02/08 13:34, Tao Liu wrote:
 > Hello,
 > 
 > Recently I made an attempt to introduce a thread pool for crash utility, to
 > optimize the performance of crash.

 Thank you for the attempt, interesting.
 What data made you try to speed up the collection of member offsets?
 First I'm interested in which routines weigh with crash startup.

 To be honest, personally I'm fairly satisfied with the current
 crash-8 startup time :) with commit cd8954023b (thanks to Hatayama-san)
 and maybe the benefit of GDB's parallel loading, which Andrew said.

 [root@rhel91u ~]# time echo quit | crash > /dev/null

 real    0m2.621s
 user    0m2.574s
 sys     0m0.112s

 So I would like to know first whether it's likely to worth looking
 into the slow part, especially if we have such complexity of
 multi-threading or another way.

 Thanks,
 Kazu

Hi Kazu,

The startup time of crash and drgn comparison:

  $ time echo "quit()" | drgn -c /tmp/maple/vmcore -s /tmp/maple/vmlinux >
/dev/null
  drgn 0.0.22+2.g33c3e36 (using Python 3.10.0, elfutils 0.185, with libkdumpfile)
  For help, type help(drgn).
...
>> import drgn
>> from drgn import NULL, Object, cast, container_of, execscript, offsetof,
reinterpret, sizeof
>> from drgn.helpers.common import *
>> from drgn.helpers.linux import *   warning: could not get debugging
information for:
  kernel modules (could not read depmod: open: /lib/modules/5.19.0-uek+/modules.dep.bin:
  No such file or directory)

  real	0m0.222s
  user	0m1.032s
  sys	0m0.041s

  $ time echo quit | crash /tmp/maple/vmcore /tmp/maple/vmlinux > /dev/null

  real	0m8.266s
  user	0m7.853s
  sys	0m0.248s

As you see, there is a startup time difference between crash and drgn. I haven't
look into drgn's source code to see if drgn have the same amount of startup work
as crash. Then I measured the time consuming status of xx_init() functions of
crash, and gprof the time status:

  $ crash /tmp/maple/vmcore /tmp/maple/vmlinux
  gdb_session_init 		23.761000
  machdep_init POST_RELOC 	0.060000
  show_untrusted_files 		0.000000
  kdump_backup_region_init 	0.001000
  read_in_kernel_config 	1.287000
  kernel_init	 		1724.044000 <<--- a lot of symbols resolving
  machdep_init POST_GDB 	851.425000
  vm_init 			1993.168000 <<--- a lot of symbols resolving                          

  machdep_init POST_VM 		414.680000
  module_init 			287.185000                      
  help_init 			0.005000
  task_init 			459.189000                     
  vfs_init 			301.908000
  net_init 			116.945000
  dev_init 			6.494000
  machdep_init POST_INIT 	9.176000

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 14.47      1.12     1.12                             skip_ws(char const*&, char
const*&, char const*)
 14.08      2.21     1.09                             lookup_partial_symbol(objfile*,
partial_symtab*, lookup_name_info const&, int, domain_enum_tag)
 13.44      3.25     1.04                             strncmp_iw_with_mode(char const*,
char const*, unsigned long, strncmp_iw_mode, language, completion_match_for_lcd*)
  5.81      3.70     0.45                            
symbol_matches_search_name(general_symbol_info const*, lookup_name_info const&)
  4.52      4.05     0.35                             gdb::bcache::insert(void const*,
int, bool*)
  4.26      4.38     0.33                             default_symbol_name_matcher(char
const*, lookup_name_info const&, completion_match_result*)
  4.26      4.71     0.33                            
language_defn::get_symbol_name_matcher(lookup_name_info const&) const
  3.23      4.96     0.25                             symbol_matches_domain(language,
domain_enum_tag, domain_enum_tag)
  2.84      5.18     0.22        1   220.00   220.00  symval_hash_init
  2.20      5.35     0.17                            
read_attribute_value(die_reader_specs const*, attribute*, unsigned int, long, unsigned
char const*, bool*)
  2.20      5.52     0.17                             iterative_hash
  1.81      5.66     0.14                             strcmp_iw_ordered(char const*, char
const*)
  1.81      5.80     0.14                             psym_lookup_symbol(objfile*,
block_enum, char const*, domain_enum_tag)
  1.68      5.93     0.13                             dwarf2_psymtab::readin_p(objfile*)
const
  1.55      6.05     0.12   133543     0.00     0.00  symname_hash_install
...

It looks to me that crash takes a lot of time on symbol resolving, so I thought
maybe I can parallel some of those to shorten the startup time.

Thanks,
Tao Liu

...
 > 
 > One obvious point which can benefit from multi-threading is memory.c:vm_init().
 > There are hundreds of MEMBER_OFFSET_INIT() related symbol resolving functions,
 > and most of the symbols are independent from each other, by careful arrangement,
 > they can be invoked parallelly. By doing so, we can shorten the waiting time of
 > crash starting.
 > 
 > The implementation is abstracted as the following:
 > 
 > Before multi-threading:
 > 	MEMBER_OFFSET_INIT(task_struct_mm, "task_struct", "mm");
 > 	MEMBER_OFFSET_INIT(mm_struct_mmap, "mm_struct", "mmap");
 > 
 > After multi-threading:
 > 	create_threadpool(&pool, 3);
 > 	...
 > 	MEMBER_OFFSET_INIT_PARA(pool, task_struct_mm, "task_struct",
"mm");
 > 	MEMBER_OFFSET_INIT_PARA(pool, mm_struct_mmap, "mm_struct",
"mmap");
 > 	...
 > 	wait_and_destroy_threadpool(pool);
 > 
 > MEMBER_OFFSET_INIT_PARA just append the task to the work queue of thread pool
 > and continues, it's up to the pool to schedule the worker thread to do the
 > symbol resolving work.
 > 
 > However, after enable multi-threading, I noticed there are always random errors
 > from gdb. From segfault to broken stack, it seems gdb is not thread safe at
 > all...
 > 
 > For example one error listed as follows:
 > 
 > 	Thread 10 "crash" received signal SIGSEGV, Segmentation fault.
 > 	[Switching to Thread 0x7fffc4f00640 (LWP 72950)]
 > 	c_yylex () at /sources/up-crash/gdb-10.2/gdb/c-exp.y:3250
 > 	3250   if (pstate->language ()->la_language != language_cplus
 > 	(gdb) bt
 > 	#0  c_yylex () at /sources/up-crash/gdb-10.2/gdb/c-exp.y:3250
 > 	#1  c_yyparse () at /sources/up-crash/gdb-10.2/gdb/c-exp.c.tmp:2092
 > 	#2  0x00000000006f62d7 in c_parse (par_state=<optimized out>) at /sources/
 >              up-crash/gdb-10.2/gdb/c-exp.y:3414
 > 	#3  0x0000000000894eac in parse_exp_in_context (stringptr=0x7fffc4efeff8,
 >              pc=<optimized out>, block=<optimized out>, comma=0,
out_subexp=0x0,
 >              tracker=0x7fffc4efef10, cstate=0x0, void_context_p=0) at parse.c:1122
 > 	#4  0x00000000008951d6 in parse_exp_1 (tracker=0x0, comma=0, block=0x0,
 >              pc=0, stringptr=0x7fffc4efeff8) at parse.c:1031
 > 	#5  parse_expression (string=<optimized out>, string@entry=0x7fffc4eff140
 >              "slab_s", tracker=tracker@entry=0x0) at parse.c:1166
 > 	#6  0x000000000092039a in gdb_get_datatype (req=0x7fffc4eff720) at symtab.c:7239
 > 	#7  gdb_command_funnel_1 (req=0x7fffc4eff720) at symtab.c:7018
 > 	#8  0x00000000009206de in gdb_command_funnel (req=0x7fffc4eff720) at symtab.c:6956
 > 	#9  0x00000000005ad137 in gdb_interface (req=0x7fffc4eff720) at
gdb_interface.c:409
 > 	#10 0x00000000005fe76c in datatype_info (name=0xab9700 "slab_s",
 >              member=0xaba8d8 "list", dm=0x0) at symbols.c:5708
 > 	#11 0x0000000000517a85 in member_offset_init_slab_s_list_slab_s_list ()
 >              at memory.c:659
 > 	#12 0x000000000068168f in group_routine (args=<optimized out>) at
thpool.c:81
 > 	#13 0x00007ffff7a48b17 in start_thread () from /lib64/libc.so.6
 > 	  #14 0x00007ffff7acd6c0 in clone3 () from /lib64/libc.so.6
 > 	(gdb) p pstate
 > 	$1 = (parser_state *) 0x0
 > 
 > 	$ cat -n /sources/up-crash/gdb-10.2/gdb/c-exp.y
 > 	66  /* The state of the parser, used internally when we are parsing the
 > 	67     expression.  */
 > 	68
 > 	69  static struct parser_state *pstate = NULL;
 > 
 > pstate is a global variable and not thread safe, the value must be changed by
 > someone else...
 > 
 > Now the project has reached a dead end. Because making gdb thread safe is an
 > impossible mission to me. Is there any advice or suggestions? Thanks in advance!
 > 
 > Thanks!
 > Tao Liu
 > 
 > --
 > Crash-utility mailing list
 > Crash-utility(a)redhat.com
 > https://listman.redhat.com/mailman/listinfo/crash-utility
 > Contribution Guidelines: https://github.com/crash-utility/crash/wiki 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Questions on multi-thread for crash