Hi Andrew,
Thanks a lot for your detailed reply, it really helped me a lot for
better understanding of gdb.
On Thu, Feb 9, 2023 at 10:27 PM Andrew Burgess <aburgess(a)redhat.com> wrote:
Tao Liu <ltao(a)redhat.com> writes:
> CC the discussion to tools-team
>
> Hi lianbo,
>
> Thanks for the reply. Yes I agree by filtering out the symbols which
> do not exist for current kernel will decrease the startup work and
> shorten the waiting time. It is a different technical path, however I
> believe multi-thread can achieve more, including symbol resolve. For
> example in main.c, several xx_init() functions can be parallelled as
> well after careful arrangement.
>
> Hi tools team,
>
> Sorry for the interruption, I don't know if it is a topic worthy of
> discussion, is gdb a thread-safe debugger, or can it be thread-safe? I
> can provide a little more background here. We are using crash utility
> for kernel vmcore debugging, and crash is built upon gdb (By patching
> some code to gdb source and compiling and linking them together).
Hi Tao,
I'd like to try and answer your question.
Apologies if I have misunderstood you, but, it sounds like you are
somehow combining GDB with some additional code to create a new "crash"
utility. Almost like you are trying to use GDB as a debugger library.
Yes, it is exactly the case.
If this is the case, then no, GDB is certainly not thread safe. GDB
isn't a library, and was not written as one. I guess it'll work just
fine if you only try to use a single instance of GDB within "crash", but
if you have multiple threads calling into different parts of GDB then
things are going to go wrong quickly.
OK, thanks for the confirmation! I realised gdb may not be thread-safe
when the testing result went wrong. But I didn't know if my guess was
correct, so I was in a crossing whether to continue or turn around for
a different path. Now I think it's time for me to choose a new path.
>
> I notice gdb already support multi-thread somehow (in
> gdb-10.2/gdbsupport/thread-pool.cc, and "GDB may use multiple threads
> to speed up certain CPU-intensive operations, such as demangling
> symbol names" as I quoted from gdb-10.2/gdb/maint.c). So it seems to
> me at least some part of gdb is thread safe.
GDB does make use of multiple threads to speed up some aspects of the
DWARF loading and symbol table building. But this is really a very
limited corner of GDB (though definitely something that is time
consuming so benefits from parallelism), the core of GDB, the bit that
uses the symbol table to provide a debugger to the user, is all single
threaded.
At the startup of crash utility, it will resolve hundreds of kernel
symbols, to determine the memory layout, modules loading, kernel
version and etc information of current vmcore and vmlinux. Previously
crash achieved this by calling some upper level functions of gdb.
Hmm... maybe it is worth trying to ask crash to call to the lower
DWARF loading functions...
> The problem is I don't
> know the thread safe boundary, so segfault and broken stack are easily
> triggered when multithread enabled in crash utility.
I really don't think what you want is going to work with GDB, so I've
not dug through to find the exact code that runs in the worker threads,
but I know its limited to DWARF parsing and symbol table creation.
OK, thanks for the explanation!
>
> In addition, if any other modern debuggers, such as lldb, can provide
> more multi-thread support, then I can give it a try, but I lack
> related info. Any suggestions and comments are welcomed, thanks in
> advance!
I don't know if any other debuggers are written more as a library, so
can't help here, sorry.
No problem, thanks again for your directions and help!
Thanks,
Tao Liu
Thanks,
Andrew
>
> Thanks,
> Tao Liu
>
> On Thu, Feb 9, 2023 at 5:28 PM lijiang <lijiang(a)redhat.com> wrote:
>>
>> On Wed, Feb 8, 2023 at 4:26 PM <crash-utility-request(a)redhat.com> wrote:
>>>
>>> Message: 1
>>> Date: Wed, 8 Feb 2023 12:34:32 +0800
>>> From: Tao Liu <ltao(a)redhat.com>
>>> To: crash-utility(a)redhat.com
>>> Subject: [Crash-utility] Questions on multi-thread for crash
>>> Message-ID: <Y+MmWBi+kq+ZSDqn(a)localhost.localdomain>
>>> Content-Type: text/plain; charset=iso-8859-1
>>>
>>> Hello,
>>>
>>> Recently I made an attempt to introduce a thread pool for crash utility, to
>>> optimize the performance of crash.
>>>
>>
>> Good question, Tao.
>>
>>>
>>> One obvious point which can benefit from multi-threading is
memory.c:vm_init().
>>> There are hundreds of MEMBER_OFFSET_INIT() related symbol resolving
functions,
>>> and most of the symbols are independent from each other, by careful
arrangement,
>>> they can be invoked parallelly. By doing so, we can shorten the waiting time
of
>>> crash starting.
>>>
>>> The implementation is abstracted as the following:
>>>
>>> Before multi-threading:
>>> MEMBER_OFFSET_INIT(task_struct_mm, "task_struct",
"mm");
>>> MEMBER_OFFSET_INIT(mm_struct_mmap, "mm_struct",
"mmap");
>>>
>>> After multi-threading:
>>> create_threadpool(&pool, 3);
>>> ...
>>> MEMBER_OFFSET_INIT_PARA(pool, task_struct_mm,
"task_struct", "mm");
>>> MEMBER_OFFSET_INIT_PARA(pool, mm_struct_mmap, "mm_struct",
"mmap");
>>> ...
>>> wait_and_destroy_threadpool(pool);
>>>
>>> MEMBER_OFFSET_INIT_PARA just append the task to the work queue of thread
pool
>>> and continues, it's up to the pool to schedule the worker thread to do
the
>>> symbol resolving work.
>>>
>>> However, after enable multi-threading, I noticed there are always random
errors
>>> from gdb. From segfault to broken stack, it seems gdb is not thread safe at
>>> all...
>>>
>>> For example one error listed as follows:
>>>
>>> Thread 10 "crash" received signal SIGSEGV, Segmentation
fault.
>>> [Switching to Thread 0x7fffc4f00640 (LWP 72950)]
>>> c_yylex () at /sources/up-crash/gdb-10.2/gdb/c-exp.y:3250
>>> 3250 ? if (pstate->language ()->la_language != language_cplus
>>> (gdb) bt
>>> #0 ?c_yylex () at /sources/up-crash/gdb-10.2/gdb/c-exp.y:3250
>>> #1 ?c_yyparse () at /sources/up-crash/gdb-10.2/gdb/c-exp.c.tmp:2092
>>> #2 ?0x00000000006f62d7 in c_parse (par_state=<optimized out>)
at /sources/
>>> up-crash/gdb-10.2/gdb/c-exp.y:3414
>>> #3 ?0x0000000000894eac in parse_exp_in_context
(stringptr=0x7fffc4efeff8,
>>> pc=<optimized out>, block=<optimized out>, comma=0,
out_subexp=0x0,
>>> tracker=0x7fffc4efef10, cstate=0x0, void_context_p=0) at
parse.c:1122
>>> #4 ?0x00000000008951d6 in parse_exp_1 (tracker=0x0, comma=0,
block=0x0,
>>> pc=0, stringptr=0x7fffc4efeff8) at parse.c:1031
>>> #5 ?parse_expression (string=<optimized out>,
string@entry=0x7fffc4eff140
>>> "slab_s", tracker=tracker@entry=0x0) at parse.c:1166
>>> #6 ?0x000000000092039a in gdb_get_datatype (req=0x7fffc4eff720) at
symtab.c:7239
>>> #7 ?gdb_command_funnel_1 (req=0x7fffc4eff720) at symtab.c:7018
>>> #8 ?0x00000000009206de in gdb_command_funnel (req=0x7fffc4eff720) at
symtab.c:6956
>>> #9 ?0x00000000005ad137 in gdb_interface (req=0x7fffc4eff720) at
gdb_interface.c:409
>>> #10 0x00000000005fe76c in datatype_info (name=0xab9700
"slab_s",
>>> member=0xaba8d8 "list", dm=0x0) at symbols.c:5708
>>> #11 0x0000000000517a85 in member_offset_init_slab_s_list_slab_s_list
()
>>> at memory.c:659
>>> #12 0x000000000068168f in group_routine (args=<optimized out>)
at thpool.c:81
>>> #13 0x00007ffff7a48b17 in start_thread () from /lib64/libc.so.6
>>> ? #14 0x00007ffff7acd6c0 in clone3 () from /lib64/libc.so.6
>>> (gdb) p pstate
>>> $1 = (parser_state *) 0x0
>>>
>>> $ cat -n /sources/up-crash/gdb-10.2/gdb/c-exp.y
>>> 66 ?/* The state of the parser, used internally when we are parsing
the
>>> 67 ? ? expression. ?*/
>>> 68 ?
>>> 69 ?static struct parser_state *pstate = NULL;
>>>
>>> pstate is a global variable and not thread safe, the value must be changed
by
>>> someone else...
>>>
>>> Now the project has reached a dead end. Because making gdb thread safe is
an
>>> impossible mission to me. Is there any advice or suggestions? Thanks in
advance!
>>>
>>
>> Can you try to load some symbols on demand when crash initializes? And later,
load and cache
>> these symbols in crash when we execute a crash command for the first time, but
it may have another
>> issue, the crash command might be slow for the first time.
>>
>> In addition, can you also try to filter out some old and unuseful symbols? For
example:
>>
>> Some kernel symbols have been removed from the latest kernel, if the current
vmcore
>> generated by the latest kernel version, crash won't need to check or search
for these old
>> kernel symbols when initializing. Otherwise, still load these old kernel
symbols. Maybe it
>> may save the initializing time.
>>
>> Thanks.
>> Lianbo
>>
>>
>>> Thanks!
>>> Tao Liu