Hi Kazu,
On Wed, Nov 15, 2023 at 4:21 PM HAGIO KAZUHITO(萩尾 一仁)
<k-hagio-ab(a)nec.com> wrote:
On 2023/11/14 17:49, Tao Liu wrote:
> There is an issue that, for kernel modules loaded by mod -s/-S, "dis -rl"
fails
> to display module's code line number data after execute "bt" cmd in
crash.
>
> Without the patch:
> crsah> mod -S
> crash> bt
> PID: 1500 TASK: ff2bd8b093524000 CPU: 16 COMMAND: "lpfc_worker_0"
> #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
> ...snip...
> #7 [ff2c9f725c39fc00] page_fault at ffffffff8ea0114e
> [exception RIP: lpfc_nlp_get+210]
> RIP: ffffffffc0f60f82 RSP: ff2c9f725c39fcb0 RFLAGS: 00010046
> RAX: 0000000000000046 RBX: ff2bd8d8ac056000 RCX: 0000000000fffffc
> RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
> RBP: ff2bd8d8ac056090 R8: 0000000000000000 R9: 0000000000000000
> R10: ff2bd90d1f8701c0 R11: 0000000000000001 R12: ff2bd93320482ae0
> R13: ff2bd93051a80524 R14: ff2bd93051a80000 R15: ff2bd9332079fc00
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
> ...snip...
> crash> dis -rl ffffffffc0f60f82
> 0xffffffffc0f60eb0 <lpfc_nlp_get>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
> 0xffffffffc0f60eb5 <lpfc_nlp_get+5>: push %rbp
> 0xffffffffc0f60eb6 <lpfc_nlp_get+6>: push %rbx
> 0xffffffffc0f60eb7 <lpfc_nlp_get+7>: test %rdi,%rdi
>
> With the patch:
> crash> mod -S
> crash> bt
> PID: 1500 TASK: ff2bd8b093524000 CPU: 16 COMMAND: "lpfc_worker_0"
> #0 [ff2c9f725c39f9e0] machine_kexec at ffffffff8e0686d3
> ...snip...
> #7 [ff2c9f725c39fc00] page_fault at ffffffff8ea0114e
> [exception RIP: lpfc_nlp_get+210]
> RIP: ffffffffc0f60f82 RSP: ff2c9f725c39fcb0 RFLAGS: 00010046
> RAX: 0000000000000046 RBX: ff2bd8d8ac056000 RCX: 0000000000fffffc
> RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
> RBP: ff2bd8d8ac056090 R8: 0000000000000000 R9: 0000000000000000
> R10: ff2bd90d1f8701c0 R11: 0000000000000001 R12: ff2bd93320482ae0
> R13: ff2bd93051a80524 R14: ff2bd93051a80000 R15: ff2bd9332079fc00
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #8 [ff2c9f725c39fcc0] __lpfc_sli_release_iocbq_s4 at ffffffffc0f2f425 [lpfc]
> ...snip...
> crash> dis -rl ffffffffc0f60f82
>
/usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c:
6756
> 0xffffffffc0f60eb0 <lpfc_nlp_get>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
>
/usr/src/debug/kernel-4.18.0-425.13.1.el8_7/linux-4.18.0-425.13.1.el8_7.x86_64/drivers/scsi/lpfc/lpfc_hbadisc.c:
6759
> 0xffffffffc0f60eb5 <lpfc_nlp_get+5>: push %rbp
>
> The root cause is, after kernel module been loaded by command mod, the symtable
> is not expanded in gdb side. crash command bt or dis will trigger such an
> expansion. However the symtable expansion is different for the 2 commands:
>
> The stack trace of "dis -rl" for symtable expanding:
>
> #0 0x00000000008d8d9f in add_compunit_symtab_to_objfile (cu=cu@entry=0xe6a77a0)
at symfile.c:2914
> #1 0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector
(this=<optimized out>, static_block=static_block@entry=0xfbe4b60, section=1,
expandable=expandable@entry=0) at buildsym.c:1072
> #2 0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block
(this=<optimized out>, static_block=static_block@entry=0xfbe4b60,
section=<optimized out>, expandable=expandable@entry=0) at buildsym.c:1106
> #3 0x000000000077e8e9 in process_full_comp_unit (pretend_language=<optimized
out>, cu=0x8ee4c60) at /usr/include/c++/8/bits/unique_ptr.h:716
> #4 process_queue (per_objfile=0xc54c870) at dwarf2/read.c:9220
> #5 dw2_do_instantiate_symtab (per_cu=<optimized out>,
per_objfile=0xc54c870, skip_partial=<optimized out>) at dwarf2/read.c:2448
> #6 0x000000000077ed67 in dw2_instantiate_symtab (per_cu=0xdd0a320,
per_objfile=0xc54c870, skip_partial=<optimized out>) at dwarf2/read.c:2472
> #7 0x000000000077f75e in dw2_expand_all_symtabs (objfile=<optimized out>)
at dwarf2/read.c:3768
> #8 0x00000000008f254d in gdb_get_line_number (req=0x7fffffffb1f0) at
symtab.c:7112
> #9 0x00000000008f22af in gdb_command_funnel_1 (req=0x7fffffffb1f0) at
symtab.c:7023
> #10 0x00000000008f2003 in gdb_command_funnel (req=0x7fffffffb1f0) at
symtab.c:6965
> #11 0x00000000005b7f02 in gdb_interface (req=req@entry=0x7fffffffb1f0) at
gdb_interface.c:409
> #12 0x00000000005f5bd8 in get_line_number (addr=18446744072651935408,
buf=buf@entry=0x7fffffffd460 "", reserved=reserved@entry=0) at symbols.c:4440
> #13 0x000000000059e574 in cmd_dis () at kernel.c:2143
>
> The stack trace of "bt" for symtable expanding:
>
> #0 0x00000000008d8d9f in add_compunit_symtab_to_objfile (cu=cu@entry=0x1ad15630)
at symfile.c:2914
> #1 0x00000000006d3293 in buildsym_compunit::end_symtab_with_blockvector
(this=<optimized out>, static_block=static_block@entry=0x1db0be30, section=1,
expandable=expandable@entry=0) at buildsym.c:1072
> #2 0x00000000006d336a in buildsym_compunit::end_symtab_from_static_block
(this=<optimized out>, static_block=static_block@entry=0x1db0be30,
section=<optimized out>, expandable=expandable@entry=0) at buildsym.c:1106
> #3 0x000000000077e8e9 in process_full_comp_unit (pretend_language=<optimized
out>, cu=0x7465240) at /usr/include/c++/8/bits/unique_ptr.h:716
> #4 process_queue (per_objfile=0xc113810) at dwarf2/read.c:9220
> #5 dw2_do_instantiate_symtab (per_cu=<optimized out>,
per_objfile=0xc113810, skip_partial=<optimized out>) at dwarf2/read.c:2448
> #6 0x000000000077ed67 in dw2_instantiate_symtab (per_cu=0xdd069d0,
per_objfile=0xc113810, skip_partial=<optimized out>) at dwarf2/read.c:2472
> #7 0x000000000077f8ed in dw2_lookup_symbol (objfile=<optimized out>,
block_index=STATIC_BLOCK, name=0x7fffffffc890 "cpumask_t", domain=STRUCT_DOMAIN)
at dwarf2/read.c:3669
> #8 0x00000000008e6d03 in lookup_symbol_via_quick_fns (objfile=0xdd277a0,
block_index=STATIC_BLOCK, name=0x7fffffffc890 "cpumask_t", domain=STRUCT_DOMAIN)
at symtab.c:2392
> #9 0x00000000008e7153 in lookup_symbol_in_objfile (objfile=0xdd277a0,
block_index=STATIC_BLOCK, name=0x7fffffffc890 "cpumask_t", domain=STRUCT_DOMAIN)
at symtab.c:2541
> #10 0x00000000008e73c6 in lookup_symbol_global_or_static_iterator_cb
(objfile=0xdd277a0, cb_data=0x7fffffffc470) at symtab.c:2615
> #11 0x00000000008b99c4 in svr4_iterate_over_objfiles_in_search_order
(gdbarch=<optimized out>, cb=0x8e7342
<lookup_symbol_global_or_static_iterator_cb(objfile*, void*)>,
cb_data=0x7fffffffc470, current_objfile=0x0) at solib-svr4.c:3248
> #12 0x00000000008e754e in lookup_global_or_static_symbol (name=0x7fffffffc890
"cpumask_t", block_index=STATIC_BLOCK, objfile=0x0, domain=STRUCT_DOMAIN) at
symtab.c:2660
> #13 0x00000000008e75da in lookup_static_symbol (name=0x7fffffffc890
"cpumask_t", domain=STRUCT_DOMAIN) at symtab.c:2678
> #14 0x00000000008e632c in lookup_symbol_aux (name=0x7fffffffc890
"cpumask_t", match_type=symbol_name_match_type::FULL, block=0x0,
domain=STRUCT_DOMAIN, language=language_c, is_a_field_of_this=0x0) at symtab.c:2122
> #15 0x00000000008e5a7a in lookup_symbol_in_language (name=0x7fffffffc890
"cpumask_t", block=0x0, domain=STRUCT_DOMAIN, lang=language_c,
is_a_field_of_this=0x0) at symtab.c:1889
> #16 0x00000000008e5b30 in lookup_symbol (name=0x7fffffffc890
"cpumask_t", block=0x0, domain=STRUCT_DOMAIN, is_a_field_of_this=0x0) at
symtab.c:1915
> #17 0x00000000008f2a4a in gdb_get_datatype (req=0x7fffffffc730) at symtab.c:7229
> #18 0x00000000008f22c0 in gdb_command_funnel_1 (req=0x7fffffffc730) at
symtab.c:7027
> #19 0x00000000008f2003 in gdb_command_funnel (req=0x7fffffffc730) at
symtab.c:6965
> #20 0x00000000005b7f02 in gdb_interface (req=req@entry=0x7fffffffc730) at
gdb_interface.c:409
> #21 0x00000000005f8a9f in datatype_info (name=name@entry=0xa8454d
"cpumask_t", member=member@entry=0x0, dm=dm@entry=0xfffffffffffffffc) at
symbols.c:5715
> #22 0x0000000000599947 in cpu_map_size (type=<optimized out>) at
kernel.c:913
> #23 0x00000000005a975d in get_cpus_online () at kernel.c:9556
> #24 0x0000000000637a8b in diskdump_get_prstatus_percpu (cpu=16) at
diskdump.c:2277
> #25 0x000000000062f0e4 in get_netdump_regs_x86_64 (bt=0x7fffffffd950,
ripp=0x7fffffffd130, rspp=0x7fffffffd138) at netdump.c:3471
> #26 0x000000000059fe68 in back_trace (bt=bt@entry=0x7fffffffd950) at
kernel.c:3092
> #27 0x00000000005ab1cb in cmd_bt () at kernel.c:2859
>
> For the stacktrace of "dis -rl", it calls dw2_expand_all_symtabs() to
expand
> all symtable of the objfile, or "*.ko.debug" in our case. However for
> the stacktrace of "bt", it doesn't expand all, but only a subset of
symtable
> which is enough to find a symbol by dw2_lookup_symbol(). As a result, the
> objfile->compunit_symtabs, which is the head of a single linked list of
> struct compunit_symtab, is not NULL but didn't contain all symtables. It
> will not be reinitialized in gdb_get_line_number() by "dis -rl" because
> !objfile_has_full_symbols(objfile) check will fail, so it cannot display
> the proper code line number data.
>
> This patch will force all the symtable of module to be expanded during
> mod load phase, so no matter what commands follow, objfile->compunit_symtabs
> always contain all symtabls.
Thank you for looking into this issue.
a question, is "mod -S -r" a workaround for it?
Yes, it can work as expected with "mod -S -r", I didn't know
"-r"
parameter can trigger such symble expansion.
I'm thinking that, if the current gdb's auto expansion is not
good for
crash, maybe we can make the behavior of "mod -r" option default. The
option adds "-readnow" to the add-symbol-file command and it looks same
as your patch to me:
$ vim gdb-10.2/gdb/symfile.c
/* We now have at least a partial symbol table. Check to see if the
user requested that all symbols be read on initial access via either
the gdb startup command line or on a per symbol file basis. Expand
all partial symbol tables for this objfile if so. */
if ((flags & OBJF_READNOW))
{
if (should_print)
printf_filtered (_("Expanding full symbols from %ps...\n"),
styled_string (file_name_style.style (), name));
if (objfile->sf)
objfile->sf->qf->expand_all_symtabs (objfile);
}
Agreed, they do the same work. Thanks again for your suggestions. Do
you want me to draft the "making mod -r option default" patch now or
later?
Thanks,
Tao Liu
Thanks,
Kazu
>
> Signed-off-by: Tao Liu <ltao(a)redhat.com>
> ---
>
> PS: This patch is a stand along and is not the follow-up of
> [PATCH v2] symbols: skip load .init.* sections if module was successfully
initialized
>
> ---
> gdb-10.2.patch | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/gdb-10.2.patch b/gdb-10.2.patch
> index d81030d..0a9a4e1 100644
> --- a/gdb-10.2.patch
> +++ b/gdb-10.2.patch
> @@ -3187,3 +3187,14 @@ exit 0
> result = stringtab + symbol_entry->_n._n_n._n_offset;
> }
> else
> +--- gdb-10.2/gdb/symtab.c.orig
> ++++ gdb-10.2/gdb/symtab.c
> +@@ -7537,6 +7537,8 @@ gdb_add_symbol_file(struct gnu_request *req)
> + lm->loaded_objfile =
objfile->separate_debug_objfile;
> + else
> + lm->loaded_objfile = objfile;
> ++ if (lm->loaded_objfile->sf)
> ++
lm->loaded_objfile->sf->qf->expand_all_symtabs(lm->loaded_objfile);
> + break;
> + }
> + }