[Crash-utility] Re: [PATCH] gdb: fix the "p" command incorrectly print the value of a global variable

Thursday, 18 April 2024

Hi Lianbo,

Thank you for the fix.

I'm afraid of merging this kind of patch changing the initial value in 
gdb before we will make the new release soon, but as far as I tested 
there was no issue found.  so let's merge this, but..

On 2024/03/06 21:32, lijiang wrote:
...
 On Wed, Mar 6, 2024 at 6:30 PM Daisuke Hatayama (Fujitsu) 
 <d.hatayama(a)fujitsu.com <mailto:d.hatayama@fujitsu.com>> wrote:

     Lianbo,

     Thank you for your work.

      > Some objects format may potentially support copy relocations, but
      > currently the maybe_copied is always initialized to 0 in the
     symbol().
      > And the type is 'mst_file_bss', not always the 'mst_bss' or
     'mst_data'
      > in the lookup_minimal_symbol_linkage(). For example:
      >
      > (gdb) p *msymbol
      > $42 = {<general_symbol_info> = {m_name = 0x349812f
     "test_no_static", value = {ivalue = 8, block = 0x8,
      >       bytes = 0x8 <error: Cannot access memory at address 0x8>,
     address = 8, common_block = 0x8, chain = 0x8}, language_specific = {
      >       obstack = 0x0, demangled_name = 0x0}, m_language =
     language_auto, ada_mangled = 0, section = 20}, size = 4,
      >   filename = 0x6db3440 "test_sanity.c", type = mst_file_bss,
     created_by_gdb = 0, target_flag_1 = 0, target_flag_2 = 0, has_size = 1,
      >   maybe_copied = 0, name_set = 1, hash_next = 0x0,
     demangled_hash_next = 0x0}

     The current description lacks explanation of when this issue
     occurs. Please write that the issue occurs when the corresponding
     kernel is built with CONFIG_CALL_DEPTH_TRACKING=y.

 Thank you for the comment, Hatayama.

 I should describe more background on this issue in the patch log. The 
 current issue can be easily reproduced with the following kernel commit:

 commit 80e4c1cd42fff110bfdae8fce7ac4f22465f9664 (HEAD)
 Author: Thomas Gleixner <tglx(a)linutronix.de <mailto:tglx@linutronix.de>>
 Date:   Thu Sep 15 13:11:19 2022 +0200

      x86/retbleed: Add X86_FEATURE_CALL_DEPTH

      Intel SKL CPUs fall back to other predictors when the RSB 
 underflows. The
      only microcode mitigation is IBRS which is insanely expensive. It comes
      with performance drops of up to 30% depending on the workload.

      A way less expensive, but nevertheless horrible mitigation is to 
 track the
      call depth in software and overeagerly fill the RSB when returns 
 underflow
      the software counter.

      Provide a configuration symbol and a CPU misfeature bit.

      Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de 
 <mailto:tglx@linutronix.de>>
      Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org 
 <mailto:peterz@infradead.org>>
      Link: 
 https://lore.kernel.org/r/20220915111147.056176424@infradead.org 
 <https://lore.kernel.org/r/20220915111147.056176424@infradead.org>

 After reverting the above commit, the current issue may disappear. And 
 originally I tried to find the clue how this kernel commit changes 
 affected the gdb, I have not found the clue for the time being. But 
 later I noticed that the gdb gets the correct offset address of a global 
 variable 'test_no_static', which is an expected behavior from the gdb 
 perspective because of copy relocations, probably some object files 
 potentially support the copy relocations, just like this.

     It would also be good to describe the fact that the issue occurs at
     least on RHEL9 kernel.

 This is an upstream issue, I have reproduced it on the upstream kernel 
 with the above kernel commit changes.

The information that the issue occurs on RHEL9 kernel is helpful to RHEL 
users.  Why don't you add something like this?
---
The issue occurs with Linux 6.2 and later or kernels that have kernel 
commit 80e4c1cd42ff ("x86/retbleed: Add X86_FEATURE_CALL_DEPTH") and 
configured with CONFIG_CALL_DEPTH_TRACKING=y, including RHEL9.3 and 
later kernels.
---

...

      > This causes a problem that the 'p' command can not work well as
      > expected, and always gets an error:
      >
      >   crash> mod -s test_sanity /home/test_sanity.ko
      >        MODULE       NAME                         BASE         
       SIZE  OBJECT FILE
      >   ffffffffc1084040  test_sanity            ffffffffc1082000   
     16384  /home/test_sanity.ko
      >   crash> p test_no_static
      >   p: gdb request failed: p test_no_static
      >   crash>
      >
      > With the patch:
      >   crash> mod -s test_sanity /home/test_sanity.ko
      >        MODULE       NAME                         BASE         
       SIZE  OBJECT FILE
      >   ffffffffc1084040  test_sanity            ffffffffc1082000   
     16384  /home/test_sanity.ko
      >   crash> p test_no_static
      >   test_no_static = $1 = 5
      >   crash>

     It's correct that p command doesn't work as expected, but it doesn't
     always result in some error. This issue is failure of calculating
     relocated address of static symbols. If the calculated address happens
     to be the address where read can be successfull, it doesn't result in
     read error but outputs some bogus value.

 It's true, but the bogus value is not an expected result because of an 
 incorrect address.

I guess he means that the phrase only "always gets an error" is not 
true, because it can emit a bogus value.  so isn't "emits an error or a 
bogus value" ok in this case?

I've rebased the patch and modified the commit log as my comments above 
and attached.  Lianbo, is this OK?

Thanks,
Kazu

...
 That is why the maybe_copied flag is initialized to 1, as I mentioned

 above, some objfile may potentially support the copy relocations.

 Thanks.
 Lianbo

     To make this clear, I think it's better to set debug level 4 and to
     have p command output calculated virtual address as debug messages.

     For example:

          crash> sym -M | grep -E " test_no"
          ffffffffc0da7580 (B) test_no
          ffffffffc0da7584 (b) test_no_static
          crash> set debug 4
          debug: 4
          crash> p test_no
          p: per_cpu_symbol_search(test_no): NULL
          test_no = <readmem: ffffffffc0da7580, KVADDR, "gdb_readmem
     callback", 4, (ROE), 560d2d483400>
          <read_diskdump: addr: ffffffffc0da7580 paddr: 10b263580 cnt: 4>
          $3 = 5
          crash> p test_no_static
          p: per_cpu_symbol_search(test_no_static): NULL
          test_no_static = <readmem: ffffffffc0d9f004, KVADDR,
     "gdb_readmem callback", 4, (ROE), 560d2dc9b100>
          <read_diskdump: addr: ffffffffc0d9f004 paddr: 108bfc004 cnt: 4>
          $4 = -1869574000

 --
 Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io
 To unsubscribe send an email to devel-leave(a)lists.crash-utility.osci.io
 https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
 Contribution Guidelines: https://github.com/crash-utility/crash/wiki 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[Crash-utility] Re: [PATCH] gdb: fix the "p" command incorrectly print the value of a global variable