On Tue, Feb 14, 2023 at 8:43 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
On 2023/02/10 19:36, lijiang wrote:
>> On Fri, Feb 10, 2023 at 2:43 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com <mailto:k-hagio-ab@nec.com>> wrote:
>>
>>     On 2023/02/09 21:35, lijiang wrote:> On Tue, Feb 7, 2023 at 6:08 PM lijiang <lijiang@redhat.com <mailto:lijiang@redhat.com> <mailto:lijiang@redhat.com <mailto:lijiang@redhat.com>>> wrote:
>>     >
>>     >         more important thing is how we determine the irq_eframe_link.
>>     >
>>     >     The following patch can work on upstream kernel vmcore and RHEL9 vmcore.
>>     >     Maybe we can check the symbols asm_common_interrupt and asm_call_on_stack as below:
>>     >
>>     >     diff --git a/x86_64.c b/x86_64.c
>>     >     index 7a5d6f050c89..62036f71f632 100644
>>     >     --- a/x86_64.c
>>     >     +++ b/x86_64.c
>>     >     @@ -3938,6 +3938,11 @@ in_exception_stack:
>>     >               if (irq_eframe) {
>>     >                       bt->flags |= BT_EXCEPTION_FRAME;
>>     >                       i = (irq_eframe - bt->stackbase)/sizeof(ulong);
>>     >     + if (symbol_exists("asm_common_interrupt")) {
>>     >     + i -= 1;
>>     >     + up = (ulong *)(&bt->stackbuf[i*sizeof(ulong)]);
>>     >     + bt->instptr = *up;
>>     >     + }
>>     >                       x86_64_print_stack_entry(bt, ofp, level, i, bt->instptr);
>>     >                       bt->flags &= ~(ulonglong)BT_EXCEPTION_FRAME;
>>     >                       cs = x86_64_exception_frame(EFRAME_PRINT|EFRAME_CS, 0,
>>     >     @@ -6521,6 +6526,16 @@ x86_64_irq_eframe_link_init(void)
>>     >        else
>>     >        return;
>>     >
>>     >     + if (symbol_exists("asm_common_interrupt") && !symbol_exists("asm_call_on_stack")) {
>>     >     + machdep->machspec->irq_eframe_link =-32;
>>     >     + return;
>>     >     + }
>>     >     +
>>     >     + if (symbol_exists("asm_common_interrupt") && symbol_exists("asm_call_on_stack")) {
>>     >     + machdep->machspec->irq_eframe_link =-56;
>>     >     + return;
>>     >     + }
>>     >     +
>>     >        if (THIS_KERNEL_VERSION < LINUX(2,6,9))
>>     >        return;
>>     >
>>     >
>>     > Do you have any other comments about the above changes? Or still looking for a better solution, any thoughts? Kazu.
>>
>>     Thank you for the information.  I've looked for a way to follow
>>     kernel changes, but it would be hard with the current unwinder.
>>
>>     So for now, let's go with this?
>>
>>
>> Agree. I will post the v2 later with your signature together.
>>
>>     +       if (symbol_exists("asm_common_interrupt")) {
>>     +               if (symbol_exists("asm_call_on_stack"))
>>     +                       machdep->machspec->irq_eframe_link = -64;
>>     +               else
>>     +                       machdep->machspec->irq_eframe_link = -32;
>>     +               return;
>>     +       }
>>
>>     Probably the actual irq_eframe_link for your kernel is -64, I
>>     think it's adjusted in x86_64_irq_eframe_link().  Please check
>>
>>
>> You are right. I confirm that the irq_eframe_link will be adjusted in the x86_64_irq_eframe_link().
>>
>> But it should be better to set the irq_eframe_link to -64 in the x86_64_irq_eframe_link_init(), it doesn't need to be adjusted any more.
>>
>>     "help -m".  e.g. a 5.8 vmcore on hand:
>>
>>     crash-dev> bt
>>     ...
>>     crash-dev> help -m | grep irq_eframe
>>                irq_eframe_link: -64
>>
>>
>>     The following is the reason why the irq_eframe_link is -32 on
>>     RHEL9.1 (5.14.0-162.6.1.el9_1.x86_64).
>>
>>
>> Thank you for sharing the following process. Kazu.
>>
>> I get another way to calculate the value of irq eframe, the irq eframe is
>> located at the stack address where the asm_common_interrupt symbol is stored
>> plus 8.
>>
>> Maybe we can use the following formula to calculate the result, but it may need
>> to walk through the whole raw stack data. At least, we do not need to get the
>> eframe address from the ASM code. What do you think?
>>
>> BTW: Maybe we can do it in the future.

That way might be a bit better, but anyways I think they are not
very robust for kernel changes and various interrupts.

I've been working on enhancing the usage of ORC data and currently
it's to determine the frame size of functions, but probably we
will also be able to find pt_regs more exactly with the data.
Until it finishes, let's use this way.


Sounds like a good idea. Looking forward to implementing it in the next release.

Thanks.
Lianbo
 
Thanks,
Kazu