Thank you for the update, Pengfei.
I copied the patch log here and commented it:

Subject: [PATCH]  arm64: Fix broken/incomplete gdb backtrace and unify output
 format

 This patch fixes multiple issues with 'gdb bt' on ARM64, where the backtrace
 would be interrupted, contain garbage threads, or display fragmented output.

 1. Fix Out-of-Bounds Read in Exception Frame Handling:
    In `arm64_print_exception_frame`, the code previously used `memcpy` to copy
    `sizeof(struct arm64_pt_regs)` bytes from a `struct arm64_stackframe *` source.
    Since `stackframe` is significantly smaller than `pt_regs`, this caused an
    out-of-bounds read, populating the GDB thread registers with stack garbage
    (often resulting in invalid addresses like -3/0xff...fd).
    This is fixed by manually copying only the valid registers (PC, SP, FP, etc.)
    and properly initializing the bitmap.

 2. Bridge the Gap Between IRQ and Process Stacks:
    Previously, GDB unwinding would stop at `call_on_irq_stack` because it could
    not automatically unwind through the assembly trampoline back to the process
    stack.
    Modified `arm64_switch_stack` (and the overflow variant) to "peek" one frame
    ahead (reading the saved FP/PC of the caller) before registering the new
    GDB substack. This effectively bridges the discontinuity, allowing GDB to
    show frames like `do_interrupt_handler` that were previously missing.

 3. Unify and Format GDB Output:
    Modified `gdb_interface.c` to:
    - Strip "Thread <id>" headers to present a continuous backtrace similar to
      the native `crash bt`.
    - Renumber stack frames sequentially (e.g., #0 to #30) instead of resetting
      at each stack switch.
    - Add indentation/alignment for frames where GDB omits the address (e.g.,
      inline functions) to improve readability.

 4. Prevent Invalid Thread Creation:
    Added checks to ensure a GDB substack is only created if the Program Counter
    (PC) is non-zero, preventing the display of "corrupt" or empty threads.


The 1,2,4 changes are fine to me.

For the 3rd change, it involves too many changes and brings very little benefit. Anyway, let's see if Tao has any concerns about it.

Thanks
Lianbo


    Tested on: Android 6.x ARM64

Signed-off-by: lipengfei28 <lipengfei28@xiaomi.com>
On Thu, Feb 5, 2026 at 2:16 PM 李鹏飞 <lipengfei28@xiaomi.com> wrote:

Hi Lianbo

1. Fixed Thread-Specific Register Retrieval

   * What was changed: In arm64_get_current_task_reg, the index was changed from a global pointer (extra_stacks_idx - 1) to a thread-specific ID (sid

     - 1).

   * Why it was fixed: Previously, the code always returned registers for the last added stack segment, regardless of which GDB thread was being

     inspected. This led to register data corruption when a task had multiple stack transitions (e.g., IRQ stack to Process stack). Now, it correctly

     maps each GDB thread to its corresponding register set.

 

  2. Eliminated Redundant Initial Backtrace for Panic Task

   * What was changed: Added (bt->task != tt->panic_task) checks in both arm64_back_trace_cmd and arm64_back_trace_cmd_v2 before adding the initial

     context as a GDB substack.

   * Why it was fixed:

       * For the Crashing (Panic) Task, GDB’s main thread (Thread 0) is already initialized with the correct registers from the ELF notes/crash notes.

       * Adding it again as a substack created a "Clone" (Thread 1) with identical registers, causing the backtrace to repeat itself (e.g., showing

         frames #0-20 and then repeating them as #21-41).

       * By skipping this initial substack for the panic task only, we prevent the duplication while still allowing other tasks to correctly register

         their contexts.

 

  Summary of Benefits:

   * No more duplicate frames: The gdb bt output for the crashing task will now start from frame #0 and continue linearly without repeating the entire

     stack.

   * Correct Register Data: Multiple stack segments (like IRQ or Exception stacks) will now show the correct register values for their respective

     frames because the indexing is no longer hardcoded to the "last" segment.

 

Could you please test using the attached patch?

Thanks

Li Pengfei

 

 

发件人: lijiang <lijiang@redhat.com>
发送时间: 202625 10:29
收件人: 李鹏飞 <lipengfei28@xiaomi.com>
抄送: devel@lists.crash-utility.osci.io
主题: [External Mail]Re: [PATCH] arm64: Fix broken/incomplete gdb backtrace and unify output format

 

[外部邮件] 此邮件来源于小米公司外部,请谨慎处理。若对邮件安全性存疑,请将邮件转发给misec@xiaomi.com进行反馈

On Wed, Feb 4, 2026 at 6:01PM 李鹏飞 <lipengfei28@xiaomi.com> wrote:

Hi Lianbo

Ive reviewed and tested the patch on my setup, but I wasnt able to reproduce the issue on my side. All relevant tests and scenarios passed as expected.

Since you mentioned that the problem occurs in your environment, it might be related to differences in configuration, kernel version, or other environment-specific factors. Could you share more details about your setup (kernel version, config, workloads, etc.) so we can investigate further?

It doesn't have any specific configuration, I tested it on an arm64 virtual machine. Probably, the current issue can be also reproduced on CentOS 10 or other distributions.

 

Also, could you send me your crash dump and the corresponding vmlinux? I can try debugging it on my side to see what might be causing the issue.

The dump file is hundreds of megabytes, and basically I can not share it with you. Sorry about that.

Im happy to help dig deeper once we have the environment specifics.

Sounds good.

 

Thanks

Lianbo 

Thanks

Li Pengfei