Thank you for the update, Pengfei.
I copied the patch log here and commented it:
Subject: [PATCH] arm64: Fix broken/incomplete gdb backtrace and unify
output
format
This patch fixes multiple issues with 'gdb bt' on ARM64, where the
backtrace
would be interrupted, contain garbage threads, or display fragmented
output.
1. Fix Out-of-Bounds Read in Exception Frame Handling:
In `arm64_print_exception_frame`, the code previously used `memcpy` to
copy
`sizeof(struct arm64_pt_regs)` bytes from a `struct arm64_stackframe *`
source.
Since `stackframe` is significantly smaller than `pt_regs`, this caused
an
out-of-bounds read, populating the GDB thread registers with stack
garbage
(often resulting in invalid addresses like -3/0xff...fd).
This is fixed by manually copying only the valid registers (PC, SP, FP,
etc.)
and properly initializing the bitmap.
2. Bridge the Gap Between IRQ and Process Stacks:
Previously, GDB unwinding would stop at `call_on_irq_stack` because it
could
not automatically unwind through the assembly trampoline back to the
process
stack.
Modified `arm64_switch_stack` (and the overflow variant) to "peek" one
frame
ahead (reading the saved FP/PC of the caller) before registering the new
GDB substack. This effectively bridges the discontinuity, allowing GDB
to
show frames like `do_interrupt_handler` that were previously missing.
3. Unify and Format GDB Output:
Modified `gdb_interface.c` to:
- Strip "Thread <id>" headers to present a continuous backtrace
similar
to
the native `crash bt`.
- Renumber stack frames sequentially (e.g., #0 to #30) instead of
resetting
at each stack switch.
- Add indentation/alignment for frames where GDB omits the address
(e.g.,
inline functions) to improve readability.
4. Prevent Invalid Thread Creation:
Added checks to ensure a GDB substack is only created if the Program
Counter
(PC) is non-zero, preventing the display of "corrupt" or empty threads.
The 1,2,4 changes are fine to me.
For the 3rd change, it involves too many changes and brings very little
benefit. Anyway, let's see if Tao has any concerns about it.
Thanks
Lianbo
Tested on: Android 6.x ARM64
Signed-off-by: lipengfei28 <lipengfei28(a)xiaomi.com>
On Thu, Feb 5, 2026 at 2:16 PM 李鹏飞 <lipengfei28(a)xiaomi.com> wrote:
Hi Lianbo
1. Fixed Thread-Specific Register Retrieval
* What was changed: In arm64_get_current_task_reg, the index was
changed from a global pointer (extra_stacks_idx - 1) to a thread-specific
ID (sid
- 1).
* Why it was fixed: Previously, the code always returned registers for
the last added stack segment, regardless of which GDB thread was being
inspected. This led to register data corruption when a task had
multiple stack transitions (e.g., IRQ stack to Process stack). Now, it
correctly
maps each GDB thread to its corresponding register set.
2. Eliminated Redundant Initial Backtrace for Panic Task
* What was changed: Added (bt->task != tt->panic_task) checks in both
arm64_back_trace_cmd and arm64_back_trace_cmd_v2 before adding the initial
context as a GDB substack.
* Why it was fixed:
* For the Crashing (Panic) Task, GDB’s main thread (Thread 0) is
already initialized with the correct registers from the ELF notes/crash
notes.
* Adding it again as a substack created a "Clone" (Thread 1) with
identical registers, causing the backtrace to repeat itself (e.g., showing
frames #0-20 and then repeating them as #21-41).
* By skipping this initial substack for the panic task only, we
prevent the duplication while still allowing other tasks to correctly
register
their contexts.
Summary of Benefits:
* No more duplicate frames: The gdb bt output for the crashing task
will now start from frame #0 and continue linearly without repeating the
entire
stack.
* Correct Register Data: Multiple stack segments (like IRQ or Exception
stacks) will now show the correct register values for their respective
frames because the indexing is no longer hardcoded to the "last"
segment.
Could you please test using the attached patch?
Thanks
Li Pengfei
*发件人:* lijiang <lijiang(a)redhat.com>
*发送时间:* 2026年2月5日 10:29
*收件人:* 李鹏飞 <lipengfei28(a)xiaomi.com>
*抄送:* devel(a)lists.crash-utility.osci.io
*主题:* [External Mail]Re: [PATCH] arm64: Fix broken/incomplete gdb
backtrace and unify output format
*[外部邮件]* 此邮件来源于小米公司外部,请谨慎处理。若对邮件安全性存疑,请将邮件转发给misec(a)xiaomi.xn--com-iw3ew31vyqjqpq
On Wed, Feb 4, 2026 at 6:01 PM 李鹏飞 <lipengfei28(a)xiaomi.com> wrote:
Hi Lianbo
I’ve reviewed and tested the patch on my setup, but I wasn’t able to
reproduce the issue on my side. All relevant tests and scenarios passed as
expected.
Since you mentioned that the problem occurs in your environment, it might
be related to differences in configuration, kernel version, or other
environment-specific factors. Could you share more details about your setup
(kernel version, config, workloads, etc.) so we can investigate further?
It doesn't have any specific configuration, I tested it on an arm64
virtual machine. Probably, the current issue can be also reproduced on
CentOS 10 or other distributions.
Also, could you send me your crash dump and the corresponding vmlinux? I
can try debugging it on my side to see what might be causing the issue.
The dump file is hundreds of megabytes, and basically I can not share it
with you. Sorry about that.
I’m happy to help dig deeper once we have the environment specifics.
Sounds good.
Thanks
Lianbo
Thanks
Li Pengfei