A error stack trace of bt cmd observed:
crash> bt 1
PID: 1 TASK: c000000003714b80 CPU: 2 COMMAND: "systemd"
#0 [c0000000037735c0] _end at c0000000037154b0 (unreliable)
#1 [c000000003773770] __switch_to at c00000000001fa9c
#2 [c0000000037737d0] __schedule at c00000000112e4ec
#3 [c0000000037738b0] schedule at c00000000112ea80
...
The #0 stack trace is incorrect, the function address shouldn't exceed _end.
The reason is for kernel commit cd52414d5a6c ("powerpc/64: ELFv2 use
minimal stack frames in int and switch frame sizes"), the offset of pt_regs
to sp changed from STACK_FRAME_OVERHEAD, i.e 112, to STACK_SWITCH_FRAME_REGS.
For CONFIG_PPC64_ELF_ABI_V1, it's 112, for ABI_V2, it's 48. So the nip will
read a wrong value from stack when ABI_V2 enabled.
After the patch:
crash> bt 1
PID: 1 TASK: c000000003714b80 CPU: 2 COMMAND: "systemd"
#0 [c0000000037737d0] __schedule at c00000000112e4ec
#1 [c0000000037738b0] schedule at c00000000112ea80
...
Signed-off-by: Tao Liu <ltao(a)redhat.com>
Suggested-by: Aditya Gupta <adityag(a)linux.ibm.com>
---
v1 Discussion:
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg01181.html
v2 No discussion:
https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg01170.html
v3 -> v2: Rebase to top-most of upstream patch
Regarding to v1's discussion, we cannot run abiv1 program on abiv2
kernel, it's because abiv1 is big-endian and abiv2 is little-endian, and
abiv2, or ppc64le kernel doesn't support big-endian, or abiv1 program
cannot run upon it, see the following:
$ file blkid
blkid: ELF 64-bit MSB executable, 64-bit PowerPC or cisco 7500, Power ELF V1 ABI, version
1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0,
BuildID[sha1]=b36e8a2a5e4d27039591a35fca38fa48735f5540, stripped
$ ~/qemu-10.1.2/build/qemu-ppc64 ./blkid
/dev/mapper/root: UUID="..." TYPE="xfs"
/dev/sda3: UUID="..." TYPE="LVM2_member" PARTUUID="..."
/dev/sda2: UUID="..." TYPE="xfs" PARTUUID="..."
/dev/mapper/swap: UUID="..." TYPE="swap"
/dev/mapper/home: UUID="..." TYPE="xfs"
/dev/sda1: PARTUUID="..."
$ ./blkid
-bash: ./blkid: cannot execute binary file: Exec format error
$ uname -a
Linux 6.12.0-150.el10.ppc64le #1 SMP Fri Oct 31 06:58:14 EDT 2025 ppc64le GNU/Linux
$ file /bin/bash
/bin/bash: ELF 64-bit LSB pie executable, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2
ABI, version 1 (SYSV), dynamically linked, interpreter /lib64/ld64.so.2,
BuildID[sha1]=9ab800028ced16c5974f5b19cb6ed754178802a8, for GNU/Linux 3.10.0, stripped
The abiv1 program blkid cannot be run on this machine, except with the
help of qemu. So from my view, we don't need to consider the case that abiv2
kernel might containing a abiv1 program or .ko.
Please feel free to correct me if I'm wrong. @Aditya Gupta
---
defs.h | 3 ++-
netdump.c | 14 ++++++++++----
ppc64.c | 34 +++++++++++++++++++++++++++++++---
symbols.c | 5 +++--
4 files changed, 46 insertions(+), 10 deletions(-)
The patch looks good to me, also verified it with SLES and other vmcores
I had with 5.14 and 6.14 kernel, kdump-compressed/ELF formats.
Sorry for the delayed response, not actively working on crash from
sometime, and thanks for the patch Tao !
Reviewed-by: Aditya Gupta <adityag(a)linux.ibm.com>
Thanks,
- Aditya G