[RFC PATCH v2 0/4] Improve stack unwind on ppc64
by Aditya Gupta
The Problem:
============
Currently crash is unable to show function arguments and local variables, as
gdb can do. And functionality for moving between frames ('up'/'down') is not
working in crash.
Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs
'bt', 'frame', 'info locals', 'up', 'down' are not working either, due to
gdb not getting the register values from `crash_target::fetch_registers`,
which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64
Proposed Solution:
==================
Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64.
This way, "gdb mode in crash" will support this feature for both ELF and
kdump-compressed vmcore formats, while "gdb" would only have supported ELF
format
Implications on Architectures:
====================================
No architecture other than PPC64 has been affected, other than in case of
'frame' command
As mentioned in patch #2, since frame will not be prohibited, so it will print:
crash> frame
#0 <unavailable> in ?? ()
Instead of before prohibited message:
crash> frame
crash: prohibited gdb command: frame
On PPC64, the default mode ("crash mode") will not have ANY OTHER changes,
other than 'frame' as mentioned above.
Major change will be in 'gdb mode' on PPC64, that it will print the frames, and
local variables, instead of failing with errors showing no frame, or showing
that couldn't get PC
Testing:
========
Git tree with this patch series applied:
https://github.com/adi-g15-ibm/crash/tree/stack-unwind-rfc2
To test gdb passthroughs:
crash> set gdb on
gdb> thread 3 # or any other thread number to change context in gdb
gdb> bt
gdb> frame
gdb> up
gdb> down
gdb> info locals
Known Issues:
=============
1. In gdb mode, 'info threads' might hang for few seconds, and print only 2
threads
2. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
from older kernels. This is a known issue due to register mismatch, and
its fix has been merged upstream:
Commit: https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef78581...
TODO:
=====
1. Introduce automatic thread selection in gdb mode, to select the crashing
thread in gdb, eliminating the need to manually run "thread <id>" after
switching to gdb mode.
Changelog:
==========
RFC V2:
- removed patch implementing 'frame', 'up', 'down' in crash
- updated the cover letter by removing the mention of those commands other
than the respective gdb passthrough
Aditya Gupta (4):
add generic get_dumpfile_regs to read registers
ppc64: fix gdb passthrough by implementing machdep->get_cpu_reg
remove 'frame' from prohibited commands list
make cpu context change transparent to crash/gdb
defs.h | 125 ++++++++++++++++++++++++++++++++++++++++++++++++
gdb-10.2.patch | 28 +++++++++++
gdb_interface.c | 2 +-
kernel.c | 33 +++++++++++++
ppc64.c | 105 ++++++++++++++++++++++++++++++++++++++--
tools.c | 12 +++--
6 files changed, 298 insertions(+), 7 deletions(-)
--
2.41.0
1 year, 3 months
RISCV64: Use va_kernel_pa_offset in VTOP()
by Song Shuai
Since RISC-V Linux v6.4, the commit 3335068f8721 ("riscv: Use
PUD/P4D/PGD pages for the linear mapping") changes the
phys_ram_base from the kernel_map.phys_addr to the start of DRAM.
The Crash's VTOP() still uses phys_ram_base and kernel_map.virt_addr
to translate kernel virtual address, that made Crash boot failed with
Linux v6.4 and later version.
Let Linux export kernel_map.va_kernel_pa_offset in v6.5 and Crash can
use "va_kernel_pa_offset" to translate the kernel virtual address in
VTOP() correctly.
Signed-off-by: Song Shuai <suagrfillet(a)gmail.com>
---
You can check/test the Linux changes from this link:
https://github.com/sugarfillet/linux/commits/6.5-rc3-crash
And I'll send the Linux changes to riscv/for-next If you're ok with this patch.
---
defs.h | 4 ++--
riscv64.c | 22 ++++++++++++++++++++++
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/defs.h b/defs.h
index 358f365..46b9857 100644
--- a/defs.h
+++ b/defs.h
@@ -3662,8 +3662,7 @@ typedef signed int s32;
ulong _X = X; \
(THIS_KERNEL_VERSION >= LINUX(5,13,0) && \
(_X) >= machdep->machspec->kernel_link_addr) ? \
- (((unsigned long)(_X)-(machdep->machspec->kernel_link_addr)) + \
- machdep->machspec->phys_base): \
+ ((unsigned long)(_X)-(machdep->machspec->va_kernel_pa_offset)): \
(((unsigned long)(_X)-(machdep->kvbase)) + \
machdep->machspec->phys_base); \
})
@@ -7021,6 +7020,7 @@ struct machine_specific {
ulong modules_vaddr;
ulong modules_end;
ulong kernel_link_addr;
+ ulong va_kernel_pa_offset;
ulong _page_present;
ulong _page_read;
diff --git a/riscv64.c b/riscv64.c
index 6b9a688..b9e50b4 100644
--- a/riscv64.c
+++ b/riscv64.c
@@ -418,6 +418,27 @@ error:
error(FATAL, "cannot get vm layout\n");
}
+static void
+riscv64_get_va_kernel_pa_offset(struct machine_specific *ms)
+{
+ unsigned long kernel_version = riscv64_get_kernel_version();
+
+ /*
+ * va_kernel_pa_offset is defined in Linux kernel since 6.5.
+ */
+ if (kernel_version >= LINUX(6,5,0)) {
+ char *string;
+ if ((string = pc->read_vmcoreinfo("NUMBER(va_kernel_pa_offset)"))) {
+ ms->va_kernel_pa_offset = htol(string, QUIET, NULL);
+ free(string);
+ } else
+ error(FATAL, "cannot read va_kernel_pa_offset\n");
+ } else if (kernel_version >= LINUX(6,4,0))
+ error(FATAL, "cannot determine va_kernel_pa_offset since Linux 6.4\n");
+ else
+ ms->va_kernel_pa_offset = ms->kernel_link_addr - ms->phys_base;
+}
+
static int
riscv64_is_kvaddr(ulong vaddr)
{
@@ -1352,6 +1373,7 @@ riscv64_init(int when)
riscv64_get_struct_page_size(machdep->machspec);
riscv64_get_va_bits(machdep->machspec);
riscv64_get_va_range(machdep->machspec);
+ riscv64_get_va_kernel_pa_offset(machdep->machspec);
pt_level_alloc(&machdep->pgd, "cannot malloc pgd space.");
pt_level_alloc(&machdep->machspec->p4d, "cannot malloc p4d space.");
--
2.20.1
1 year, 3 months
[RFC PATCH 0/5] Improve stack unwind on ppc64
by Aditya Gupta
The Problem:
============
Currently crash is unable to show function arguments and local variables, as
gdb can do. And functionality for moving between frames ('up'/'down') is not
working in crash.
Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs
'bt', 'frame', 'info locals', 'up', 'down' are not working either, due to
gdb not getting the register values from `crash_target::fetch_registers`,
which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64
Proposed Solution:
==================
Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64.
This way, "gdb mode in crash" will support this feature for both ELF and
kdump-compressed vmcore formats, while "gdb" would only have supported ELF
format
Also, backtrace can be slightly different in gdb and crash (due to gdb
being able to print inline frames also). so it can cause confusion of 'bt'
working in crash context, but 'frame'/'up'/'down' working as
'gdb passthrough', showing different frames.
This has been explained in patch #4.
So to prevent confusion mentioned above, implement 'frame', 'up', 'down'
as commands in default crash mode also, instead of working via gdb
passthroughs which they do currently.
So, now in default mode, 'bt','frame','up','down' will be consistent with each
other.
Implications on Architectures:
====================================
No architecture other than PPC64 has been affected, other than in case of
'frame', 'up', 'down' commands
1. frame: As mentioned in patch #2, that frame will not be prohibited, and
will print:
crash> frame
#0 <unavailable> in ?? ()
Instead of before prohibited message:
crash> frame
crash: prohibited gdb command: frame
2. up/down: These commands will now be run as native crash commands by
default instead of showing
crash> up
crash: ambiguous command: up (symbol and gdb command)
crash> down
crash: ambiguous command: down (symbol and gdb command)
On PPC64, the default mode ("crash mode") will not have ANY OTHER changes,
other than the 'frame', 'up', 'down' as mentioned above.
Major change will be in 'gdb mode' on PPC64, that it will print the frames, and
local variables, instead of failing with errors showing no frame, or showing
that couldn't get PC
Testing:
========
Git tree with this patch series applied: https://github.ibm.com/adityag/crash
(replace this link with github.com later)
To test 'frame'/'up'/'down' in crash (implemented in patch #3):
crash> bt
crash> frame
crash> up
crash> down
crash> up 4
To test 'bt'/'frame'/'up'/'down'/'info locals' gdb passthroughs:
crash> set gdb on
gdb> thread 3 # or any other thread number to change context in gdb
gdb> bt
gdb> frame
gdb> up
gdb> down
gdb> info locals
Known Issues:
=============
1. In gdb mode, 'info threads' might hang for few seconds, and print only 2
threads
2. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
from older kernels. This is a known issue due to register mismatch, and
its fix has been accepted upstream:
Commit: https://github.com/linuxppc/linux/commit/b684c09f09e7a6af3794d4233ef78581...
TODO:
=====
1. Introduce automatic thread selection in gdb mode, to select the crashing
thread in gdb, eliminating the need to manually run "thread <id>" after
switching to gdb mode.
Aditya Gupta (5):
add generic get_dumpfile_regs to read registers
ppc64: fix gdb passthrough by implementing machdep->get_cpu_reg
remove 'frame' from prohibited commands list
implement 'frame', 'up', 'down' inside crash
make cpu context change transparent to crash/gdb
defs.h | 135 ++++++++++++++++++++++++++
gdb-10.2.patch | 28 ++++++
gdb_interface.c | 2 +-
global_data.c | 3 +
help.c | 34 +++++++
kernel.c | 183 +++++++++++++++++++++++++++++++++++
ppc64.c | 250 +++++++++++++++++++++++++++++++++++++++++++++++-
task.c | 1 +
tools.c | 12 ++-
9 files changed, 641 insertions(+), 7 deletions(-)
--
2.41.0
1 year, 3 months
[PATCH v2] Fix the "foreach DE" task identifier displays incorrect state tasks.
by Lianbo Jiang
Currently, the "foreach DE ps -m" command may display "DE" as well as
"ZO" state tasks as below:
crash> foreach DE ps -m
...
[0 00:00:00.040] [ZO] PID: 11458 TASK: ffff91c75680d280 CPU: 7 COMMAND: "ora_w01o_p01mci"
[0 00:00:00.044] [ZO] PID: 49118 TASK: ffff91c7bf3e8000 CPU: 19 COMMAND: "oracle_49118_p0"
[0 00:00:00.050] [ZO] PID: 28748 TASK: ffff91a7cbde3180 CPU: 2 COMMAND: "ora_imr0_p01sci"
[0 00:00:00.050] [DE] PID: 28405 TASK: ffff91a7c8eb0000 CPU: 27 COMMAND: "ora_vktm_p01sci"
[0 00:00:00.051] [ZO] PID: 31716 TASK: ffff91a7f7192100 CPU: 6 COMMAND: "ora_p001_p01sci"
...
That is not expected behavior, the "foreach" command needs to handle
such cases. Let's add a check to determine if the task state identifier
is specified and the specified identifier is equal to the actual task
state identifier, so that it can filter out the unspecified state
tasks.
With the patch:
crash> foreach DE ps -m
[0 00:00:00.050] [DE] PID: 28405 TASK: ffff91a7c8eb0000 CPU: 27 COMMAND: "ora_vktm_p01sci"
crash>
Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com>
---
defs.h | 2 +-
task.c | 52 +++++++++++++++++++---------------------------------
2 files changed, 20 insertions(+), 34 deletions(-)
diff --git a/defs.h b/defs.h
index 358f365585cf..5ee60f1eb3a5 100644
--- a/defs.h
+++ b/defs.h
@@ -1203,7 +1203,7 @@ struct foreach_data {
char *pattern;
regex_t regex;
} regex_info[MAX_REGEX_ARGS];
- ulong state;
+ const char *state;
char *reference;
int keys;
int pids;
diff --git a/task.c b/task.c
index b9076da35565..20a9ce3aa40b 100644
--- a/task.c
+++ b/task.c
@@ -6636,39 +6636,42 @@ cmd_foreach(void)
STREQ(args[optind], "NE") ||
STREQ(args[optind], "SW")) {
+ ulong state = TASK_STATE_UNINITIALIZED;
+
if (fd->flags & FOREACH_STATE)
error(FATAL, "only one task state allowed\n");
if (STREQ(args[optind], "RU"))
- fd->state = _RUNNING_;
+ state = _RUNNING_;
else if (STREQ(args[optind], "IN"))
- fd->state = _INTERRUPTIBLE_;
+ state = _INTERRUPTIBLE_;
else if (STREQ(args[optind], "UN"))
- fd->state = _UNINTERRUPTIBLE_;
+ state = _UNINTERRUPTIBLE_;
else if (STREQ(args[optind], "ST"))
- fd->state = _STOPPED_;
+ state = _STOPPED_;
else if (STREQ(args[optind], "TR"))
- fd->state = _TRACING_STOPPED_;
+ state = _TRACING_STOPPED_;
else if (STREQ(args[optind], "ZO"))
- fd->state = _ZOMBIE_;
+ state = _ZOMBIE_;
else if (STREQ(args[optind], "DE"))
- fd->state = _DEAD_;
+ state = _DEAD_;
else if (STREQ(args[optind], "SW"))
- fd->state = _SWAPPING_;
+ state = _SWAPPING_;
else if (STREQ(args[optind], "PA"))
- fd->state = _PARKED_;
+ state = _PARKED_;
else if (STREQ(args[optind], "WA"))
- fd->state = _WAKING_;
+ state = _WAKING_;
else if (STREQ(args[optind], "ID"))
- fd->state = _UNINTERRUPTIBLE_|_NOLOAD_;
+ state = _UNINTERRUPTIBLE_|_NOLOAD_;
else if (STREQ(args[optind], "NE"))
- fd->state = _NEW_;
+ state = _NEW_;
- if (fd->state == TASK_STATE_UNINITIALIZED)
+ if (state == TASK_STATE_UNINITIALIZED)
error(FATAL,
"invalid task state for this kernel: %s\n",
args[optind]);
+ fd->state = args[optind];
fd->flags |= FOREACH_STATE;
optind++;
@@ -7039,26 +7042,9 @@ foreach(struct foreach_data *fd)
if ((fd->flags & FOREACH_KERNEL) && !is_kernel_thread(tc->task))
continue;
- if (fd->flags & FOREACH_STATE) {
- if (fd->state == _RUNNING_) {
- if (task_state(tc->task) != _RUNNING_)
- continue;
- } else if (fd->state & _UNINTERRUPTIBLE_) {
- if (!(task_state(tc->task) & _UNINTERRUPTIBLE_))
- continue;
-
- if (valid_task_state(_NOLOAD_)) {
- if (fd->state & _NOLOAD_) {
- if (!(task_state(tc->task) & _NOLOAD_))
- continue;
- } else {
- if ((task_state(tc->task) & _NOLOAD_))
- continue;
- }
- }
- } else if (!(task_state(tc->task) & fd->state))
- continue;
- }
+ if ((fd->flags & FOREACH_STATE) &&
+ (!STRNEQ(task_state_string(tc->task, buf, 0), fd->state)))
+ continue;
if (specified) {
for (j = 0; j < fd->tasks; j++) {
--
2.37.1
1 year, 3 months
[RFC][PATCH 0/1] add loongarch64 platform support.
by Ming Wang
This patch are for Crash-utility tool, it make crash tool support on
loongarch64 architecture and the common commands(bt, p, rd, mod, log, set,
dis, and so on).
The upstream GDB code supports the loongarch64 architecture from version 13.1.
See: https://sourceware.org/gdb/download/ANNOUNCEMENT
But Crash-utility depends on gdb-10.2, gdb-10.2 do NOT supported loongarch64.
So we need a patch(gdb-10.2-loongarch.patch) to support it. I don't have a better
way to deal with this problem at the moment.
I test this patch on Loongson 3C50000 processor platform.
...
KERNEL: /usr/lib/debug/lib/modules/5.10.0-60.102.0.128.oe2203.loongarch64/vmlinux
DUMPFILE: /proc/kcore
CPUS: 16
DATE: Thu Jul 27 19:51:21 CST 2023
UPTIME: 06:35:11
LOAD AVERAGE: 0.15, 0.03, 0.01
TASKS: 257
NODENAME: localhost.localdomain
RELEASE: 5.10.0-60.102.0.128.oe2203.loongarch64
VERSION: #1 SMP Fri Jul 14 04:17:09 UTC 2023
MACHINE: loongarch64 (2200 Mhz)
MEMORY: 64 GB
PID: 2964
COMMAND: "crash"
TASK: 9000000098805500 [THREAD_INFO: 9000000094d48000]
CPU: 6
STATE: TASK_RUNNING (ACTIVE)
crash>
crash> dis -l start_kernel
/linux-5.10.0-60.102.0.128.oe2203.loongarch64/init/main.c: 883
0x9000000001030818 <start_kernel>: 0x0141ee40
/linux-5.10.0-60.102.0.128.oe2203.loongarch64/init/main.c: 879
0x900000000103081c <start_kernel+4>: 0x90000000
/linux-5.10.0-60.102.0.128.oe2203.loongarch64/init/main.c: 883
0x9000000001030820 <start_kernel+8>: addu16i.d $zero, $t8, 8179(0x1ff3)
/linux-5.10.0-60.102.0.128.oe2203.loongarch64/init/main.c: 879
...
About the LoongArch64 Architecture:
https://www.kernel.org/doc/html/latest/loongarch/index.html
After this RFC, I will split this big patch to many small patchs by function,
like RISCV64 patch sets.
Ming Wang (1):
loongarch64: Support loongarch64 architecture and common commands
Makefile | 9 +-
README | 4 +-
configure.c | 27 +-
crash.8 | 2 +-
defs.h | 161 +-
diskdump.c | 24 +-
gdb-10.2-loongarch.patch | 15207 +++++++++++++++++++++++++++++++++++++
gdb_interface.c | 1 -
help.c | 9 +-
lkcd_vmdump_v1.h | 2 +-
lkcd_vmdump_v2_v3.h | 5 +-
loongarch64.c | 1347 ++++
main.c | 3 +-
netdump.c | 26 +-
ramdump.c | 2 +
symbols.c | 26 +-
16 files changed, 16832 insertions(+), 23 deletions(-)
create mode 100644 gdb-10.2-loongarch.patch
create mode 100644 loongarch64.c
base-commit: c74f375e0ef7cd9b593fa1d73c47505822c8f2a0
--
2.39.2
1 year, 3 months
[PATCH] Fix the "foreach DE" task identifier displays incorrect state tasks.
by Lianbo Jiang
Currently, the "foreach DE ps -m" command may display "DE" as well as
"ZO" state tasks as below:
crash> foreach DE ps -m
...
[0 00:00:00.040] [ZO] PID: 11458 TASK: ffff91c75680d280 CPU: 7 COMMAND: "ora_w01o_p01mci"
[0 00:00:00.044] [ZO] PID: 49118 TASK: ffff91c7bf3e8000 CPU: 19 COMMAND: "oracle_49118_p0"
[0 00:00:00.050] [ZO] PID: 28748 TASK: ffff91a7cbde3180 CPU: 2 COMMAND: "ora_imr0_p01sci"
[0 00:00:00.050] [DE] PID: 28405 TASK: ffff91a7c8eb0000 CPU: 27 COMMAND: "ora_vktm_p01sci"
[0 00:00:00.051] [ZO] PID: 31716 TASK: ffff91a7f7192100 CPU: 6 COMMAND: "ora_p001_p01sci"
...
That is not expected behavior, the "foreach" command needs to handle
such cases. Let's add a check to determine if the task state identifier
is specified and the task state identifier is equal to the "DE", so that
it can filter out the non-"DE" state tasks.
With the patch:
crash> foreach DE ps -m
[0 00:00:00.050] [DE] PID: 28405 TASK: ffff91a7c8eb0000 CPU: 27 COMMAND: "ora_vktm_p01sci"
crash>
Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com>
---
task.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/task.c b/task.c
index b9076da35565..4f40c396b195 100644
--- a/task.c
+++ b/task.c
@@ -7043,6 +7043,9 @@ foreach(struct foreach_data *fd)
if (fd->state == _RUNNING_) {
if (task_state(tc->task) != _RUNNING_)
continue;
+ } else if (fd->state == _DEAD_) {
+ if (task_state(tc->task) != _DEAD_)
+ continue;
} else if (fd->state & _UNINTERRUPTIBLE_) {
if (!(task_state(tc->task) & _UNINTERRUPTIBLE_))
continue;
--
2.37.1
1 year, 3 months