Thank you for the fix.
Date: Thu, 21 Sep 2023 12:29:05 +0530
From: Aditya Gupta <adityag@linux.ibm.com>
To: crash-utility@redhat.com
Cc: Hari Bathini <hbathini@linux.ibm.com>, Mahesh J Salgaonkar
<mahesh@linux.ibm.com>, Sourabh Jain <sourabhjain@linux.ibm.com>,
d.hatayama@fujitsu.com
Subject: [Crash-utility] [PATCH v1] diskdump: add hook for additional
checks on prstatus notes validity
Message-ID: <20230921065905.1020839-1-adityag@linux.ibm.com>
Content-Type: text/plain; charset="US-ASCII"; x-default=true
Upstream crash reports these warnings on PowerPC64:
WARNING: cpu 0 invalid NT_PRSTATUS note (n_type != NT_PRSTATUS)
...
Apart from these warnings, register values are also invalid.
This warning was found in the commit:
commit db8c030857b4 ("diskdump/netdump: fix segmentation fault
caused by failure of stopping CPUs")
With above commit, crash checks whether 'crash_notes' is initialised,
before mapping PRSTATUS notes.
But some architectures such as PowerPC64, in fadump case
(firmware-assisted dump), don't populate 'crash_notes' since the
registers are already stored in the cpu notes in the vmcore.
Instead of checking 'crash_notes' for all architectures, introduce
a machdep hook ('is_cpu_prstatus_valid'), for architectures to
decide validity checks for PRSTATUS notes
A default hook ('diskdump_is_cpu_prstatus_valid') has also been provided
for all architectures other than PowerPC64, which checks if 'crash_notes'
for a given cpu is valid, maintaining the current behaviour
PowerPC64 doesn't utilise 'crash_notes' to get register values, so no
additional checks are required
Fixes: db8c030857b4 ("diskdump/netdump: fix segmentation fault caused by failure of stopping CPUs")
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
Testing
=======
NOTE: To test this on PowerPC64 with upstream kernel dump, AND on system
with Radix MMU, following patch will also be needed to be applied:
Link: https://listman.redhat.com/archives/crash-utility/2023-September/010961.html
This is due to change in vmemmap address mapping for Radix MMU, since
following patch in the kernel:
368a0590d954: (powerpc/book3s64/vmemmap: switch radix to use a
different vmemmap handling function)
More details about the change are in the linked patch. Basically what
changed is, the address mapping for vmemmap address is now in kernel
page table, in case of Radix MMU, instead of 'vmemmap_list' which is currently
used in crash.
Git Tree for Testing
====================
1. With this patch (diskdump: add hook for additional ...) applied:
https://github.com/adi-g15-ibm/crash/tree/bugzilla-203256-list-v1
2. With both this and the linked patch (ppc64: do page traversal ...) applied:
https://github.com/adi-g15-ibm/crash/tree/bugzilla-203256-withupstreamradix
---
---
defs.h | 1 +
diskdump.c | 15 ++++++++++++---
ppc64.c | 10 ++++++++++
3 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/defs.h b/defs.h
index 96a7a2a31471..f7f56947e5ac 100644
--- a/defs.h
+++ b/defs.h
@@ -1073,6 +1073,7 @@ struct machdep_table {
int (*verify_line_number)(ulong, ulong, ulong);
void (*get_irq_affinity)(int);
void (*show_interrupts)(int, ulong *);
+ int (*is_cpu_prstatus_valid)(int cpu);
I would suggest putting it at the end of this table. Although it may not break the compatibility of the extension module, just like the offset_table/size_table, I get used to doing that if there is no special reason.
int (*is_page_ptr)(ulong, physaddr_t *);
int (*get_cpu_reg)(int, int, const char *, int, void *);
};
diff --git a/diskdump.c b/diskdump.c
index 2c284ff3f97f..ad9a00b08ce1 100644
--- a/diskdump.c
+++ b/diskdump.c
@@ -142,13 +142,22 @@ int have_crash_notes(int cpu)
return TRUE;
}
+int diskdump_is_cpu_prstatus_valid(int cpu)
+{
+ static int crash_notes_exists = -1;
+
+ if (crash_notes_exists == -1)
+ crash_notes_exists = kernel_symbol_exists("crash_notes");
+
+ return (!crash_notes_exists || have_crash_notes(cpu));
+}
+
Got a warning as below:
cc -c -g -DX86_64 -DLZO -DGDB_10_2 diskdump.c -Wall -O2 -Wstrict-prototypes -Wmissing-prototypes -fstack-protector -Wformat-security
diskdump.c:145:5: warning: no previous prototype for ‘diskdump_is_cpu_prstatus_valid’ [-Wmissing-prototypes]
145 | int diskdump_is_cpu_prstatus_valid(int cpu)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
void
map_cpus_to_prstatus_kdump_cmprs(void)
{
void **nt_ptr;
int online, i, j, nrcpus;
size_t size;
- int crash_notes_exists;
if (pc->flags2 & QEMU_MEM_DUMP_COMPRESSED) /* notes exist for all cpus */
goto resize_note_pointers;
@@ -171,10 +180,9 @@ map_cpus_to_prstatus_kdump_cmprs(void)
* Re-populate the array with the notes mapping to online cpus
*/
nrcpus = (kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS);
- crash_notes_exists = kernel_symbol_exists("crash_notes");
for (i = 0, j = 0; i < nrcpus; i++) {
- if (in_cpu_map(ONLINE_MAP, i) && (!crash_notes_exists || have_crash_notes(i))) {
+ if (in_cpu_map(ONLINE_MAP, i) && machdep->is_cpu_prstatus_valid(i)) {
dd->nt_prstatus_percpu[i] = nt_ptr[j++];
dd->num_prstatus_notes =
MAX(dd->num_prstatus_notes, i+1);
@@ -1076,6 +1084,7 @@ diskdump_init(char *unused, FILE *fptr)
if (!DISKDUMP_VALID() && !KDUMP_CMPRS_VALID())
return FALSE;
+ machdep->is_cpu_prstatus_valid = diskdump_is_cpu_prstatus_valid;
dd->ofp = fptr;
return TRUE;
}
diff --git a/ppc64.c b/ppc64.c
index fc34006f4863..1159b8c3a8e7 100644
--- a/ppc64.c
+++ b/ppc64.c
@@ -298,6 +298,15 @@ struct machine_specific book3e_machine_specific = {
.is_vmaddr = book3e_is_vmaddr,
};
+/**
+ * No additional checks are required on PPC64, for checking if PRSTATUS notes
+ * is valid
+ */
+int ppc64_is_cpu_prstatus_valid(int cpu)
+{
+ return TRUE;
+}
+
#define SKIBOOT_BASE 0x30000000
/*
@@ -418,6 +427,7 @@ ppc64_init(int when)
break;
case POST_GDB:
+ machdep->is_cpu_prstatus_valid = ppc64_is_cpu_prstatus_valid;
The hook is set in the stage of POST_GDB, I'm wondering if the current warning is still shown in the crash minimal mode(with option --minimal). Can you help to confirm this one?
And other changes are fine to me.
Thanks.
Lianbo
ms = machdep->machspec;
if (!(machdep->flags & BOOK3E)) {
--
2.41.0