Re: [PAT CH] remove offline status check for CPU register map
by lijiang
On Mon, Nov 4, 2024 at 4:13 PM <devel-request(a)lists.crash-utility.osci.io>
wrote:
> Date: Fri, 1 Nov 2024 20:35:32 +0800
> From: Guanyou Chen <chenguanyou9338(a)gmail.com>
> Subject: [Crash-utility] [PATCH] remove offline status check for CPU
> register map
> To: Lianbo <lijiang(a)redhat.com>, Tao Liu <ltao(a)redhat.com>,
> devel(a)lists.crash-utility.osci.io
> Message-ID:
> <CAHS3RMV5tzd2cHR+zniv-39QZE2idjQjXLytFXv5=
> mneizbw5Q(a)mail.gmail.com>
> Content-Type: multipart/alternative;
> boundary="0000000000006026e40625d92c82"
>
> --0000000000006026e40625d92c82
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Lianbo, Tao
>
> Remove offline status check, We can query the registers of
> each CPU at any time and obtain their stack.
>
> CPU 0: [OFFLINE]
> X0: 0000000000000000 X1: 0000000000000000 X2: 0000000000000000
> X3: 000000000003fcbc X4: 0000000000000001 X5: 0000000000000000
> X6: 0000000000000000 X7: 0000000000000000 X8: 00000000ffffffff
> X9: ffffffc009e6ae48 X10: ffffffc009e6ae20 X11: 0000000000000000
> X12: 0000000000000002 X13: 0000000000000004 X14: 0000000000000000
> X15: 0000000000004000 X16: 00000000f90f05f6 X17: 00000000f90f05f6
> X18: 0000000000000000 X19: 0000000000000002 X20: ffffffc009e3b008
> X21: ffffffc00a01d020 X22: ffffffc009f798f0 X23: 0000000060001000
> X24: 0000000000000000 X25: 0000000000000000 X26: 0000000000000000
> X27: 0000000000000000 X28: ffffff8111eecb00 X29: ffffffc008003f50
> LR: ffffffc00802df88 SP: ffffffc008003f40 PC: ffffffc00802df94
> PSTATE: 024003c5 FPVALID: 00000000
>
> crash> bt -c 0
> PID: 1842 TASK: ffffff8111eecb00 CPU: 0 COMMAND: "android.bg"
> 00 [ffffffc008003f50] ipi_handler at ffffffc00802df90
> 01 [ffffffc008003f90] handle_percpu_devid_irq at ffffffc008146f50
> 02 [ffffffc008003fd0] generic_handle_domain_irq at ffffffc00813f484
> 03 [ffffffc008003fe0] gic_handle_irq at ffffffc008010140
> --- <IRQ stack> ---
> 04 [ffffffc019c3be20] call_on_irq_stack at ffffffc008016ed4
> 05 [ffffffc019c3be40] do_interrupt_handler at ffffffc008019cb4
> 06 [ffffffc019c3be60] el0_interrupt at ffffffc008f7b848
> 07 [ffffffc019c3be90] __el0_irq_handler_common at ffffffc008f7b368
> 08 [ffffffc019c3bea0] el0t_64_irq_handler at ffffffc008f7b344
> 09 [ffffffc019c3bfe0] el0t_64_irq at ffffffc008011720
> PC: 0000000072415108 LR: 00000000724150d0 SP: 0000007691d2bfa0
> X29: 00000000734f60e0 X28: 000000001a2fa678 X27: 0000000000000063
> X26: 000000001a2fa678 X25: 000000001a2fa678 X24: 000000001a7bb718
> X23: 000000001a7ba198 X22: 000000001a7ba190 X21: b4000076f9a828c8
> X20: 0000000000000000 X19: b4000076f9a82800 X18: 000000768d68a000
> X17: 00000000708f89f8 X16: 00000000000000f0 X15: 0000000000000000
> X14: 0000007691d2bca0 X13: 0000000080100000 X12: 0000000000000000
> X11: 0000000000000000 X10: 0000000000000000 X9: 9636716211228cd4
> X8: 9636716211228cd4 X7: 0000000000000010 X6: 000000001a7bb728
> X5: 0000000070845200 X4: 0000000018a40d38 X3: 00000000707e8f98
> X2: 000000001a2fa678 X1: 000000001a7ba198 X0: 0000000070847aa8
> ORIG_X0: 00000000ffffff9c SYSCALLNO: ffffffff PSTATE: 60001000
>
> Signed-off-by: Guanyou.Chen <chenguanyou(a)xiaomi.com>
> ---
> netdump.c | 15 +++++----------
> 1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/netdump.c b/netdump.c
> index 435793b..455f90e 100644
> --- a/netdump.c
> +++ b/netdump.c
> @@ -101,7 +101,7 @@ map_cpus_to_prstatus(void)
> nrcpus = (kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS);
>
> for (i = 0; i < nrcpus; i++) {
> - if (in_cpu_map(ONLINE_MAP, i) && machdep->is_cpu_prstatus_valid(i))
>
Checking online cpus is meaningful, the current modification seems
unreasonable :-)
Please refer to this commit: d5b362edf7d5.
Thanks
Lianbo
> {
> + if (machdep->is_cpu_prstatus_valid(i)) {
> nd->nt_prstatus_percpu[i] = nt_ptr[i];
> nd->num_prstatus_notes =
> MAX(nd->num_prstatus_notes, i+1);
> @@ -2998,15 +2998,10 @@ dump_registers_for_elf_dumpfiles(void)
> return;
> }
>
> - for (c = 0; c < kt->cpus; c++) {
> - if (check_offline_cpu(c)) {
> - fprintf(fp, "%sCPU %d: [OFFLINE]\n", c ? "\n" : "", c);
> - continue;
> - }
> -
> - fprintf(fp, "%sCPU %d:\n", c ? "\n" : "", c);
> - display_regs_from_elf_notes(c, fp);
> - }
> + for (c = 0; c < kt->cpus; c++) {
> + fprintf(fp, "%sCPU %d: %s\n", c ? "\n" : "", c,
> check_offline_cpu(c) ? "[OFFLINE]" : "[ONLINE]");
> + display_regs_from_elf_notes(c, fp);
> + }
> }
>
> struct x86_64_user_regs_struct {
> --
> 2.34.1
>
> Guanyou.
> Thanks.
>
3 days, 17 hours
[PATCH] bugfix map cpus register
by Guanyou Chen
Hi Lianbo, Tao
When CPUs are in an offline state, it can lead to mapping errors.
We need to map them to the correct positions one by one.
Before:
n_namesz: 5 ("CPU2")
n_descsz: 392
n_type: 1 (NT_PRSTATUS)
si.signo: 0 si.code: 0 si.errno: 0
cursig: 0 sigpend: 0 sighold: 0
pid: 3 ppid: 0 pgrp: 0 sid:0
utime: 0.000000 stime: 0.000000
cutime: 0.000000 cstime: 0.000000
X0: ffffffc000fc8818 X1: 0000000000000000 X2:
ffffffc000fc84c8
X3: 0000000000000000 X4: ffffffc0405e37bf X5:
ffffffc00a07372f
X6: 322e34323320205b X7: 545b5d3539383334 X8:
ffffffc000fc2f0c
X9: 89fece0a9ef8cb00 X10: c0000001001f75f4 X11:
00000001001f75f4
X12: 0000000000000003 X13: 00000000000005f4 X14:
ffffffc009eb1210
X15: 0000000000000004 X16: 000000002a4cec24 X17:
000000002a4cec24
X18: ffffffc009e7d140 X19: ffffffc00a04c670 X20:
0000000000000000
X21: 0000000000000000 X22: ffffff8027f22280 X23:
0000000000000009
X24: 0000000000000007 X25: ffffffc009f839c0 X26:
ffffffc0090f87f8
X27: 0000000000000000 X28: ffffff80454f3840 X29:
ffffffc0405e3b60
LR: ffffffc0080e57fc SP: ffffffc0405e3b60 PC:
ffffffc000fc2f84
CPU 0: [OFFLINE]
CPU 1: [OFFLINE]
CPU 2:
X0: 0000000000000000 X1: 0000000000000000 X2: 0000000000000000
X3: 000000000003fcbc X4: 0000000000000001 X5: 0000000000000000
X6: 0000000000000000 X7: 0000000000000000 X8: 00000000ffffffff
X9: ffffffc009e6ae48 X10: ffffffc009e6ae20 X11: 0000000000000000
X12: 0000000000000002 X13: 0000000000000004 X14: 0000000000000000
X15: 0000000000004000 X16: 00000000f90f05f6 X17: 00000000f90f05f6
X18: 0000000000000000 X19: 0000000000000002 X20: ffffffc009e3b008
X21: ffffffc00a01d020 X22: ffffffc009f798f0 X23: 0000000060001000
X24: 0000000000000000 X25: 0000000000000000 X26: 0000000000000000
X27: 0000000000000000 X28: ffffff8111eecb00 X29: ffffffc008003f50
LR: ffffffc00802df88 SP: ffffffc008003f40 PC: ffffffc00802df94
PSTATE: 024003c5 FPVALID: 00000000
After:
CPU 2:
X0: ffffffc000fc8818 X1: 0000000000000000 X2: ffffffc000fc84c8
X3: 0000000000000000 X4: ffffffc0405e37bf X5: ffffffc00a07372f
X6: 322e34323320205b X7: 545b5d3539383334 X8: ffffffc000fc2f0c
X9: 89fece0a9ef8cb00 X10: c0000001001f75f4 X11: 00000001001f75f4
X12: 0000000000000003 X13: 00000000000005f4 X14: ffffffc009eb1210
X15: 0000000000000004 X16: 000000002a4cec24 X17: 000000002a4cec24
X18: ffffffc009e7d140 X19: ffffffc00a04c670 X20: 0000000000000000
X21: 0000000000000000 X22: ffffff8027f22280 X23: 0000000000000009
X24: 0000000000000007 X25: ffffffc009f839c0 X26: ffffffc0090f87f8
X27: 0000000000000000 X28: ffffff80454f3840 X29: ffffffc0405e3b60
LR: ffffffc0080e57fc SP: ffffffc0405e3b60 PC: ffffffc000fc2f84
PSTATE: 600000c5 FPVALID: 00000000
crash> bt
PID: 15959 TASK: ffffff80454f3840 CPU: 2 COMMAND: "AnrConsumer"
[ffffffc0405e3b60] ipanic at ffffffc000fc2f80 [mrdump]
[ffffffc0405e3b70] atomic_notifier_call_chain at ffffffc0080e57f8
[ffffffc0405e3c30] panic at ffffffc008f734d0
[ffffffc0405e3c80] sysrq_handle_crash at ffffffc0087f3c18
[ffffffc0405e3c90] __handle_sysrq at ffffffc0087f3798
[ffffffc0405e3ce0] write_sysrq_trigger at ffffffc0087f49c0
[ffffffc0405e3d00] proc_reg_write at ffffffc00842e4b8
[ffffffc0405e3d80] vfs_write at ffffffc008381eb4
[ffffffc0405e3dd0] ksys_write at ffffffc008382200
[ffffffc0405e3e10] __arm64_sys_write at ffffffc00838228c
[ffffffc0405e3e20] invoke_syscall at ffffffc00802efe0
[ffffffc0405e3e40] el0_svc_common at ffffffc00802eef4
[ffffffc0405e3e70] do_el0_svc at ffffffc00802ede8
[ffffffc0405e3e80] el0_svc at ffffffc008f7a7d0
[ffffffc0405e3ea0] el0t_64_sync_handler at ffffffc008f7a758
[ffffffc0405e3fe0] el0t_64_sync at ffffffc00801157c
PC: 00000077c798ca28 LR: 00000077a82e19f4 SP: 000000761c517af0
X29: 000000761c517b00 X28: 000000761c517db8 X27: 000000761c517c90
X26: 000000761c517c98 X25: 000000761c517bf9 X24: 000000761c519000
X23: 000000761c517be1 X22: 0000000000000001 X21: 00000000000003e3
X20: 000000761c517c11 X19: 000000761c517bf8 X18: 0000007568224000
X17: 00000077c798ca20 X16: 00000077c79b2ae0 X15: b4000077202cc480
X14: 0000000000000000 X13: 000000761c517a70 X12: ffffff80ffffffd0
X11: 000000761c517a40 X10: 0000000000000001 X9: 0000000000000000
X8: 0000000000000040 X7: 7f7f7f7f7f7f7f7f X6: 0000000000000010
X5: 000000761c517c0c X4: ffffffffffffffff X3: ffffffffffffffff
X2: 0000000000000001 X1: 000000761c517c11 X0: 00000000000003e3
ORIG_X0: 00000000000003e3 SYSCALLNO: 40 PSTATE: 00001000
Signed-off-by: Guanyou.Chen <chenguanyou(a)xiaomi.com>
---
netdump.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/netdump.c b/netdump.c
index b4e2a5c..8ea5159 100644
--- a/netdump.c
+++ b/netdump.c
@@ -75,7 +75,7 @@ void
map_cpus_to_prstatus(void)
{
void **nt_ptr;
- int online, i, j, nrcpus;
+ int online, i, nrcpus;
size_t size;
if (pc->flags2 & QEMU_MEM_DUMP_ELF) /* notes exist for all cpus */
@@ -100,9 +100,9 @@ map_cpus_to_prstatus(void)
*/
nrcpus = (kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS);
- for (i = 0, j = 0; i < nrcpus; i++) {
+ for (i = 0; i < nrcpus; i++) {
if (in_cpu_map(ONLINE_MAP, i) && machdep->is_cpu_prstatus_valid(i))
{
- nd->nt_prstatus_percpu[i] = nt_ptr[j++];
+ nd->nt_prstatus_percpu[i] = nt_ptr[i];
nd->num_prstatus_notes =
MAX(nd->num_prstatus_notes, i+1);
}
--
2.34.1
Guanyou.
Thanks.
4 days, 16 hours
Re: [PATCH] bugfix command "help -r" segv fault
by lijiang
Hi, Guanyou
Thank you for the fix.
On Mon, Nov 4, 2024 at 4:13 PM <devel-request(a)lists.crash-utility.osci.io>
wrote:
> Date: Fri, 1 Nov 2024 18:01:27 +0800
> From: Guanyou Chen <chenguanyou9338(a)gmail.com>
> Subject: [Crash-utility] [PATCH] bugfix command "help -r" segv fault
> To: Lianbo <lijiang(a)redhat.com>, Tao Liu <ltao(a)redhat.com>,
> devel(a)lists.crash-utility.osci.io
> Message-ID:
> <CAHS3RMU3nuiqW4z=
> Qo9RoufADrUxcaLhyjnxwMCuGODB_+37yQ(a)mail.gmail.com>
> Content-Type: multipart/mixed; boundary="00000000000065fc530625d705b8"
>
> --00000000000065fc530625d705b8
> Content-Type: multipart/alternative;
> boundary="00000000000065fc530625d705b6"
>
> --00000000000065fc530625d705b6
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Lianbo, Tao
>
> When the ELF Note does not contain CPU registers,
> attempting to retrieve online CPU registers will cause a crash.
>
> After:
> CPU 6:
> help: registers not collected for cpu 6
> ...
>
> Signed-off-by: Guanyou.Chen <chenguanyou(a)xiaomi.com>
> ---
> netdump.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/netdump.c b/netdump.c
> index 8ea5159..435793b 100644
> --- a/netdump.c
> +++ b/netdump.c
> @@ -2780,6 +2780,10 @@ display_regs_from_elf_notes(int cpu, FILE *ofp)
>
I copied the code block here:
display_regs_from_elf_notes(int cpu, FILE *ofp)
{
Elf32_Nhdr *note32;
Elf64_Nhdr *note64;
size_t len;
char *user_regs;
int c, skipped_count;
/*
* Kdump NT_PRSTATUS notes are only related to online cpus,
* so offline cpus should be skipped.
*/
if (pc->flags2 & QEMU_MEM_DUMP_ELF)
skipped_count = 0;
else {
for (c = skipped_count = 0; c < cpu; c++) {
if (check_offline_cpu(c))
skipped_count++;
}
}
if ((cpu - skipped_count) >= nd->num_prstatus_notes &&
!machine_type("MIPS")) {
error(INFO, "registers not collected for cpu %d\n", cpu);
return;
}
...
Could you please point out why the above check does not work?
BTW: I'm not sure if it can work for you, can you help to try this? Just a
guess.
if (((cpu < 0 ) || (!dd->nt_prstatus_percpu[cpu])
|| (cpu - skipped_count) >= nd->num_prstatus_notes) &&
!machine_type("MIPS")) {
error(INFO, "registers not collected for cpu %d\n", cpu);
return;
}
Thanks
Lianbo
nd->nt_prstatus_percpu[cpu];
> else
> note64 = (Elf64_Nhdr *)nd->nt_prstatus;
> + if (!note64) {
> + error(INFO, "registers not collected for cpu %d\n", cpu);
> + return;
> + }
> len = sizeof(Elf64_Nhdr);
> len = roundup(len + note64->n_namesz, 4);
> len = roundup(len + note64->n_descsz, 4);
> @@ -2820,6 +2824,10 @@ display_regs_from_elf_notes(int cpu, FILE *ofp)
> nd->nt_prstatus_percpu[cpu];
> else
> note32 = (Elf32_Nhdr *)nd->nt_prstatus;
> + if (!note32) {
> + error(INFO, "registers not collected for cpu %d\n", cpu);
> + return;
> + }
> len = sizeof(Elf32_Nhdr);
> len = roundup(len + note32->n_namesz, 4);
> len = roundup(len + note32->n_descsz, 4);
> @@ -2857,6 +2865,10 @@ display_regs_from_elf_notes(int cpu, FILE *ofp)
> else
> note64 = (Elf64_Nhdr *)nd->nt_prstatus;
>
> + if (!note64) {
> + error(INFO, "registers not collected for cpu %d\n", cpu);
> + return;
> + }
> prs = (struct ppc64_elf_prstatus *)
> ((char *)note64 + sizeof(Elf64_Nhdr) + note64->n_namesz);
> prs = (struct ppc64_elf_prstatus *)roundup((ulong)prs, 4);
> @@ -2903,6 +2915,10 @@ display_regs_from_elf_notes(int cpu, FILE *ofp)
> nd->nt_prstatus_percpu[cpu];
> else
> note64 = (Elf64_Nhdr *)nd->nt_prstatus;
> + if (!note64) {
> + error(INFO, "registers not collected for cpu %d\n", cpu);
> + return;
> + }
> len = sizeof(Elf64_Nhdr);
> len = roundup(len + note64->n_namesz, 4);
> len = roundup(len + note64->n_descsz, 4);
> --
> 2.34.1
>
> Guanyou.
> Thanks
>
5 days, 22 hours
[PATCH] vmcoreinfo: read vmcoreinfo using 'vmcoreinfo_data' when unavailable in elf note
by Aditya Gupta
Few vmcores don't have vmcoreinfo elf note, such as those created using
virsh-dump.
On architectures such as PowerPC64, vmcoreinfo is mandatory to fetch the
first_vmalloc_address, for vmcores of upstream linux, since crash-utility commit:
commit 5b24e363a898 ("get vmalloc start address from vmcoreinfo")
Try reading from the 'vmcoreinfo_data' symbol instead, if the vmcoreinfo
crash tries to read in case of diskdump/netdump is empty/missing.
The approach to read 'vmcoreinfo_data' was used for a live kernel, which can be
reused in the case of missing vmcoreinfo note also, as the
'vmcoreinfo_data' symbol is available with vmcore too
Hence rename 'vmcoreinfo_read_string' in kernel.c to
'vmcoreinfo_read_from_memory', and use it in netdump.c and diskdump.c
too.
Reported-by: Anushree Mathur <anushree.mathur(a)linux.ibm.com>
Tested-by: Anushree Mathur <anushree.mathur(a)linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag(a)linux.ibm.com>
---
defs.h | 1 +
diskdump.c | 18 ++++++++++++++++++
kernel.c | 9 ++++-----
netdump.c | 19 +++++++++++++++++++
4 files changed, 42 insertions(+), 5 deletions(-)
diff --git a/defs.h b/defs.h
index 2231cb68b804..910264e12314 100644
--- a/defs.h
+++ b/defs.h
@@ -6166,6 +6166,7 @@ void dump_kernel_table(int);
void dump_bt_info(struct bt_info *, char *where);
void dump_log(int);
void parse_kernel_version(char *);
+char *vmcoreinfo_read_from_memory(const char *);
#define LOG_LEVEL(v) ((v) & 0x07)
#define SHOW_LOG_LEVEL (0x1)
diff --git a/diskdump.c b/diskdump.c
index ce3cbb7b12dd..30d0c87f84c1 100644
--- a/diskdump.c
+++ b/diskdump.c
@@ -1041,6 +1041,13 @@ pfn_to_pos(ulong pfn)
return desc_pos;
}
+/**
+ * Check if vmcoreinfo in vmcore is missing/empty
+ */
+static bool is_vmcoreinfo_empty(void)
+{
+ return (dd->sub_header_kdump->size_vmcoreinfo == 0);
+}
/*
* Determine whether a file is a diskdump creation, and if TRUE,
@@ -1088,6 +1095,17 @@ is_diskdump(char *file)
pc->read_vmcoreinfo = vmcoreinfo_read_string;
+ /*
+ * vmcoreinfo can be empty in case of dump collected via virsh-dump
+ *
+ * check if vmcoreinfo is not available in vmcore, and try to read
+ * thev vmcore from memory, using "vmcoreinfo_data" symbol
+ */
+ if (is_vmcoreinfo_empty()) {
+ error(WARNING, "vmcoreinfo is empty, will read from symbols\n");
+ pc->read_vmcoreinfo = vmcoreinfo_read_from_memory;
+ }
+
if ((pc->flags2 & GET_LOG) && KDUMP_CMPRS_VALID()) {
pc->dfd = dd->dfd;
pc->readmem = read_diskdump;
diff --git a/kernel.c b/kernel.c
index adb19ad8725d..7d26a5c5a0a1 100644
--- a/kernel.c
+++ b/kernel.c
@@ -99,7 +99,6 @@ static ulong dump_audit_skb_queue(ulong);
static ulong __dump_audit(char *);
static void dump_audit(void);
static void dump_printk_safe_seq_buf(int);
-static char *vmcoreinfo_read_string(const char *);
static void check_vmcoreinfo(void);
static int is_pvops_xen(void);
static int get_linux_banner_from_vmlinux(char *, size_t);
@@ -11852,8 +11851,8 @@ dump_printk_safe_seq_buf(int msg_flags)
* Returns a string (that has to be freed by the caller) that contains the
* value for key or NULL if the key has not been found.
*/
-static char *
-vmcoreinfo_read_string(const char *key)
+char *
+vmcoreinfo_read_from_memory(const char *key)
{
char *buf, *value_string, *p1, *p2;
size_t value_length;
@@ -11918,10 +11917,10 @@ check_vmcoreinfo(void)
switch (get_symbol_type("vmcoreinfo_data", NULL, NULL))
{
case TYPE_CODE_PTR:
- pc->read_vmcoreinfo = vmcoreinfo_read_string;
+ pc->read_vmcoreinfo = vmcoreinfo_read_from_memory;
break;
case TYPE_CODE_ARRAY:
- pc->read_vmcoreinfo = vmcoreinfo_read_string;
+ pc->read_vmcoreinfo = vmcoreinfo_read_from_memory;
break;
}
}
diff --git a/netdump.c b/netdump.c
index b4e2a5cb2037..c69c7a1e80db 100644
--- a/netdump.c
+++ b/netdump.c
@@ -111,6 +111,14 @@ map_cpus_to_prstatus(void)
FREEBUF(nt_ptr);
}
+/**
+ * Check if vmcoreinfo in vmcore is missing/empty
+ */
+static bool is_vmcoreinfo_empty(void)
+{
+ return (nd->size_vmcoreinfo == 0);
+}
+
/*
* Determine whether a file is a netdump/diskdump/kdump creation,
* and if TRUE, initialize the vmcore_data structure.
@@ -464,6 +472,17 @@ is_netdump(char *file, ulong source_query)
pc->read_vmcoreinfo = vmcoreinfo_read_string;
+ /*
+ * vmcoreinfo can be empty in case of dump collected via virsh-dump
+ *
+ * check if vmcoreinfo is not available in vmcore, and try to read
+ * thev vmcore from memory, using "vmcoreinfo_data" symbol
+ */
+ if (is_vmcoreinfo_empty()) {
+ error(WARNING, "vmcoreinfo is empty, will read from symbols\n");
+ pc->read_vmcoreinfo = vmcoreinfo_read_from_memory;
+ }
+
if ((source_query == KDUMP_LOCAL) &&
(pc->flags2 & GET_OSRELEASE))
kdump_get_osrelease();
--
2.46.2
6 days
[PATCH] remove offline status check for CPU register map
by Guanyou Chen
Hi Lianbo, Tao
Remove offline status check, We can query the registers of
each CPU at any time and obtain their stack.
CPU 0: [OFFLINE]
X0: 0000000000000000 X1: 0000000000000000 X2: 0000000000000000
X3: 000000000003fcbc X4: 0000000000000001 X5: 0000000000000000
X6: 0000000000000000 X7: 0000000000000000 X8: 00000000ffffffff
X9: ffffffc009e6ae48 X10: ffffffc009e6ae20 X11: 0000000000000000
X12: 0000000000000002 X13: 0000000000000004 X14: 0000000000000000
X15: 0000000000004000 X16: 00000000f90f05f6 X17: 00000000f90f05f6
X18: 0000000000000000 X19: 0000000000000002 X20: ffffffc009e3b008
X21: ffffffc00a01d020 X22: ffffffc009f798f0 X23: 0000000060001000
X24: 0000000000000000 X25: 0000000000000000 X26: 0000000000000000
X27: 0000000000000000 X28: ffffff8111eecb00 X29: ffffffc008003f50
LR: ffffffc00802df88 SP: ffffffc008003f40 PC: ffffffc00802df94
PSTATE: 024003c5 FPVALID: 00000000
crash> bt -c 0
PID: 1842 TASK: ffffff8111eecb00 CPU: 0 COMMAND: "android.bg"
00 [ffffffc008003f50] ipi_handler at ffffffc00802df90
01 [ffffffc008003f90] handle_percpu_devid_irq at ffffffc008146f50
02 [ffffffc008003fd0] generic_handle_domain_irq at ffffffc00813f484
03 [ffffffc008003fe0] gic_handle_irq at ffffffc008010140
--- <IRQ stack> ---
04 [ffffffc019c3be20] call_on_irq_stack at ffffffc008016ed4
05 [ffffffc019c3be40] do_interrupt_handler at ffffffc008019cb4
06 [ffffffc019c3be60] el0_interrupt at ffffffc008f7b848
07 [ffffffc019c3be90] __el0_irq_handler_common at ffffffc008f7b368
08 [ffffffc019c3bea0] el0t_64_irq_handler at ffffffc008f7b344
09 [ffffffc019c3bfe0] el0t_64_irq at ffffffc008011720
PC: 0000000072415108 LR: 00000000724150d0 SP: 0000007691d2bfa0
X29: 00000000734f60e0 X28: 000000001a2fa678 X27: 0000000000000063
X26: 000000001a2fa678 X25: 000000001a2fa678 X24: 000000001a7bb718
X23: 000000001a7ba198 X22: 000000001a7ba190 X21: b4000076f9a828c8
X20: 0000000000000000 X19: b4000076f9a82800 X18: 000000768d68a000
X17: 00000000708f89f8 X16: 00000000000000f0 X15: 0000000000000000
X14: 0000007691d2bca0 X13: 0000000080100000 X12: 0000000000000000
X11: 0000000000000000 X10: 0000000000000000 X9: 9636716211228cd4
X8: 9636716211228cd4 X7: 0000000000000010 X6: 000000001a7bb728
X5: 0000000070845200 X4: 0000000018a40d38 X3: 00000000707e8f98
X2: 000000001a2fa678 X1: 000000001a7ba198 X0: 0000000070847aa8
ORIG_X0: 00000000ffffff9c SYSCALLNO: ffffffff PSTATE: 60001000
Signed-off-by: Guanyou.Chen <chenguanyou(a)xiaomi.com>
---
netdump.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/netdump.c b/netdump.c
index 435793b..455f90e 100644
--- a/netdump.c
+++ b/netdump.c
@@ -101,7 +101,7 @@ map_cpus_to_prstatus(void)
nrcpus = (kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS);
for (i = 0; i < nrcpus; i++) {
- if (in_cpu_map(ONLINE_MAP, i) && machdep->is_cpu_prstatus_valid(i))
{
+ if (machdep->is_cpu_prstatus_valid(i)) {
nd->nt_prstatus_percpu[i] = nt_ptr[i];
nd->num_prstatus_notes =
MAX(nd->num_prstatus_notes, i+1);
@@ -2998,15 +2998,10 @@ dump_registers_for_elf_dumpfiles(void)
return;
}
- for (c = 0; c < kt->cpus; c++) {
- if (check_offline_cpu(c)) {
- fprintf(fp, "%sCPU %d: [OFFLINE]\n", c ? "\n" : "", c);
- continue;
- }
-
- fprintf(fp, "%sCPU %d:\n", c ? "\n" : "", c);
- display_regs_from_elf_notes(c, fp);
- }
+ for (c = 0; c < kt->cpus; c++) {
+ fprintf(fp, "%sCPU %d: %s\n", c ? "\n" : "", c,
check_offline_cpu(c) ? "[OFFLINE]" : "[ONLINE]");
+ display_regs_from_elf_notes(c, fp);
+ }
}
struct x86_64_user_regs_struct {
--
2.34.1
Guanyou.
Thanks.
6 days, 19 hours
Re: [PATCH] mod: introduce -v option to display modules with valid version
by lijiang
Hi, Sun Feng
Thank you for the patch.
On Mon, Oct 28, 2024 at 11:32 AM <devel-request(a)lists.crash-utility.osci.io>
wrote:
> Date: Wed, 23 Oct 2024 08:53:58 +0800
> From: Sun Feng <loyou85(a)gmail.com>
> Subject: [Crash-utility] [PATCH] mod: introduce -v option to display
> modules with valid version
> To: devel(a)lists.crash-utility.osci.io
> Cc: Sun Feng <loyou85(a)gmail.com>
> Message-ID: <20241023005358.11328-1-loyou85(a)gmail.com>
>
> With this option, we can get module version easily in kdump,
> it's helpful when developing external modules.
>
It seems to be a specific case?
>
> crash> mod -v
> NAME VERSION
> ahci 3.0
> vxlan 0.1.2.1
> dca 1.12.1
> ...
>
> Signed-off-by: Sun Feng <loyou85(a)gmail.com>
> ---
> defs.h | 3 +++
> help.c | 12 +++++++++++-
> kernel.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
> symbols.c | 44 +++++++++++++++++++++++++++++++++++++++-----
> 4 files changed, 98 insertions(+), 7 deletions(-)
>
> diff --git a/defs.h b/defs.h
> index e2a9278..f14fcdf 100644
> --- a/defs.h
> +++ b/defs.h
> @@ -2244,6 +2244,7 @@ struct offset_table { /* stash of
> commonly-used offsets */
> long rb_list_head;
> long file_f_inode;
> long page_page_type;
> + long module_version;
> };
>
> struct size_table { /* stash of commonly-used sizes */
> @@ -2935,6 +2936,7 @@ struct symbol_table_data {
>
> #define MAX_MOD_NAMELIST (256)
> #define MAX_MOD_NAME (64)
> +#define MAX_MOD_VERSION (64)
> #define MAX_MOD_SEC_NAME (64)
>
> #define MOD_EXT_SYMS (0x1)
> @@ -2984,6 +2986,7 @@ struct load_module {
> long mod_size;
> char mod_namelist[MAX_MOD_NAMELIST];
> char mod_name[MAX_MOD_NAME];
> + char mod_version[MAX_MOD_VERSION];
> ulong mod_flags;
> struct syment *mod_symtable;
> struct syment *mod_symend;
> diff --git a/help.c b/help.c
> index e95ac1d..1bac5e1 100644
> --- a/help.c
> +++ b/help.c
> @@ -5719,7 +5719,7 @@ NULL
> char *help_mod[] = {
> "mod",
> "module information and loading of symbols and debugging data",
> -"-s module [objfile] | -d module | -S [directory] [-D|-t|-r|-R|-o|-g]",
> +"-s module [objfile] | -d module | -S [directory] [-D|-t|-r|-R|-o|-g|-v]",
> " With no arguments, this command displays basic information of the
> currently",
> " installed modules, consisting of the module address, name, base
> address,",
> " size, the object file name (if known), and whether the module was
> compiled",
> @@ -5791,6 +5791,7 @@ char *help_mod[] = {
> " -g When used with -s or -S, add a module object's
> section",
> " start and end addresses to its symbol list.",
> " -o Load module symbols with old mechanism.",
> +" -v Display modules with valid version.",
> " ",
> " If the %s session was invoked with the \"--mod <directory>\" option,
> or",
> " a CRASH_MODULE_PATH environment variable exists, then
> /lib/modules/<release>",
> @@ -5881,6 +5882,15 @@ char *help_mod[] = {
> " vxglm P(U)",
> " vxgms P(U)",
> " vxodm P(U)",
> +" ",
> +" Display modules with valid version:",
> +" ",
> +" %s> mod -v",
> +" NAME VERSION",
> +" ahci 3.0",
> +" vxlan 0.1.2.1",
> +" dca 1.12.1",
> +" ...",
> NULL
> };
>
There are many kernel modules, which do not have the actual value for the
field "version"(null), E.g:
crash> struct module c008000005cb1d00
struct module {
...
version = 0x0,
srcversion = 0xc00000009c3628c0 "7D7FAEDDA764AC772D6F805",
...
Currently, it is also easy to view the version string, for example:
crash> mod
MODULE NAME TEXT_BASE SIZE
OBJECT FILE
c008000004400080 libcrc32c c008000004260000 196608
(not loaded) [CONFIG_KALLSYMS]
...
c0080000044a0700 sg c008000004480000 262144
(not loaded) [CONFIG_KALLSYMS]
...
crash> struct module c0080000044a0700|grep -w version
version = 0xc000000009d67f20 "3.5.36",
Could you please explain the current background? Why is it needed? As you
saw, it's not too hard to get a module version string based on crash
internal command.
Thanks
Lianbo
>
> diff --git a/kernel.c b/kernel.c
> index adb19ad..91eef2a 100644
> --- a/kernel.c
> +++ b/kernel.c
> @@ -3593,6 +3593,9 @@ module_init(void)
> MEMBER_OFFSET_INIT(module_num_gpl_syms, "module",
> "num_gpl_syms");
>
> + if (MEMBER_EXISTS("module", "version"))
> + MEMBER_OFFSET_INIT(module_version, "module",
> "version");
> +
> if (MEMBER_EXISTS("module", "mem")) { /* 6.4 and later */
> kt->flags2 |= KMOD_MEMORY; /* MODULE_MEMORY()
> can be used. */
>
> @@ -4043,6 +4046,7 @@ irregularity:
> #define REMOTE_MODULE_SAVE_MSG (6)
> #define REINIT_MODULES (7)
> #define LIST_ALL_MODULE_TAINT (8)
> +#define LIST_ALL_MODULE_VERSION (9)
>
> void
> cmd_mod(void)
> @@ -4117,7 +4121,7 @@ cmd_mod(void)
> address = 0;
> flag = LIST_MODULE_HDR;
>
> - while ((c = getopt(argcnt, args, "Rd:Ds:Sot")) != EOF) {
> + while ((c = getopt(argcnt, args, "Rd:Ds:Sotv")) != EOF) {
> switch(c)
> {
> case 'R':
> @@ -4195,6 +4199,13 @@ cmd_mod(void)
> flag = LIST_ALL_MODULE_TAINT;
> break;
>
> + case 'v':
> + if (flag)
> + cmd_usage(pc->curcmd, SYNOPSIS);
> + else
> + flag = LIST_ALL_MODULE_VERSION;
> + break;
> +
> default:
> argerrs++;
> break;
> @@ -4578,10 +4589,12 @@ do_module_cmd(ulong flag, char *modref, ulong
> address,
> struct load_module *lm, *lmp;
> int maxnamelen;
> int maxsizelen;
> + int maxversionlen;
> char buf1[BUFSIZE];
> char buf2[BUFSIZE];
> char buf3[BUFSIZE];
> char buf4[BUFSIZE];
> + char buf5[BUFSIZE];
>
> if (NO_MODULES())
> return;
> @@ -4744,6 +4757,37 @@ do_module_cmd(ulong flag, char *modref, ulong
> address,
> case LIST_ALL_MODULE_TAINT:
> show_module_taint();
> break;
> +
> + case LIST_ALL_MODULE_VERSION:
> + maxnamelen = maxversionlen = 0;
> +
> + for (i = 0; i < kt->mods_installed; i++) {
> + lm = &st->load_modules[i];
> + maxnamelen = strlen(lm->mod_name) > maxnamelen ?
> + strlen(lm->mod_name) : maxnamelen;
> +
> + maxversionlen = strlen(lm->mod_version) >
> maxversionlen ?
> + strlen(lm->mod_version) : maxversionlen;
> + }
> +
> + fprintf(fp, "%s %s\n",
> + mkstring(buf2, maxnamelen, LJUST, "NAME"),
> + mkstring(buf5, maxversionlen, LJUST, "VERSION"));
> +
> + for (i = 0; i < kt->mods_installed; i++) {
> + lm = &st->load_modules[i];
> + if ((!address || (lm->module_struct == address) ||
> + (lm->mod_base == address)) &&
> + strlen(lm->mod_version)) {
> + fprintf(fp, "%s ", mkstring(buf2,
> maxnamelen,
> + LJUST, lm->mod_name));
> + fprintf(fp, "%s ", mkstring(buf5,
> maxversionlen,
> + LJUST, lm->mod_version));
> +
> + fprintf(fp, "\n");
> + }
> + }
> + break;
> }
> }
>
> diff --git a/symbols.c b/symbols.c
> index d00fbd7..9d90df7 100644
> --- a/symbols.c
> +++ b/symbols.c
> @@ -1918,6 +1918,7 @@ store_module_symbols_6_4(ulong total, int
> mods_installed)
> {
> int i, m, t;
> ulong mod, mod_next;
> + ulong version;
> char *mod_name;
> uint nsyms, ngplsyms;
> ulong syms, gpl_syms;
> @@ -1930,6 +1931,7 @@ store_module_symbols_6_4(ulong total, int
> mods_installed)
> struct load_module *lm;
> char buf1[BUFSIZE];
> char buf2[BUFSIZE];
> + char mod_version[BUFSIZE];
> char *strbuf = NULL, *modbuf, *modsymbuf;
> struct syment *sp;
> ulong first, last;
> @@ -1980,6 +1982,13 @@ store_module_symbols_6_4(ulong total, int
> mods_installed)
>
> mod_name = modbuf + OFFSET(module_name);
>
> + BZERO(mod_version, BUFSIZE);
> + if (MEMBER_EXISTS("module", "version")) {
> + version = ULONG(modbuf + OFFSET(module_version));
> + if (version)
> + read_string(version, mod_version, BUFSIZE
> - 1);
> + }
> +
> lm = &st->load_modules[m++];
> BZERO(lm, sizeof(struct load_module));
>
> @@ -2003,9 +2012,15 @@ store_module_symbols_6_4(ulong total, int
> mods_installed)
> error(INFO, "module name greater than
> MAX_MOD_NAME: %s\n", mod_name);
> strncpy(lm->mod_name, mod_name, MAX_MOD_NAME-1);
> }
> + if (strlen(mod_version) < MAX_MOD_VERSION)
> + strcpy(lm->mod_version, mod_version);
> + else {
> + error(INFO, "module version greater than
> MAX_MOD_VERSION: %s\n", mod_version);
> + strncpy(lm->mod_version, mod_version,
> MAX_MOD_VERSION-1);
> + }
> if (CRASHDEBUG(3))
> - fprintf(fp, "%lx (%lx): %s syms: %d gplsyms: %d
> ksyms: %ld\n",
> - mod, lm->mod_base, lm->mod_name, nsyms,
> ngplsyms, nksyms);
> + fprintf(fp, "%lx (%lx): %s syms: %d gplsyms: %d
> ksyms: %ld version: %s\n",
> + mod, lm->mod_base, lm->mod_name, nsyms,
> ngplsyms, nksyms, lm->mod_version);
>
> lm->mod_flags = MOD_EXT_SYMS;
> lm->mod_ext_symcnt = mcnt;
> @@ -2271,6 +2286,7 @@ store_module_symbols_v2(ulong total, int
> mods_installed)
> {
> int i, m;
> ulong mod, mod_next;
> + ulong version;
> char *mod_name;
> uint nsyms, ngplsyms;
> ulong syms, gpl_syms;
> @@ -2285,6 +2301,7 @@ store_module_symbols_v2(ulong total, int
> mods_installed)
> char buf2[BUFSIZE];
> char buf3[BUFSIZE];
> char buf4[BUFSIZE];
> + char mod_version[BUFSIZE];
> char *strbuf, *modbuf, *modsymbuf;
> struct syment *sp;
> ulong first, last;
> @@ -2344,6 +2361,13 @@ store_module_symbols_v2(ulong total, int
> mods_installed)
>
> mod_name = modbuf + OFFSET(module_name);
>
> + BZERO(mod_version, BUFSIZE);
> + if (MEMBER_EXISTS("module", "version")) {
> + version = ULONG(modbuf + OFFSET(module_version));
> + if (version)
> + read_string(version, mod_version, BUFSIZE
> - 1);
> + }
> +
> lm = &st->load_modules[m++];
> BZERO(lm, sizeof(struct load_module));
> lm->mod_base = ULONG(modbuf +
> MODULE_OFFSET2(module_module_core, rx));
> @@ -2357,11 +2381,19 @@ store_module_symbols_v2(ulong total, int
> mods_installed)
> mod_name);
> strncpy(lm->mod_name, mod_name, MAX_MOD_NAME-1);
> }
> + if (strlen(mod_version) < MAX_MOD_VERSION)
> + strcpy(lm->mod_version, mod_version);
> + else {
> + error(INFO,
> + "module version greater than MAX_MOD_VERSION:
> %s\n",
> + mod_version);
> + strncpy(lm->mod_version, mod_version,
> MAX_MOD_VERSION-1);
> + }
> if (CRASHDEBUG(3))
> fprintf(fp,
> - "%lx (%lx): %s syms: %d gplsyms: %d ksyms:
> %ld\n",
> - mod, lm->mod_base, lm->mod_name, nsyms,
> - ngplsyms, nksyms);
> + "%lx (%lx): %s syms: %d gplsyms: %d ksyms: %ld
> version: %s\n",
> + mod, lm->mod_base, lm->mod_name, nsyms,
> + ngplsyms, nksyms, lm->mod_version);
> lm->mod_flags = MOD_EXT_SYMS;
> lm->mod_ext_symcnt = mcnt;
> lm->mod_init_module_ptr = ULONG(modbuf +
> @@ -10177,6 +10209,8 @@ dump_offset_table(char *spec, ulong makestruct)
> OFFSET(module_next));
> fprintf(fp, " module_name: %ld\n",
> OFFSET(module_name));
> + fprintf(fp, " module_version: %ld\n",
> + OFFSET(module_version));
> fprintf(fp, " module_syms: %ld\n",
> OFFSET(module_syms));
> fprintf(fp, " module_nsyms: %ld\n",
> --
> 2.43.0
>
1 week, 3 days
[PATCH] bugfix command "help -r" segv fault
by Guanyou Chen
Hi Lianbo, Tao
When the ELF Note does not contain CPU registers,
attempting to retrieve online CPU registers will cause a crash.
After:
CPU 6:
help: registers not collected for cpu 6
...
Signed-off-by: Guanyou.Chen <chenguanyou(a)xiaomi.com>
---
netdump.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/netdump.c b/netdump.c
index 8ea5159..435793b 100644
--- a/netdump.c
+++ b/netdump.c
@@ -2780,6 +2780,10 @@ display_regs_from_elf_notes(int cpu, FILE *ofp)
nd->nt_prstatus_percpu[cpu];
else
note64 = (Elf64_Nhdr *)nd->nt_prstatus;
+ if (!note64) {
+ error(INFO, "registers not collected for cpu %d\n", cpu);
+ return;
+ }
len = sizeof(Elf64_Nhdr);
len = roundup(len + note64->n_namesz, 4);
len = roundup(len + note64->n_descsz, 4);
@@ -2820,6 +2824,10 @@ display_regs_from_elf_notes(int cpu, FILE *ofp)
nd->nt_prstatus_percpu[cpu];
else
note32 = (Elf32_Nhdr *)nd->nt_prstatus;
+ if (!note32) {
+ error(INFO, "registers not collected for cpu %d\n", cpu);
+ return;
+ }
len = sizeof(Elf32_Nhdr);
len = roundup(len + note32->n_namesz, 4);
len = roundup(len + note32->n_descsz, 4);
@@ -2857,6 +2865,10 @@ display_regs_from_elf_notes(int cpu, FILE *ofp)
else
note64 = (Elf64_Nhdr *)nd->nt_prstatus;
+ if (!note64) {
+ error(INFO, "registers not collected for cpu %d\n", cpu);
+ return;
+ }
prs = (struct ppc64_elf_prstatus *)
((char *)note64 + sizeof(Elf64_Nhdr) + note64->n_namesz);
prs = (struct ppc64_elf_prstatus *)roundup((ulong)prs, 4);
@@ -2903,6 +2915,10 @@ display_regs_from_elf_notes(int cpu, FILE *ofp)
nd->nt_prstatus_percpu[cpu];
else
note64 = (Elf64_Nhdr *)nd->nt_prstatus;
+ if (!note64) {
+ error(INFO, "registers not collected for cpu %d\n", cpu);
+ return;
+ }
len = sizeof(Elf64_Nhdr);
len = roundup(len + note64->n_namesz, 4);
len = roundup(len + note64->n_descsz, 4);
--
2.34.1
Guanyou.
Thanks
1 week, 3 days
[PATCH] gdb bt: multiple stacks support (x86_64)
by Alexey Makhalov
gdb target analyzes only one task at a time and it backtraces
only straight C stack until end of the stack. If stacks were concatenated
during exceptions or interrupts, gdb bt will show only the topmost one.
Introduce multiple stacks support in gdb target, which can be observed
as a different threads from gdb perspective.
'gdb info threads' - to see list of in-kenrel stacks to given task.
'gdb thread <Id>' - to switch.
'gdb bt' - to show it.
Implmentation is machine specific. In x86_64, I use cmd_bt() to add
additional gdb threads (gdb_add_substack(stack_id) call). Once added,
gdb will may call machdep->get_current_task_reg() with corresonding
stack_id (sid: new argument).
Note: crash 'bt' command must be called for addition threads to appear.
No threads/stacks support for arm64 and ppc64, x86_64 only.
Example of #GP fault in the kernel caught by SCTP task..
crash> bt
PID: 94228 TASK: ffff96a6766a8000 CPU: 31 COMMAND: "SCTP"
#0 [ffffbb67437e7220] panic at ffffffff99b4f60b
#1 [ffffbb67437e72c0] die_addr at ffffffff99033650
#2 [ffffbb67437e72f0] exc_general_protection at ffffffff99b9194b
#3 [ffffbb67437e7390] asm_exc_general_protection at ffffffff99c00b47
[exception RIP: crypto_aead_encrypt+9]
RIP: ffffffff995ce269 RSP: ffffbb67437e7440 RFLAGS: 00010246
RAX: 0fdd59d2b3d89ecb RBX: 0000000000000000 RCX: 0000000000000c90
RDX: ffff96a368508110 RSI: 0000000000000000 RDI: ffff96a348352060
RBP: ffffbb67437e7650 R8: 0000000000000001 R9: ffff96a3685080c8
R10: ffff96a348351c78 R11: 00000000d5a09e53 R12: 0000000000000008
R13: ffff96a348352010 R14: ffff96a348352000 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#4 [ffffbb67437e7440] echainiv_encrypt at ffffffffc0ae82c2 [echainiv]
#5 [ffffbb67437e7658] crypto_aead_encrypt at ffffffff995ce27c
#6 [ffffbb67437e7668] esp_output_tail at ffffffffc0add3fc [esp4]
#7 [ffffbb67437e76f8] esp_output at ffffffffc0addedf [esp4]
#8 [ffffbb67437e7760] xfrm_output_resume at ffffffff99a9186a
#9 [ffffbb67437e77e0] xfrm_output at ffffffff99a91fba
#10 [ffffbb67437e7810] __xfrm4_output at ffffffff99a7b0e6
#11 [ffffbb67437e7820] xfrm4_output at ffffffff99a7b172
#12 [ffffbb67437e7890] ip_local_out at ffffffff99a000ef
#13 [ffffbb67437e78b8] __ip_queue_xmit at ffffffff99a0028e
#14 [ffffbb67437e7918] sctp_v4_xmit at ffffffffc0afe0f8 [sctp]
#15 [ffffbb67437e79f0] sctp_packet_singleton at ffffffffc0b0bc47 [sctp]
#16 [ffffbb67437e7a60] sctp_outq_flush at ffffffffc0b0c636 [sctp]
#17 [ffffbb67437e7b08] sctp_outq_uncork at ffffffffc0b0d85c [sctp]
#18 [ffffbb67437e7b18] sctp_do_sm at ffffffffc0afbaa6 [sctp]
#19 [ffffbb67437e7d08] __sctp_connect at ffffffffc0b17893 [sctp]
#20 [ffffbb67437e7d78] __sctp_setsockopt_connectx at ffffffffc0b17a6d [sctp]
#21 [ffffbb67437e7da8] sctp_getsockopt at ffffffffc0b1c892 [sctp]
#22 [ffffbb67437e7eb8] sock_common_getsockopt at ffffffff9993c6e7
#23 [ffffbb67437e7ec8] __sys_getsockopt at ffffffff9993afac
#24 [ffffbb67437e7f18] __x64_sys_getsockopt at ffffffff9993b0bf
#25 [ffffbb67437e7f28] x64_sys_call at ffffffff99004ca5
#26 [ffffbb67437e7f38] do_syscall_64 at ffffffff99b90e34
#27 [ffffbb67437e7f50] entry_SYSCALL_64_after_hwframe at ffffffff99c00126
RIP: 00007f12c63028ea RSP: 00007f10e41d9b28 RFLAGS: 00000206
RAX: ffffffffffffffda RBX: 0000000000000050 RCX: 00007f12c63028ea
RDX: 000000000000006f RSI: 0000000000000084 RDI: 0000000000000050
RBP: 00007f10a00009b0 R8: 00007f10e41d9b3c R9: 00007f10ac000a5c
R10: 00007f10e41d9b40 R11: 0000000000000206 R12: 00007f10e41db120
R13: 0000000000000050 R14: 0000000000000010 R15: 000000000289e070
ORIG_RAX: 0000000000000037 CS: 0033 SS: 002b
crash> gdb bt
#0 0xffffffff998eaadf in __inb (port=100) at ./arch/x86/include/asm/shared/io.h:22
#1 i8042_read_status () at drivers/input/serio/i8042-acpipnpio.h:54
#2 i8042_panic_blink (state=<optimized out>) at drivers/input/serio/i8042.c:1137
#3 0xffffffff99b4f60b in panic (fmt=fmt@entry=0xffffffff9a42c4cb "Fatal exception") at kernel/panic.c:460
#4 0xffffffff99b49b84 in oops_end (flags=<optimized out>, flags@entry=582, regs=<optimized out>, regs@entry=0xffffbb67437e7398, signr=<optimized out>) at arch/x86/kernel/dumpstack.c:382
#5 0xffffffff99033650 in die_addr (str=str@entry=0xffffbb67437e7304 "general protection fault, probably for non-canonical address 0xfdd59d2b3d89edb", regs=regs@entry=0xffffbb67437e7398, err=err@entry=0, gp_addr=<optimized out>) at arch/x86/kernel/dumpstack.c:462
#6 0xffffffff99b9194b in __exc_general_protection (error_code=0, regs=0xffffbb67437e7398) at arch/x86/kernel/traps.c:784
#7 exc_general_protection (regs=0xffffbb67437e7398, error_code=0) at arch/x86/kernel/traps.c:729
#8 0xffffffff99c00b47 in asm_exc_general_protection () at ./arch/x86/include/asm/idtentry.h:564
crash> gdb info threads
Id Target Id Frame
* 1 94228 SCTP (stack 0) 0xffffffff998eaadf in __inb (port=100) at ./arch/x86/include/asm/shared/io.h:22
2 94228 SCTP (stack 1) crypto_aead_encrypt (req=req@entry=0xffff96a348352060) at crypto/aead.c:86
crash> gdb thread 2
[Switching to thread 2 (94228 SCTP (stack 1))]
#0 crypto_aead_encrypt (req=req@entry=0xffff96a348352060) at crypto/aead.c:86
86 crypto/aead.c: No such file or directory.
crash> gdb bt
#0 crypto_aead_encrypt (req=req@entry=0xffff96a348352060) at crypto/aead.c:86
#1 0xffffffffc0ae82c2 in echainiv_encrypt (req=0xffff96a348352010) at crypto/echainiv.c:82
#2 0xffffffff995ce27c in crypto_aead_encrypt (req=0xffff96a348352060) at crypto/aead.c:94
#3 0xffffffffc0add3fc in esp_output_tail ()
#4 0xffffffffc0addedf in esp_output ()
#5 0xffffffff99a9186a in xfrm_output_one (err=0, skb=0xffff96a3c852b300) at net/xfrm/xfrm_output.c:553
#6 xfrm_output_resume (sk=sk@entry=0xffff96a348368000, skb=skb@entry=0xffff96a3c852b300, err=<optimized out>, err@entry=1) at net/xfrm/xfrm_output.c:588
#7 0xffffffff99a91fba in xfrm_output2 (skb=0xffff96a3c852b300, sk=0xffff96a348368000, net=0xffff96a365582580) at net/xfrm/xfrm_output.c:615
#8 xfrm_output (sk=0xffff96a348368000, skb=0xffff96a3c852b300) at net/xfrm/xfrm_output.c:765
#9 0xffffffff99a7b0e6 in __xfrm4_output (net=<optimized out>, sk=<optimized out>, skb=<optimized out>) at net/ipv4/xfrm4_output.c:28
#10 0xffffffff99a7b172 in NF_HOOK_COND (pf=2 '\002', hook=4, okfn=0xffffffff99a7b0c0 <__xfrm4_output>, cond=<optimized out>, out=0xffff96a496ff2000, in=0x0, skb=0xffff96a3c852b300, sk=0xffff96a348368000, net=0xffff96a365582580) at ./include/linux/netfilter.h:291
#11 xfrm4_output (net=0xffff96a365582580, sk=0xffff96a348368000, skb=0xffff96a3c852b300) at net/ipv4/xfrm4_output.c:33
#12 0xffffffff99a000ef in dst_output (skb=0xffff96a368508110, sk=0x0, net=0xffff96a348352060) at ./include/net/dst.h:444
#13 ip_local_out (net=0xffff96a348352060, sk=0x0, skb=0xffff96a368508110) at net/ipv4/ip_output.c:126
#14 0xffffffff99a0028e in __ip_queue_xmit (sk=sk@entry=0xffff96a348368000, skb=skb@entry=0xffff96a3c852b300, fl=fl@entry=0xffff96a348351830, tos=tos@entry=186 '\272') at net/ipv4/ip_output.c:532
#15 0xffffffffc0afe0f8 in sctp_v4_xmit (skb=0xffff96a3c852b300, t=0xffff96a348351800) at net/sctp/protocol.c:1071
#16 0xffffffffc0b1f553 in sctp_packet_transmit (packet=packet@entry=0xffffbb67437e79f8, gfp=gfp@entry=3264) at net/sctp/output.c:653
#17 0xffffffffc0b0bc47 in sctp_packet_singleton (transport=<optimized out>, chunk=chunk@entry=0xffff96a34c96f500, gfp=3264) at net/sctp/outqueue.c:783
#18 0xffffffffc0b0c636 in sctp_outq_flush_ctrl (ctx=0xffffbb67437e7aa0) at net/sctp/outqueue.c:914
#19 sctp_outq_flush (q=0xffff96a3483585b8, rtx_timeout=rtx_timeout@entry=0, gfp=<optimized out>) at net/sctp/outqueue.c:1212
#20 0xffffffffc0b0d85c in sctp_outq_uncork (q=q@entry=0xffff96a3483585b8, gfp=gfp@entry=3264) at net/sctp/outqueue.c:764
#21 0xffffffffc0afbaa6 in sctp_cmd_interpreter (state=<optimized out>, status=<optimized out>, gfp=<optimized out>, commands=0xffffbb67437e7b68, event_arg=<optimized out>, asoc=0xffff96a348358000, ep=<optimized out>, subtype=..., event_type=<optimized out>) at net/sctp/sm_sideeffect.c:1819
#22 sctp_side_effects (gfp=<optimized out>, commands=0xffffbb67437e7b68, status=<optimized out>, event_arg=<optimized out>, asoc=<synthetic pointer>, ep=<optimized out>, state=<optimized out>, subtype=..., event_type=<optimized out>) at net/sctp/sm_sideeffect.c:1199
#23 sctp_do_sm (net=<optimized out>, event_type=event_type@entry=SCTP_EVENT_T_PRIMITIVE, subtype=..., subtype@entry=..., state=<optimized out>, ep=<optimized out>, asoc=<optimized out>, event_arg=<optimized out>, gfp=<optimized out>) at net/sctp/sm_sideeffect.c:1170
#24 0xffffffffc0b1e2f0 in sctp_primitive_ASSOCIATE (net=<optimized out>, asoc=asoc@entry=0xffff96a348358000, arg=arg@entry=0x0) at net/sctp/primitive.c:73
#25 0xffffffffc0b17893 in __sctp_connect (sk=sk@entry=0xffff96a348368000, kaddrs=kaddrs@entry=0xffff96a342085030, addrs_size=addrs_size@entry=16, flags=2050, assoc_id=assoc_id@entry=0xffffbb67437e7df4) at ./include/net/net_namespace.h:369
#26 0xffffffffc0b17a6d in __sctp_setsockopt_connectx (sk=sk@entry=0xffff96a348368000, kaddrs=kaddrs@entry=0xffff96a342085030, addrs_size=16, assoc_id=assoc_id@entry=0xffffbb67437e7df4) at net/sctp/socket.c:1334
#27 0xffffffffc0b1c892 in sctp_getsockopt_connectx3 (optlen=0x7f10e41d9b3c, optval=0x7f10e41d9b40 <error: Cannot access memory at address 0x7f10e41d9b40>, len=16, sk=0xffff96a348368000) at net/sctp/socket.c:1419
#28 sctp_getsockopt (sk=0xffff96a348368000, level=<optimized out>, optname=<optimized out>, optval=0x7f10e41d9b40 <error: Cannot access memory at address 0x7f10e41d9b40>, optlen=<optimized out>) at net/sctp/socket.c:8124
#29 0xffffffff9993c6e7 in sock_common_getsockopt (sock=<optimized out>, level=0, optname=1750106384, optval=0xc90 <error: Cannot access memory at address 0xc90>, optlen=0x1) at net/core/sock.c:3652
#30 0xffffffff9993afac in __sys_getsockopt (fd=<optimized out>, level=132, optname=111, optval=0x7f10e41d9b40 <error: Cannot access memory at address 0x7f10e41d9b40>, optlen=<optimized out>) at net/socket.c:2327
#31 0xffffffff9993b0bf in __do_sys_getsockopt (optlen=<optimized out>, optval=<optimized out>, optname=<optimized out>, level=<optimized out>, fd=<optimized out>) at net/socket.c:2342
#32 __se_sys_getsockopt (optlen=<optimized out>, optval=<optimized out>, optname=<optimized out>, level=<optimized out>, fd=<optimized out>) at net/socket.c:2339
#33 __x64_sys_getsockopt (regs=<optimized out>) at net/socket.c:2339
#34 0xffffffff99004ca5 in x64_sys_call (regs=regs@entry=0xffffbb67437e7f58, nr=<optimized out>) at ./arch/x86/include/generated/asm/syscalls_64.h:56
#35 0xffffffff99b90e34 in do_syscall_x64 (nr=<optimized out>, regs=0xffffbb67437e7f58) at arch/x86/entry/common.c:51
#36 do_syscall_64 (regs=0xffffbb67437e7f58, nr=<optimized out>) at arch/x86/entry/common.c:81
#37 0xffffffff99c00126 in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:121
Now we can use GDB to see the root cause.
Signed-off-by: Alexey Makhalov <alexey.makhalov(a)broadcom.com>
---
arm64.c | 2 +-
crash_target.c | 25 ++++++++++++++++++----
defs.h | 3 ++-
gdb_interface.c | 6 +++---
ppc64.c | 2 +-
x86_64.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--
6 files changed, 81 insertions(+), 12 deletions(-)
diff --git a/arm64.c b/arm64.c
index 608b19d..62f91d8 100644
--- a/arm64.c
+++ b/arm64.c
@@ -204,7 +204,7 @@ out:
static int
arm64_get_current_task_reg(int regno, const char *name,
- int size, void *value)
+ int size, void *value, int unused)
{
struct bt_info bt_info, bt_setup;
struct task_context *tc;
diff --git a/crash_target.c b/crash_target.c
index 1080976..8b17ef8 100644
--- a/crash_target.c
+++ b/crash_target.c
@@ -27,8 +27,9 @@ void crash_target_init (void);
extern "C" int gdb_readmem_callback(unsigned long, void *, int, int);
extern "C" int crash_get_current_task_reg (int regno, const char *regname,
- int regsize, void *val);
+ int regsize, void *val, int sid);
extern "C" int gdb_change_thread_context (void);
+extern "C" int gdb_add_substack (int sid);
extern "C" void crash_get_current_task_info(unsigned long *pid, char **comm);
/* The crash target. */
@@ -64,9 +65,10 @@ public:
unsigned long pid;
char *comm;
crash_get_current_task_info(&pid, &comm);
- return string_printf ("%ld %s", pid, comm);
+ if (thread_count(this) == 1)
+ return string_printf ("%ld %s", pid, comm);
+ return string_printf ("%ld %s (stack %ld)", pid, comm, ptid.tid());
}
-
};
static void supply_registers(struct regcache *regcache, int regno)
@@ -79,7 +81,7 @@ static void supply_registers(struct regcache *regcache, int regno)
if (regsize > sizeof (regval))
error (_("fatal error: buffer size is not enough to fit register value"));
- if (crash_get_current_task_reg (regno, regname, regsize, (void *)®val))
+ if (crash_get_current_task_reg (regno, regname, regsize, (void *)®val, inferior_thread()->ptid.tid()))
regcache->raw_supply (regno, regval);
else
regcache->raw_supply (regno, NULL);
@@ -144,7 +146,22 @@ crash_target_init (void)
extern "C" int
gdb_change_thread_context (void)
{
+ for (thread_info *tp : current_inferior()->threads_safe())
+ if (tp->ptid.tid_p())
+ delete_thread (tp);
target_fetch_registers(get_current_regcache(), -1);
reinit_frame_cache();
return TRUE;
}
+
+/* Add a thread for each additional stack. Use stack ID as a thread ID */
+extern "C" int
+gdb_add_substack (int sid)
+{
+ ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, sid);
+
+ thread_info *tp = find_thread_ptid (current_inferior(), ptid);
+ if (tp == nullptr)
+ add_thread_silent (current_inferior()->process_target(), ptid);
+ return TRUE;
+}
diff --git a/defs.h b/defs.h
index b93a7a6..bb2bc20 100644
--- a/defs.h
+++ b/defs.h
@@ -1081,7 +1081,7 @@ struct machdep_table {
void (*get_irq_affinity)(int);
void (*show_interrupts)(int, ulong *);
int (*is_page_ptr)(ulong, physaddr_t *);
- int (*get_current_task_reg)(int, const char *, int, void *);
+ int (*get_current_task_reg)(int, const char *, int, void *, int);
int (*is_cpu_prstatus_valid)(int cpu);
};
@@ -8301,5 +8301,6 @@ enum ppc64_regnum {
/* crash_target.c */
extern int gdb_change_thread_context (void);
+extern int gdb_add_substack (int sid);
#endif /* !GDB_COMMON */
diff --git a/gdb_interface.c b/gdb_interface.c
index 315711e..c138c94 100644
--- a/gdb_interface.c
+++ b/gdb_interface.c
@@ -1074,12 +1074,12 @@ unsigned long crash_get_kaslr_offset(void)
/* Callbacks for crash_target */
int crash_get_current_task_reg (int regno, const char *regname,
- int regsize, void *value);
+ int regsize, void *value, int sid);
int crash_get_current_task_reg (int regno, const char *regname,
- int regsize, void *value)
+ int regsize, void *value, int sid)
{
if (!machdep->get_current_task_reg)
return FALSE;
- return machdep->get_current_task_reg(regno, regname, regsize, value);
+ return machdep->get_current_task_reg(regno, regname, regsize, value, sid);
}
diff --git a/ppc64.c b/ppc64.c
index 782107b..1cf06e3 100644
--- a/ppc64.c
+++ b/ppc64.c
@@ -2512,7 +2512,7 @@ ppc64_print_eframe(char *efrm_str, struct ppc64_pt_regs *regs,
static int
ppc64_get_current_task_reg(int regno, const char *name, int size,
- void *value)
+ void *value, int unused)
{
struct bt_info bt_info, bt_setup;
struct task_context *tc;
diff --git a/x86_64.c b/x86_64.c
index e7f8fe2..2e7cde4 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -126,7 +126,7 @@ static int x86_64_get_framesize(struct bt_info *, ulong, ulong, char *);
static void x86_64_framesize_debug(struct bt_info *);
static void x86_64_get_active_set(void);
static int x86_64_get_kvaddr_ranges(struct vaddr_range *);
-static int x86_64_get_current_task_reg(int, const char *, int, void *);
+static int x86_64_get_current_task_reg(int, const char *, int, void *, int);
static int x86_64_verify_paddr(uint64_t);
static void GART_init(void);
static void x86_64_exception_stacks_init(void);
@@ -143,6 +143,14 @@ struct machine_specific x86_64_machine_specific = { 0 };
static const char *exception_functions_orig[];
static const char *exception_functions_5_8[];
+/*
+ * Additional stacks entry registers for gdb target.
+ * See 'gdb info threads'
+ */
+#define MAX_STACKS_NUM 5
+ulong stack_idx;
+ulong stacks_regs[MAX_STACKS_NUM][SS_REGNUM + 1];
+
/* Use this hardwired version -- sometimes the
* debuginfo doesn't pick this up even though
* it exists in the kernel; it shouldn't change.
@@ -3551,6 +3559,7 @@ x86_64_low_budget_back_trace_cmd(struct bt_info *bt_in)
irq_eframe = 0;
last_process_stack_eframe = 0;
bt->call_target = NULL;
+ stack_idx = 0;
rsp = bt->stkptr;
ms = machdep->machspec;
@@ -4159,6 +4168,7 @@ x86_64_dwarf_back_trace_cmd(struct bt_info *bt_in)
last_process_stack_eframe = 0;
bt->call_target = NULL;
bt->bptr = 0;
+ stack_idx = 0;
rsp = bt->stkptr;
if (!rsp) {
error(INFO, "cannot determine starting stack pointer\n");
@@ -4799,6 +4809,36 @@ x86_64_exception_frame(ulong flags, ulong kvaddr, char *local,
} else if (machdep->flags & ORC)
bt->bptr = rbp;
+
+ /*
+ * Preserve registers set for each additional in-kernel stack
+ * up to MAX_STACKS_NUM.
+ */
+ if (!(cs & 3) && verified && stack_idx < MAX_STACKS_NUM) {
+ stacks_regs[stack_idx][RAX_REGNUM] = rax;
+ stacks_regs[stack_idx][RBX_REGNUM] = rbx;
+ stacks_regs[stack_idx][RCX_REGNUM] = rcx;
+ stacks_regs[stack_idx][RDX_REGNUM] = rdx;
+ stacks_regs[stack_idx][RSI_REGNUM] = rsi;
+ stacks_regs[stack_idx][RDI_REGNUM] = rdi;
+ stacks_regs[stack_idx][RBP_REGNUM] = rbp;
+ stacks_regs[stack_idx][RSP_REGNUM] = rsp;
+ stacks_regs[stack_idx][R8_REGNUM] = r8;
+ stacks_regs[stack_idx][R9_REGNUM] = r9;
+ stacks_regs[stack_idx][R10_REGNUM] = r10;
+ stacks_regs[stack_idx][R11_REGNUM] = r11;
+ stacks_regs[stack_idx][R12_REGNUM] = r12;
+ stacks_regs[stack_idx][R13_REGNUM] = r13;
+ stacks_regs[stack_idx][R14_REGNUM] = r14;
+ stacks_regs[stack_idx][R15_REGNUM] = r15;
+ stacks_regs[stack_idx][RIP_REGNUM] = rip;
+ stacks_regs[stack_idx][EFLAGS_REGNUM] = rflags;
+ stacks_regs[stack_idx][CS_REGNUM] = cs;
+ stacks_regs[stack_idx][SS_REGNUM] = ss;
+ /* Skip stack 0 (main stack), start with index 1 */
+ gdb_add_substack (stack_idx + 1);
+ stack_idx++;
+ }
if (kvaddr)
FREEBUF(pt_regs_buf);
@@ -9236,7 +9276,7 @@ x86_64_get_kvaddr_ranges(struct vaddr_range *vrp)
static int
x86_64_get_current_task_reg(int regno, const char *name,
- int size, void *value)
+ int size, void *value, int sid)
{
struct bt_info bt_info, bt_setup;
struct task_context *tc;
@@ -9256,6 +9296,17 @@ x86_64_get_current_task_reg(int regno, const char *name,
if (!tc)
return FALSE;
+ /* Non zero stack ID, use saved regs */
+ if (sid && sid <= MAX_STACKS_NUM) {
+ switch (regno) {
+ case RAX_REGNUM ... SS_REGNUM:
+ memcpy(value, &stacks_regs[sid - 1][regno], size > 8 ? 8 : size);
+ return TRUE;
+ default:
+ return FALSE;
+ }
+ }
+
/*
* Task is active, grab CPU's registers
*/
--
2.43.5
2 weeks, 3 days
[ANNOUNCE] crash-8.0.6 is available
by lijiang
Hi,
Thank you all for your contributions to the crash-utility, crash-8.0.6 is
now available.
Download from:
https://crash-utility.github.io/
or
https://github.com/crash-utility/crash/releases
The GitHub master branch serves as a development branch that will
contain all patches that are queued for the next release:
$ git clone https://github.com/crash-utility/crash.git
Changelog:
f13853cef53f crash-8.0.5 -> crash-8.0.6
db0077614aae Fix for 'sys' to properly display the PANIC message
ca74157283dd Doc: add doc to state that the --log option is deprecated
968debd0d597 arm64: Add gdb stack unwind support
89ff1e457344 x86_64: Add gdb stack unwind support
6dfda0d22355 ppc64: Add gdb stack unwind support
1fd80c623c20 Preparing for gdb stack unwind support
7c8a7dddda66 vmware_guestdump: Various format versions support
c4db469af091 x86_64: Fix invalid input "=>" for bt command
21e0a345f973 Fix cpumask_t recursive dependence issue
32b03ca26229 Revert "arm64: section_size_bits compatible with macro definitions"
7b5c8bca7d05 X86 64: improve the method of determining whether kaslr is enabled
9babe985a7eb kmem: fix the determination for slab page
0d2ad774532d x86_64: Fix the bug of getting incorrect framesize
17248cf00276 arm64: Support 16K page, 48 VA bits and 4 level page table
19ce5a996ce7 arm64: fix 64K page and 52-bits VA support
3b8f9721e13d arm64: use the same expression to indicate ptrs_per_pgd
2ebf656a4a17 arm64: fix indent issue and refactor PTE_TO_PHYS
f20a94016148 “kmem address” not working properly when redzone is enabled
79b93ecb2e72 Fix a "Bus error" issue caused by 'crash --osrelease' or
crash loading
af3d266aeb8c arm64: cleanup the pud description
f93d870f8b6a arm64: fix for 'help -m/-M' to correctly display the pmd
description
bcdf0f798d01 arm64: Introduction of support for 16K page with 2-level
table support
5218919ec108 s390x: Fix "bt -f/-F" command fail with seek error
321e1e854588 Fix a segfault issue due to the incorrect irq_stack_size on ARM64
5cd1c6ace5fe arm64: fix the determination of vmemmap and struct_page_size
f615f8fab7bf Fix "irq -a" exceeding the memory range issue
38f26cc8b930 LoongArch64: fix incorrect code in the main()
93d7f647c45b arm64: Introduction of support for 16K page with 3-level
table support
1c6da3eaff82 arm64: Fix bt command show wrong stacktrace on ramdump source
af895b219876 arm64: fix a potential segfault when unwind frame
ce4ddc742fbd List: enable LIST_HEAD_FORMAT for -r option
3452fe802bf9 Fix "kmem -i" and "swap" commands on Linux 6.10-rc1 and
later kernels
196c4b79c13d X86 64: fix a regression issue about kernel stack padding
a20eb05de3c1 Fix for failing to load kernel module
6752571d8d78 X86 64: fix for crash session loading failure
7c2c90d0b06a Fix "kmem -v" option on Linux 6.9 and later kernels
48764a14bc58 x86_64: fix for adding top_of_kernel_stack_padding for kernel stack
3879e9104826 Reflect __{start,end}_init_task kernel symbols rename
568c6f049ad4 arm64: section_size_bits compatible with macro definitions
af2ac4c41df6 Cleanup: replace struct zspage_5_17 with union
a584e9752fb2 Adding the zram decompression algorithm "lzo-rle"
9104e87db44e Mark start of 8.0.6 development phase with version 8.0.5++
Full ChangeLog:
https://crash-utility.github.io/changelog/ChangeLog-8.0.6.txt
or
https://github.com/crash-utility/crash/compare/8.0.5...8.0.6
2 weeks, 6 days
Re: [PATCH] RISCV64: add panic signature to panic_msg to properly display the PANIC message
by lijiang
On Sat, Nov 9, 2024 at 9:14 AM <devel-request(a)lists.crash-utility.osci.io>
wrote:
> Date: Fri, 8 Nov 2024 21:13:19 +1300
> From: Tao Liu <ltao(a)redhat.com>
> Subject: [Crash-utility] Re: [PATCH] RISCV64: add panic signature to
> panic_msg to properly display the PANIC message
> To: Austin Kim <austindh.kim(a)gmail.com>
> Cc: lijiang <lijiang(a)redhat.com>, devel(a)lists.crash-utility.osci.io,
> Austin Kim <austindhkim(a)gmail.com>, 김동현 <austin.kim(a)lge.com>
> Message-ID:
> <
> CAO7dBbWgemin4FPcSkgQ21dbbqW-S8KeFtNuaH8otOZvNWQVSA(a)mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Austin & Lianbo,
>
> On Fri, Nov 8, 2024 at 1:35 AM Austin Kim <austindh.kim(a)gmail.com> wrote:
> >
> > Hello Lianbo,
> >
> > 2024년 11월 6일 (수) 오후 12:53, lijiang <lijiang(a)redhat.com>님이 작성:
> > >
> > > Hi, Austin
> > > Thank you for the patch.
> > >
> > > On Fri, Nov 1, 2024 at 5:19 PM <
> devel-request(a)lists.crash-utility.osci.io> wrote:
> > >>
> > >> Date: Tue, 29 Oct 2024 17:32:07 +0900
> > >> From: Austin Kim <austindh.kim(a)gmail.com>
> > >> Subject: [Crash-utility] [PATCH] RISCV64: add panic signature to
> > >> panic_msg to properly display the PANIC message
> > >> To: devel(a)lists.crash-utility.osci.io
> > >> Cc: austindh.kim(a)gmail.com, austin.kim(a)lge.com
> > >> Message-ID: <20241029083207.GA30130@adminpc-PowerEdge-R7525>
> > >> Content-Type: text/plain; charset=us-ascii
> > >>
> > >> Using 'sys' command, we can view the panic message with general system
> > >> information. If we run RISCV64-based vmcore, PANIC message is not
> properly
> > >> displayed.
> > >>
> > >> The reason is that "Unable to handle kernel" is first printed in the
> kernel log
> > >> when exception occurs in the RISC-V based Linux kernel. The
> corresponding
> > >> kernel commit is 21733cb518471.
> > >>
> > >> Without the patch:
> > >> crash> sys
> > >> KERNEL: vmlinux [TAINTED]
> > >> DUMPFILE: vmcore
> > >> CPUS: 4
> > >> DATE: Thu Aug 22 16:13:08 KST 2024
> > >> UPTIME: 00:33:25
> > >> LOAD AVERAGE: 0.07, 0.07, 0.02
> > >> TASKS: 385
> > >> NODENAME: starfive
> > >> RELEASE: 6.6.20+
> > >> VERSION: #13 SMP Mon Aug 19 12:58:52 KST 2024
> > >> MACHINE: riscv64 (unknown Mhz)
> > >> MEMORY: 4 GB
> > >> PANIC: ""
> > >>
> > >> With the patch:
> > >> crash> sys
> > >> KERNEL: vmlinux [TAINTED]
> > >> DUMPFILE: vmcore
> > >> CPUS: 4
> > >> DATE: Thu Aug 22 16:13:08 KST 2024
> > >> UPTIME: 00:33:25
> > >> LOAD AVERAGE: 0.07, 0.07, 0.02
> > >> TASKS: 385
> > >> NODENAME: starfive
> > >> RELEASE: 6.6.20+
> > >> VERSION: #13 SMP Mon Aug 19 12:58:52 KST 2024
> > >> MACHINE: riscv64 (unknown Mhz)
> > >> MEMORY: 4 GB
> > >> PANIC: "Unable to handle kernel access to user memory without
> uaccess routines at virtual address 0000000000000000"
> > >>
> > >> Signed-off-by: Austin Kim <austindh.kim(a)gmail.com>
> > >> ---
> > >> task.c | 1 +
> > >> 1 file changed, 1 insertion(+)
> > >>
> > >> diff --git a/task.c b/task.c
> > >> index d52ce0b..443f488 100644
> > >> --- a/task.c
> > >> +++ b/task.c
> > >> @@ -6330,6 +6330,7 @@ static const char* panic_msg[] = {
> > >> "[Hardware Error]: ",
> > >> "Bad mode in ",
> > >> "Oops: ",
> > >> + "Unable to handle kernel access ",
> > >
> > >
> > > I would tend to search the panic keywords again as below, which can
> cover both riscv64 and aarch64 cases.
> > >
> > > diff --git a/task.c b/task.c
> > > index c131cc32067d..9613adebab57 100644
> > > --- a/task.c
> > > +++ b/task.c
> > > @@ -6392,6 +6392,9 @@ get_panicmsg(char *buf)
> > > get_symbol_data("sysrq_pressed", sizeof(int),
> &msg_found);
> > > break;
> > > }
> > > +
> > > + /* try to search panic string with panic keywords*/
> > > + search_panic_task_by_keywords(buf, &msg_found);
> > > }
>
> With this patch applied, no regression found, I think this one can work.
>
Thank you for the confirmation, Austin and Tao.
Applied:
https://github.com/crash-utility/crash/commit/db0077614aaeda6d0ed557f2b91...
Lianbo
>
> Thanks,
> Tao Liu
>
> > >
> > > found:
> > >
> > >
> > > What do you think? I haven't tested this one, not sure if it can work
> for you, could you please try it?
> >
> > Thank you for the positive feedback on the patch and for sharing
> > another great idea.
> > I tested the patch you suggested, and it worked well on my side.
> > Here’s the crash message:
> >
> > crash> sys | grep PANIC
> > PANIC: "Unable to handle kernel access to user memory without
> > uaccess routines at virtual address 0000000000000000"
> >
> > This new patch is useful not only for RISC-V but also for a wider
> > range of architectures,
> > and it seems like a better approach than modifying panic_msg[].
> >
> > Best regards,
> > Austin Kim
> >
> > > Tao, can we also do a regression test to double check if there are any
> risks?
> > >
> > > Thanks
> > > Lianbo
> > >
> > >
> > >>
> > >> };
> > >>
> > >> #define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
> > >> --
> > >> 2.17.1
> >
>
3 weeks