Hi Guanyou,
Thanks for the fix. However I have a few questions for this patch:
On Fri, Nov 1, 2024 at 10:16 PM Guanyou Chen <chenguanyou9338(a)gmail.com> wrote:
Hi Lianbo, Tao
When CPUs are in an offline state, it can lead to mapping errors.
1. What is the root cause, why cpu offline state will lead to mapping
errors? You didn't make this clear in your commit message.
2. Is this issue cpu arch specific? Since the example you gave is only
arm64 specific, however the code change is generic for all archs. So
I'm a bit worried about whether it will make the result incorrect for
other archs. In addition, I didn't reproduce the mapping error issue
on my x86_64 machine, so I cannot verify the patch locally.
We need to map them to the correct positions one by one.
3. Please make the commit log tidy. I know you want to diff the
outputs of "before" and "after", but we only need the info which are
highly related to this issue. e.g:
Before:
What command line you trigger will get the following outputs?
n_namesz: 5 ("CPU2")
n_descsz: 392
n_type: 1 (NT_PRSTATUS)
si.signo: 0 si.code: 0 si.errno: 0
cursig: 0 sigpend: 0 sighold: 0
pid: 3 ppid: 0 pgrp: 0 sid:0
utime: 0.000000 stime: 0.000000
cutime: 0.000000 cstime: 0.000000
I'm not sure if the above info is related to this issue?
X0: ffffffc000fc8818 X1: 0000000000000000 X2:
ffffffc000fc84c8
X3: 0000000000000000 X4: ffffffc0405e37bf X5: ffffffc00a07372f
X6: 322e34323320205b X7: 545b5d3539383334 X8: ffffffc000fc2f0c
X9: 89fece0a9ef8cb00 X10: c0000001001f75f4 X11: 00000001001f75f4
X12: 0000000000000003 X13: 00000000000005f4 X14: ffffffc009eb1210
X15: 0000000000000004 X16: 000000002a4cec24 X17: 000000002a4cec24
X18: ffffffc009e7d140 X19: ffffffc00a04c670 X20: 0000000000000000
X21: 0000000000000000 X22: ffffff8027f22280 X23: 0000000000000009
X24: 0000000000000007 X25: ffffffc009f839c0 X26: ffffffc0090f87f8
X27: 0000000000000000 X28: ffffff80454f3840 X29: ffffffc0405e3b60
LR: ffffffc0080e57fc SP: ffffffc0405e3b60 PC: ffffffc000fc2f84
We don't need a list of all regs, a few lines of which can indicate
the output mismatch for "Before" and "After" is enough.
CPU 0: [OFFLINE]
CPU 1: [OFFLINE]
CPU 2:
X0: 0000000000000000 X1: 0000000000000000 X2: 0000000000000000
X3: 000000000003fcbc X4: 0000000000000001 X5: 0000000000000000
X6: 0000000000000000 X7: 0000000000000000 X8: 00000000ffffffff
X9: ffffffc009e6ae48 X10: ffffffc009e6ae20 X11: 0000000000000000
X12: 0000000000000002 X13: 0000000000000004 X14: 0000000000000000
X15: 0000000000004000 X16: 00000000f90f05f6 X17: 00000000f90f05f6
X18: 0000000000000000 X19: 0000000000000002 X20: ffffffc009e3b008
X21: ffffffc00a01d020 X22: ffffffc009f798f0 X23: 0000000060001000
X24: 0000000000000000 X25: 0000000000000000 X26: 0000000000000000
X27: 0000000000000000 X28: ffffff8111eecb00 X29: ffffffc008003f50
LR: ffffffc00802df88 SP: ffffffc008003f40 PC: ffffffc00802df94
PSTATE: 024003c5 FPVALID: 00000000
After:
CPU 2:
X0: ffffffc000fc8818 X1: 0000000000000000 X2: ffffffc000fc84c8
X3: 0000000000000000 X4: ffffffc0405e37bf X5: ffffffc00a07372f
X6: 322e34323320205b X7: 545b5d3539383334 X8: ffffffc000fc2f0c
X9: 89fece0a9ef8cb00 X10: c0000001001f75f4 X11: 00000001001f75f4
X12: 0000000000000003 X13: 00000000000005f4 X14: ffffffc009eb1210
X15: 0000000000000004 X16: 000000002a4cec24 X17: 000000002a4cec24
X18: ffffffc009e7d140 X19: ffffffc00a04c670 X20: 0000000000000000
X21: 0000000000000000 X22: ffffff8027f22280 X23: 0000000000000009
X24: 0000000000000007 X25: ffffffc009f839c0 X26: ffffffc0090f87f8
X27: 0000000000000000 X28: ffffff80454f3840 X29: ffffffc0405e3b60
LR: ffffffc0080e57fc SP: ffffffc0405e3b60 PC:
ffffffc000fc2f84
PSTATE: 600000c5 FPVALID: 00000000
crash> bt
PID: 15959 TASK: ffffff80454f3840 CPU: 2 COMMAND: "AnrConsumer"
[ffffffc0405e3b60] ipanic at ffffffc000fc2f80 [mrdump]
[ffffffc0405e3b70] atomic_notifier_call_chain at ffffffc0080e57f8
[ffffffc0405e3c30] panic at ffffffc008f734d0
[ffffffc0405e3c80] sysrq_handle_crash at ffffffc0087f3c18
[ffffffc0405e3c90] __handle_sysrq at ffffffc0087f3798
[ffffffc0405e3ce0] write_sysrq_trigger at ffffffc0087f49c0
[ffffffc0405e3d00] proc_reg_write at ffffffc00842e4b8
[ffffffc0405e3d80] vfs_write at ffffffc008381eb4
[ffffffc0405e3dd0] ksys_write at ffffffc008382200
[ffffffc0405e3e10] __arm64_sys_write at ffffffc00838228c
[ffffffc0405e3e20] invoke_syscall at ffffffc00802efe0
[ffffffc0405e3e40] el0_svc_common at ffffffc00802eef4
[ffffffc0405e3e70] do_el0_svc at ffffffc00802ede8
[ffffffc0405e3e80] el0_svc at ffffffc008f7a7d0
[ffffffc0405e3ea0] el0t_64_sync_handler at ffffffc008f7a758
[ffffffc0405e3fe0] el0t_64_sync at ffffffc00801157c
The stacktrace is meaningless for this patch, we can cut this off.
PC: 00000077c798ca28 LR: 00000077a82e19f4 SP:
000000761c517af0
X29: 000000761c517b00 X28: 000000761c517db8 X27: 000000761c517c90
X26: 000000761c517c98 X25: 000000761c517bf9 X24: 000000761c519000
X23: 000000761c517be1 X22: 0000000000000001 X21: 00000000000003e3
X20: 000000761c517c11 X19: 000000761c517bf8 X18: 0000007568224000
X17: 00000077c798ca20 X16: 00000077c79b2ae0 X15: b4000077202cc480
X14: 0000000000000000 X13: 000000761c517a70 X12: ffffff80ffffffd0
X11: 000000761c517a40 X10: 0000000000000001 X9: 0000000000000000
X8: 0000000000000040 X7: 7f7f7f7f7f7f7f7f X6: 0000000000000010
X5: 000000761c517c0c X4: ffffffffffffffff X3: ffffffffffffffff
X2: 0000000000000001 X1: 000000761c517c11 X0: 00000000000003e3
ORIG_X0: 00000000000003e3 SYSCALLNO: 40 PSTATE: 00001000
ditto.
Maybe a short summary after the output paste, telling us which info
should reviewers pay attention to.
Thanks,
Tao Liu
Signed-off-by: Guanyou.Chen <chenguanyou(a)xiaomi.com>
---
netdump.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/netdump.c b/netdump.c
index b4e2a5c..8ea5159 100644
--- a/netdump.c
+++ b/netdump.c
@@ -75,7 +75,7 @@ void
map_cpus_to_prstatus(void)
{
void **nt_ptr;
- int online, i, j, nrcpus;
+ int online, i, nrcpus;
size_t size;
if (pc->flags2 & QEMU_MEM_DUMP_ELF) /* notes exist for all cpus */
@@ -100,9 +100,9 @@ map_cpus_to_prstatus(void)
*/
nrcpus = (kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS);
- for (i = 0, j = 0; i < nrcpus; i++) {
+ for (i = 0; i < nrcpus; i++) {
if (in_cpu_map(ONLINE_MAP, i) && machdep->is_cpu_prstatus_valid(i)) {
- nd->nt_prstatus_percpu[i] = nt_ptr[j++];
+ nd->nt_prstatus_percpu[i] = nt_ptr[i];
nd->num_prstatus_notes =
MAX(nd->num_prstatus_notes, i+1);
}
--
2.34.1
Guanyou.
Thanks.