[Crash-Utility][PATCH v2 00/13] gdb stack unwinding support for crash utility
by Tao Liu
This patchset is a rebase/merged version of the following 3 patchsets:
1): [PATCH v10 0/5] Improve stack unwind on ppc64 [1]
2): [PATCH 0/5] x86_64 gdb stack unwinding support [2]
3): Clean up on top of one-thread-v2 [3]
A complete description of gdb stack unwinding support for crash can be
found in [1].
This patchset can be divided into the following 2 parts:
1) part1: arch independent, mainly modify on the
crash_target.c/gdb_interface.c files, in preparation of the
gdb side.
2) part2: arch specific part, for implementing ppc64/x86_64/arm64 gdb
stack unwinding support.
=== part 2
arm64: Add gdb stack unwinding support
Fix cpumask_t recursive dependence issue
Parse stack by inactive_stack_frame priorily if the struct is valid
x86_64: Add gdb stack unwinding support
ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
=== part 1
Stop stack unwinding at non-kernel address
Fix gdb_interface: restore gdb's output streams at end of gdb_interface
Print task pid/command instead of CPU index
Rename get_cpu_reg to get_current_task_reg
Let crash change gdb context
Leave only one gdb thread for crash
Remove 'frame' from prohibited commands list
===
v2 -> v1:
1) Added the patch: x86_64: Fix invalid input "=>" for bt command,
thanks for Kazu's testing.
2) Modify the patch: x86_64: Add gdb stack unwinding support, added the
pcp_save, spp_save and sp, for restoring the value in match of the original
code logic.
[1]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00469.html
[2]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00488.html
[3]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00554.html
Aditya Gupta (2):
Remove 'frame' from prohibited commands list
ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
Tao Liu (11):
Leave only one gdb thread for crash
Let crash change gdb context
Rename get_cpu_reg to get_current_task_reg
Print task pid/command instead of CPU index
Fix gdb_interface: restore gdb's output streams at end of
gdb_interface
Stop stack unwinding at non-kernel address
x86_64: Add gdb stack unwinding support
Parse stack by inactive_stack_frame priorily if the struct is valid
Fix cpumask_t recursive dependence issue
x86_64: Fix invalid input "=>" for bt command
arm64: Add gdb stack unwinding support
arm64.c | 114 +++++++++++++++++-
crash_target.c | 47 +++++---
defs.h | 187 +++++++++++++++++++++++++++++-
gdb-10.2.patch | 79 +++++++++++++
gdb_interface.c | 33 ++----
kernel.c | 61 ++++++++--
ppc64.c | 163 ++++++++++++++++++++++++--
task.c | 30 +++--
tools.c | 8 +-
x86_64.c | 299 +++++++++++++++++++++++++++++++++++++++++++-----
10 files changed, 916 insertions(+), 105 deletions(-)
--
2.40.1
7 months
Re: [PATCH] Reflect __{start,end}_init_task kernel symbols rename
by Lianbo Jiang
Hi, Alexander
Thank you for the early fix.
On 4/10/24 22:15, devel-request(a)lists.crash-utility.osci.io wrote:
> Date: Wed, 10 Apr 2024 14:55:35 +0200
> From: Alexander Gordeev<agordeev(a)linux.ibm.com>
> Subject: [Crash-utility] [PATCH] Reflect __{start,end}_init_task
> kernel symbols rename
> To:devel@lists.crash-utility.osci.io
> Cc: Alexander Egorenkov<egorenar(a)linux.ibm.com>
> Message-ID:<20240410125535.2891355-1-agordeev(a)linux.ibm.com>
>
> Kernel commit 8f69cba096b5 ("x86: Rename __{start,end}_init_task to
> __{start,end}_init_stack") leads to failure:
>
> crash: invalid count request: 0
Could you please point out which command caused the current failure? or
failed when crash load?
Anyway, the code changes are fine to me.
Thanks
Lianbo
> Assume both __{start,end}_init_task and __{start,end}_init_stack
> symbols could exist for backward compatibility.
>
> Signed-off-by: Alexander Gordeev<agordeev(a)linux.ibm.com>
> ---
> task.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/task.c b/task.c
> index ebdb5be..88e1d50 100644
> --- a/task.c
> +++ b/task.c
> @@ -496,10 +496,17 @@ task_init(void)
> ((len = SIZE(thread_union)) != STACKSIZE())) {
> machdep->stacksize = len;
> } else if (!VALID_SIZE(thread_union) && !VALID_SIZE(task_union)) {
> + len = 0;
> if (kernel_symbol_exists("__start_init_task") &&
> kernel_symbol_exists("__end_init_task")) {
> len = symbol_value("__end_init_task");
> len -= symbol_value("__start_init_task");
> + } else if (kernel_symbol_exists("__start_init_stack") &&
> + kernel_symbol_exists("__end_init_stack")) {
> + len = symbol_value("__end_init_stack");
> + len -= symbol_value("__start_init_stack");
> + }
> + if (len) {
> ASSIGN_SIZE(thread_union) = len;
> machdep->stacksize = len;
> }
> -- 2.40.1
7 months, 2 weeks
[Question] crash-8.0.5 invalid to parse the assembly code by dis cmd for ARM64 crash dump
by qiwu.chen@transsion.com
Dear sirs,
I found a bug for crash-8.0.5 that I failed to parse the assembly code by dis cmd for ARM64 crash dump:
$ crash vmlinux dump.202403061305 -d 1
KERNEL: vmlinux [TAINTED]
DUMPFILE: dump.202403061305 [PARTIAL DUMP]
CPUS: 4crash: get_cpus_online: online: 4
DATE: Wed Mar 6 21:04:30 CST 2024
UPTIME: 2135039823346 days, 00:18:07
LOAD AVERAGE: 0.32, 0.40, 0.17
TASKS: 93
NODENAME: benshushu
RELEASE: 5.15.0+
VERSION: #1 SMP Tue Mar 5 16:54:41 CST 2024
MACHINE: aarch64 (unknown Mhz)
MEMORY: 1 GB
PANIC: "Unable to handle kernel paging request at virtual address ffff800809102430"
PID: 494
COMMAND: "bash"
TASK: ffff000007d11a80 [THREAD_INFO: ffff000007d11a80]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 494 TASK: ffff000007d11a80 CPU: 0 COMMAND: "bash"
0: ffff80001022400c (crash_kexec)
#0 [ffff000007ce34d0] crash_kexec at ffff800010224008
#1 [ffff000007ce3570] die at ffff800010030038
#2 [ffff000007ce35e0] die_kernel_fault at ffff80001005d8e8
#3 [ffff000007ce3610] __do_kernel_fault at ffff80001005dbf4
#4 [ffff000007ce3650] do_bad_area at ffff80001005de14
#5 [ffff000007ce36b0] do_translation_fault at ffff800011172f84
#6 [ffff000007ce3700] do_mem_abort at ffff80001005e220
#7 [ffff000007ce3760] el1_abort at ffff800011162210
#8 [ffff000007ce3790] el1h_64_sync_handler at ffff80001116243c
#9 [ffff000007ce38f0] el1h_64_sync at ffff8000100111dc
......
crash> dis do_mem_abort
crash> dis -x ffff80001005e220 -r 8
0xffff80001005e184 <do_mem_abort>:
crash> dis do_mem_abort
0xffff80001005e184 <do_mem_abort>:
crash> dis do_translation_fault
0xffff800011172ed4 <do_translation_fault>:
There is no problem for crash-8.0.4:
crash> dis do_mem_abort
0xffff80001005e184 <do_mem_abort>: mov x9, x30
0xffff80001005e188 <do_mem_abort+4>: nop
0xffff80001005e18c <do_mem_abort+8>: stp x29, x30, [sp, #-96]!
0xffff80001005e190 <do_mem_abort+12>: mov x29, sp
......
There must be some change corrupted the ARM64 dis function. Please help look at the issue.
Thanks
7 months, 4 weeks
[PATCH] remove struct zspage_5_17 and use union to resolve issue
by Guanyou Chen
Hi LianBo
We don't need struct zspage_5_17.
---
defs.h | 32 +++++++++++++++-----------------
diskdump.c | 15 ++++++---------
2 files changed, 21 insertions(+), 26 deletions(-)
diff --git a/defs.h b/defs.h
index 3cb8e63..01f316e 100644
--- a/defs.h
+++ b/defs.h
@@ -7407,28 +7407,26 @@ ulong try_zram_decompress(ulonglong pte_val,
unsigned char *buf, ulong len, ulon
#define SECTORS_PER_PAGE (1 << SECTORS_PER_PAGE_SHIFT)
struct zspage {
- struct {
- unsigned int fullness : 2;
- unsigned int class : 9;
- unsigned int isolated : 3;
- unsigned int magic : 8;
+ union {
+ unsigned int flag_bits;
+ struct {
+ unsigned int fullness : 2;
+ unsigned int class : 9;
+ unsigned int isolated : 3;
+ unsigned int magic : 8;
+ } v0;
+ struct {
+ unsigned int huge : 1;
+ unsigned int fullness : 2;
+ unsigned int class : 9;
+ unsigned int isolated : 3;
+ unsigned int magic : 8;
+ } v5_17;
};
unsigned int inuse;
unsigned int freeobj;
};
-struct zspage_5_17 {
- struct {
- unsigned int huge : 1;
- unsigned int fullness : 2;
- unsigned int class : 9;
- unsigned int isolated : 3;
- unsigned int magic : 8;
- };
- unsigned int inuse;
- unsigned int freeobj;
-};
-
/*
* makedumpfile.c
*/
diff --git a/diskdump.c b/diskdump.c
index 3ae7bf2..a928a0e 100644
--- a/diskdump.c
+++ b/diskdump.c
@@ -2819,7 +2819,6 @@ zram_object_addr(ulong pool, ulong handle, unsigned
char *zram_buf)
{
ulong obj, off, class, page, zspage;
struct zspage zspage_s;
- struct zspage_5_17 zspage_5_17_s;
physaddr_t paddr;
unsigned int obj_idx, class_idx, size;
ulong pages[2], sizes[2];
@@ -2833,15 +2832,13 @@ zram_object_addr(ulong pool, ulong handle, unsigned
char *zram_buf)
readmem(page + OFFSET(page_private), KVADDR, &zspage,
sizeof(void *), "page_private", FAULT_ON_ERROR);
+ readmem(zspage, KVADDR, &zspage_s, sizeof(struct zspage), "zspage",
FAULT_ON_ERROR);
if (VALID_MEMBER(zspage_huge)) {
- readmem(zspage, KVADDR, &zspage_5_17_s,
- sizeof(struct zspage_5_17), "zspage_5_17", FAULT_ON_ERROR);
- class_idx = zspage_5_17_s.class;
- zs_magic = zspage_5_17_s.magic;
+ class_idx = zspage_s.v5_17.class;
+ zs_magic = zspage_s.v5_17.magic;
} else {
- readmem(zspage, KVADDR, &zspage_s, sizeof(struct zspage), "zspage",
FAULT_ON_ERROR);
- class_idx = zspage_s.class;
- zs_magic = zspage_s.magic;
+ class_idx = zspage_s.v0.class;
+ zs_magic = zspage_s.v0.magic;
}
if (zs_magic != ZSPAGE_MAGIC)
@@ -2887,7 +2884,7 @@ zram_object_addr(ulong pool, ulong handle, unsigned
char *zram_buf)
out:
if (VALID_MEMBER(zspage_huge)) {
- if (!zspage_5_17_s.huge)
+ if (!zspage_s.v5_17.huge)
return (zram_buf + ZS_HANDLE_SIZE);
} else {
readmem(page, KVADDR, &obj, sizeof(void *), "page flags",
FAULT_ON_ERROR);
--
2.39.0
7 months, 4 weeks
[PATCH] arm64: section_size_bits compatible with macro definitions
by Guanyou Chen
Hi Kazu,
Compatible with google android GKI changes,
SECTION_SIZE_BITS = 27 when defined 4K_PAGES or 16K_PAGES.
SECTION_SIZE_BITS = 29 when defined 64K_PAGES.
Link:
https://lore.kernel.org/lkml/15cf9a2359197fee0168f820c5c904650d07939e.161...
Link:
https://lore.kernel.org/all/43843c5e092bfe3ec4c41e3c8c78a7ee35b69bb0.1611...
See:
https://cs.android.com/android/_/android/kernel/common/+/673e9ab6b64f9811...
Before android-12-gki:
crash> help -m | grep section_size_bits
section_size_bits: 30
The first PFN error, the physical address should be 0x40000000.
crash> kmem -p
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffffff06e00000 200000000 ffffff80edf4fa12 ffffffff070f3640 1
4000000000002000 private
After android-12-gki:
crash> help -m | grep section
section_size_bits: 27
crash> kmem -p
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffffffeffe00000 40000000 0 0 1 1000 reserved
Signed-off-by: chenguanyou <chenguanyou(a)xiaomi.com>
---
arm64.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/arm64.c b/arm64.c
index e36c723..50e22ea 100644
--- a/arm64.c
+++ b/arm64.c
@@ -1629,7 +1629,16 @@ arm64_get_section_size_bits(void)
if ((ret = get_kernel_config("CONFIG_HOTPLUG_SIZE_BITS",
&string)) == IKCONFIG_STR)
machdep->section_size_bits = atol(string);
}
- }
+
+ // arm64: reduce section size for sparsemem
+ if ((ret = get_kernel_config("CONFIG_ARM64_4K_PAGES", NULL)) ==
IKCONFIG_Y
+ || (ret = get_kernel_config("CONFIG_ARM64_16K_PAGES",
NULL)) == IKCONFIG_Y)
+ machdep->section_size_bits = _SECTION_SIZE_BITS_5_12;
+ // arm64/sparsemem: reduce SECTION_SIZE_BITS
+ else if ((ret = get_kernel_config("CONFIG_ARM64_64K_PAGES", NULL))
== IKCONFIG_Y)
+ machdep->section_size_bits = _SECTION_SIZE_BITS_5_12_64K;
+
+ }
if (CRASHDEBUG(1))
fprintf(fp, "SECTION_SIZE_BITS: %ld\n", machdep->section_size_bits);
--
2.39.0
Thanks,
Guanyou.Chen
8 months
Re: [PATCH v2 00/13] gdb stack unwinding support for crash utility
by Lianbo Jiang
Hi, Tao
Thank you for the update.
I will look at the v2 later, maybe take some time to test again.
Thanks.
Lianbo
On 4/28/24 12:02, devel-request(a)lists.crash-utility.osci.io wrote:
> Date: Sun, 28 Apr 2024 12:01:57 +0800
> From: Tao Liu<ltao(a)redhat.com>
> Subject: [Crash-utility] [Crash-Utility][PATCH v2 00/13] gdb stack
> unwinding support for crash utility
> To:devel@lists.crash-utility.osci.io
> Cc: Tao Liu<ltao(a)redhat.com>
> Message-ID:<20240428040210.11474-1-ltao(a)redhat.com>
> Content-Type: text/plain; charset=UTF-8
>
> This patchset is a rebase/merged version of the following 3 patchsets:
>
> 1): [PATCH v10 0/5] Improve stack unwind on ppc64 [1]
> 2): [PATCH 0/5] x86_64 gdb stack unwinding support [2]
> 3): Clean up on top of one-thread-v2 [3]
>
> A complete description of gdb stack unwinding support for crash can be
> found in [1].
>
> This patchset can be divided into the following 2 parts:
>
> 1) part1: arch independent, mainly modify on the
> crash_target.c/gdb_interface.c files, in preparation of the
> gdb side.
> 2) part2: arch specific part, for implementing ppc64/x86_64/arm64 gdb
> stack unwinding support.
>
> === part 2
> arm64: Add gdb stack unwinding support
> Fix cpumask_t recursive dependence issue
> Parse stack by inactive_stack_frame priorily if the struct is valid
> x86_64: Add gdb stack unwinding support
> ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
>
> === part 1
> Stop stack unwinding at non-kernel address
> Fix gdb_interface: restore gdb's output streams at end of gdb_interface
> Print task pid/command instead of CPU index
> Rename get_cpu_reg to get_current_task_reg
> Let crash change gdb context
> Leave only one gdb thread for crash
> Remove 'frame' from prohibited commands list
> ===
>
> v2 -> v1:
> 1) Added the patch: x86_64: Fix invalid input "=>" for bt command,
> thanks for Kazu's testing.
> 2) Modify the patch: x86_64: Add gdb stack unwinding support, added the
> pcp_save, spp_save and sp, for restoring the value in match of the original
> code logic.
>
> [1]:https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg0046...
> [2]:https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg0048...
> [3]:https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg0055...
>
> Aditya Gupta (2):
> Remove 'frame' from prohibited commands list
> ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
>
> Tao Liu (11):
> Leave only one gdb thread for crash
> Let crash change gdb context
> Rename get_cpu_reg to get_current_task_reg
> Print task pid/command instead of CPU index
> Fix gdb_interface: restore gdb's output streams at end of
> gdb_interface
> Stop stack unwinding at non-kernel address
> x86_64: Add gdb stack unwinding support
> Parse stack by inactive_stack_frame priorily if the struct is valid
> Fix cpumask_t recursive dependence issue
> x86_64: Fix invalid input "=>" for bt command
> arm64: Add gdb stack unwinding support
>
> arm64.c | 114 +++++++++++++++++-
> crash_target.c | 47 +++++---
> defs.h | 187 +++++++++++++++++++++++++++++-
> gdb-10.2.patch | 79 +++++++++++++
> gdb_interface.c | 33 ++----
> kernel.c | 61 ++++++++--
> ppc64.c | 163 ++++++++++++++++++++++++--
> task.c | 30 +++--
> tools.c | 8 +-
> x86_64.c | 299 +++++++++++++++++++++++++++++++++++++++++++-----
> 10 files changed, 916 insertions(+), 105 deletions(-)
>
> -- 2.40.1
8 months, 1 week
[PATCH] crash_target: Support for GDB debugging of all tasks
by Alexey Makhalov
Support for GDB debugging of all tasks active and inactive.
Before this commit only active tasks were listed by "info threads"
with "CPU #" as a Target Id.
"info threads" will now show all tasks, similar to "ps", example:
crash> info threads
Id Target Id Frame
* 1 0 swapper/0 0xffffffffadba19d4 in default_idle () at arch/x86/kernel/process.c:731
2 0 swapper/1 0xffffffffadba19d4 in default_idle () at arch/x86/kernel/process.c:731
3 0 swapper/2 0xffffffffadba19d4 in default_idle () at arch/x86/kernel/process.c:731
4 0 swapper/3 0xffffffffadba19d4 in default_idle () at arch/x86/kernel/process.c:731
5 0 swapper/4 0xffffffffadb97292 in context_switch (rf=0xffffbaf0000f3e88, next=0xffff9ecb04908000, prev=<optimized out>, rq=<optimized out>) at kernel/sched/core.c:5372
...
730 970325 taskset 0xffffffffadb97292 in context_switch (rf=0xffffbaf006a0fd18, next=0xffff9ecb0aec0000, prev=<optimized out>, rq=<optimized out>) at kernel/sched/core.c:5372
731 975217 sleep 0xffffffffadb97292 in context_switch (rf=0xffffbaf005743c20, next=0xffff9ecac0692880, prev=<optimized out>, rq=<optimized out>) at kernel/sched/core.c:5372
732 975228 sleep 0xffffffffadb97292 in context_switch (rf=0xffffbaf00696fb58, next=0xffff9ecac0690000, prev=<optimized out>, rq=<optimized out>) at kernel/sched/core.c:5372
...
876 976084 docker 0xffffffffadb97292 in context_switch (rf=0xffffbaf0153dbd10, next=0xffff9ecac0645100, prev=<optimized out>, rq=<optimized out>) at kernel/sched/core.c:5372
877 976085 systemd-userwor 0xffffffffadb97292 in context_switch (rf=0xffffbaf0153cbc58, next=0xffff9ecac0645100, prev=<optimized out>, rq=<optimized out>) at kernel/sched/core.c:5372
878 976086 systemd-userwor 0xffffffffadb97292 in context_switch (rf=0xffffbaf0153e3c58, next=0xffffffffaec15a40 <init_task>, prev=<optimized out>, rq=<optimized out>) at kernel/sched/core.c:5372
879 976087 systemd-userwor 0xffffffffadb97292 in context_switch (rf=0xffffbaf0153ebc58, next=0xffffffffaec15a40 <init_task>, prev=<optimized out>, rq=<optimized out>) at kernel/sched/core.c:5372
Where "Target ID" contains "PID COMM" of the task
Example of "731 975217 sleep" debugging, real case, trying to
figure out why sleep was stuck in uninterruptable sleep.
Backtrace using crash:
crash> ps | grep 975217
975217 969797 3 ffff9ecb3956a880 UN 0.0 0 0 sleep
crash> bt 975217
PID: 975217 TASK: ffff9ecb3956a880 CPU: 3 COMMAND: "sleep"
#0 [ffffbaf005743ba0] __schedule at ffffffffadb97292
#1 [ffffbaf005743c60] schedule at ffffffffadb982b8
#2 [ffffbaf005743c80] rwbase_write_lock at ffffffffadb9aed7
#3 [ffffbaf005743cc0] down_write at ffffffffadb9b133
#4 [ffffbaf005743cd0] unlink_file_vma at ffffffffad2b0e2e
#5 [ffffbaf005743cf8] free_pgtables at ffffffffad2a47b0
#6 [ffffbaf005743d88] exit_mmap at ffffffffad2b3b8d
#7 [ffffbaf005743e80] mmput at ffffffffad08c81f
#8 [ffffbaf005743e98] do_exit at ffffffffad09636c
#9 [ffffbaf005743ef8] do_group_exit at ffffffffad096c78
RIP: 00007f111c70ddf9 RSP: 00007fff451817e8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f111c8089e0 RCX: 00007f111c70ddf9
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R8: ffffffffffffff80 R9: 0000000000000000
R10: 00007fff451817b0 R11: 0000000000000246 R12: 00007f111c8089e0
R13: 00007f111c80e2e0 R14: 0000000000000002 R15: 00007f111c80e2c8
ORIG_RAX: 00000000000000e7 CS: 0033 SS: 002b
Backtrace using gdb (pay attention, task must be selected by thread Id):
crash> thread 731
[Switching to thread 731 ( 975217 sleep)]
5372 switch_to(prev, next, prev);
crash> gdb bt
#0 0xffffffffadb97292 in context_switch (rf=0xffffbaf005743c20, next=0xffff9ecac0692880, prev=<optimized out>, rq=<optimized out>) at kernel/sched/core.c:5372
#1 __schedule (sched_mode=sched_mode@entry=0) at kernel/sched/core.c:6696
#2 0xffffffffadb982b8 in schedule () at kernel/sched/core.c:6772
#3 0xffffffffadb9aed7 in rwbase_write_lock (rwb=rwb@entry=0xffff9ecaf1831430, state=state@entry=2) at kernel/locking/rwbase_rt.c:259
#4 0xffffffffadb9b133 in __down_write (sem=sem@entry=0xffff9ecaf1831430) at kernel/locking/rwsem.c:1474
#5 down_write (sem=sem@entry=0xffff9ecaf1831430) at kernel/locking/rwsem.c:1574
#6 0xffffffffad2b0e2e in i_mmap_lock_write (mapping=<optimized out>) at ./include/linux/fs.h:466
#7 unlink_file_vma (vma=vma@entry=0xffff9ecadd566090) at mm/mmap.c:127
#8 0xffffffffad2a47b0 in free_pgtables (tlb=tlb@entry=0xffffbaf005743dd0, mt=mt@entry=0xffff9ecb28e3b180, vma=0xffff9ecadd566090, vma@entry=0xffff9ecadd566000, floor=floor@entry=0, ceiling=ceiling@entry=0) at mm/memory.c:431
#9 0xffffffffad2b3b8d in exit_mmap (mm=mm@entry=0xffff9ecb28e3b180) at mm/mmap.c:3237
#10 0xffffffffad08c81f in __mmput (mm=0xffff9ecb28e3b180) at kernel/fork.c:1204
#11 mmput (mm=mm@entry=0xffff9ecb28e3b180) at kernel/fork.c:1226
#12 0xffffffffad09636c in exit_mm () at kernel/exit.c:563
#13 do_exit (code=code@entry=0) at kernel/exit.c:856
#14 0xffffffffad096c78 in do_group_exit (exit_code=0) at kernel/exit.c:1019
#15 0xffffffffad096cf8 in __do_sys_exit_group (error_code=<optimized out>) at kernel/exit.c:1030
#16 __se_sys_exit_group (error_code=<optimized out>) at kernel/exit.c:1028
#17 __x64_sys_exit_group (regs=<optimized out>) at kernel/exit.c:1028
#18 0xffffffffadb8a327 in do_syscall_x64 (nr=<optimized out>, regs=0xffffbaf005743f58) at arch/x86/entry/common.c:51
#19 do_syscall_64 (regs=0xffffbaf005743f58, nr=<optimized out>) at arch/x86/entry/common.c:81
#20 0xffffffffadc000dc in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120
#21 0x00007f111c80e2c8 in ?? ()
#22 0x0000000000000002 in ?? ()
#23 0x00007f111c80e2e0 in ?? ()
#24 0x00007f111c8089e0 in ?? ()
#25 0x0000000000000000 in ?? ()
crash> f 3
259 rwbase_schedule();
crash> p *rwb
$1 = {
readers = {
counter = 1
},
rtmutex = {
wait_lock = {
raw_lock = {
{
val = {
counter = 0
},
{
locked = 0 '\000',
pending = 0 '\000'
},
{
locked_pending = 0,
tail = 0
}
}
}
},
waiters = {
rb_root = {
rb_node = 0xffffbaf006977be0
},
rb_leftmost = 0xffffbaf00696fbe0
},
owner = 0xffff9ecb3956a881
}
}
Additional changes:
1. Allow gdb "frame" command.
2. Blacklist useless gdb "gcore" command. Use gcore plugin instead.
3. Move crash_target_init() to later time as crash target requires a list of
tasks to be initialized.
Known issues and TBD items:
1. "info threads" may bail out first time throwing errors trying to access
userspace address during unwind process. Following "info threads"
invokations run without issues.
2. To unwind a stack of inactive task, only modern Linux versions, which use
inactive_task_frame, are supported and only x86_64 architecture.
3. gdb bt unwinder does not stop properly and may show invalid frames (21-25
on example above). Not a regression, existed before.
4. gdb bt unwinder does not work on active tasks in userspace. Not a regression,
existed before.
5. Only x86_64 architecture supported. machdep->get_task_reg() must be
implemented for others. Not a regression, existed before.
6. Active tasks registers fetching imlemented only for VMware dumps, see
x86_64_get_task_reg() for more details. Not a regression, existed before.
Signed-off-by: Alexey Makhalov <alexey.makhalov(a)broadcom.com>
---
crash_target.c | 39 ++++++++++++++++++++---------
defs.h | 10 ++++++--
gdb-10.2.patch | 7 ++----
gdb_interface.c | 63 ++++++++++++++++++++++++++++++----------------
help.c | 1 +
main.c | 1 +
task.c | 1 +
x86_64.c | 66 +++++++++++++++++++++++++++++++++++++++++++------
8 files changed, 140 insertions(+), 48 deletions(-)
diff --git a/crash_target.c b/crash_target.c
index 4554806..2fdf203 100644
--- a/crash_target.c
+++ b/crash_target.c
@@ -2,6 +2,8 @@
* crash_target.c
*
* Copyright (c) 2021 VMware, Inc.
+ * Copyright (c) 2024 Broadcom. All Rights Reserved. The term "Broadcom"
+ * refers to Broadcom Inc. and/or its subsidiaries.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -13,7 +15,7 @@
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
- * Author: Alexey Makhalov <amakhalov(a)vmware.com>
+ * Author: Alexey Makhalov <alexey.makhalov(a)broadcom.com>
*/
#include <defs.h>
@@ -23,11 +25,11 @@
#include "regcache.h"
#include "gdbarch.h"
-void crash_target_init (void);
-
+extern "C" void crash_target_init (void);
extern "C" int gdb_readmem_callback(unsigned long, void *, int, int);
-extern "C" int crash_get_nr_cpus(void);
-extern "C" int crash_get_cpu_reg (int cpu, int regno, const char *regname,
+extern "C" int crash_get_nr_tasks(void);
+extern "C" void crash_get_task_info(int task_nr, unsigned long *pid, char **comm);
+extern "C" int crash_get_task_reg (int task_nr, int regno, const char *regname,
int regsize, void *val);
@@ -60,7 +62,13 @@ public:
bool has_registers () override { return true; }
bool thread_alive (ptid_t ptid) override { return true; }
std::string pid_to_str (ptid_t ptid) override
- { return string_printf ("CPU %ld", ptid.tid ()); }
+ {
+ unsigned long pid;
+ char *comm;
+
+ crash_get_task_info(ptid.tid(), &pid, &comm);
+ return string_printf ("%7ld %s", pid, comm);
+ }
};
@@ -68,18 +76,25 @@ public:
void
crash_target::fetch_registers (struct regcache *regcache, int regno)
{
+ int r;
gdb_byte regval[16];
- int cpu = inferior_ptid.tid();
+ int task_nr = inferior_ptid.tid();
struct gdbarch *arch = regcache->arch ();
- for (int r = 0; r < gdbarch_num_regs (arch); r++)
+ if (regno >= 0) {
+ r = regno;
+ goto onetime;
+ }
+
+ for (r = 0; regno == -1 && r < gdbarch_num_regs (arch); r++)
{
+onetime:
const char *regname = gdbarch_register_name(arch, r);
int regsize = register_size (arch, r);
if (regsize > sizeof (regval))
error (_("fatal error: buffer size is not enough to fit register value"));
- if (crash_get_cpu_reg (cpu, r, regname, regsize, (void *)®val))
+ if (crash_get_task_reg (task_nr, r, regname, regsize, (void *)®val))
regcache->raw_supply (r, regval);
else
regcache->raw_supply (r, NULL);
@@ -107,10 +122,10 @@ crash_target::xfer_partial (enum target_object object, const char *annex,
#define CRASH_INFERIOR_PID 1
-void
+extern "C" void
crash_target_init (void)
{
- int nr_cpus = crash_get_nr_cpus();
+ int nr_tasks = crash_get_nr_tasks();
crash_target *target = new crash_target ();
/* Own the target until it is successfully pushed. */
@@ -119,7 +134,7 @@ crash_target_init (void)
push_target (std::move (target_holder));
inferior_appeared (current_inferior (), CRASH_INFERIOR_PID);
- for (int i = 0; i < nr_cpus; i++)
+ for (int i = 0; i < nr_tasks; i++)
{
thread_info *thread = add_thread_silent (target,
ptid_t(CRASH_INFERIOR_PID, 0, i));
diff --git a/defs.h b/defs.h
index 98650e8..2b3f247 100644
--- a/defs.h
+++ b/defs.h
@@ -1080,7 +1080,7 @@ struct machdep_table {
void (*get_irq_affinity)(int);
void (*show_interrupts)(int, ulong *);
int (*is_page_ptr)(ulong, physaddr_t *);
- int (*get_cpu_reg)(int, int, const char *, int, void *);
+ int (*get_task_reg)(struct task_context *, int, const char *, int, void *);
int (*is_cpu_prstatus_valid)(int cpu);
};
@@ -2263,6 +2263,7 @@ struct size_table { /* stash of commonly-used sizes */
long pt_regs;
long task_struct;
long thread_info;
+ long inactive_task_frame;
long softirq_state;
long desc_struct;
long umode_t;
@@ -8001,9 +8002,14 @@ extern int have_full_symbols(void);
#define XEN_HYPERVISOR_ARCH
#endif
+/*
+ * crash_target.c
+ */
+extern void crash_target_init (void);
+
/*
* Register numbers must be in sync with gdb/features/i386/64bit-core.c
- * to make crash_target->fetch_registers() ---> machdep->get_cpu_reg()
+ * to make crash_target->fetch_registers() ---> machdep->get_task_reg()
* working properly.
*/
enum x86_64_regnum {
diff --git a/gdb-10.2.patch b/gdb-10.2.patch
index a7018a2..ecf673d 100644
--- a/gdb-10.2.patch
+++ b/gdb-10.2.patch
@@ -221,7 +221,7 @@ exit 0
warning (_("\
--- gdb-10.2/gdb/main.c.orig
+++ gdb-10.2/gdb/main.c
-@@ -392,6 +392,14 @@ start_event_loop ()
+@@ -392,6 +392,13 @@ start_event_loop ()
return;
}
@@ -230,7 +230,6 @@ exit 0
+extern "C" void main_loop(void);
+extern "C" unsigned long crash_get_kaslr_offset(void);
+extern "C" int console(const char *, ...);
-+void crash_target_init (void);
+#endif
+
/* Call command_loop. */
@@ -316,7 +315,7 @@ exit 0
}
}
-@@ -1242,6 +1274,16 @@ captured_main (void *data)
+@@ -1242,6 +1274,14 @@ captured_main (void *data)
captured_main_1 (context);
@@ -324,8 +323,6 @@ exit 0
+ /* Relocate the vmlinux. */
+ objfile_rebase (symfile_objfile, crash_get_kaslr_offset());
+
-+ crash_target_init();
-+
+ /* Back to crash. */
+ main_loop();
+#endif
diff --git a/gdb_interface.c b/gdb_interface.c
index b14319c..03178f5 100644
--- a/gdb_interface.c
+++ b/gdb_interface.c
@@ -3,6 +3,8 @@
* Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
* Copyright (C) 2002-2015,2018-2019 David Anderson
* Copyright (C) 2002-2015,2018-2019 Red Hat, Inc. All rights reserved.
+ * Copyright (c) 2024 Broadcom. All Rights Reserved. The term "Broadcom"
+ * refers to Broadcom Inc. and/or its subsidiaries.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -711,7 +713,7 @@ static char *prohibited_list[] = {
"watch", "rwatch", "awatch", "attach", "continue", "c", "fg", "detach",
"finish", "handle", "interrupt", "jump", "kill", "next", "nexti",
"signal", "step", "s", "stepi", "target", "until", "delete",
- "clear", "disable", "enable", "condition", "ignore", "frame", "catch",
+ "clear", "disable", "enable", "condition", "ignore", "gcore", "catch",
"tcatch", "return", "file", "exec-file", "core-file", "symbol-file",
"load", "si", "ni", "shell", "sy",
NULL /* must be last */
@@ -877,6 +879,7 @@ gdb_readmem_callback(ulong addr, void *buf, int len, int write)
switch (len)
{
case SIZEOF_8BIT:
+ fprintf(fp, "%s\n", pc->curcmd);
if (STREQ(pc->curcmd, "bt")) {
if (readmem(addr, memtype, buf, SIZEOF_8BIT,
"gdb_readmem_callback", readflags))
@@ -1063,34 +1066,52 @@ get_frame_offset(ulong pc)
unsigned long crash_get_kaslr_offset(void);
unsigned long crash_get_kaslr_offset(void)
{
- return kt->relocate * -1;
+ return kt->relocate * -1;
}
/* Callbacks for crash_target */
-int crash_get_nr_cpus(void);
-int crash_get_cpu_reg (int cpu, int regno, const char *regname,
+int crash_get_nr_tasks(void);
+void crash_get_task_info(int task_nr, unsigned long *pid, char **comm);
+int crash_get_task_reg (int task_nr, int regno, const char *regname,
int regsize, void *val);
-int crash_get_nr_cpus(void)
+int crash_get_nr_tasks(void)
{
- if (SADUMP_DUMPFILE())
- return sadump_get_nr_cpus();
- else if (DISKDUMP_DUMPFILE())
- return diskdump_get_nr_cpus();
- else if (KDUMP_DUMPFILE())
- return kdump_get_nr_cpus();
- else if (VMSS_DUMPFILE())
- return vmware_vmss_get_nr_cpus();
-
- /* Just CPU #0 */
- return 1;
+ return RUNNING_TASKS();
}
-int crash_get_cpu_reg (int cpu, int regno, const char *regname,
- int regsize, void *value)
+/* Get task information by its index number in TT */
+void crash_get_task_info(int task_nr, unsigned long *pid, char **comm)
{
- if (!machdep->get_cpu_reg)
- return FALSE;
- return machdep->get_cpu_reg(cpu, regno, regname, regsize, value);
+ int i;
+ struct task_context *tc;
+
+ tc = FIRST_CONTEXT();
+ for (i = 0; i < RUNNING_TASKS(); i++, tc++)
+ if (i == task_nr) {
+ *pid = tc->pid;
+ *comm = tc->comm;
+ return;
+ }
+ *pid = 0;
+ *comm = NULL;
+ return;
+}
+
+int crash_get_task_reg (int task_nr, int regno, const char *regname,
+ int regsize, void *value)
+{
+ int i;
+ struct task_context *tc;
+
+ if (!machdep->get_task_reg)
+ return FALSE;
+
+ tc = FIRST_CONTEXT();
+ for (i = 0; i < RUNNING_TASKS(); i++, tc++)
+ if (i == task_nr) {
+ return machdep->get_task_reg(tc, regno, regname, regsize, value);
+ }
+ return FALSE;
}
diff --git a/help.c b/help.c
index a9c4d30..85dbda5 100644
--- a/help.c
+++ b/help.c
@@ -8520,6 +8520,7 @@ char *version_info[] = {
"Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.",
"Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.",
"Copyright (C) 2015, 2021 VMware, Inc.",
+"Copyright (C) 2024 Broadcom, Inc.",
"This program is free software, covered by the GNU General Public License,",
"and you are welcome to change it and/or distribute copies of it under",
"certain conditions. Enter \"help copying\" to see the conditions.",
diff --git a/main.c b/main.c
index 0b6b927..13acd2d 100644
--- a/main.c
+++ b/main.c
@@ -794,6 +794,7 @@ main_loop(void)
} else
SIGACTION(SIGINT, restart, &pc->sigaction, NULL);
+ crash_target_init();
/*
* Display system statistics and current context.
*/
diff --git a/task.c b/task.c
index ebdb5be..5d26c52 100644
--- a/task.c
+++ b/task.c
@@ -298,6 +298,7 @@ task_init(void)
tt->flags |= THREAD_INFO;
}
+ STRUCT_SIZE_INIT(inactive_task_frame, "inactive_task_frame");
MEMBER_OFFSET_INIT(task_struct_state, "task_struct", "state");
MEMBER_SIZE_INIT(task_struct_state, "task_struct", "state");
if (INVALID_MEMBER(task_struct_state)) {
diff --git a/x86_64.c b/x86_64.c
index 502817d..b6e36a5 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -126,7 +126,7 @@ static int x86_64_get_framesize(struct bt_info *, ulong, ulong, char *);
static void x86_64_framesize_debug(struct bt_info *);
static void x86_64_get_active_set(void);
static int x86_64_get_kvaddr_ranges(struct vaddr_range *);
-static int x86_64_get_cpu_reg(int, int, const char *, int, void *);
+static int x86_64_get_task_reg(struct task_context *, int, const char *, int, void *);
static int x86_64_verify_paddr(uint64_t);
static void GART_init(void);
static void x86_64_exception_stacks_init(void);
@@ -195,7 +195,7 @@ x86_64_init(int when)
machdep->machspec->irq_eframe_link = UNINITIALIZED;
machdep->machspec->irq_stack_gap = UNINITIALIZED;
machdep->get_kvaddr_ranges = x86_64_get_kvaddr_ranges;
- machdep->get_cpu_reg = x86_64_get_cpu_reg;
+ machdep->get_task_reg = x86_64_get_task_reg;
if (machdep->cmdline_args[0])
parse_cmdline_args();
if ((string = pc->read_vmcoreinfo("relocate"))) {
@@ -891,7 +891,7 @@ x86_64_dump_machdep_table(ulong arg)
fprintf(fp, " is_page_ptr: x86_64_is_page_ptr()\n");
fprintf(fp, " verify_paddr: x86_64_verify_paddr()\n");
fprintf(fp, " get_kvaddr_ranges: x86_64_get_kvaddr_ranges()\n");
- fprintf(fp, " get_cpu_reg: x86_64_get_cpu_reg()\n");
+ fprintf(fp, " get_task_reg: x86_64_get_task_reg()\n");
fprintf(fp, " init_kernel_pgd: x86_64_init_kernel_pgd()\n");
fprintf(fp, "clear_machdep_cache: x86_64_clear_machdep_cache()\n");
fprintf(fp, " xendump_p2m_create: %s\n", PVOPS_XEN() ?
@@ -6398,6 +6398,9 @@ x86_64_ORC_init(void)
};
struct ORC_data *orc;
+ MEMBER_OFFSET_INIT(inactive_task_frame_bp, "inactive_task_frame", "bp");
+ MEMBER_OFFSET_INIT(inactive_task_frame_ret_addr, "inactive_task_frame", "ret_addr");
+
if (machdep->flags & FRAMEPOINTER)
return;
@@ -6455,9 +6458,6 @@ x86_64_ORC_init(void)
orc->__stop_orc_unwind = symbol_value("__stop_orc_unwind");
orc->orc_lookup = symbol_value("orc_lookup");
- MEMBER_OFFSET_INIT(inactive_task_frame_bp, "inactive_task_frame", "bp");
- MEMBER_OFFSET_INIT(inactive_task_frame_ret_addr, "inactive_task_frame", "ret_addr");
-
orc->has_signal = MEMBER_EXISTS("orc_entry", "signal"); /* added at 6.3 */
orc->has_end = MEMBER_EXISTS("orc_entry", "end"); /* removed at 6.4 */
@@ -9070,14 +9070,64 @@ x86_64_get_kvaddr_ranges(struct vaddr_range *vrp)
}
static int
-x86_64_get_cpu_reg(int cpu, int regno, const char *name,
+x86_64_get_task_reg(struct task_context *tc, int regno, const char *name,
int size, void *value)
{
if (regno >= LAST_REGNUM)
return FALSE;
+ /*
+ * For inactive task, grab rip, rbp, rbx, r12, r13, r14 and r15 from
+ * inactive_task_frame (see __switch_to_asm). Other regs saved on
+ * regular frame.
+ */
+ if (!is_task_active(tc->task)) {
+ int frame_size = STRUCT_SIZE("inactive_task_frame");
+
+ /* Only modern kernels supported. */
+ if (tt->flags & THREAD_INFO && frame_size == 7 * 8) {
+ ulong rsp;
+ int offset = 0;
+ switch (regno) {
+ case RSP_REGNUM:
+ readmem(tc->task + OFFSET(task_struct_thread) +
+ OFFSET(thread_struct_rsp), KVADDR,
+ &rsp, sizeof(void *),
+ "thread_struct rsp", FAULT_ON_ERROR);
+ rsp += frame_size;
+ memcpy(value, &rsp, size);
+ return TRUE;
+ case RIP_REGNUM:
+ offset += 8;
+ case RBP_REGNUM:
+ offset += 8;
+ case RBX_REGNUM:
+ offset += 8;
+ case R12_REGNUM:
+ offset += 8;
+ case R13_REGNUM:
+ offset += 8;
+ case R14_REGNUM:
+ offset += 8;
+ case R15_REGNUM:
+ readmem(tc->task + OFFSET(task_struct_thread) +
+ OFFSET(thread_struct_rsp), KVADDR,
+ &rsp, sizeof(void *),
+ "thread_struct rsp", FAULT_ON_ERROR);
+ readmem(rsp + offset, KVADDR, value, sizeof(void *),
+ "inactive_thread_frame saved regs", FAULT_ON_ERROR);
+ return TRUE;
+ }
+ }
+ /* TBD: older kernels support. */
+ return FALSE;
+ }
+
+ /*
+ * Task is active, grab CPU's registers
+ */
if (VMSS_DUMPFILE())
- return vmware_vmss_get_cpu_reg(cpu, regno, name, size, value);
+ return vmware_vmss_get_cpu_reg(tc->processor, regno, name, size, value);
return FALSE;
}
--
2.39.0
8 months, 1 week
[PATCH v5 ] Adding the zram decompression algorithm "lzo-rle"
by Yulong TANG 汤玉龙
In Linux 5.1, the ZRAM block driver has changed its default compressor from "lzo" to "lzo-rle" to enhance LZO compression support. However, crash does not support the improved LZO algorithm, resulting in failure when reading memory.
change default compressor : ce82f19fd5809f0cf87ea9f753c5cc65ca0673d6
The issue was discovered when using the extension 'gcore' to generate a process coredump, which was found to be incomplete and unable to be opened properly with gdb.
This patch is for Crash-utility tool, it enables the Crash-utility to support decompression of the "lzo-rle" compression algorithm used in zram. The patch has been tested with vmcore files from kernel version 5.4, and successfully allows reading of memory compressed with the zram compression algorithm.
Testing:
========
before apply this patch :
crash> gcore -v 0 1
gcore: WARNING: only the lzo compressor is supported
gcore: WARNING: only the lzo compressor is supported
gcore: WARNING: only the lzo compressor is supported
after:
crash> gcore -v 0
1 Saved core.1.init
Changelog:
==========
v2: keep the "if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)" related code of the copied kernel code, but change the "if defined" macro into a runtime check .
v3: set a default value of HAVE_EFFICIENT_UNALIGNED_ACCESS depending on architecture, for no ikconfig kernels.
v4: avoid checking CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS every call;move "include lzorle_decompress.h" to diskdump.c from def.h
v5: some modifications for code standardization and code redundancy; modified get_unaligned_le16() to handle big-endian data.
Patch:
==========
See attachment.
Thanks and regards,
Yulong
8 months, 1 week
Re: [PATCH v4 ] Adding the zram decompression algorithm "lzo-rle" to support kernel versions >= 5.1
by lijiang
On Fri, Apr 5, 2024 at 2:04 PM <devel-request(a)lists.crash-utility.osci.io>
wrote:
> Date: Fri, 5 Apr 2024 06:01:33 +0000
> From: HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>
> Subject: [Crash-utility] Re: [PATCH v4 ] Adding the zram decompression
> algorithm "lzo-rle" to support kernel versions >= 5.1
> To: Yulong TANG 汤玉龙 <yulong.tang(a)nio.com>,
> "devel(a)lists.crash-utility.osci.io"
> <devel(a)lists.crash-utility.osci.io>
> Message-ID: <ce7328ee-aab4-4b72-85d5-77e5ef3aff9a(a)nec.com>
> Content-Type: multipart/mixed;
> boundary="_002_ce7328eeaab44b7285d577e5ef3aff9aneccom_"
>
>
Hi, Yulong
I have one question about the following function:
+static uint16_t get_unaligned_le16(const void *p) {
+ uint16_t value;
+ memcpy(&value, p, sizeof(uint16_t));
+ return value;
+}
Is this for handling the data under a little-endian machine? It might have
a different result when calling the above function on a
big-endian/little-endian machine, is that expected behavior? Just confirm
with you.
Thanks
Lianbo
8 months, 1 week
Re: [PATCH] gdb: fix the "p" command incorrectly print the value of a global variable
by lijiang
On Wed, Mar 6, 2024 at 6:30 PM Daisuke Hatayama (Fujitsu) <
d.hatayama(a)fujitsu.com> wrote:
> Lianbo,
>
> Thank you for your work.
>
> > Some objects format may potentially support copy relocations, but
> > currently the maybe_copied is always initialized to 0 in the symbol().
> > And the type is 'mst_file_bss', not always the 'mst_bss' or 'mst_data'
> > in the lookup_minimal_symbol_linkage(). For example:
> >
> > (gdb) p *msymbol
> > $42 = {<general_symbol_info> = {m_name = 0x349812f "test_no_static",
> value = {ivalue = 8, block = 0x8,
> > bytes = 0x8 <error: Cannot access memory at address 0x8>, address
> = 8, common_block = 0x8, chain = 0x8}, language_specific = {
> > obstack = 0x0, demangled_name = 0x0}, m_language = language_auto,
> ada_mangled = 0, section = 20}, size = 4,
> > filename = 0x6db3440 "test_sanity.c", type = mst_file_bss,
> created_by_gdb = 0, target_flag_1 = 0, target_flag_2 = 0, has_size = 1,
> > maybe_copied = 0, name_set = 1, hash_next = 0x0, demangled_hash_next =
> 0x0}
>
> The current description lacks explanation of when this issue
> occurs. Please write that the issue occurs when the corresponding
> kernel is built with CONFIG_CALL_DEPTH_TRACKING=y.
>
>
Thank you for the comment, Hatayama.
I should describe more background on this issue in the patch log. The
current issue can be easily reproduced with the following kernel commit:
commit 80e4c1cd42fff110bfdae8fce7ac4f22465f9664 (HEAD)
Author: Thomas Gleixner <tglx(a)linutronix.de>
Date: Thu Sep 15 13:11:19 2022 +0200
x86/retbleed: Add X86_FEATURE_CALL_DEPTH
Intel SKL CPUs fall back to other predictors when the RSB underflows.
The
only microcode mitigation is IBRS which is insanely expensive. It comes
with performance drops of up to 30% depending on the workload.
A way less expensive, but nevertheless horrible mitigation is to track
the
call depth in software and overeagerly fill the RSB when returns
underflow
the software counter.
Provide a configuration symbol and a CPU misfeature bit.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20220915111147.056176424@infradead.org
After reverting the above commit, the current issue may disappear. And
originally I tried to find the clue how this kernel commit changes affected
the gdb, I have not found the clue for the time being. But later I noticed
that the gdb gets the correct offset address of a global variable
'test_no_static', which is an expected behavior from the gdb perspective
because of copy relocations, probably some object files potentially support
the copy relocations, just like this.
It would also be good to describe the fact that the issue occurs at
> least on RHEL9 kernel.
>
This is an upstream issue, I have reproduced it on the upstream kernel with
the above kernel commit changes.
>
> > This causes a problem that the 'p' command can not work well as
> > expected, and always gets an error:
> >
> > crash> mod -s test_sanity /home/test_sanity.ko
> > MODULE NAME BASE SIZE
> OBJECT FILE
> > ffffffffc1084040 test_sanity ffffffffc1082000 16384
> /home/test_sanity.ko
> > crash> p test_no_static
> > p: gdb request failed: p test_no_static
> > crash>
> >
> > With the patch:
> > crash> mod -s test_sanity /home/test_sanity.ko
> > MODULE NAME BASE SIZE
> OBJECT FILE
> > ffffffffc1084040 test_sanity ffffffffc1082000 16384
> /home/test_sanity.ko
> > crash> p test_no_static
> > test_no_static = $1 = 5
> > crash>
>
> It's correct that p command doesn't work as expected, but it doesn't
> always result in some error. This issue is failure of calculating
> relocated address of static symbols. If the calculated address happens
> to be the address where read can be successfull, it doesn't result in
> read error but outputs some bogus value.
>
It's true, but the bogus value is not an expected result because of an
incorrect address.
That is why the maybe_copied flag is initialized to 1, as I mentioned
above, some objfile may potentially support the copy relocations.
Thanks.
Lianbo
>
> To make this clear, I think it's better to set debug level 4 and to
> have p command output calculated virtual address as debug messages.
>
> For example:
>
> crash> sym -M | grep -E " test_no"
> ffffffffc0da7580 (B) test_no
> ffffffffc0da7584 (b) test_no_static
> crash> set debug 4
> debug: 4
> crash> p test_no
> p: per_cpu_symbol_search(test_no): NULL
> test_no = <readmem: ffffffffc0da7580, KVADDR, "gdb_readmem callback",
> 4, (ROE), 560d2d483400>
> <read_diskdump: addr: ffffffffc0da7580 paddr: 10b263580 cnt: 4>
> $3 = 5
> crash> p test_no_static
> p: per_cpu_symbol_search(test_no_static): NULL
> test_no_static = <readmem: ffffffffc0d9f004, KVADDR, "gdb_readmem
> callback", 4, (ROE), 560d2dc9b100>
> <read_diskdump: addr: ffffffffc0d9f004 paddr: 108bfc004 cnt: 4>
> $4 = -1869574000
>
>
>
8 months, 1 week