Re: [PATCH] arm64: fix a potential segfault in arm64_unwind_frame
by lijiang
On Tue, Jul 23, 2024 at 5:25 PM <devel-request(a)lists.crash-utility.osci.io>
wrote:
> Date: Tue, 23 Jul 2024 06:51:36 -0000
> From: qiwu.chen(a)transsion.com
> Subject: [Crash-utility] Re: [PATCH] arm64: fix a potential segfault
> in arm64_unwind_frame
> To: devel(a)lists.crash-utility.osci.io
> Message-ID: <20240723065136.28594.35349(a)lists.crash-utility.osci.io>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Lianbo,
> The vmcore for my case is not enabled KASAN.
> Thanks
>
>
Ok, thank you for trying this. Seems they are different issues.
Could you please post a v2(Improve it as we discussed)?
Thanks
Lianbo
4 months
Re: [PATCH] arm64: fix a potential segfault in arm64_unwind_frame
by Lianbo Jiang
On 7/18/24 6:21 PM, devel-request(a)lists.crash-utility.osci.io wrote:
> Date: Thu, 18 Jul 2024 07:26:02 -0000
> From:qiwu.chen@transsion.com
> Subject: [Crash-utility] Re: [PATCH] arm64: fix a potential segfault
> in arm64_unwind_frame
> To:devel@lists.crash-utility.osci.io
> Message-ID:<20240718072602.21739.62283(a)lists.crash-utility.osci.io>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Lianbo,
>
> 1. The current issue can be reproduced with arm64_unwind_frame_v2():
Thank you for the confirmation, qiwu.
If so, the same changes should be done in the arm64_unwind_frame_v2().
What do you think?
Thanks
Lianbo
> crash> bt
> [Detaching after fork from child process 4778]
>
> Thread 1 "crash" received signal SIGSEGV, Segmentation fault.
> 0x0000555555826dae in arm64_unwind_frame_v2 (bt=0x7fffffffd8f0,
> frame=0x7fffffffd060, ofp=0x555559909970) at arm64.c:3048
> 3048 frame->pc = GET_STACK_ULONG(fp + 8);
> (gdb) bt
> #0 0x0000555555826dae in arm64_unwind_frame_v2 (bt=0x7fffffffd8f0,
> frame=0x7fffffffd060, ofp=0x555559909970) at arm64.c:3048
> #1 0x0000555555827d99 in arm64_back_trace_cmd_v2 (bt=0x7fffffffd8f0) at
> arm64.c:3426
> #2 0x00005555557df95e in back_trace (bt=0x7fffffffd8f0) at
> kernel.c:3240
> #3 0x00005555557dd8b8 in cmd_bt () at kernel.c:2881
> #4 0x000055555573696b in exec_command () at main.c:893
> #5 0x000055555573673e in main_loop () at main.c:840
> #6 0x0000555555aa4a61 in captured_main (data=<optimized out>) at
> main.c:1284
> #7 gdb_main (args=<optimized out>) at main.c:1313
> #8 0x0000555555aa4ae0 in gdb_main_entry (argc=<optimized out>,
> argv=<optimized out>) at main.c:1338
> #9 0x00005555558021df in gdb_main_loop (argc=2, argv=0x7fffffffe248) at
> gdb_interface.c:81
> #10 0x0000555555736401 in main (argc=3, argv=0x7fffffffe248) at
> main.c:721
> (gdb) p/x *(struct arm64_stackframe *)0x7fffffffd060
> $1 = {fp = 0xffffffc008003f50, sp = 0xffffffc008003f40, pc =
> 0xffffffdfd669447c}
> (gdb) p/x *(struct bt_info *)0x7fffffffd8f0
> $2 = {task = 0xffffff8118012500, flags = 0x0, instptr =
> 0xffffffdfd669447c, stkptr = 0xffffffc008003f40, bptr = 0x0, stackbase =
> 0xffffffc01b5b0000, stacktop = 0xffffffc01b5b4000,
> stackbuf = 0x555556117a80, tc = 0x55557a3b3480, hp = 0x0, textlist =
> 0x0, ref = 0x0, frameptr = 0xffffffc008003f50, call_target = 0x0,
> machdep = 0x0, debug = 0x0, eframe_ip = 0x0, radix = 0x0,
> cpumask = 0x0}
>
> 2. The issue can be easily reproduced by "echo c >
> /proc/sysrq-trigger" on Andriod GKI-5.10 platform.
> >From the reproduced dump we can see, the current fp/sp of crashing
> cpu1 is out of range task's stack, but located in the irq stack of cpu0
> KERNEL: vmlinux [TAINTED]
> DUMPFILE: SYS_COREDUMP
> CPUS: 8 [OFFLINE: 7]
> MACHINE: aarch64 (unknown Mhz)
> MEMORY: 8 GB
> PANIC: "Kernel panic - not syncing: sysrq triggered crash"
> PID: 9089
> COMMAND: "sh"
> TASK: ffffff8118012500 [THREAD_INFO: ffffff8118012500]
> CPU: 1
> STATE: TASK_RUNNING (PANIC)
> crash> help -m |grep irq
> irq_stack_size: 16384
> irq_stacks[0]: ffffffc008000000
> irq_stacks[1]: ffffffc008008000
> irq_stacks[2]: ffffffc008010000
> irq_stacks[3]: ffffffc008018000
> irq_stacks[4]: ffffffc008020000
> irq_stacks[5]: ffffffc008028000
> irq_stacks[6]: ffffffc008030000
> irq_stacks[7]: ffffffc008038000
> crash> task_struct.thread -x ffffff8118012500
> thread = {
> cpu_context = {
> x19 = 0xffffff80c01ea500,
> x20 = 0xffffff8118012500,
> x21 = 0xffffff8118012500,
> x22 = 0xffffff80c01ea500,
> x23 = 0xffffff8118012500,
> x24 = 0xffffff81319ac270,
> x25 = 0xffffffdfd8f87000,
> x26 = 0xffffff8118012500,
> x27 = 0xffffffdfd88ea180,
> x28 = 0xffffffdfd7e1b4b8,
> fp = 0xffffffc01b5b3a10,
> sp = 0xffffffc01b5b3a10,
> pc = 0xffffffdfd667b89c
> },
> crash> bt -S 0xffffffc01b5b3a10
> PID: 9089 TASK: ffffff8118012500 CPU: 1 COMMAND: "sh"
> #0 [ffffffc008003f50] local_cpu_stop at ffffffdfd6694478
> crash> bt -S ffffffc008003f50
> PID: 9089 TASK: ffffff8118012500 CPU: 1 COMMAND: "sh"
> bt: non-process stack address for this task: ffffffc008003f50
> (valid range: ffffffc01b5b0000 - ffffffc01b5b4000)
>
> The second frame begins to switch to irq stack of cpu0.
> crash> rd ffffffc008003f50 2
> ffffffc008003f50: ffffffc008003f70 ffffffdfd68125d0 p?.......%......
> crash> dis -x ffffffdfd68125d0
> 0xffffffdfd68125d0 <handle_percpu_devid_fasteoi_ipi+0xb0>: b
> 0xffffffdfd6812730 <handle_percpu_devid_fasteoi_ipi+0x210>
> crash> rd ffffffc008003f70 2
> ffffffc008003f70: ffffffc008003fb0 ffffffdfd680352c .?......,5......
> crash> dis ffffffdfd680352c -x
> 0xffffffdfd680352c <__handle_domain_irq+0x114>: bl
> 0xffffffdfd673d204 <__irq_exit_rcu>
> crash> rd ffffffc008003fb0 2
> ffffffc008003fb0: ffffffc008003fe0 ffffffdfd6610380 .?........a.....
> crash> dis -x ffffffdfd6610380
> 0xffffffdfd6610380 <gic_handle_irq.30555+0x6c>: cbz w0,
> 0xffffffdfd6610348 <gic_handle_irq.30555+0x34>
> crash> rd ffffffc008003fe0 2
> ffffffc008003fe0: ffffffdfd8d83e20 ffffffdfd6612624
> >......$&a.....
> crash> dis -x ffffffdfd6612624
> 0xffffffdfd6612624 <el1_irq+0xe4>: mov sp, x19
> crash> rd ffffffdfd8d83e20 2
> ffffffdfd8d83e20: ffffffdfd8d83e80 ffffffdfd768c690
> .>........h.....
> crash> dis -x ffffffdfd768c690
> 0xffffffdfd768c690 <cpuidle_enter_state+0x3a4>: tbnz w19, #31,
> 0xffffffdfd768c720 <cpuidle_enter_state+0x434>
> crash> rd ffffffdfd8d83e80 2
> ffffffdfd8d83e80: ffffffdfd8d83ef0 ffffffdfd67ab4f4
> .>........z.....
> crash> dis -x ffffffdfd67ab4f4
> 0xffffffdfd67ab4f4 <do_idle+0x308>: str xzr, [x19, #8]
> crash> rd ffffffdfd8d83ef0 2
> ffffffdfd8d83ef0: ffffffdfd8d83f50 ffffffdfd67ab7e4 P?........z.....
> crash> dis -x ffffffdfd67ab7e4
> 0xffffffdfd67ab7e4 <cpu_startup_entry+0x84>: b
> 0xffffffdfd67ab7e0 <cpu_startup_entry+0x80>
>
> It's unreasonable cpu1 is in cpu0's irq context, which is far away from
> the backtrace showed by "bt -T", so we must avoid this case.
> crash> bt -T
> PID: 9089 TASK: ffffff8118012500 CPU: 1 COMMAND: "sh"
> [ffffffc01b5b3238] vsnprintf at ffffffdfd7075c10
> [ffffffc01b5b32b8] sprintf at ffffffdfd707b9e4
> [ffffffc01b5b3398] __sprint_symbol at ffffffdfd68abff4
> [ffffffc01b5b33c8] symbol_string at ffffffdfd70774ac
> [ffffffc01b5b33d8] symbol_string at ffffffdfd7077510
> [ffffffc01b5b34c8] string at ffffffdfd70767a8
> [ffffffc01b5b34d8] vsnprintf at ffffffdfd7075c2c
> [ffffffc01b5b34e8] vsnprintf at ffffffdfd7075fdc
> [ffffffc01b5b3518] vscnprintf at ffffffdfd707b8b4
> [ffffffc01b5b3558] ktime_get_ts64 at ffffffdfd686a2f8
> [ffffffc01b5b3598] data_alloc at ffffffdfd68009b4
> [ffffffc01b5b35d8] prb_reserve at ffffffdfd68011b4
> [ffffffc01b5b35e8] prb_reserve at ffffffdfd68010a0
> [ffffffc01b5b3648] log_store at ffffffdfd67fb024
> [ffffffc01b5b3698] number at ffffffdfd7076ea4
> [ffffffc01b5b36d8] number at ffffffdfd7076ea4
> [ffffffc01b5b3738] vsnprintf at ffffffdfd7075c10
> [ffffffc01b5b3778] number at ffffffdfd7076ea4
> [ffffffc01b5b37c8] number at ffffffdfd7076ea4
> [ffffffc01b5b3828] vsnprintf at ffffffdfd7075c10
> [ffffffc01b5b3868] vsnprintf at ffffffdfd7075c2c
> [ffffffc01b5b3888] number at ffffffdfd7076ea4
> [ffffffc01b5b38e8] vsnprintf at ffffffdfd7075c10
> [ffffffc01b5b3928] vsnprintf at ffffffdfd7075c2c
> [ffffffc01b5b3968] aee_nested_printf at ffffffdfd3d05d7c [mrdump]
> [ffffffc01b5b3a48] mrdump_common_die at ffffffdfd3d05a98 [mrdump]
> [ffffffc01b5b3ac8] ipanic at ffffffdfd3d06078 [mrdump]
> [ffffffc01b5b3ae8] __typeid__ZTSFiP14notifier_blockmPvE_global_addr at
> ffffffdfd7e3c118
> [ffffffc01b5b3af0] ipanic.cfi_jt at ffffffdfd3d0ab40 [mrdump]
> [ffffffc01b5b3b18] atomic_notifier_call_chain at ffffffdfd678236c
> [ffffffc01b5b3b28] panic at ffffffdfd672aa28
> [ffffffc01b5b3bf8] rcu_read_unlock.34874 at ffffffdfd718ddfc
> [ffffffc01b5b3c58] __handle_sysrq at ffffffdfd718d3c0
> [ffffffc01b5b3c68] write_sysrq_trigger at ffffffdfd718ea20
> [ffffffc01b5b3cb0] __typeid__ZTSFlP4filePKcmPxE_global_addr at
> ffffffdfd7e2fa88
> [ffffffc01b5b3cc8] proc_reg_write at ffffffdfd6c56f44
> [ffffffc01b5b3cd8] file_start_write at ffffffdfd6b3f884
> [ffffffc01b5b3d08] vfs_write at ffffffdfd6b403b4
> [ffffffc01b5b3da8] ksys_write at ffffffdfd6b40240
> [ffffffc01b5b3df8] __arm64_sys_write at ffffffdfd6b401b4
> [ffffffc01b5b3e20] __typeid__ZTSFlPK7pt_regsE_global_addr at
> ffffffdfd7e25240
> [ffffffc01b5b3e38] el0_svc_common at ffffffdfd6695980
> [ffffffc01b5b3e48] el0_da at ffffffdfd7d43e30
> [ffffffc01b5b3e58] el0_svc at ffffffdfd7d43d9c
> [ffffffc01b5b3e98] el0_sync_handler at ffffffdfd7d43d10
> [ffffffc01b5b3ea8] el0_sync at ffffffdfd66128b8
>
> Thanks
4 months
Re: [PATCH] arm64: fix a potential segfault in arm64_unwind_frame
by lijiang
Hi, qiwu
Thank you for the update.
On Mon, Jul 15, 2024 at 11:52 AM <devel-request(a)lists.crash-utility.osci.io>
wrote:
> Date: Sun, 14 Jul 2024 11:38:27 -0000
> From: qiwu.chen(a)transsion.com
> Subject: [Crash-utility] Re: [PATCH] arm64: fix a potential segfault
> in arm64_unwind_frame
> To: devel(a)lists.crash-utility.osci.io
> Message-ID: <20240714113827.21739.63969(a)lists.crash-utility.osci.io>
> Content-Type: text/plain; charset="utf-8"
>
> Sorry, the patch in previous mail. is mistake Please help review the below
> patch which is test fine:
> diff --git a/arm64.c b/arm64.c
> index b3040d7..b992c01 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -2814,7 +2814,7 @@ arm64_unwind_frame(struct bt_info *bt, struct
> arm64_stackframe
> *frame)
> low = frame->sp;
> high = (low + stack_mask) & ~(stack_mask);
>
> - if (fp < low || fp > high || fp & 0xf)
> + if (fp < low || fp > high || fp & 0xf || !INSTACK(fp, bt))
> return FALSE;
>
>
I saw the similar code is in the arm64_unwind_frame_v2(), can you help to
check if the current issue can be reproduced with bt -o/-O(although the
-o/-O may be used in some old vmores)? Or we need to do the same change in
the arm64_unwind_frame_v2().
BTW: I can not reproduce the current issue, can you help share how to
reproduce this one(if possible)?
Thanks
Lianbo
frame->sp = fp + 0x10;
>
> Thanks
>
>
4 months, 1 week
Re: [PATCH] Fix "irq -a" exceeding the memory range issue
by Lianbo Jiang
Hi, Tao
On 7/5/24 9:26 AM, devel-request(a)lists.crash-utility.osci.io wrote:
> Date: Thu, 4 Jul 2024 17:00:56 +1200
> From: Tao Liu<ltao(a)redhat.com>
> Subject: [Crash-utility] [PATCH] Fix "irq -a" exceeding the memory
> range issue
> To:devel@lists.crash-utility.osci.io
> Cc: Tao Liu<ltao(a)redhat.com>
> Message-ID:<20240704050056.17375-1-ltao(a)redhat.com>
> Content-Type: text/plain; charset="US-ASCII"; x-default=true
>
> Previously without the patch, there was an error observed as follows:
>
> crash> irq -a
> IRQ NAME AFFINITY
> 0 timer 0-191
> 4 ttyS0 0-23,96-119
> ...
> 84 smartpqi 72-73,168
> irq: page excluded: kernel virtual address: ffff97d03ffff000 type: "irq_desc affinity"
>
> The reason is the reading of irq affinity exceeded the memory range, see
> the following debug info:
>
> Thread 1 "crash" hit Breakpoint 1, generic_get_irq_affinity (irq=85) at kernel.c:7373
> 7375 irq_desc_addr = get_irq_desc_addr(irq);
> (gdb) p/x irq_desc_addr
> $1 = 0xffff97d03f21e800
>
> crash> struct irq_desc 0xffff97d03f21e800
> struct irq_desc {
> irq_common_data = {
> state_use_accessors = 425755136,
> node = 3,
> handler_data = 0x0,
> msi_desc = 0xffff97ca51b83480,
> affinity = 0xffff97d03fffee60,
> effective_affinity = 0xffff97d03fffe6c0
> },
>
> crash> whatis cpumask_t
> typedef struct cpumask {
> unsigned long bits[128];
> } cpumask_t;
> SIZE: 1024
>
> In order to get the affinity, crash will read the memory range 0xffff97d03fffee60
> ~ 0xffff97d03fffee60 + 1024(0x400) by line:
>
> readmem(affinity_ptr, KVADDR, affinity, len,
> "irq_desc affinity", FAULT_ON_ERROR);
>
> However the reading will exceed the effective memory range:
>
> crash> kmem 0xffff97d03fffee60
> CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
> ffff97c900044400 32 123297 162944 1273 4k kmalloc-32
> SLAB MEMORY NODE TOTAL ALLOCATED FREE
> fffffca460ffff80 ffff97d03fffe000 3 128 81 47
> FREE / [ALLOCATED]
> [ffff97d03fffee60]
>
> PAGE PHYSICAL MAPPING INDEX CNT FLAGS
> fffffca460ffff80 83fffe000 dead000000000001 ffff97d03fffe340 1 d7ffffe0000800 slab
>
> crash> kmem ffff97d03ffff000
> PAGE PHYSICAL MAPPING INDEX CNT FLAGS
> fffffca460ffffc0 83ffff000 0 0 1 d7ffffe0004000 reserved
>
> crash> dmesg
> ...
> [ 0.000000] BIOS-e820: [mem 0x00000000fe000000-0x00000000fe00ffff] reserved
> [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000083fffefff] usable
> [ 0.000000] BIOS-e820: [mem 0x000000083ffff000-0x000000083fffffff] reserved
> ...
>
> The beginning physical address, aka 0x83fffe000, is located in the usable
> area and is readable, however the later physical address, starting from
> 0x83ffff000, is located in reserved region and not readable. In fact,
> the affinity member is allocated by alloc_cpumask_var_node(), for the 192 CPUs
> system, the allocated size is only 24, and we can see it is within
> the kmalloc-32 slab. So it is incorrect to read 1024 length(given by
> STRUCT_SIZE("cpumask_t")), only 24 is enough.
>
> Since there are plenty of places in crash which takes the value of
> STRUCT_SIZE("cpumask_t"), and works fine for the past, this patch will
> not modify them all, but only this place which encountered the issue.
>
> Signed-off-by: Tao Liu<ltao(a)redhat.com>
> ---
> kernel.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/kernel.c b/kernel.c
> index 8a9d498..464e877 100644
> --- a/kernel.c
> +++ b/kernel.c
> @@ -7362,7 +7362,7 @@ void
> generic_get_irq_affinity(int irq)
> {
> ulong irq_desc_addr;
> - long len;
> + long len, len_cpumask;
> ulong affinity_ptr;
> ulong *affinity;
> ulong tmp_addr;
> @@ -7382,8 +7382,11 @@ generic_get_irq_affinity(int irq)
> if (!action)
> return;
>
> - if ((len = STRUCT_SIZE("cpumask_t")) < 0)
> - len = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * sizeof(ulong);
> + len = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * sizeof(ulong);
> + len_cpumask = STRUCT_SIZE("cpumask_t");
> + if (len_cpumask > 0) {
> + len = len_cpumask > len ? len : len_cpumask;
> + }
>
This change looks good, but I still have two comments below:
[1] Can we drop the evaluation of "STRUCT_SIZE("cpumask_t")" and just
use the size of "DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * sizeof(ulong)"
? Are there any regression issues?
[2] There are the similar case in the get_cpumask_buf(), see tools.c,
can you make the same change?
ulong *
get_cpumask_buf(void)
{
int cpulen;
if ((cpulen = STRUCT_SIZE("cpumask_t")) < 0)
cpulen = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) *
sizeof(ulong);
return (ulong *)GETBUF(cpulen);
}
Any thoughts?
Thanks
Lianbo
> affinity = (ulong *)GETBUF(len);
> if (VALID_MEMBER(irq_common_data_affinity))
> -- 2.40.1
4 months, 1 week
[PATCH v2] Fix "irq -a" exceeding the memory range issue
by Tao Liu
Previously without the patch, there was an error observed as follows:
crash> irq -a
IRQ NAME AFFINITY
0 timer 0-191
4 ttyS0 0-23,96-119
...
84 smartpqi 72-73,168
irq: page excluded: kernel virtual address: ffff97d03ffff000 type: "irq_desc affinity"
The reason is the reading of irq affinity exceeded the memory range, see
the following debug info:
Thread 1 "crash" hit Breakpoint 1, generic_get_irq_affinity (irq=85) at kernel.c:7373
7375 irq_desc_addr = get_irq_desc_addr(irq);
(gdb) p/x irq_desc_addr
$1 = 0xffff97d03f21e800
crash> struct irq_desc 0xffff97d03f21e800
struct irq_desc {
irq_common_data = {
state_use_accessors = 425755136,
node = 3,
handler_data = 0x0,
msi_desc = 0xffff97ca51b83480,
affinity = 0xffff97d03fffee60,
effective_affinity = 0xffff97d03fffe6c0
},
crash> whatis cpumask_t
typedef struct cpumask {
unsigned long bits[128];
} cpumask_t;
SIZE: 1024
In order to get the affinity, crash will read the memory range 0xffff97d03fffee60
~ 0xffff97d03fffee60 + 1024(0x400) by line:
readmem(affinity_ptr, KVADDR, affinity, len,
"irq_desc affinity", FAULT_ON_ERROR);
However the reading will exceed the effective memory range:
crash> kmem 0xffff97d03fffee60
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
ffff97c900044400 32 123297 162944 1273 4k kmalloc-32
SLAB MEMORY NODE TOTAL ALLOCATED FREE
fffffca460ffff80 ffff97d03fffe000 3 128 81 47
FREE / [ALLOCATED]
[ffff97d03fffee60]
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffffca460ffff80 83fffe000 dead000000000001 ffff97d03fffe340 1 d7ffffe0000800 slab
crash> kmem ffff97d03ffff000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffffca460ffffc0 83ffff000 0 0 1 d7ffffe0004000 reserved
crash> dmesg
...
[ 0.000000] BIOS-e820: [mem 0x00000000fe000000-0x00000000fe00ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000083fffefff] usable
[ 0.000000] BIOS-e820: [mem 0x000000083ffff000-0x000000083fffffff] reserved
...
The beginning physical address, aka 0x83fffe000, is located in the usable
area and is readable, however the later physical address, starting from
0x83ffff000, is located in reserved region and not readable. In fact,
the affinity member is allocated by alloc_cpumask_var_node(), for the 192 CPUs
system, the allocated size is only 24, and we can see it is within
the kmalloc-32 slab. So it is incorrect to read 1024 length(given by
STRUCT_SIZE("cpumask_t")), only 24 is enough.
Since there are plenty of places in crash which takes the value of
STRUCT_SIZE("cpumask_t"), and works fine for the past, this patch will
not modify them all, only the one which encountered this issue(hunk in
kernel.c), and the one with the same DIV_ROUND_UP() (hunk in tools.c).
Signed-off-by: Tao Liu <ltao(a)redhat.com>
---
v1 -> v2: modify the same hunk in tools.c
---
kernel.c | 9 ++++++---
tools.c | 11 ++++++++---
2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/kernel.c b/kernel.c
index 8a9d498..464e877 100644
--- a/kernel.c
+++ b/kernel.c
@@ -7362,7 +7362,7 @@ void
generic_get_irq_affinity(int irq)
{
ulong irq_desc_addr;
- long len;
+ long len, len_cpumask;
ulong affinity_ptr;
ulong *affinity;
ulong tmp_addr;
@@ -7382,8 +7382,11 @@ generic_get_irq_affinity(int irq)
if (!action)
return;
- if ((len = STRUCT_SIZE("cpumask_t")) < 0)
- len = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * sizeof(ulong);
+ len = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * sizeof(ulong);
+ len_cpumask = STRUCT_SIZE("cpumask_t");
+ if (len_cpumask > 0) {
+ len = len_cpumask > len ? len : len_cpumask;
+ }
affinity = (ulong *)GETBUF(len);
if (VALID_MEMBER(irq_common_data_affinity))
diff --git a/tools.c b/tools.c
index 1022d57..fab0f83 100644
--- a/tools.c
+++ b/tools.c
@@ -6718,9 +6718,14 @@ swap64(uint64_t val, int swap)
ulong *
get_cpumask_buf(void)
{
- int cpulen;
- if ((cpulen = STRUCT_SIZE("cpumask_t")) < 0)
- cpulen = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * sizeof(ulong);
+ int cpulen, len_cpumask;
+
+ cpulen = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) * sizeof(ulong);
+ len_cpumask = STRUCT_SIZE("cpumask_t");
+ if (len_cpumask > 0) {
+ cpulen = len_cpumask > cpulen ? cpulen : len_cpumask;
+ }
+
return (ulong *)GETBUF(cpulen);
}
--
2.40.1
4 months, 1 week
[PATCH] arm64: fix a potential segfault in arm64_unwind_frame
by qiwu.chen@transsion.com
The range of frame->fp is checked insufficiently, which may lead to a wrong
next fp. As a result, bt->stackbuf will be accessed out of range, and segfault.
crash> bt
[Detaching after fork from child process 11409]
PID: 7661 TASK: ffffff81858aa500 CPU: 4 COMMAND: "sh"
#0 [ffffffc008003f50] local_cpu_stop at ffffffdd7669444c
Thread 1 "crash" received signal SIGSEGV, Segmentation fault.
0x00005555558266cc in arm64_unwind_frame (bt=0x7fffffffd8f0, frame=0x7fffffffd080) at arm64.c:2821
2821 frame->fp = GET_STACK_ULONG(fp);
(gdb) bt
#0 0x00005555558266cc in arm64_unwind_frame (bt=0x7fffffffd8f0, frame=0x7fffffffd080) at arm64.c:2821
#1 0x0000555555827527 in arm64_back_trace_cmd (bt=0x7fffffffd8f0) at arm64.c:3306
#2 0x00005555557df7ee in back_trace (bt=0x7fffffffd8f0) at kernel.c:3240
#3 0x00005555557dd748 in cmd_bt () at kernel.c:2881
#4 0x00005555557367fb in exec_command () at main.c:893
#5 0x00005555557365ce in main_loop () at main.c:840
#6 0x0000555555aa4801 in captured_main (data=<optimized out>) at main.c:1284
#7 gdb_main (args=<optimized out>) at main.c:1313
#8 0x0000555555aa4880 in gdb_main_entry (argc=<optimized out>, argv=<optimized out>) at main.c:1338
#9 0x000055555580206f in gdb_main_loop (argc=2, argv=0x7fffffffe248) at gdb_interface.c:81
#10 0x0000555555736291 in main (argc=3, argv=0x7fffffffe248) at main.c:721
(gdb) p /x *(struct bt_info*) 0x7fffffffd8f0
$3 = {task = 0xffffff81858aa500, flags = 0x0, instptr = 0xffffffdd76694450, stkptr = 0xffffffc008003f40, bptr = 0x0, stackbase = 0xffffffc027288000,
stacktop = 0xffffffc02728c000, stackbuf = 0x555556115a40, tc = 0x55559d16fdc0, hp = 0x0, textlist = 0x0, ref = 0x0, frameptr = 0xffffffc008003f50,
call_target = 0x0, machdep = 0x0, debug = 0x0, eframe_ip = 0x0, radix = 0x0, cpumask = 0x0}
(gdb) p /x *(struct arm64_stackframe*) 0x7fffffffd080
$4 = {fp = 0xffffffc008003f50, sp = 0xffffffc008003f60, pc = 0xffffffdd76694450}
crash> bt -S 0xffffffc008003f50
PID: 7661 TASK: ffffff81858aa500 CPU: 4 COMMAND: "sh"
bt: non-process stack address for this task: ffffffc008003f50
(valid range: ffffffc027288000 - ffffffc02728c000)
Check frame->fp value sufficiently before access it . Only frame->fp within
the range of bt->stackbase and bt->stacktop will be regarded as valid.
Signed-off-by: qiwu.chen <qiwu.chen(a)transsion.com>
---
arm64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arm64.c b/arm64.c
index b3040d7..b992c01 100644
--- a/arm64.c
+++ b/arm64.c
@@ -2814,7 +2814,7 @@ arm64_unwind_frame(struct bt_info *bt, struct arm64_stackframe *frame)
low = frame->sp;
high = (low + stack_mask) & ~(stack_mask);
- if (fp < low || fp > high || fp & 0xf)
+ if (fp < low || fp > high || fp & 0xf || !arm64_is_kernel_exception_frame(bt, fp))
return FALSE;
frame->sp = fp + 0x10;
--
2.25.1
4 months, 2 weeks
Re: arm64 16KB page size support feasibility?
by Lianbo Jiang
Hi, jarvis
On 7/11/24 6:42 AM, devel-request(a)lists.crash-utility.osci.io wrote:
> Date: Mon, 08 Jul 2024 12:25:43 -0000
> From:jarvis0922@gmail.com
> Subject: [Crash-utility] arm64 16KB page size support feasibility?
> To:devel@lists.crash-utility.osci.io
> Message-ID:<20240708122543.30295.97706(a)lists.crash-utility.osci.io>
> Content-Type: text/plain; charset="utf-8"
>
> Hi crash utility developers,
>
> It seems like aosp starts to support 16KB page size for arm64
> https://source.android.com/docs/core/architecture/16kb-page-size/16kb-dev...
>
> Buthttps://github.com/crash-utility/crash/blob/master/arm64.c#L265 has been there for 5+ years.
> Is there any plan for crash utility to support it?
Thanks for your attention.
For the time being, I have no any plans to support it.
However, if you are interested, and would like to post a patch, that is
welcome. And we can help to review.
Thanks
Lianbo
> Thanks.
4 months, 2 weeks
arm64: Fix bt command show wrong stacktrace on ramdump source
by cb126yx
Hi all,
I may found a potential bug when using qcom arm64 ramdump to parse backtrace.
Unfortunately, I actually found no processes can use the bt command correctly.
Ex: when start crash tool to do analyse: # crash vmlinux --kaslr=xxx DDRCS0_0.BIN@0x0000000080000000,... --machdep vabits_actual=39
Then seen below misleading backtrace information :
crash> bt 16930
PID: 16930 TASK: ffffff89b3eada00 CPU: 2 COMMAND: "Firebase Backgr"
#0 [ffffffc034c437f0] __switch_to at ffffffe0036832d4
#1 [ffffffc034c43850] __kvm_nvhe_$d.2314 at 6be732e004cf05a0
#2 [ffffffc034c438b0] __kvm_nvhe_$d.2314 at 86c54c6004ceff80
#3 [ffffffc034c43950] __kvm_nvhe_$d.2314 at 55d6f96003a7b120
#4 [ffffffc034c439f0] __kvm_nvhe_$d.2314 at 9ccec46003a80a64
#5 [ffffffc034c43ac0] __kvm_nvhe_$d.2314 at 8cf41e6003a945c4
#6 [ffffffc034c43b10] __kvm_nvhe_$d.2314 at a8f181e00372c818
#7 [ffffffc034c43b40] __kvm_nvhe_$d.2314 at 6dedde600372c0d0
#8 [ffffffc034c43b90] __kvm_nvhe_$d.2314 at 62cc07e00373d0ac
#9 [ffffffc034c43c00] __kvm_nvhe_$d.2314 at 72fb1de00373bedc
...
PC: 00000073f5294840 LR: 00000070d8f39ba4 SP: 00000070d4afd5d0
X29: 00000070d4afd600 X28: b4000071efcda7f0 X27: 00000070d4afe000
X26: 0000000000000000 X25: 00000070d9616000 X24: 0000000000000000
X23: 0000000000000000 X22: 0000000000000000 X21: 0000000000000000
X20: b40000728fd27520 X19: b40000728fd27550 X18: 000000702daba000
X17: 00000073f5294820 X16: 00000070d940f9d8 X15: 00000000000000bf
X14: 0000000000000000 X13: 00000070d8ad2fac X12: b40000718fce5040
X11: 0000000000000000 X10: 0000000000000070 X9: 0000000000000001
X8: 0000000000000062 X7: 0000000000000020 X6: 0000000000000000
X5: 0000000000000000 X4: 0000000000000000 X3: 0000000000000000
X2: 0000000000000002 X1: 0000000000000080 X0: b40000728fd27550
ORIG_X0: b40000728fd27550 SYSCALLNO: ffffffff PSTATE: 40001000
By checking the raw data below, will see the lr (fp+8) data show the pointer which already been replaced by PAC prefix.
crash> bt -f
PID: 16930 TASK: ffffff89b3eada00 CPU: 2 COMMAND: "Firebase Backgr"
#0 [ffffffc034c437f0] __switch_to at ffffffe0036832d4
ffffffc034c437f0: ffffffc034c43850 6be732e004cf05a4
ffffffc034c43800: ffffffe006186108 a0ed07e004cf09c4
ffffffc034c43810: ffffff8a1a340000 ffffff8a8d343c00
ffffffc034c43820: ffffff89b3eada00 ffffff8b780db540
ffffffc034c43830: ffffff89b3eada00 0000000000000000
ffffffc034c43840: 0000000000000004 712b828118484a00
#1 [ffffffc034c43850] __kvm_nvhe_$d.2314 at 6be732e004cf05a0
ffffffc034c43850: ffffffc034c438b0 86c54c6004ceff84
ffffffc034c43860: 000000708070f000 ffffffc034c43938
ffffffc034c43870: ffffff88bd822878 ffffff89b3eada00
...
So we check the CONFIG_ARM64_PTR_AUTH and CONFIG_ARM64_PTR_AUTH_KERNEL to double check if pac mechanism been enabled on this ramdump.
Then we use vabits to figure it out.
Fix then show the right backtrace below:
crash> bt 16930
PID: 16930 TASK: ffffff89b3eada00 CPU: 2 COMMAND: "Firebase Backgr"
#0 [ffffffc034c437f0] __switch_to at ffffffe0036832d4
#1 [ffffffc034c43850] __schedule at ffffffe004cf05a0
#2 [ffffffc034c438b0] preempt_schedule_common at ffffffe004ceff80
#3 [ffffffc034c43950] unmap_page_range at ffffffe003a7b120
#4 [ffffffc034c439f0] unmap_vmas at ffffffe003a80a64
#5 [ffffffc034c43ac0] exit_mmap at ffffffe003a945c4
#6 [ffffffc034c43b10] __mmput at ffffffe00372c818
#7 [ffffffc034c43b40] mmput at ffffffe00372c0d0
#8 [ffffffc034c43b90] exit_mm at ffffffe00373d0ac
#9 [ffffffc034c43c00] do_exit at ffffffe00373bedc
PC: 00000073f5294840 LR: 00000070d8f39ba4 SP: 00000070d4afd5d0
X29: 00000070d4afd600 X28: b4000071efcda7f0 X27: 00000070d4afe000
X26: 0000000000000000 X25: 00000070d9616000 X24: 0000000000000000
X23: 0000000000000000 X22: 0000000000000000 X21: 0000000000000000
X20: b40000728fd27520 X19: b40000728fd27550 X18: 000000702daba000
X17: 00000073f5294820 X16: 00000070d940f9d8 X15: 00000000000000bf
X14: 0000000000000000 X13: 00000070d8ad2fac X12: b40000718fce5040
X11: 0000000000000000 X10: 0000000000000070 X9: 0000000000000001
X8: 0000000000000062 X7: 0000000000000020 X6: 0000000000000000
X5: 0000000000000000 X4: 0000000000000000 X3: 0000000000000000
X2: 0000000000000002 X1: 0000000000000080 X0: b40000728fd27550
ORIG_X0: b40000728fd27550 SYSCALLNO: ffffffff PSTATE: 40001000
Let's use GENMASK to replace the pac pointer to fix it.
gki related commit url here:
https://lore.kernel.org/all/20230412160134.306148-4-mark.rutland@arm.com/
4 months, 2 weeks
[PATCH] Fix a "Bus error" issue caused by 'crash --osrelease' or crash loading
by Lianbo Jiang
Sometimes, in a production environment, there are still some vmcores
that are incomplete, such as partial header or the data is corrupted.
When crash tool attempts to parse such vmcores, it may fail as below:
$ ./crash --osrelease vmcore
Bus error (core dumped)
or
$ crash vmlinux vmcore
...
Bus error (core dumped)
$
The gdb sees that crash tool reads out a null bitmap from the header in
this vmcore, when executing memcpy(), emits a SIGBUS error as below:
$ gdb /home/lijiang/src/crash/crash /tmp/core.126301
Core was generated by `./crash --osrelease /home/lijiang/src/39317/vmcore'.
Program terminated with signal SIGBUS, Bus error.
#0 __memcpy_evex_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
831 LOAD_ONE_SET((%rsi), PAGE_SIZE, %VMM(4), %VMM(5), %VMM(6), %VMM(7))
(gdb) bt
#0 __memcpy_evex_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
#1 0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f "/home/lijiang/src/39317/vmcore") at diskdump.c:820
#2 0x0000000000651cf3 in is_diskdump (file=0x7ffc59ddff5f "/home/lijiang/src/39317/vmcore") at diskdump.c:1042
#3 0x0000000000502ac9 in get_osrelease (dumpfile=0x7ffc59ddff5f "/home/lijiang/src/39317/vmcore") at main.c:1938
#4 0x00000000004fb2e8 in main (argc=3, argv=0x7ffc59dde3a8) at main.c:271
(gdb) frame 1
#1 0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f "/home/lijiang/src/39317/vmcore") at diskdump.c:820
820 memcpy(dd->dumpable_bitmap, dd->bitmap + bitmap_len/2,
(gdb) p dd->dumpable_bitmap
$1 = 0x7f8e89800010 ""
(gdb) p dd->bitmap
$2 = 0x7f8e87e09000 ""
(gdb) p dd->bitmap + bitmap_len/2
$3 = 0x7f8e88a17000 ""
(gdb) p *(char*)(dd->bitmap+bitmap_len/2)
$4 = 0 '\000'
(gdb) p bitmap_len/2
$5 = 12640256
Let's add a sanity check for such cases to avoid causing a SIGBUS error.
With the patch:
$ crash -s vmlinux vmcore
crash: vmcore: not a supported file format
...
Enter "crash -h" for details.
$ crash --osrelease vmcore
unknown
Reported-by: Buland Kumar Singh <bsingh(a)redhat.com>
Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com>
---
diskdump.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/diskdump.c b/diskdump.c
index 1f7118cacfc6..c31eab01aa05 100644
--- a/diskdump.c
+++ b/diskdump.c
@@ -814,10 +814,12 @@ restart:
madvise(dd->bitmap, bitmap_len, MADV_WILLNEED);
}
- if (dump_is_partial(header))
+ if (dump_is_partial(header)) {
+ if (*(char*)(dd->bitmap + bitmap_len/2) == '\0')
+ goto err;
memcpy(dd->dumpable_bitmap, dd->bitmap + bitmap_len/2,
bitmap_len/2);
- else
+ } else
memcpy(dd->dumpable_bitmap, dd->bitmap, bitmap_len);
dd->data_offset
--
2.45.1
4 months, 3 weeks