On Mon, Apr 17, 2017 at 09:05:12AM -0400, Dave Anderson wrote:
>
>
> ----- Original Message -----
> > Hi All,
> >
> > I try to use `bt -a' for arm64 platform, then Segmentation fault
> > happened. My crash is built from source code hosted on github. And my
> > kernel version is 4.4.35.
>
> I note your reference to github, but what version of crash are you
> using?
> The only thing that comes to mind is this fix that went into
> crash-7.1.8:
>
> - Fix for the ARM64 "bt" command in Linux 4.10 and later kernels that
> are configured with CONFIG_THREAD_INFO_IN_TASK. Without the patch,
> the "bt" command will fail for active tasks in dumpfiles that were
> generated by the kdump facility.
> (takahiro.akashi(a)linaro.org)
>
> But since you are using kernel version 4.4.35, that is presumably not
> the problem.
Thank you for rapid response.
I'm using the most current code, which already contains this patch.
Its version is 7.1.8++.
>
> > I tried to use gdb to examine this problem, Some information is shown
> > as below:
> >
> > (gdb) bt
> > #0 arm64_is_kernel_exception_frame (bt=bt@entry=0x7ffeba6577e0,
> > stkptr=stkptr@entry=18446743803091823872) at arm64.c:1504
> > #1 0x00000000004fbda8 in arm64_back_trace_cmd (bt=0x7ffeba6577e0) at
> arm64.c:2259
> > #2 0x00000000004d415c in back_trace (bt=bt@entry=0x7ffeba6577e0) at
> kernel.c:3063
> > #3 0x00000000004dee87 in cmd_bt () at kernel.c:2701
> > [...]
> > (gdb) p/x stkptr
> > $14 = 0xffffffc0fded2d00
> > (gdb) p/x bt->stackbase
> > $15 = 0xffffff8008dcc000
> >
> > As it is, (stkptr - bt->stackbase) is too large. It lead
> > bt->stackbuf[(ulong)(STACK_OFFSET_TYPE(stkptr))] index out of bounds.
> > This stack belongs to swapper/0. I'm not sure whether it is a BUG.
> > Could anyone give me some advise to solve this problem? Thank you!
>
> The closest sample arm64 kernel I have available is 4.5-based, and
> looking
> at the kernel virtual address space, both the stkptr and stackbase
> values
> above are out of range, so I'm not sure what's going on:
>
> crash> mach
> MACHINE TYPE: aarch64
> MEMORY SIZE: 16 GB
> CPUS: 1
> HZ: 1000
> PAGE SIZE: 65536
> KERNEL VIRTUAL BASE: ffff800000000000
> KERNEL VMALLOC BASE: ffff000000000000
> KERNEL MODULES BASE: ffff7ffffc000000
> KERNEL VMEMMAP BASE: ffff7fbfe0000000
> KERNEL STACK SIZE: 16384
> IRQ STACK SIZE: 16384
> IRQ STACKS:
> CPU 0: ffff8003ffe30020
> CPU 1: ffff8003ffe60020
> CPU 2: ffff8003ffe90020
> CPU 3: ffff8003ffec0020
> CPU 4: ffff8003ffef0020
> CPU 5: ffff8003fff20020
> CPU 6: ffff8003fff50020
> CPU 7: ffff8003fff80020
> crash>
I'm afraid I don't get you. Did you mean you cannot reproduce this
phenomenon? Because the index is (stkptr - bt->stackbase). I think it
should be OK if they are in the same range(both larger than
PAGE_OFFSET of both smaller than PAGE_OFFSET). For my further inspect,
Only bt 0 will crash. bt other thread is OK. I guess maybe swapper
also should use stack address beyond PAGE_OFFSET (for my board, it's
0xffffffc000000000).
No I cannot reproduce it, but I don't have a 4.4-based dumpfile.
On my 4.5-based kernel, the thread_info pointers of each thread
of pid 0, which are their stackbase addresses, are these:
crash> for 0 set | grep THREAD_INFO
TASK: ffff800000c32100 (1 of 8) [THREAD_INFO: ffff800000bf0000]
TASK: ffff8003dc0c9100 (1 of 8) [THREAD_INFO: ffff8003dc128000]
TASK: ffff8003dc0c9f80 (1 of 8) [THREAD_INFO: ffff8003dc12c000]
TASK: ffff8003dc0cae00 (1 of 8) [THREAD_INFO: ffff8003dc130000]
TASK: ffff8003dc0cbc80 (1 of 8) [THREAD_INFO: ffff8003dc134000]
TASK: ffff8003dc0ccb00 (1 of 8) [THREAD_INFO: ffff8003dc138000]
TASK: ffff8003dc0cd980 (1 of 8) [THREAD_INFO: ffff8003dc13c000]
TASK: ffff8003dc0ce800 (1 of 8) [THREAD_INFO: ffff8003dc140000]
crash>
The stackbase of cpu 0 points to the static "init_thread_union"
structure, and the other 7 cpus are kmalloc'd. They all are in
unity-mapped virtual memory (starting at ffff800000000000), which
is based upon a VA_BITS value of 48). And all of the IRQ stacks are
in that memory segment.
So does your bt->stackbase of 0xffffff8008dcc000 reflect the
address of "init_thread_union"?. (presuming it is pid 0 on cpu 0)
And does your stkptr value of 0xffffffc0fded2d00 point into the
cpu's IRQ stack? If not, it should point into the regular stack
for pid 0 on that cpu.
So anyway, it looks like the "stackframe.fp" value used at line 2259
of arm64.c is invalid for use as the stack pointer. But without
the dumpfile in front of me, I can't figure it out. If you can
make the vmlinux/vmcore pair available for me, send me the
instructions offline on how to download them
Dave