And a fix for the phantom exception frame issue has been checked in:
https://github.com/crash-utility/crash/commit/14b3eadfd8cfafa19115c06aa4e...
Fix for the ARM64 "bt" command in Linux 4.5 and later kernels which
are not configured with CONFIG_FUNCTION_GRAPH_TRACER. Without the
patch, backtraces that originate from a per-cpu IRQ stack will dump
an invalid exception frame before transitioning to the process stack.
(anderson(a)redhat.com)
OK, thanks.
But those two patches contain trailing white-spaces at the end of
some lines :)
I also have some concerns on "bt -f" output.
I will send out a separate patch about this issue.
Thanks,
-Takahiro AKASHI
Dave
----- Original Message -----
>
>
> Hello Takahiro,
>
> I went ahead and checked in a fix for the user-space backtrace issue here:
>
>
https://github.com/crash-utility/crash/commit/2d53b97a476e71bfd5e2054d64a...
>
> Fix for the ARM64 "bt" command in Linux 4.5 and later kernels which
> use per-cpu IRQ stacks. Without the patch, if an active non-crashing
> task was running in user space when it received the shutdown IPI from
> the crashing task, the "-- <IRQ stack> ---" transition marker
from
> the IRQ stack to the process stack is not displayed, and a message
> indicating "bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp:
> <address> fp: 0 (?)" gets displayed.
> (anderson(a)redhat.com)
>
>
> The "phantom" exception frames in your 4.7 kernel vmcore are seen because
> your kernel doesn't have CONFIG_FUNCTION_GRAPH_TRACER configured, and
> therefore __in_irqentry_text() is a no-op:
>
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> static inline int __in_irqentry_text(unsigned long ptr)
> {
> extern char __irqentry_text_start[];
> extern char __irqentry_text_end[];
>
> return ptr >= (unsigned long)&__irqentry_text_start &&
> ptr < (unsigned long)&__irqentry_text_end;
> }
> #else
> static inline int __in_irqentry_text(unsigned long ptr)
> {
> return 0;
> }
> #endif
>
> static inline int in_exception_text(unsigned long ptr)
> {
> extern char __exception_text_start[];
> extern char __exception_text_end[];
> int in;
>
> in = ptr >= (unsigned long)&__exception_text_start &&
> ptr < (unsigned long)&__exception_text_end;
>
> return in ? : __in_irqentry_text(ptr);
> }
>
> In my Linux 4.5 kernel, CONFIG_FUNCTION_GRAPH_TRACER is configured,
> and as a result, "gic_handle_irq" is outside of the range from
> __exception_text_start to __exception_text_end:
>
> crash> sym -l
> ... [ cut ] ...
> ffff800000090000 (T) __exception_text_start
> ffff800000090000 (T) _stext
> ffff800000090000 (T) do_undefinstr
> ffff8000000901d8 (T) do_debug_exception
> ffff800000090294 (T) do_mem_abort
> ffff800000090348 (T) do_sp_pc_abort
> ffff800000090428 (T) __exception_text_end
> ffff800000090428 (T) __irqentry_text_start
> ffff800000090428 (t) gic_handle_irq
> ffff8000000904e0 (t) gic_handle_irq
> ffff800000090670 (T) __entry_text_start
> ffff800000090670 (T) __irqentry_text_end
> ...
>
> In your Linux 4.7 kernel, gic_handle_irq is located within
> the range, and as a result, the phantom exception frame gets
> dumped:
>
> crash> sym -l
> ... [ cut ] ...
> ffff000008081000 (T) __exception_text_start
> ffff000008081000 (T) _stext
> ffff000008081000 (T) do_undefinstr
> ffff000008081000 (t) efi_header_end
> ffff000008081248 (T) do_mem_abort
> ffff0000080812e8 (T) do_sp_pc_abort
> ffff0000080813c0 (T) do_debug_exception
> ffff000008081460 (t) sun4i_handle_irq
> ffff0000080814d0 (t) gic_handle_irq
> ffff000008081580 (t) gic_handle_irq
> ffff0000080816e0 (T) __exception_text_end
> ...
>
> The crash utility's in_exception_frame() function is based upon an older
> kernel's version before _irqentry_text_start and __irqentry_text_end existed.
>
> So two things need to be fixed in the crash utility:
>
> (1) the __irqentry_text_start and __irqentry_text_end range must
> be checked by in_exception_text() if they exist, and
> (2) this IRQ stack kludge that was added to the kernel's dump_backtrace()
> function needs to be handled the same way in the crash utility:
>
> if (in_exception_text(where)) {
> /*
> * If we switched to the irq_stack before calling
> this
> * exception handler, then the pt_regs will be on the
> * task stack. The easiest way to tell is if the
> large
> * pt_regs would overlap with the end of the
> irq_stack.
> */
> if (stack < irq_stack_ptr &&
> (stack + sizeof(struct pt_regs)) > irq_stack_ptr)
> stack =
> IRQ_STACK_TO_TASK_STACK(irq_stack_ptr);
>
> dump_mem("", "Exception stack", stack,
> stack + sizeof(struct pt_regs), false);
> }
>
> I'm working on a patch for the above as we speak.
>
> Thanks,
> Dave
>
>
> ----- Original Message -----
> >
> >
> >
> > ----- Original Message -----
> >
> > > > But I'm not sure what happens when an arm64 IRQ exception occurs
when
> > > > the task is running in user space. Does it lay an exception frame
down
> > > > on the
> > > > process stack and then make the transition? (and therefore the
> > > > user-space frame
> > > > above is legitimate?) Or does the user-space frame get laid down
> > > > directly on the
> > > > IRQ stack? Unfortunately I don't know enough about arm64
exception
> > > > handling.
> > >
> > > Since I reviewed this IRQ stack patch in LAK-ML, I will be able to help
> > > you.
> > > but I don't have enough time to explain in details this week.
> >
> > That's good news, your help will be greatly appreciated.
> >
> > > > In any case, the bt should display "-- <IRQ stack>
...", and then dump
> > > > the user-to-kernel-space exception frame, wherever it lies, i.e.,
> > > > either
> > > > on the
> > > > normal process stack or (maybe?) on the IRQ stack.
> > > >
> > > > Anyway, can you make the vmlinux/vmcore pair available for me to
> > > > download?
> > > > You can send the details to me offline.
> > >
> > > I sent you a message which contains the link to those binaries.
> >
> > Got them -- thanks!
> >
> > Also, I was finally able to generate a vmcore on a RHEL7 4.5.0-based
> > kernel,
> > where the crash occurred on cpu 1, and other 7 cpus were running in user
> > space.
> > I do see the same problem w/respect to the IRQ-stack-to-user-space
> > transition.
> >
> > However, I do not have the "phantom" exception frame dumps on the
IRQ
> > stacks that your dumpfile displays on the 7 non-crashing cpus, regardless
> > whether they came from kernel or user space.
> >
> > Here is the output:
> >
> > crash> sys
> > KERNEL: ../vmlinux
> > DUMPFILE: ../vmcore [PARTIAL DUMP]
> > CPUS: 8 [OFFLINE: 7]
> > DATE: Thu Jun 2 15:09:34 2016
> > UPTIME: 05:06:18
> > LOAD AVERAGE: 7.56, 3.49, 1.38
> > TASKS: 202
> > NODENAME:
apm-mustang-ev3-07.khw.lab.eng.bos.redhat.com
> > RELEASE: 4.5.0-0.38.el7.aarch64
> > VERSION: #1 SMP Thu May 19 15:37:24 EDT 2016
> > MACHINE: aarch64 (unknown Mhz)
> > MEMORY: 16 GB
> > PANIC: "sysrq: SysRq : Trigger a crash"
> > crash> bt -a
> > PID: 2546 TASK: ffff8003d5ab9600 CPU: 0 COMMAND: "spin"
> > #0 [ffff8003ffe33d60] crash_save_cpu at ffff800000148444
> > #1 [ffff8003ffe33dc0] handle_IPI at ffff80000009c8d0
> > #2 [ffff8003ffe33f80] gic_handle_irq at ffff8000000904c8
> > #3 [ffff8003ffe33fd0] el0_irq_naked at ffff80000009180c
> > bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003d5b73ed0
> > fp: 0 (?)
> > PC: 00000000004005b0 LR: 0000ffff911b0c94 SP: 0000fffffee69ca0
> > X29: 0000fffffee69ca0 X28: 0000000000000000 X27: 0000000000000000
> > X26: 0000000000000000 X25: 0000000000000000 X24: 0000000000000000
> > X23: 0000000000000000 X22: 0000000000000000 X21: 0000000000400450
> > X20: 0000000000000000 X19: 0000000000000000 X18: 0000fffffee69bb0
> > X17: 0000000000420000 X16: 0000ffff911b0ba4 X15: 00000000001815e7
> > X14: 0000ffff9136ffb8 X13: 000000000000000f X12: 0000000000000090
> > X11: 0000000090000000 X10: 00000000ffffffff X9: 0000000000000018
> > X8: 2f2f2f2f2f2f2f2f X7: b0bca0bdbeb3ff91 X6: 0000000000000000
> > X5: da16a3a21e08b5bc X4: 0000000000000000 X3: 00000000004005b0
> > X2: 0000fffffee69df8 X1: 0000fffffee69de8 X0: 0000000000000001
> > ORIG_X0: 0000ffff91310000 SYSCALLNO: ffffffffffffffff PSTATE:
> > 60000000
> >
> > PID: 2513 TASK: ffff8003d925d000 CPU: 1 COMMAND: "bash"
> > #0 [ffff8003dbf238d0] crash_kexec at ffff8000001486cc
> > #1 [ffff8003dbf23a20] die at ffff80000009731c
> > #2 [ffff8003dbf23a50] __do_kernel_fault at ffff8000000a7210
> > #3 [ffff8003dbf23a90] do_page_fault at ffff80000077b244
> > #4 [ffff8003dbf23ac0] do_mem_abort at ffff8000000902e8
> > #5 [ffff8003dbf23b30] el1_da at ffff800000091368
> > PC: ffff8000004970e4 [sysrq_handle_crash+36]
> > LR: ffff800000497c5c [__handle_sysrq+296]
> > SP: ffff8003dbf23cf0 PSTATE: 60000145
> > X29: ffff8003dbf23cf0 X28: ffff8003dbf20000 X27: ffff800000792000
> > X26: 0000000000000040 X25: 000000000000011e X24: 0000000000000007
> > X23: 0000000000000000 X22: ffff800000ce4000 X21: 0000000000000063
> > X20: ffff800000c50000 X19: ffff800000ce4888 X18: 0000000000000000
> > X17: 0000ffff7d780e20 X16: ffff800000237848 X15: 00192ea0bab15d05
> > X14: 0000000000000000 X13: 0000000000000000 X12: ffff800000c50000
> > X11: 0000000000000000 X10: 00000000000001d3 X9: 00000000000001d4
> > X8: ffff80000121ce10 X7: 0000000000008d88 X6: ffff8000012140b8
> > X5: 0000000000000000 X4: 0000000000000000 X3: 0000000000000000
> > X2: ffff8003ffe76448 X1: 0000000000000000 X0: 0000000000000001
> > ORIG_X0: 00000000000001d3 SYSCALLNO: 0
> > #6 [ffff8003dbf23d00] __handle_sysrq at ffff800000497c5c
> > #7 [ffff8003dbf23d10] write_sysrq_trigger at ffff8000004980d4
> > #8 [ffff8003dbf23d50] proc_reg_write at ffff80000029b934
> > #9 [ffff8003dbf23d70] __vfs_write at ffff800000235fd0
> > #10 [ffff8003dbf23db0] vfs_write at ffff800000236d54
> > #11 [ffff8003dbf23e40] sys_write at ffff80000023789c
> > #12 [ffff8003dbf23e90] __sys_trace_return at ffff800000091a8c
> > PC: 0000ffff7d7dbda8 LR: 0000ffff7d7835d4 SP: 0000fffff90fe1b0
> > X29: 0000fffff90fe1b0 X28: 0000000000000000 X27: 00000000004fb000
> > X26: 00000000004bb420 X25: 0000000000000001 X24: 00000000004f8000
> > X23: 0000000000000000 X22: 0000000000000002 X21: 0000ffff7d881168
> > X20: 0000ffff76e30000 X19: 0000000000000002 X18: 0000000000000000
> > X17: 0000ffff7d780e20 X16: 0000000000000000 X15: 00192ea0bab15d05
> > X14: 0000000000000000 X13: 0000000000000000 X12: 0000000000000001
> > X11: 000000001c1fc6a0 X10: 00000000004fd000 X9: 0000fffff90fe130
> > X8: 0000000000000040 X7: 0000000000000001 X6: 0000ffff7d759a98
> > X5: 0000000000000001 X4: 00000000fbad2a84 X3: 0000000000000000
> > X2: 0000000000000002 X1: 0000ffff76e30000 X0: 0000000000000001
> > ORIG_X0: 0000000000000001 SYSCALLNO: 40 PSTATE: 20000000
> >
> > PID: 2545 TASK: ffff8003d5901d00 CPU: 2 COMMAND: "spin"
> > #0 [ffff8003ffe93d60] crash_save_cpu at ffff800000148444
> > #1 [ffff8003ffe93dc0] handle_IPI at ffff80000009c8d0
> > #2 [ffff8003ffe93f80] gic_handle_irq at ffff8000000904c8
> > #3 [ffff8003ffe93fd0] el0_irq_naked at ffff80000009180c
> > bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003db4f3ed0
> > fp: 0 (?)
> > PC: 00000000004005b0 LR: 0000ffffb50f0c94 SP: 0000ffffe48b4910
> > X29: 0000ffffe48b4910 X28: 0000000000000000 X27: 0000000000000000
> > X26: 0000000000000000 X25: 0000000000000000 X24: 0000000000000000
> > X23: 0000000000000000 X22: 0000000000000000 X21: 0000000000400450
> > X20: 0000000000000000 X19: 0000000000000000 X18: 0000ffffe48b4820
> > X17: 0000000000420000 X16: 0000ffffb50f0ba4 X15: 00000000001815e7
> > X14: 0000ffffb52affb8 X13: 000000000000000f X12: 0000000000000090
> > X11: 0000000090000000 X10: 00000000ffffffff X9: 0000000000000018
> > X8: 2f2f2f2f2f2f2f2f X7: b0bca0bdbeb3ff91 X6: 0000000000000000
> > X5: 46c7b691c219cb7a X4: 0000000000000000 X3: 00000000004005b0
> > X2: 0000ffffe48b4a68 X1: 0000ffffe48b4a58 X0: 0000000000000001
> > ORIG_X0: 0000ffffb5250000 SYSCALLNO: ffffffffffffffff PSTATE:
> > 60000000
> >
> > PID: 2541 TASK: ffff8003d917b300 CPU: 3 COMMAND: "usex"
> > #0 [ffff8003ffec3d60] crash_save_cpu at ffff800000148444
> > #1 [ffff8003ffec3dc0] handle_IPI at ffff80000009c8d0
> > #2 [ffff8003ffec3f80] gic_handle_irq at ffff8000000904c8
> > #3 [ffff8003ffec3fd0] el0_irq_naked at ffff80000009180c
> > bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003dbecbed0
> > fp: 0 (?)
> > PC: 00000000004361e0 LR: 0000000000435be0 SP: 0000ffffcee64ac0
> > X29: 0000ffffcee64af0 X28: 0000000000000000 X27: 0000000000000000
> > X26: 0000000000000000 X25: 0000000000000000 X24: 0000000000000000
> > X23: 0000000000000000 X22: 0000000000000000 X21: 00000000004037b0
> > X20: 0000ffff8c790000 X19: 0000000000062a44 X18: 0000ffffcee64980
> > X17: 0000ffff8c891b9c X16: 00000000004602b8 X15: 002c4612d8986fa7
> > X14: 0000000000000000 X13: 00000003e8000000 X12: 0000000000000018
> > X11: 00000000000b5585 X10: 000000005750846e X9: 00000000001ecba2
> > X8: 0000000000000099 X7: 0000000000000000 X6: 0000ffff8c8946ec
> > X5: 0000ffff8c894768 X4: 0000000000000032 X3: 0000000000000007
> > X2: 0000000000000007 X1: 0000000000000005 X0: 00000000004b21cc
> > ORIG_X0: 0000ffffcee64b18 SYSCALLNO: ffffffffffffffff PSTATE:
> > 80000000
> >
> > PID: 2544 TASK: ffff8003d9176a80 CPU: 4 COMMAND: "usex"
> > #0 [ffff8003ffef3d60] crash_save_cpu at ffff800000148444
> > #1 [ffff8003ffef3dc0] handle_IPI at ffff80000009c8d0
> > #2 [ffff8003ffef3f80] gic_handle_irq at ffff8000000904c8
> > #3 [ffff8003ffef3fd0] el0_irq_naked at ffff80000009180c
> > bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003dbea7ed0
> > fp: 0 (?)
> > PC: 0000000000435e38 LR: 0000000000435c94 SP: 0000ffffcee64af0
> > X29: 0000ffffcee64af0 X28: 0000000000000000 X27: 0000000000000000
> > X26: 0000000000000000 X25: 0000000000000000 X24: 0000000000000000
> > X23: 0000000000000000 X22: 0000000000000000 X21: 00000000004037b0
> > X20: 0000ffff8c790000 X19: 0000000000041b82 X18: 0000ffffcee64980
> > X17: 0000ffff8c894590 X16: 0000000000460008 X15: 0034caab9974abe0
> > X14: 0000000000000000 X13: 00000003e8000000 X12: 0000000000000018
> > X11: 00000000000d83c1 X10: 000000005750846e X9: 00000000001eccc0
> > X8: 0000000000000099 X7: 0000000000000000 X6: 0000ffff8c8946ec
> > X5: 0000ffff8c894768 X4: 000000000000474e X3: 0000000000435ff8
> > X2: 0000000000000042 X1: 000000000000002a X0: 0000ffffcee64b84
> > ORIG_X0: 0000ffffcee64b18 SYSCALLNO: ffffffffffffffff PSTATE:
> > 20000000
> >
> > PID: 2547 TASK: ffff8003d5906580 CPU: 5 COMMAND: "spin"
> > #0 [ffff8003fff23d60] crash_save_cpu at ffff800000148444
> > #1 [ffff8003fff23dc0] handle_IPI at ffff80000009c8d0
> > #2 [ffff8003fff23f80] gic_handle_irq at ffff8000000904c8
> > #3 [ffff8003fff23fd0] el0_irq_naked at ffff80000009180c
> > bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003db4efed0
> > fp: 0 (?)
> > PC: 00000000004005b0 LR: 0000ffffb33d0c94 SP: 0000ffffe5813a70
> > X29: 0000ffffe5813a70 X28: 0000000000000000 X27: 0000000000000000
> > X26: 0000000000000000 X25: 0000000000000000 X24: 0000000000000000
> > X23: 0000000000000000 X22: 0000000000000000 X21: 0000000000400450
> > X20: 0000000000000000 X19: 0000000000000000 X18: 0000ffffe5813980
> > X17: 0000000000420000 X16: 0000ffffb33d0ba4 X15: 00000000001815e7
> > X14: 0000ffffb358ffb8 X13: 000000000000000f X12: 0000000000000090
> > X11: 0000000090000000 X10: 00000000ffffffff X9: 0000000000000018
> > X8: 2f2f2f2f2f2f2f2f X7: b0bca0bdbeb3ff91 X6: 0000000000000000
> > X5: f72609a0900e9af5 X4: 0000000000000000 X3: 00000000004005b0
> > X2: 0000ffffe5813bc8 X1: 0000ffffe5813bb8 X0: 0000000000000001
> > ORIG_X0: 0000ffffb3530000 SYSCALLNO: ffffffffffffffff PSTATE:
> > 60000000
> >
> > PID: 2542 TASK: ffff8003d9178780 CPU: 6 COMMAND: "usex"
> > #0 [ffff8003fff53d60] crash_save_cpu at ffff800000148444
> > #1 [ffff8003fff53dc0] handle_IPI at ffff80000009c8d0
> > #2 [ffff8003fff53f80] gic_handle_irq at ffff8000000904c8
> > #3 [ffff8003fff53fd0] el0_irq_naked at ffff80000009180c
> > bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003dbebbed0
> > fp: 0 (?)
> > PC: 0000000000435e10 LR: 0000000000435ddc SP: 0000ffffcee64ad0
> > X29: 0000ffffcee64ad0 X28: 0000000000000000 X27: 0000000000000000
> > X26: 0000000000000000 X25: 0000000000000000 X24: 0000000000000000
> > X23: 0000000000000000 X22: 0000000000000000 X21: 00000000004037b0
> > X20: 0000ffff8c790000 X19: 0000000000054c83 X18: 0000ffffcee64980
> > X17: 0000ffff8c894590 X16: 0000000000460008 X15: 0033b1eeb687fcdc
> > X14: 0000000000000000 X13: 00000003e8000000 X12: 0000000000000018
> > X11: 00000000000d3be2 X10: 000000005750846e X9: 00000000001ecc9a
> > X8: 0000000000000099 X7: 0000000000000000 X6: 0000ffff8c8946ec
> > X5: 0000ffff8c894768 X4: 000000000000474e X3: 0000000000435ff8
> > X2: 000000003693b600 X1: 000000003693b5f0 X0: 0000000000000006
> > ORIG_X0: 0000ffffcee64b18 SYSCALLNO: ffffffffffffffff PSTATE:
> > 80000000
> >
> > PID: 2548 TASK: ffff8003d5ab4d80 CPU: 7 COMMAND: "spin"
> > #0 [ffff8003fff83d60] crash_save_cpu at ffff800000148444
> > #1 [ffff8003fff83dc0] handle_IPI at ffff80000009c8d0
> > #2 [ffff8003fff83f80] gic_handle_irq at ffff8000000904c8
> > #3 [ffff8003fff83fd0] el0_irq_naked at ffff80000009180c
> > bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003d5b63ed0
> > fp: 0 (?)
> > PC: 00000000004005b0 LR: 0000ffffae060c94 SP: 0000ffffcf219e20
> > X29: 0000ffffcf219e20 X28: 0000000000000000 X27: 0000000000000000
> > X26: 0000000000000000 X25: 0000000000000000 X24: 0000000000000000
> > X23: 0000000000000000 X22: 0000000000000000 X21: 0000000000400450
> > X20: 0000000000000000 X19: 0000000000000000 X18: 0000ffffcf219d30
> > X17: 0000000000420000 X16: 0000ffffae060ba4 X15: 00000000001815e7
> > X14: 0000ffffae21ffb8 X13: 000000000000000f X12: 0000000000000090
> > X11: 0000000090000000 X10: 00000000ffffffff X9: 0000000000000018
> > X8: 2f2f2f2f2f2f2f2f X7: b0bca0bdbeb3ff91 X6: 0000000000000000
> > X5: aa704cb48aa4536a X4: 0000000000000000 X3: 00000000004005b0
> > X2: 0000ffffcf219f78 X1: 0000ffffcf219f68 X0: 0000000000000001
> > ORIG_X0: 0000ffffae1c0000 SYSCALLNO: ffffffffffffffff PSTATE:
> > 60000000
> > crash>
> >
> > Given that the link at the top of each of the IRQ stacks back to the
> > kernel-entry-from-user-space exception frames look to be legitimate,
> > perhaps
> > the "fp: 0" could be used as a key to recognizing the
> > IRQ-while-in-user-space
> > scenario? And also it doesn't appear that the phantom exception frames
> > that are dumped in your vmcore are mistakenly generating the fp of 0.
> >
> > Thanks,
> > Dave
> >
> >
>
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility