Nakahiro,
As I had last proposed, I have kept the original backtrace method in place,
while making your redesigned method available as a bt option accessible by
using "bt -o", or by using "bt -O" to toggle the default method from
the
original to the optional.
There were a couple bugs in your backtrace method.
(1) Although I can't find a sample vmcore where this piece gets executed,
this part was certainly incorrect:
+ /*
+ * Keep a buffer for a while until
+ * displaying the last frame on IRQ stack
+ * at next arm64_print_stackframe_entry()
+ */
+ if (bt->flags | BT_FULL)
+ ms->irq_stackbuf = bt->stackbuf;
(2) And on "older" kernels, your backtrace generates a segmentation violation
on any user-space task. I'm not sure when things changed, but at least on
3.19 kernels, the fp of the fp/pc stackframe pair that references the
kernel-entry exception is not 0, but rather contains the user-space stack
address.
For example, in a 4.5-based kernel, here's a typical syscall-generated
fp/pc stackframe, where the fp is 0:
crash> bt -F 2513
... [ cut ] ...
#11 [ffff8003dbf23e80] sys_write at ffff800000237898
ffff8003dbf23e80: 0000000000000000 __sys_trace_return <==== fp 0
ffff8003dbf23e90: 0000000000000200 0000ffff76e30000
ffff8003dbf23ea0: ffffffffffffffff __sys_trace+28
ffff8003dbf23eb0: 0000000000000002 0000ffff76e30000
ffff8003dbf23ec0: ffffffffffffffff 0000000000000000
#12 [ffff8003dbf23ed0] __sys_trace at ffff800000091a88
PC: 0000ffff7d7dbda8 LR: 0000ffff7d7835d4 SP: 0000fffff90fe1b0
X29: 0000fffff90fe1b0 X28: 0000000000000000 X27: 00000000004fb000
X26: 00000000004bb420 X25: 0000000000000001 X24: 00000000004f8000
X23: 0000000000000000 X22: 0000000000000002 X21: 0000ffff7d881168
X20: 0000ffff76e30000 X19: 0000000000000002 X18: 0000000000000000
X17: 0000ffff7d780e20 X16: 0000000000000000 X15: 00192ea0bab15d05
X14: 0000000000000000 X13: 0000000000000000 X12: 0000000000000001
X11: 000000001c1fc6a0 X10: 00000000004fd000 X9: 0000fffff90fe130
X8: 0000000000000040 X7: 0000000000000001 X6: 0000ffff7d759a98
X5: 0000000000000001 X4: 00000000fbad2a84 X3: 0000000000000000
X2: 0000000000000002 X1: 0000ffff76e30000 X0: 0000000000000001
ORIG_X0: 0000000000000001 SYSCALLNO: 40 PSTATE: 20000000
crash>
But on a 3.19 kernel, it looks like this, where the fp is the user-space
stack address seen in the exception frame:
crash> bt -F 27755
... [ cut ] ...
#10 [fffffe035de9be80] sys_write at fffffe00001e8720
fffffe035de9be80: 000003ffed468290 __sys_trace_return <== fp 3ffed468290
fffffe035de9be90: 0000000000000200 000003ff955f0000
fffffe035de9bea0: ffffffffffffffff __sys_trace+28
fffffe035de9beb0: 0000000000000002 000003ff955f0000
fffffe035de9bec0: ffffffffffffffff 0000000000000000
#11 [fffffe035de9bed0] __sys_trace at fffffe00000939c8
PC: 000003ff9bf9c668 LR: 000003ff9bf439cc SP: 000003ffed468290
X29: 000003ffed468290 X28: 0000000000000000 X27: 00000000004fb000
X26: 00000000004bb330 X25: 0000000000000001 X24: 00000000004f8000
X23: 0000000000000000 X22: 0000000000000002 X21: 000003ff9c041178
X20: 000003ff955f0000 X19: 0000000000000002 X18: 0000000000000000
X17: 000003ff9bf41218 X16: 0000000000000000 X15: 0016431976760606
X14: 0000000000000000 X13: 0000000000000000 X12: 0000000000000001
X11: 0000000000c48820 X10: 00000000004fd000 X9: 000003ffed468220
X8: 0000000000000040 X7: 0000000000000001 X6: 000003ff9bf19a98
X5: 0000000000000001 X4: 00000000fbad2a84 X3: 0000000000000000
X2: 0000000000000002 X1: 000003ff955f0000 X0: 0000000000000001
ORIG_X0: 0000000000000001 SYSCALLNO: 40 PSTATE: 20000000
Same thing if the kernel was entered via interrupt -- here the 4.5-based kernel:
crash> bt -F 1
... [ cut ] ...
#6 [ffff8003dc103e70] sys_epoll_pwait at ffff8000002806c8
ffff8003dc103e70: 0000000000000000 el0_svc_naked+36 <=== fp 0
ffff8003dc103e80: 0000000000000000 0000000000000000
ffff8003dc103e90: ffffffffffffffff 0000ffff9455a48c
ffff8003dc103ea0: 0000000060000000 0000000039000a80
ffff8003dc103eb0: 0000000000000008 0000000000000000
ffff8003dc103ec0: 0000000000000014 000000000ecb0180
#7 [ffff8003dc103ed0] el0_svc_naked at ffff800000091a2c
PC: 0000ffff9455a48c LR: 0000aaaad308fb48 SP: 0000fffff45629b0
X29: 0000fffff45629b0 X28: 0000fffff4562ec8 X27: 0000aaaad313f000
X26: 0000aaaad30bfd70 X25: 0000aaaad313f000 X24: 0000aaaad30c0248
X23: 0000000000000001 X22: 0000aaaaebf2ea50 X21: 0000fffff45629f0
X20: 0000000000000000 X19: 0000000000000004 X18: 0000aaaad30e0050
X17: 0000ffff9455a228 X16: 0000aaaad313f640 X15: 00000000000000d8
X14: 00e53b496f58b0a8 X13: 0000000000000000 X12: 0000000000000010
X11: 0000000000000067 X10: 0000000000000065 X9: 0000000000000074
X8: 0000000000000016 X7: 000000000000cbf6 X6: 000000000003d090
X5: 0000000000000008 X4: 0000000000000000 X3: ffffffffffffffff
X2: 0000000000000022 X1: 0000fffff45629f0 X0: 0000000000000004
ORIG_X0: 0000000000000004 SYSCALLNO: 16 PSTATE: 60000000
crash>
And then here on the 3.19 kernel:
crash> bt -F 1
... [ cut ] ...
#6 [fffffe035c083e80] sys_epoll_pwait at fffffe0000229bc8
fffffe035c083e80: 000003ffe63d3610 el0_svc_naked+36 <=== fp 3ffe63d3610
fffffe035c083e90: 0000000000000000 0000000000000000
fffffe035c083ea0: ffffffffffffffff 000003ff7bc9ad4c
fffffe035c083eb0: 000003ffe63d3470 work_pending+28
fffffe035c083ec0: 0000000000000000 0000000000000000
#7 [fffffe035c083ed0] el0_svc_naked at fffffe000009396c
PC: 000003ff7bc9ad4c LR: 000003ff7c11711c SP: 000003ffe63d3610
X29: 000003ffe63d3610 X28: 0000000000000001 X27: 000003ff7c21f000
X26: 000003ff7c195618 X25: 000003ff7c195f88 X24: 000003ff7c1943e8
X23: 000003ff8e21dc20 X22: 0000000000000000 X21: 000003ffe63d3780
X20: 0000000000000000 X19: 0000000000000004 X18: 0000000000000800
X17: 000003ff7bc9aae8 X16: 000003ff7c21f598 X15: 003b9aca00000000
X14: 002ca73218000000 X13: ffffffffa90e8b35 X12: 0000000000000018
X11: 00000000127f8e50 X10: 0000000000001102 X9: 000000000001f418
X8: 0000000000000016 X7: 0000000000000000 X6: 000003ff7c090000
X5: 0000000000000008 X4: 0000000000000000 X3: ffffffffffffffff
X2: 0000000000000001 X1: 000003ffe63d3780 X0: 0000000000000004
ORIG_X0: 0000000000000004 SYSCALLNO: 16 PSTATE: 60000000
Anyway, that kernel anomoly caused a segmentation violation in your
arm64_gen_hidden_frame() function when it tries to use the user
space stack address in STACK_OFFSET_TYPE(). That turned out to be
my first attempt at maintenance of your patch, and while I've got
it to work without a SIGSEGV, my hack/kludge of a fix doesn't fit
very nicely into your arm64_back_trace_cmd_v2 for() loop. And
on the old kernel, "bt -f" does not display stack memory at and
above the fp/pc stackframe that contain the user-stack fp. Please
feel free to send a follow-up patch.
Other changes I made to your backtrace method:
(1) I removed the "--- <Exception in kernel> ---", "--- <Exception
in user> ---",
lines. I felt that they are unnecessary given that that it's obvious which
is which, and no other arches print such lines.
(2) I removed the "#0 [user space]" line, which was meant for non-kdump kernels
that
did not have a user-space exception frame available, so instead of printing
nothing for a task that was in user-space, it just showed that single line.
(3) I removed the empty linefeed after in-kernel exception frames so that each task's
backtrace is a single blurb without any intervening linefeeds.
I did make a couple changes to the original backtrace method that
adopts code from your new optional method:
(1) ORIG_X0 and SYSCALLNO registers are not displayed in kernel
exception frames.
(2) stackframe entry text locations are modified to be the PC address
of the branch instruction instead of the subsequent "return" PC
address.
That being the case, these are the essential differences between
the original and optional methods:
(1) original: the starting point of backtraces for the active, non-crashing,
tasks, will continue to have crash_save_cpu() on the IRQ stack as the
starting point;
optional: backtrace will start with the IPI exception frame located
on the process stack
(2) optional: exception entry stackframe adjusted to be located farther
down in the IRQ stack
(3) optional: bt -f does not display IRQ stack memory above the adjusted
exception entry stackframe
(4) optional: may display "(Next exception frame might be wrong)"
The combined patchset is queued for crash-7.1.6:
----- Original Message -----
Takahiro,
Sorry for the previous empty response -- I sent it by mistake while
putting this 3-message collaborative response together.
> Dave,
>
> If you don't like my patches, that is OK.
> Applying them or not is totally up to you.
> I will *never* submit my patches again.
Well, that's certainly unfortunate.
> Having said so, I think I would better explain my intentions on the code.
>
> On Wed, Jun 29, 2016 at 04:09:01PM -0400, Dave Anderson wrote:
> >
> >
> > Hi Takahiro,
> >
> > I applied patches 1/2 and 2/2 from the v5 patchset. But I can't
> > believe the results are what you intended?
> >
> > For example, taking the 4.6 vmcore that you gave to me, here is the
> > current crash utility's output of "bt -a", where the crashing
task
> > entered crash_kexec() via the sysrq-c page fault exception, and the
> > tasks on the other cpus have all entered crash_save_cpu() on their
> > IRQ stack as a result of the shutdown IPI, one from user-space and
> > the others from the kernel:
> >
> > crash> bt -a
> > PID: 0 TASK: ffff000008dcd900 CPU: 0 COMMAND: "swapper/0"
> > #0 [ffff800022f42e50] crash_save_cpu at ffff00000812ae44
> > #1 [ffff800022f43010] handle_IPI at ffff00000808e718
> > #2 [ffff800022f43040] gic_handle_irq at ffff0000080815f8
> > #3 [ffff800022f43080] el1_irq at ffff000008084720
> > --- <IRQ stack> ---
>
> This part of dump shown above is the result of "kdump" mechanisms,
> and is *not* useful for you to understand what was happening on this
> CPU when a crash occurred on the other CPU, #3 in this case.
>
> > PC: ffff0000080857c0 [arch_cpu_idle+16]
> > LR: ffff0000080857bc [arch_cpu_idle+12]
> > SP: ffff000008dc3f10 PSTATE: 60400149
> > X29: ffff000008dc3f10 X28: ffff000008dc0000 X27: 0000000000000000
> > X26: ffff000008dc5000 X25: ffff000008dc5c1c X24: ffff000008dc5000
> > X23: ffff000008dc0000 X22: ffff000008bd0270 X21: ffff000008dc0000
> > X20: ffff000008dc5b88 X19: 0000000000000000 X18: 00000000a632f641
> > X17: 0000ffff7da57880 X16: ffff0000081d9838 X15: 00000000383a0a79
> > X14: 00000000b2b0b162 X13: 000000005f2cbeec X12: 0000000000045a9e
> > X11: ffff8000213bd800 X10: 0000000000000850 X9: ffff000008dc0000
> > X8: 000000010000aa07 X7: 000000000000003d X6: 0015752a00000000
> > X5: 0100000000000000 X4: 00000000000001c0 X3: 0000000000000000
> > X2: 0000000000000001 X1: 0000000000000000 X0: 0000000000000000
> > ORIG_X0: 0000000000000000 SYSCALLNO: 7fffffffffffffff
> > #4 [ffff000008dc3f10] arch_cpu_idle at ffff0000080857c0
> > #5 [ffff000008dc3f20] cpu_startup_entry at ffff0000080f26cc
> > #6 [ffff000008dc3f80] rest_init at ffff0000087c7930
> > #7 [ffff000008dc3fa0] start_kernel at ffff000008b10b70
>
> All you need to know is that CPU#0 was idle at the crash point.
> Whatever else do you want to know?
Things like:
(1) I want to know whether the active cpu was properly shut down by
the kdump mechanism.
(2) I would like to actually "test" that the crash utility's transition
from
the IRQ stack back to the process stack works as expected. (especially
given
the relative rarity of kernel crashes that actually occur while operating
on IRQ stacks)
(3) There are several other dumping mechanisms that can be utilized on ARM64
systems, so again, I'd like it to be obvious that this is kdump.
(4) When doing a filtered "foreach bt" it would be helpful to simply make it
obvious that a task was an active, non-crashing, task -- with more
information
than a "hanging" exception frame dump.
(5) All of the other architectures do it.
... [ cut ] ...
> > But why do you think that it is an improvement to leave out the
> > transition
> > to the IRQ stack?
>
> I explained the reason in my previous e-mail.
> I still believe that it is an improvement.
OK, we're just going to have to agree to disagree.
> > In any case, I haven't even started looking at the "bt -f" part
of the
> > patch
> > because quite frankly, this patchset is so complex that I haven't even
> > begun to try to understand it. When you said it would be "easier to
> > maintain", well, perhaps for *you* maybe, but certainly not for me!
>
> Do you think so?
> It is because you are the author.
Maybe so, but I absolutely do think so.
> The changes I made were big and look complex to you,
> but the resulting code is simple enough for others IMO.
>
> I tried to make the main loop of arm64_back_trace_cmd() simple and
> quite resemble the counterpart of dump_backtrace() in the kernel.
>From my perspective, the current arm64_back_trace_cmd() is far simpler,
and is essentially a backport of the kernel's dump_backtrace() code.
> The differences come mainly from the facts:
> 1.you (crash util) need to display not only a one-line frame summary
> (with a value of stack/frame pointer), but also a full dump of stack
Right, but the current loop remains basically the same as the kernel's.
> 2.your current code will miss a *very important* stack frame that is
> the one exactly when an interrupt takes place.
>
> I believe that the latter is a big improvement.
I'm not following when you say it "will miss" a frame. It's just
following
the links established by the architecture. If you line up the backtrace
output
of the current vs. your patch, there are no missing frames, but rather you
essentially move the "el1_da()" stack frame entry to a lower stack address,
and your "bt -f" implementation doesn't dump the raw exception frame data.
> Yeah, due to the changes, arm64_unwind_frame() may get a bit complicated,
> but it is very naive implementation based on the nature of the kernel's
> stack usage (or PCS for ARM), the complexity is not the result of my poor
> skill of coding.
> In other words, the complexity is now *encapsulated* in that function.
I understand what you're saying, and certainly appreciate the depth of
knowledge that you have, but quite frankly I don't find the changes
significant enough to reinvent the wheel.
> In addition, I wrote down that function step-by-step, from generic cases
> to exceptional cases, with a bunch of comments for better understandings.
> And you will be able to remove such exceptional cases *if* you don't like
> them.
>
> Given those stuffs, I think that my code is easier to maintain.
>
> Again, surely up to you.
>
> > Without the kernel's backtrace code from which the current code is based,
> > I don't
> > have anything to work from anymore. I'm really not sure whether the
> > coverage of
> > the "corner cases" you referred to make this effort worth it. I
would
> > like to
> > see examples of how the current code fails. But anyway, I will continue
> > to test it to
> > see if there actually is any significant upgrade from what we already
> > have in
> > place. From a kernel debugging perspective, all we really need is a
> > basic unwinder,
> > and optionally the full dump of the stack data in between those frames.
> > And the
> > current code does do that at a minimum, and there's much to be said for
> > simplicity.
> >
... [ cut ] ...
> > Here is another thing that I would prefer not to change/omit.
> >
> > In the current code, the raw exception frame data is dumped as
> > part of the "bt -[fF]" output, just prior to it being translated
> > as an exception frame:
> >
> > crash> bt -F
> > PID: 1223 TASK: ffff800020ef5780 CPU: 3 COMMAND: "sh"
> > ... [ cut ] ...
> > #5 [ffff800020b0bb70] do_mem_abort at ffff00000808128c
> > ffff800020b0bb70: ffff800020b0bd40 el1_da+24
> > ffff800020b0bb80: cpu_cgrp_subsys+152 0000000000000063
> > ffff800020b0bb90: ffff800020b0bd40 sysrq_handle_crash+32
> > ffff800020b0bba0: 0000000000000002 textbuf.34610
> > ffff800020b0bbb0: ffff800020b0bbd0 kallsyms_token_index+43128
> > ffff800020b0bbc0: 000000000000000f 0000000100000000
> > ffff800020b0bbd0: ffff800020b0bc70 vprintk_default+56
> > ffff800020b0bbe0: cpu_cgrp_subsys+152 0000000000000063
> > ffff800020b0bbf0: sysrq_crash_op 0000000000000009
> > ffff800020b0bc00: 0000000000000000 0000000000000015
> > ffff800020b0bc10: 0000000000000120 0000000000000040
> > ffff800020b0bc20: 0000000000000001 0000000000000000
> > ffff800020b0bc30: log_wait+8 0000000000000000
> > ffff800020b0bc40: 0000000000000000 00000000000047d4
> > ffff800020b0bc50: ffff800022f337a4 0000000000000000
> > ffff800020b0bc60: 0000000000000106 0000000000000001
> > ffff800020b0bc70: 0000000000000002 0000000000000106
> > ffff800020b0bc80: log_buf_len cont
> > ffff800020b0bc90: 0000ffff83cc28f0 text.34829+13
> > ffff800020b0bca0: sys_write 0000ffff83d266c0
> > ffff800020b0bcb0: 0000000000000006 cpu_cgrp_subsys+152
> > ffff800020b0bcc0: 0000000000000063 sysrq_crash_op
> > ffff800020b0bcd0: 0000000000000009 0000000000000000
> > ffff800020b0bce0: 0000000000000015 0000000000000120
> > ffff800020b0bcf0: 0000000000000040 sys_call_table
> > ffff800020b0bd00: ffff800020b08000 ffff800020b0bd40
> > ffff800020b0bd10: sysrq_handle_crash+12 ffff800020b0bd40
> > ffff800020b0bd20: sysrq_handle_crash+32 0000000060400149
> > ffff800020b0bd30: cpu_cgrp_subsys+152 [kmalloc-1024]
> > #6 [ffff800020b0bd40] el1_da at ffff000008084568
> > PC: ffff000008457fc8 [sysrq_handle_crash+32]
> > LR: ffff000008457fb4 [sysrq_handle_crash+12]
> > SP: ffff800020b0bd40 PSTATE: 60400149
> > X29: ffff800020b0bd40 X28: ffff800020b08000 X27: ffff0000087e2000
> > X26: 0000000000000040 X25: 0000000000000120 X24: 0000000000000015
> > X23: 0000000000000000 X22: 0000000000000009 X21: ffff000008e071b0
> > X20: 0000000000000063 X19: ffff000008dda000 X18: 0000000000000006
> > X17: 0000ffff83d266c0 X16: ffff0000081c68b8 X15: ffff000008e6cc95
> > X14: 0000ffff83cc28f0 X13: ffff000008e6c758 X12: ffff000008dda7a0
> > X11: 0000000000000106 X10: 0000000000000002 X9: 0000000000000001
> > X8: 0000000000000106 X7: 0000000000000000 X6: ffff800022f337a4
> > X5: 00000000000047d4 X4: 0000000000000000 X3: 0000000000000000
> > X2: ffff000008dda7b8 X1: 0000000000000000 X0: 0000000000000001
> > ORIG_X0: ffff000008dda000 SYSCALLNO: ffff80002104d418
> > ...
> >
> > whereas with the v5 patchset, the exception frame only gets translated,
> > but the actual raw memory never gets dumped:
>
> I surely remember that you said that would not be an issue
> when I submitted older version, maybe v1 or v2.
Well, yes, that's true in the case of an exception frame generated
by the IPI interrupt (or any interrupt), where the eframe contents are
incidental to the interrupt's arrival, and not nearly as interesting
as an in-line kernel exception frame generated by a NULL pointer
reference, BUG(), etc., where the contents of the exception frame
are completely related to the problem/bug at hand. And in that case,
the current code does dump the raw contents just before the register
translation.
> Do you think that those symbolic display are still useful
> though it is not quite easy to recognize which register has what value?
It's not a major item, but it certainly just dumps the raw memory
in between frame like anywhere else in the "bt -f" output. Seems
strange to just omit it there.
>
> Even more, <ffff800020b0bb80-ffff800020b0bc10> is *not* a stack for
> do_mem_abort(). It is just wrong and will confuse people.
> So this is another example of improvement on my patches.
I'm not following -- your patch (as well as the current version) both
show ffff800020b0bb80-ffff800020b0bc10 below do_mem_abort():
...
#5 [ffff800020b0bb70] do_mem_abort at ffff000008081288
ffff800020b0bb70: ffff800020b0bd40 el1_da+24
ffff800020b0bb80: cpu_cgrp_subsys+152 0000000000000063
ffff800020b0bb90: ffff800020b0bd40 sysrq_handle_crash+32
ffff800020b0bba0: 0000000000000002 textbuf.34610
ffff800020b0bbb0: ffff800020b0bbd0 kallsyms_token_index+43128
ffff800020b0bbc0: 000000000000000f 0000000100000000
ffff800020b0bbd0: ffff800020b0bc70 vprintk_default+56
ffff800020b0bbe0: cpu_cgrp_subsys+152 0000000000000063
ffff800020b0bbf0: sysrq_crash_op 0000000000000009
ffff800020b0bc00: 0000000000000000 0000000000000015
ffff800020b0bc10: 0000000000000120 0000000000000040
#6 [ffff800020b0bc20] el1_da at ffff000008084564
--- <Exception in kernel> ---
PC: ffff000008457fc8 [sysrq_handle_crash+32]
LR: ffff000008457fb4 [sysrq_handle_crash+12]
SP: ffff800020b0bd40 PSTATE: 60400149
...
The current code does the above, but dumps the raw exception frame
before printing a stack entry for el1_da():
...
#5 [ffff800020b0bb70] do_mem_abort at ffff00000808128c
ffff800020b0bb70: ffff800020b0bd40 el1_da+24
ffff800020b0bb80: cpu_cgrp_subsys+152 0000000000000063
ffff800020b0bb90: ffff800020b0bd40 sysrq_handle_crash+32
ffff800020b0bba0: 0000000000000002 textbuf.34610
ffff800020b0bbb0: ffff800020b0bbd0 kallsyms_token_index+43128
ffff800020b0bbc0: 000000000000000f 0000000100000000
ffff800020b0bbd0: ffff800020b0bc70 vprintk_default+56
ffff800020b0bbe0: cpu_cgrp_subsys+152 0000000000000063
ffff800020b0bbf0: sysrq_crash_op 0000000000000009
ffff800020b0bc00: 0000000000000000 0000000000000015
ffff800020b0bc10: 0000000000000120 0000000000000040
ffff800020b0bc20: 0000000000000001 0000000000000000
ffff800020b0bc30: log_wait+8 0000000000000000
ffff800020b0bc40: 0000000000000000 00000000000047d4
ffff800020b0bc50: ffff800022f337a4 0000000000000000
ffff800020b0bc60: 0000000000000106 0000000000000001
ffff800020b0bc70: 0000000000000002 0000000000000106
ffff800020b0bc80: log_buf_len cont
ffff800020b0bc90: 0000ffff83cc28f0 text.34829+13
ffff800020b0bca0: sys_write 0000ffff83d266c0
ffff800020b0bcb0: 0000000000000006 cpu_cgrp_subsys+152
ffff800020b0bcc0: 0000000000000063 sysrq_crash_op
ffff800020b0bcd0: 0000000000000009 0000000000000000
ffff800020b0bce0: 0000000000000015 0000000000000120
ffff800020b0bcf0: 0000000000000040 sys_call_table
ffff800020b0bd00: ffff800020b08000 ffff800020b0bd40
ffff800020b0bd10: sysrq_handle_crash+12 ffff800020b0bd40
ffff800020b0bd20: sysrq_handle_crash+32 0000000060400149
ffff800020b0bd30: cpu_cgrp_subsys+152 [kmalloc-1024]
#6 [ffff800020b0bd40] el1_da at ffff000008084568
PC: ffff000008457fc8 [sysrq_handle_crash+32]
LR: ffff000008457fb4 [sysrq_handle_crash+12]
SP: ffff800020b0bd40 PSTATE: 60400149
...
Yes, again I understand that you feel that the el1_da() frame
should be at ffff800020b0bc20, but on the other hand, the
link register/pc under do_mem_abort() points to ffff800020b0bd40,
and I kind of prefer to show that. I *know* you disagree.
... [ cut ] ...
> IMO I think that you'd better improve the output of
> arm64_print_exception_frame() for bt -F.
> And this is an totally independent issue from other parts of my patches.
I don't see where "bt -F" gets involved in arm64_print_exception_frame()?
Are you referring to this, which I agree is OK:
@@ -2130,10 +2343,11 @@ arm64_print_exception_frame(struct bt_info *bt, ulong
pt_regs, int mode, FILE *o
}
if (is_64_bit) {
- fprintf(ofp, "ORIG_X0: %016lx SYSCALLNO: %lx",
- (ulong)regs->orig_x0, (ulong)regs->syscallno);
- if (mode == USER_MODE)
+ if (mode == USER_MODE) {
+ fprintf(ofp, "ORIG_X0: %016lx SYSCALLNO: %lx",
+ (ulong)regs->orig_x0,
(ulong)regs->syscallno);
fprintf(ofp, " PSTATE: %08lx",
(ulong)regs->pstate);
+ }
fprintf(ofp, "\n");
}
Anyway, I'm going to be away for the next few days, and will revisit
this all again next week.
But here's a thought:
There already are "bt -o" and "bt -O" for x86 and x86_64, which
allow
for the existence of coexisting but separate back trace methods.
I'm thinking that we can keep the current code, while also introducing
your patchset as an "optional" back trace method. There would be
arm64_back_trace_cmd_v1() and arm64_back_trace_cmd_v2(),
arm64_unwind_frame_v1()
and arm64_unwind_frame_v2(), and any other functions that can't comfortably
be utilized by both methods. And then "bt -o" would use the optional
version, or the default method selection could be toggled with "bt -O".
And again, I hope we don't lose you. During the KASLR negotiations, you've
already seen what an obstinate ass I can be. But it's really part of my
job as a gatekeeper.
Thanks,
Dave
"Simplicity, simplicity, simplicity."
- Henry David Thoreau
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility