Hi Ogawa,
I've checked in a patch for storing the correct 32-bit x86 hard/soft IRQ stack
addresses:
https://github.com/crash-utility/crash/commit/5959759058e53035fb26b9f2b59...
Fix for Linux 3.15 and later 32-bit X86 kernels containing kernel
commit 198d208df4371734ac4728f69cb585c284d20a15, titled "x86: Keep
thread_info on thread stack in x86_32". Without the patch, incorrect
addresses of each per-cpu hardirq_stack and softirq_stack were saved
for usage by the "bt" command.
(hirofumi(a)mail.parknet.co.jp, anderson(a)redhat.com)
I don't have a vmcore that I can test whether the stack transition still works,
but at least it's got the correct addresses, and I believe I have the correct
values set up in restore_stack().
Also, there are also several other 32-bit x86 backtrace-related patches that I've
checked in the last few days for other issues that you may have run into:
commit 222e784fb2d16f28ab702a7f9a1889e0b18ca152
Author: Dave Anderson <anderson(a)redhat.com>
Date: Fri Feb 10 15:07:58 2017 -0500
Fix for 32-bit X86 kernels configured with CONFIG_RANDOMIZE_BASE.
Without the patch, an invalid kernel PAGE_OFFSET value is calculated
and as a result the session fails during session initialization just
after the patching of the gdb minimal_symbol values message, showing
the warning message "WARNING: cannot read linux_banner string",
followed by "crash: /vmlinux and /dev/crash do not match!". This
patch also adds a new "--machdep page_offset=<value>" option that
can be used if the CONFIG_PAGE_OFFSET value is not the default
address of 0xc0000000.
(anderson(a)redhat.com)
commit 69b577e423c1adb5f06d19bccc790063a500aac4
Author: Dave Anderson <anderson(a)redhat.com>
Date: Tue Feb 14 14:25:37 2017 -0500
Fix for the "bt" command on Linux 4.9 and later 32-bit X86 kernels
containing kernel commit 0100301bfdf56a2a370c7157b5ab0fbf9313e1cd,
subject "sched/x86: Rewrite the switch_to() code". Without the
patch, backtraces for inactive (sleeping) tasks fail with the message
"bt: invalid structure member offset: task_struct_thread_eip".
(anderson(a)redhat.com)
commit ec1a9b967da19acaa0ec6b6a904d16267ba6a409
Author: Dave Anderson <anderson(a)redhat.com>
Date: Tue Feb 14 15:23:01 2017 -0500
Fix for a "[-Wmisleading-indentation]" compiler warning and the
associated bug that is generated by lkcd_x86_trace.c when building
32-bit X86 with "make warn" with gcc-6.3.1.
(anderson(a)redhat.com)
commit 63dd57685e5e897ae343a213b58cb1e6ae4ed282
Author: Dave Anderson <anderson(a)redhat.com>
Date: Tue Feb 14 16:23:56 2017 -0500
Fix for an invalid "bt" warning on a 32-bit X86 idle (swapper) task.
Without the patch, the backtrace displays the "cannot resolve stack
trace" warning, dumps the backtrace, and then the text symbols:
crash> bt
PID: 0 TASK: f0962180 CPU: 6 COMMAND: "swapper/6"
bt: cannot resolve stack trace:
#0 [f095ff1c] __schedule at c0b6ef8d
#1 [f095ff58] schedule at c0b6f4a9
#2 [f095ff64] schedule_preempt_disabled at c0b6f728
#3 [f095ff6c] cpu_startup_entry at c04b0310
#4 [f095ff94] start_secondary at c04468c0
bt: text symbols on stack:
[f095ff1c] __schedule at c0b6ef8d
[f095ff58] schedule at c0b6f4ae
[f095ff64] schedule_preempt_disabled at c0b6f72d
[f095ff6c] cpu_startup_entry at c04b0315
[f095ff94] start_secondary at c04468c5
crash>
The backtrace shown is actually correct.
(anderson(a)redhat.com)
commit 8323490fd962ba1b6aa745deea2ec817d837862d
Author: Dave Anderson <anderson(a)redhat.com>
Date: Wed Feb 15 15:04:07 2017 -0500
Another fix for a similar "bt: cannot resolve stack trace" warning
on a 32-bit X86 idle/swapper task, but when running on cpu 0.
(anderson(a)redhat.com)
commit 345da6dc904cb88a3abfae36a990150a2036b95d
Author: Dave Anderson <anderson(a)redhat.com>
Date: Wed Feb 15 15:39:13 2017 -0500
Remove two one-time warning messages that are displayed when running
the "bt" command on Linux 4.2 and later 32-bit X86 kernels. Without
the patch, the first "bt" command that is executed will be preceded
by "bt: WARNING: "system_call" symbol does not exist", followed
by
"bt: WARNING: neither "ret_from_sys_call" nor
"syscall_badsys"
symbols exist".
(anderson(a)redhat.com)
It's been quite some time since I've done any 32-bit x86 maintenance,
and any help/time you can afford would be appreciated.
Thanks,
Dave
----- Original Message -----
Dave Anderson <anderson(a)redhat.com> writes:
>> > Please review the attached update to your patch, which adds support for
>> > the irq_ctx-to-irq_stack transition.
>> > - if ((hard_sp = per_cpu_symbol_search("per_cpu__hardirq_ctx")))
{
>> > + if ((hard_sp = per_cpu_symbol_search("per_cpu__hardirq_ctx"))
||
>> > + (hard_sp =
per_cpu_symbol_search("per_cpu__hardirq_stack"))) {
>> > if ((kt->flags & SMP) && (kt->flags &
PER_CPU_OFF)) {
>> > for (i = 0; i < NR_CPUS; i++) {
>> > + ulong ptr;
>> > +
>>
>> [...]
>>
>> Right, this will work, maybe, several stuffs. However hardirq_tasks[] is
>> missing, so this will not be fully. For now, I don't have dump of other
>> than v4.9 (stack format was changed), so I'm not sure which commands are
>> not working (user of hardirq_tasks[]).
>
> I haven't looked at the recent kernel to check out how the linkage is done
> without the temporary thread_info at the bottom of the hard/soft irq
> stacks,
> but looking at a live 4.9 system, it looks like the first word at the base
> of
> the hard and soft irq stack now contains a pointer into the stack of the
> last caller? If that's true, then stkptr_to_task() could be used to
> populate the hardirq_tasks[] and softirq_tasks[] -- but the initialization
> would have to be delayed until near the end of task_init().
Right. At least on v4.9, with quick check, looks like the x86 and x86_64
has different (x86 is for hardirq/softirq, x86_64 is for
hardirq/exception).
linux/arch/x86/kernel/dumpstack_{32,64}.c is how get previous stack from
sp.
x86:
in_hardirq_stack(stack, info)
info->next_sp = (unsigned long *)*begin;
in_softirq_stack(stack, info)
info->next_sp = (unsigned long *)*begin;
both is same as you said (at start of stack pages).
x86_64
in_exception_stack(stack, info)
info->next_sp = (unsigned long *)regs->sp;
in_irq_stack(stack, info)
info->next_sp = (unsigned long *)*(end - 1);
exception has pt_regs at top of stack (at end of stack pages)
irq has previous sp at top of stack (at end of stack pages)
But I'm still not checking when switched to this format, and how
consolidate/coding those in crash.
--
OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>