On Fri, Oct 7, 2022 at 9:57 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>
wrote:
On 2022/10/06 9:42, lijiang wrote:
> Hi, Kazu
> Thank you for the comment.
> On Wed, Oct 5, 2022 at 5:11 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com
>
> wrote:
>
>> Hi Lianbo,
>>
>> On 2022/10/05 11:22, Lianbo Jiang wrote:
>>> Currently crash will fail and then exit, if the initialization of
>>> the emergency stacks information fails. In real customer environments,
>>> sometimes, a vmcore may be partially damaged, although such vmcores
>>> are rare. For example:
>>>
>>> # ./crash ../3.10.0-1127.18.2.el7.ppc64le/vmcore
>> ../3.10.0-1127.18.2.el7.ppc64le/vmlinux -s
>>> crash: invalid kernel virtual address: 38 type:
"paca->emergency_sp"
>>> #
>>>
>>> Lets try to keep loading vmcore if such issues happen, so call
>>> the readmem() with the RETURN_ON_ERROR instead of FAULT_ON_ERROR,
>>> which allows the crash move on.
>>
>> Just to confirm, can I have the error messages printed with the patch
>> for the vmcore?
>>
>
> Yes. Crash has the following error messages printed after applying this
> patch:
>
> # ./crash ../vmcore ../vmlinux -s
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: fc4 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 1000138 type:
"paca->emergency_sp"
> crash: invalid kernel virtual address: 100000040 type:
"paca->emergency_sp"
> ...
Thanks, I see. These errors can be printed for the number of CPUs,
but better than nothing.
Looking at the users of ms->*emergency_sp, when the errors occur,
1. ppc64_in_emergency_stack() proceed with the initialized value 0,
and the following condition will happen to fail:
Good understanding, Kazu.
if (addr >= base && addr < top) {
but I would like to add this to check it in advance:
--- a/ppc64.c
+++ b/ppc64.c
@@ -1947,7 +1947,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool
verbose)
if (cpu < 0 || cpu >= kt->cpus)
return NONE_STACK;
- if (ms->emergency_sp) {
+ if (ms->emergency_sp && ms->emergency_sp[cpu]) {
Is it better to use the IS_KVADDR(ms->emergency_sp[cpu]) ? For example:
+ if (ms->emergency_sp && IS_KVADDR(ms->emergency_sp[cpu]) ) {
Thanks.
Lianbo
top = ms->emergency_sp[cpu];
base = top - STACKSIZE();
if (addr >= base && addr < top) {
@@ -1957,7 +1957,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool
verbose)
}
}
- if (ms->nmi_emergency_sp) {
+ if (ms->nmi_emergency_sp && ms->nmi_emergency_sp[cpu]) {
top = ms->nmi_emergency_sp[cpu];
base = top - STACKSIZE();
if (addr >= base && addr < top) {
@@ -1967,7 +1967,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool
verbose)
}
}
- if (ms->mc_emergency_sp) {
+ if (ms->mc_emergency_sp && ms->mc_emergency_sp[cpu]) {
top = ms->mc_emergency_sp[cpu];
base = top - STACKSIZE();
if (addr >= base && addr < top) {
2. ppc64_set_bt_emergency_stack() is called after
ppc64_in_emergency_stack()
returning other than NONE_STACK, so it should have a valid value. No need
to check it again.
So, with the additional change above,
Acked-by: Kazuhito Hagio <k-hagio-ab(a)nec.com>
Lianbo, if the above is ok for you, please apply with it.
Thanks,
Kazu