On Fri, Oct 7, 2022 at 9:57 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:

On 2022/10/06 9:42, lijiang wrote:
> Hi, Kazu
> Thank you for the comment.
> On Wed, Oct 5, 2022 at 5:11 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com>
> wrote:
>
>> Hi Lianbo,
>>
>> On 2022/10/05 11:22, Lianbo Jiang wrote:
>>> Currently crash will fail and then exit, if the initialization of
>>> the emergency stacks information fails. In real customer environments,
>>> sometimes, a vmcore may be partially damaged, although such vmcores
>>> are rare. For example:
>>>
>>>     # ./crash ../3.10.0-1127.18.2.el7.ppc64le/vmcore
>> ../3.10.0-1127.18.2.el7.ppc64le/vmlinux  -s
>>>     crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
>>>     #
>>>
>>> Lets try to keep loading vmcore if such issues happen, so call
>>> the readmem() with the RETURN_ON_ERROR instead of FAULT_ON_ERROR,
>>> which allows the crash move on.
>>
>> Just to confirm, can I have the error messages printed with the patch
>> for the vmcore?
>>
>
> Yes. Crash has the following error messages printed after applying this
> patch:
>
> # ./crash ../vmcore ../vmlinux -s
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: fc4  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 1000138  type: "paca->emergency_sp"
> crash: invalid kernel virtual address: 100000040  type: "paca->emergency_sp"
> ...

Thanks, I see.  These errors can be printed for the number of CPUs,
but better than nothing.

Looking at the users of ms->*emergency_sp, when the errors occur,

1. ppc64_in_emergency_stack() proceed with the initialized value 0,
and the following condition will happen to fail:


Good understanding, Kazu.
 
   if (addr >= base && addr < top) {

but I would like to add this to check it in advance:

--- a/ppc64.c
+++ b/ppc64.c
@@ -1947,7 +1947,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool verbose)
         if (cpu < 0  || cpu >= kt->cpus)
                 return NONE_STACK;

-       if (ms->emergency_sp) {
+       if (ms->emergency_sp && ms->emergency_sp[cpu]) {

Is it better to use the IS_KVADDR(ms->emergency_sp[cpu]) ? For example:

+       if (ms->emergency_sp && IS_KVADDR(ms->emergency_sp[cpu]) ) {

Thanks.
Lianbo

                 top = ms->emergency_sp[cpu];
                 base =  top - STACKSIZE();
                 if (addr >= base && addr < top) {
@@ -1957,7 +1957,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool verbose)
                 }
         }

-       if (ms->nmi_emergency_sp) {
+       if (ms->nmi_emergency_sp && ms->nmi_emergency_sp[cpu]) {
                 top = ms->nmi_emergency_sp[cpu];
                 base = top - STACKSIZE();
                 if (addr >= base && addr < top) {
@@ -1967,7 +1967,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool verbose)
                 }
         }

-       if (ms->mc_emergency_sp) {
+       if (ms->mc_emergency_sp && ms->mc_emergency_sp[cpu]) {
                 top = ms->mc_emergency_sp[cpu];
                 base =  top - STACKSIZE();
                 if (addr >= base && addr < top) {


2. ppc64_set_bt_emergency_stack() is called after ppc64_in_emergency_stack()
returning other than NONE_STACK, so it should have a valid value.  No need
to check it again.

So, with the additional change above,

Acked-by: Kazuhito Hagio <k-hagio-ab@nec.com>

Lianbo, if the above is ok for you, please apply with it.

Thanks,
Kazu