On 2022/10/06 9:42, lijiang wrote:
Hi, Kazu
Thank you for the comment.
On Wed, Oct 5, 2022 at 5:11 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>
wrote:
> Hi Lianbo,
>
> On 2022/10/05 11:22, Lianbo Jiang wrote:
>> Currently crash will fail and then exit, if the initialization of
>> the emergency stacks information fails. In real customer environments,
>> sometimes, a vmcore may be partially damaged, although such vmcores
>> are rare. For example:
>>
>> # ./crash ../3.10.0-1127.18.2.el7.ppc64le/vmcore
> ../3.10.0-1127.18.2.el7.ppc64le/vmlinux -s
>> crash: invalid kernel virtual address: 38 type:
"paca->emergency_sp"
>> #
>>
>> Lets try to keep loading vmcore if such issues happen, so call
>> the readmem() with the RETURN_ON_ERROR instead of FAULT_ON_ERROR,
>> which allows the crash move on.
>
> Just to confirm, can I have the error messages printed with the patch
> for the vmcore?
>
Yes. Crash has the following error messages printed after applying this
patch:
# ./crash ../vmcore ../vmlinux -s
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: fc4 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 38 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 1000138 type: "paca->emergency_sp"
crash: invalid kernel virtual address: 100000040 type:
"paca->emergency_sp"
...
Thanks, I see. These errors can be printed for the number of CPUs,
but better than nothing.
Looking at the users of ms->*emergency_sp, when the errors occur,
1. ppc64_in_emergency_stack() proceed with the initialized value 0,
and the following condition will happen to fail:
if (addr >= base && addr < top) {
but I would like to add this to check it in advance:
--- a/ppc64.c
+++ b/ppc64.c
@@ -1947,7 +1947,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool verbose)
if (cpu < 0 || cpu >= kt->cpus)
return NONE_STACK;
- if (ms->emergency_sp) {
+ if (ms->emergency_sp && ms->emergency_sp[cpu]) {
top = ms->emergency_sp[cpu];
base = top - STACKSIZE();
if (addr >= base && addr < top) {
@@ -1957,7 +1957,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool verbose)
}
}
- if (ms->nmi_emergency_sp) {
+ if (ms->nmi_emergency_sp && ms->nmi_emergency_sp[cpu]) {
top = ms->nmi_emergency_sp[cpu];
base = top - STACKSIZE();
if (addr >= base && addr < top) {
@@ -1967,7 +1967,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool verbose)
}
}
- if (ms->mc_emergency_sp) {
+ if (ms->mc_emergency_sp && ms->mc_emergency_sp[cpu]) {
top = ms->mc_emergency_sp[cpu];
base = top - STACKSIZE();
if (addr >= base && addr < top) {
2. ppc64_set_bt_emergency_stack() is called after ppc64_in_emergency_stack()
returning other than NONE_STACK, so it should have a valid value. No need
to check it again.
So, with the additional change above,
Acked-by: Kazuhito Hagio <k-hagio-ab(a)nec.com>
Lianbo, if the above is ok for you, please apply with it.
Thanks,
Kazu