On Mon, Jun 12, 2023 at 9:03 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>
wrote:
Hi Lianbo,
sorry, my company was closed last Friday.
No worry, Kazu.
On 2023/06/08 20:46, lijiang wrote:
> On Thu, Jun 8, 2023 at 5:56 PM lijiang
<lijiang(a)redhat.com> wrote:
>
>> On Thu, Jun 8, 2023 at 5:02 PM HAGIO KAZUHITO(萩尾 一仁) <
k-hagio-ab(a)nec.com>
>> wrote:
>>
>>> On 2023/06/08 13:31, Lianbo Jiang wrote:
>>>> Crash tool will fail to load vmcore with the following error:
>>>>
>>>> #crash vmlinux /var/crash/127.0.0.1-2023-06-07-22\:03\:24/vmcore -s
>>>> crash: invalid structure size: note_buf
>>>> FILE: diskdump.c LINE: 121 FUNCTION: have_crash_notes()
>>>>
>>>> [./crash] error trace: 101859ac => 10291798 => 10291450 =>
10266038
>>>>
>>>> 10266038: SIZE_verify+156
>>>> 10291450: have_crash_notes+308
>>>> 10291798: map_cpus_to_prstatus_kdump_cmprs+448
>>>> 101859ac: task_init+11980
>>>>
>>>> The reason is that the note_buf is not intialized before using the
>>>> SIZE(note_buf) in the have_crash_notes(). Let's initialize the
variable
>>>> note_buf in the ppc64_init() to fix this issue.
>>>
>>> Thanks for the patch.
>>>
>>> Most of the architectures already have it, but some still do not have,
>>
>>
>> I don't have a 4 cpus s390x machine on hand, and am still trying to find
>> it.
>>
>
> I checked the crash code. For the s390x, it doesn't invoke the
> map_cpus_to_prstatus{_kdump_cmprs}(), so this patch should not affect
the
> s390x arch.
My thought is that the original commit db8c030857b4 should have added
the initialization of SIZE(note_buf) in task_init(), because there might
be some architectures that do not initialize it and it's better to
initialize it when it's used, e.g. like this:
--- a/task.c
+++ b/task.c
@@ -675,6 +675,8 @@ task_init(void)
tt->this_task = pid_to_task(active_pid);
}
else {
+ if (INVALID_SIZE(note_buf))
+ STRUCT_SIZE_INIT(note_buf, "note_buf_t");
if (KDUMP_DUMPFILE())
map_cpus_to_prstatus();
else if (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE())
Ok, fine to me. But, generally it should be good to be initialized in the
machdep_init() because the note_buf is strongly related to the
crash_notes, and the crash_notes contains machine specific information such
as cpu registers at the time of crash.
To be exact, maybe we should check also whether the size is valid:
- crash_notes_exists = kernel_symbol_exists("crash_notes");
+ crash_notes_exists = kernel_symbol_exists("crash_notes") &&
+ VALID_SIZE(note_buf);
but if crash_notes is defined, note_buf_t also should be defined. So I
think the above VALID_SIZE(note_buf) is not necessary.
Yes, as I mentioned above, they are closely related.
/* Per cpu memory for storing cpu states in case of system crash.
*/
note_buf_t __percpu *crash_notes;
What do you think?
Agree, I will post v2 later.
Thanks
Lianbo
Thanks,
Kazu
>
> Mikhail Zaslonko: Can you help to confirm this issue or do some tests?
> Thank you in advance.
>
> Lianbo
>
>
>>
>>>
>>> e.g. is s390x ok? It might be better to put it in task_init() because
>>
>>
>> It was initialized in the machdep_init(POST_GDB), see the following
code:
>> ...
>> read_in_kernel_config(IKCFG_INIT);
>> kernel_init();
>> machdep_init(POST_GDB); <---
>> vm_init();
>> machdep_init(POST_VM);
>> module_init();
>> help_init();
>> task_init();
>> vfs_init();
>> net_init();
>> dev_init();
>> machdep_init(POST_INIT);
>> ...
>> I would suggest putting it at the end of the kernel_init(), so that the
>> original initialization order will basically not be changed
>>
>>
>>> it's used there?
>>>
>>
>> Do you mean we need to move the existing initialization(all) to the
>> task_init()?
>>
>>>
>>> $ git grep 'STRUCT_SIZE_INIT(note_buf'
>>> arm.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>>> arm64.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>>> mips.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>>> mips64.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>>> ppc.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>>> riscv64.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>>> x86.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>>> x86_64.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>>> xen_hyper.c: XEN_HYPER_STRUCT_SIZE_INIT(note_buf_t,
"note_buf_t");
>>>
>>> Thanks,
>>> Kazu
>>>
>>>>
>>>> Fixes: db8c030857b4 ("diskdump/netdump: fix segmentation fault
caused
>>> by failure of stopping CPUs")
>>>> Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com>
>>>> ---
>>>> ppc64.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/ppc64.c b/ppc64.c
>>>> index b95a621d8fe4..ff7f0fca3a95 100644
>>>> --- a/ppc64.c
>>>> +++ b/ppc64.c
>>>> @@ -695,6 +695,7 @@ ppc64_init(int when)
>>>> }
>>>>
>>>> ppc64_init_paca_info();
>>>> + STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>>>>
>>>> if (!machdep->hz) {
>>>> machdep->hz = HZ;
>>
>>
>>
>> On Thu, Jun 8, 2023 at 5:56 PM lijiang <lijiang(a)redhat.com
>> <mailto:lijiang@redhat.com>> wrote:
>>
>> On Thu, Jun 8, 2023 at 5:02 PM HAGIO KAZUHITO(萩尾 一仁)
>> <k-hagio-ab(a)nec.com <mailto:k-hagio-ab@nec.com>> wrote:
>>
>> On 2023/06/08 13:31, Lianbo Jiang wrote:
>> > Crash tool will fail to load vmcore with the following error:
>> >
>> > #crash vmlinux
>> /var/crash/127.0.0.1-2023-06-07-22\:03\:24/vmcore -s
>> > crash: invalid structure size: note_buf
>> > FILE: diskdump.c LINE: 121 FUNCTION:
>> have_crash_notes()
>> >
>> > [./crash] error trace: 101859ac => 10291798 => 10291450
>> => 10266038
>> >
>> > 10266038: SIZE_verify+156
>> > 10291450: have_crash_notes+308
>> > 10291798: map_cpus_to_prstatus_kdump_cmprs+448
>> > 101859ac: task_init+11980
>> >
>> > The reason is that the note_buf is not intialized before
>> using the
>> > SIZE(note_buf) in the have_crash_notes(). Let's initialize
>> the variable
>> > note_buf in the ppc64_init() to fix this issue.
>>
>> Thanks for the patch.
>>
>> Most of the architectures already have it, but some still do
>> not have,
>>
>>
>> I don't have a 4 cpus s390x machine on hand, and am still trying
>> to find it.
>>
>>
>> I checked the crash code. For the s390x, it doesn't invoke the
>> map_cpus_to_prstatus{_kdump_cmprs}(), so this patch should not affect
>> the s390x arch.
>>
>> Mikhail Zaslonko: Can you help to confirm this issue or do some tests?
>> Thank you in advance.
>>
>> Lianbo
>>
>>
>> e.g. is s390x ok? It might be better to put it in task_init()
>> because
>>
>> It was initialized in the machdep_init(POST_GDB), see the
>> following code:
>> ...
>> read_in_kernel_config(IKCFG_INIT);
>> kernel_init();
>> machdep_init(POST_GDB); <---
>> vm_init();
>> machdep_init(POST_VM);
>> module_init();
>> help_init();
>> task_init();
>> vfs_init();
>> net_init();
>> dev_init();
>> machdep_init(POST_INIT);
>> ...
>> I would suggest putting it at the end of the kernel_init(), so
>> that the original initialization order will basically not be changed
>>
>>
>> it's used there?
>>
>> Do you mean we need to move the existing initialization(all) to
>> the task_init()?
>>
>>
>> $ git grep 'STRUCT_SIZE_INIT(note_buf'
>> arm.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>> arm64.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>> mips.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>> mips64.c: STRUCT_SIZE_INIT(note_buf,
"note_buf_t");
>> ppc.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>> riscv64.c: STRUCT_SIZE_INIT(note_buf,
"note_buf_t");
>> x86.c: STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>> x86_64.c: STRUCT_SIZE_INIT(note_buf,
"note_buf_t");
>> xen_hyper.c: XEN_HYPER_STRUCT_SIZE_INIT(note_buf_t,
>> "note_buf_t");
>>
>> Thanks,
>> Kazu
>>
>> >
>> > Fixes: db8c030857b4 ("diskdump/netdump: fix segmentation
>> fault caused by failure of stopping CPUs")
>> > Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com
>> <mailto:lijiang@redhat.com>>
>> > ---
>> > ppc64.c | 1 +
>> > 1 file changed, 1 insertion(+)
>> >
>> > diff --git a/ppc64.c b/ppc64.c
>> > index b95a621d8fe4..ff7f0fca3a95 100644
>> > --- a/ppc64.c
>> > +++ b/ppc64.c
>> > @@ -695,6 +695,7 @@ ppc64_init(int when)
>> > }
>> >
>> > ppc64_init_paca_info();
>> > + STRUCT_SIZE_INIT(note_buf, "note_buf_t");
>> >
>> > if (!machdep->hz) {
>> > machdep->hz = HZ;
>>