On Thu, Sep 21, 2023 at 3:59 PM Aditya Gupta <adityag(a)linux.ibm.com> wrote:
On 21/09/23 11:21, lijiang wrote:
On Mon, Sep 18, 2023 at 7: 34 PM lijiang <lijiang@ redhat. com>
<lijiang(a)%E2%80%8Aredhat.%E2%80%8Acom> wrote: Hi, Aditya Thank you for
the patch. On Mon, Sep 11, 2023 at 8: 00 PM <crash-utility-request@
redhat. com> <crash-utility-request(a)%E2%80%8Aredhat.%E2%80%8Acom> wrote:
From: Aditya Gupta <adityag@ linux. ibm. com>
<adityag(a)%E2%80%8Alinux.%E2%80%8Aibm.%E2%80%8Acom> To:
On Mon, Sep 18, 2023 at 7:34 PM lijiang <lijiang(a)redhat.com> wrote:
> Hi, Aditya
> Thank you for the patch.
>
> On Mon, Sep 11, 2023 at 8:00 PM <crash-utility-request(a)redhat.com> wrote:
>
>> From: Aditya Gupta <adityag(a)linux.ibm.com>
>> To: crash-utility(a)redhat.com
>> Cc: Hari Bathini <hbathini(a)linux.ibm.com>, Mahesh J Salgaonkar
>> <mahesh(a)linux.ibm.com>, Sourabh Jain <sourabhjain(a)linux.ibm.com
>> >,
>> "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
>> Subject: [Crash-utility] [PATCH] ppc64: do page traversal if
>> vmemmap_list not populated
>> Message-ID: <20230911093032.419388-1-adityag(a)linux.ibm.com>
>> Content-Type: text/plain; charset="US-ASCII"; x-default=true
>>
>> Currently 'crash-tool' fails on vmcore collected on upstream kernel on
>> PowerPC64 with the error:
>>
>> crash: invalid kernel virtual address: 0 type: "first list entry
>>
>> Presently the address translation for vmemmap addresses is done using
>> the vmemmap_list. But with the below commit in Linux, vmemmap_list can
>> be empty, in case of Radix MMU on PowerPC64
>>
>> 368a0590d954: (powerpc/book3s64/vmemmap: switch radix to use a
>> different vmemmap handling function)
>>
>> In case vmemmap_list is empty, then it's head is NULL, which crash tries
>> to access and fails due to accessing NULL.
>>
>> Instead of depending on 'vmemmap_list' for address translation for
>> vmemmap addresses, do a kernel pagetable walk to get the physical
>> address associated with given virtual address
>>
>> Reviewed-by: Hari Bathini <hbathini(a)linux.ibm.com>
>> Signed-off-by: Aditya Gupta <adityag(a)linux.ibm.com>
>>
>> ---
>>
>> Testing
>> =======
>>
>> Git tree with patch applied:
>>
https://github.com/adi-g15-ibm/crash/tree/bugzilla-203296-list-v1
>>
>> This can be tested with '/proc/vmcore' as the vmcore, since makedumpfile
>
> also fails in absence of 'vmemmap_list' in upstream linux
>>
>> The fix for makedumpfile will also been sent to upstream
>>
>> ---
>> ---
>> ppc64.c | 47 +++++++++++++++++++++++++++++++++--------------
>> 1 file changed, 33 insertions(+), 14 deletions(-)
>>
>> diff --git a/ppc64.c b/ppc64.c
>> index fc34006f4863..1e84c5f56773 100644
>> --- a/ppc64.c
>> +++ b/ppc64.c
>> @@ -1280,8 +1280,30 @@ void ppc64_vmemmap_init(void)
>> long backing_size, virt_addr_offset, phys_offset, list_offset;
>> ulong *vmemmap_list;
>> char *vmemmap_buf;
>> - struct machine_specific *ms;
>> -
>> + struct machine_specific *ms = machdep->machspec;
>> +
>> + ld = &list_data;
>> + BZERO(ld, sizeof(struct list_data));
>> +
>> + /*
>> + * vmemmap_list is missing or set to 0 in the kernel would imply
>> + * vmemmap region is mapped in the kernel pagetable. So, read
>> vmemmap_list
>> + * anyway and use the translation method accordingly.
>> + */
>> + readmem(symbol_value("vmemmap_list"), KVADDR,
&ld->start,
>> sizeof(void *),
>> + "vmemmap_list", RETURN_ON_ERROR);
>> + if (!ld->start) {
>> + /*
>> + * vmemmap_list is set to 0 or missing. Do kernel
>> pagetable walk
>> + * for vmemmamp address translation.
>> + */
>> + ms->vmemmap_list = NULL;
>> + ms->vmemmap_cnt = 0;
>> +
>> + machdep->flags |= VMEMMAP_AWARE;
>> + return;
>> + }
>> +
>>
>
I would suggest putting the
'readmem(symbol_value("vmemmap_list"),...)'
after the 'kernel_symbol_exists("vmemmap_list")', and also check the
returned value of readmem().
Yes, I considered that, but the if condition checking for
'kernel_symbol_exists("vmemmap_list")' returns from the function if
the
symbol is not found, same with the readmem return value check earlier, we
wanted to initialise the ms->vmemmap_list and machdep->flags before that.
It's true.
For the current patch, if the symbol 'vmemmap_list' is not found, the
symbol_value() will invoke the error(FATAL, ...) and ultimately exit from
this function. It doesn't have any chance to execute the initialising code.
And, the initialization code relies on the result of ld->start, once the
readmem() fails due to some reasons, it may also enter the
initialization code. I'm not sure if that is expected behavior(if yes, the
"RETURN_ON_ERROR|QUIET " would be good).
If so, the kernel_symbol_exists("vmemmap_list") may be redundant. That can
be removed this time. What do you think?
The readmem used to be like this, which returns from function if readmem
failed:
- if (!readmem(symbol_value("vmemmap_list"),
- KVADDR, &ld->start, sizeof(void *), "vmemmap_list",
- RETURN_ON_ERROR))
- return;
So, intentionally I put it above the if and removed the return value
check, since we don't want to depend on vmemmap_list symbol being there for
newer kernels anymore.
And to be future proof in case the address mapping moves to page table for
Hash MMU case also, and since vmemmap_list is PowerPC specific, the symbol
might also be removed.
Ok, got it, thanks for the information.
But, if it's a coding guidelines requirement to handle the return
values,
I can think of how to handle the return value of readmem, any suggestions ?
No, it's not a coding guidelines requirement. :-)
And other changes are fine to me.
Thank you for your reviews Lianbo :)
No problem.
Thanks.
Lianbo
Thanks,
- Aditya Gupta