On Tue, Sep 19, 2023 at 2:23 PM Aditya Gupta <adityag@linux.ibm.com> wrote:

Hello lijiang,

On Mon, Sep 18, 2023 at 07:34:04PM +0800, lijiang wrote:
> Hi, Aditya
> Thank you for the patch.
>
> On Mon, Sep 11, 2023 at 8:00 PM <crash-utility-request@redhat.com> wrote:
>
> > ...
> >
> > Currently 'crash-tool' fails on vmcore collected on upstream kernel on
> > PowerPC64 with the error:
> >
> > crash: invalid kernel virtual address: 0 type: "first list entry
> >
> > Presently the address translation for vmemmap addresses is done using
> > the vmemmap_list. But with the below commit in Linux, vmemmap_list can
> > be empty, in case of Radix MMU on PowerPC64
> >
> > 368a0590d954: (powerpc/book3s64/vmemmap: switch radix to use a
> > different vmemmap handling function)
> >
> > In case vmemmap_list is empty, then it's head is NULL, which crash tries
> > to access and fails due to accessing NULL.
> >
> > Instead of depending on 'vmemmap_list' for address translation for
> > vmemmap addresses, do a kernel pagetable walk to get the physical
> > address associated with given virtual address
> >
> > Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
> > Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> >
> > ---
> >
> > Testing
> > =======
> >
> > Git tree with patch applied:
> > https://github.com/adi-g15-ibm/crash/tree/bugzilla-203296-list-v1
> >
> > This can be tested with '/proc/vmcore' as the vmcore, since makedumpfile
> >
>
> Can you help to describe in detail how to reproduce this issue? Or does
> this require any kernel configs to be enabled first? I did not reproduce
> the current issue with '/proc/kcore' or vmcore(via cp).
>
> Test kernel commit: ce9ecca0238b ("Linux 6.6-rc2")
>
> # ./crash /home/linux/vmlinux

Thanks for testing it.

This issue occurs only in case of Radix MMU.

Overall, these are all the requirements:
1. Upstream linux (master branch) (your commit will also work, ce9ecca0238b)
2. 'CONFIG_PPC_BOOK3S_64' should be 'y' in kernel config (this should be there
in default configs)

# grep "CONFIG_PPC_BOOK3S_64" /home/linux/.config

CONFIG_PPC_BOOK3S_64=y

3. Check in dmesg of the crashed kernel, if it prints 'hash-mmu' or
'radix-mmu'. It should be 'radix-mmu'.

# dmesg|grep mmu
[ 0.000000] hash-mmu: Page sizes from device-tree:
[ 0.000000] hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
[ 0.000000] hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
[ 0.000000] hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
[ 0.000000] hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
[ 0.000000] hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
[ 0.000000] hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
[ 0.000000] hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
[ 0.000000] hash-mmu: Initializing hash mmu with SLB
[ 0.000000] mmu_features = 0xfc006e01
[ 0.000000] hash-mmu: ppc64_pft_size = 0x1b
[ 0.000000] hash-mmu: htab_hash_mask = 0xfffff

I guess, the system that was crashed might be using 'hash-mmu'.

> also fails in absence of 'vmemmap_list' in upstream linux

Yes, it will fail in Hash MMU case, as we depend on 'vmemmap_list' in that case,
as the virtual to physical address mapping is not available in page table, in
case of Hash-MMU.

Only in radix MMU case, it will still work, even if 'vmemmap_list' is removed,
since we have the mappings in kernel page table, which is used by this patch.

Let me know if the issue still doesn't reproduce even after using a system with
Radix MMU.

Yes, still not reproduce on my side. But, looks like we have the same system with Radix MMU, it's strange.

Thanks.

Lianbo

Thanks,
- Aditya Gupta