Hello lijiang,
On Wed, Sep 20, 2023 at 10:21:18AM +0800, lijiang wrote:
On Tue, Sep 19, 2023 at 2:23 PM Aditya Gupta
<adityag(a)linux.ibm.com> wrote:
> Hello lijiang,
>
> On Mon, Sep 18, 2023 at 07:34:04PM +0800, lijiang wrote:
> > Hi, Aditya
> > Thank you for the patch.
> >
> > ...
> >
> > Test kernel commit: ce9ecca0238b ("Linux 6.6-rc2")
> >
> > # ./crash /home/linux/vmlinux
>
> Thanks for testing it.
>
> This issue occurs only in case of Radix MMU.
>
> Overall, these are all the requirements:
> 1. Upstream linux (master branch) (your commit will also work,
> ce9ecca0238b)
> 2. 'CONFIG_PPC_BOOK3S_64' should be 'y' in kernel config (this
should be
> there
> in default configs)
>
# grep "CONFIG_PPC_BOOK3S_64" /home/linux/.config
CONFIG_PPC_BOOK3S_64=y
3. Check in dmesg of the crashed kernel, if it prints 'hash-mmu' or
> 'radix-mmu'. It should be 'radix-mmu'.
>
>
# dmesg|grep mmu
[ 0.000000] hash-mmu: Page sizes from device-tree:
[ 0.000000] hash-mmu: base_shift=12: shift=12, sllp=0x0000,
avpnm=0x00000000, tlbiel=1, penc=0
[ 0.000000] hash-mmu: base_shift=12: shift=16, sllp=0x0000,
avpnm=0x00000000, tlbiel=1, penc=7
[ 0.000000] hash-mmu: base_shift=12: shift=24, sllp=0x0000,
avpnm=0x00000000, tlbiel=1, penc=56
[ 0.000000] hash-mmu: base_shift=16: shift=16, sllp=0x0110,
avpnm=0x00000000, tlbiel=1, penc=1
[ 0.000000] hash-mmu: base_shift=16: shift=24, sllp=0x0110,
avpnm=0x00000000, tlbiel=1, penc=8
[ 0.000000] hash-mmu: base_shift=24: shift=24, sllp=0x0100,
avpnm=0x00000001, tlbiel=0, penc=0
[ 0.000000] hash-mmu: base_shift=34: shift=34, sllp=0x0120,
avpnm=0x000007ff, tlbiel=0, penc=3
[ 0.000000] hash-mmu: Initializing hash mmu with SLB
[ 0.000000] mmu_features = 0xfc006e01
[ 0.000000] hash-mmu: ppc64_pft_size = 0x1b
[ 0.000000] hash-mmu: htab_hash_mask = 0xfffff
This seems to using Hash MMU, hence the error doesn't come up.
Since vmemmap_list is NOT empty in case of Hash MMU, so crash works as expected.
Can you try it on a system with Radix MMU ? (Rainier/Denali systems might have
that by default)
The 'dmesg | grep mmu' you did is a good way to check if the system is using
'radix-mmu'.
> I guess, the system that was crashed might be using 'hash-mmu'.
>
> > also fails in absence of 'vmemmap_list' in upstream linux
>
> Yes, it will fail in Hash MMU case, as we depend on 'vmemmap_list' in that
> case,
> as the virtual to physical address mapping is not available in page table,
> in
> case of Hash-MMU.
>
> Only in radix MMU case, it will still work, even if 'vmemmap_list' is
> removed,
> since we have the mappings in kernel page table, which is used by this
> patch.
>
> Let me know if the issue still doesn't reproduce even after using a system
> with
> Radix MMU.
>
>
Yes, still not reproduce on my side. But, looks like we have the same
system with Radix MMU, it's strange.
Actually I meant the current MMU should be Radix MMU, according to the above
system logs, the system is using Hash MMU.
On a system with current MMU as Radix MMU, the error should occur. Since with
the below commit in upstream kernel:
368a0590d954 ("powerpc/book3s64/vmemmap: switch radix to use a different vmemmap
handling function")
the way address mapping was stored for vmemmap has changed, for Radix MMU.
In case of Radix MMU, now we have the vmemmap address mapping in kernel page
tables only. Hence 'vmemmap_list' is empty.
In case of Hash MMU, vmemmap address mapping is still stored in
'vmemmap_list', which crash uses, hence the error will not occur.
Also, due to this reason, if we crash a system which is using Hash MMU,
kernel still populates it in 'vmemmap_list' so we need that symbol.
While, in Radix MMU, even if the symbol is there or is missing, crash will
still work after this patch.
Thanks
- Aditya Gupta