On Fri, 22 Jan 2021 18:23:28 +0100
Jiri Bohac <jbohac(a)suse.cz> wrote:
On Fri, Jan 22, 2021 at 08:48:55AM +0100, Petr Tesarik wrote:
> IIUC the only reason for having a physical mask at all is that all page
> table bits beyond the current architectural limit are marked as
> reserved by Intel, so they could theoretically be used for any purpose
> in a future revision of the architecture. Intel has always recommended
> to initialize reserved bits to zero, but I think that crash utility
> developers are quite conservative and afraid of introducing regressions
> for cases that are known to work, so whenever they introduce a new
> feature (such as 5-level paging), they try to preserve the existing
> working cases with absolutely no change.
Well the bits above bit 51 need to be masked out. The AMD64
Architecture Programmer’s Manual (Volume 2: System Programming)
shows these as "Available", meaning "not interpreted by the
processor" and the OS can use them for whatever purpose. And bit
63 is NX. So we do need to have a physical mask only covering
bits 12-51 which is the physical address of the next level page
table.
This part is clear. BTW bits 59-62 may already be used as Protection
key if CR4.PKE=1.
Since the original implementation by AMD supported only 40-bit physical
addresses, I suspected that additional bits were underspecified in the
original manuals. However, I have just consulted my copy of the AMD64
Architecture Programmer's Manual from 2003 (oldest I could find), and
it clearly states:
If a processor implementation supports fewer than the full 52-bit
physical address, software must clear the unimplemented high-order
translation-table base-address bits to 0.
In short, you're right, relying on these bits being zero has always
been totally safe.
The question is why have masks shorter than 52 bits for systems
with physical addresses shorter than 52 bits. These, AFAICT need
to be 0 in the PTE and thus don't need to be masked out to obtain
the address.
These shorter values were taken from the Linux kernel's
__PHYSICAL_MASK_SHIFT macro. IIRC the early virtual memory layout did
not reserve enough virtual addresses in the direct mapping to allow
full 52-bit addressing, so the OS limit made some sense for the Linux
kernel, but it surely doesn't make sense for the crash utility.
BTW it took the kernel engineers also quite some time to realize that
they can always use 52 for the physical address mask (cf. kernel commit
b83ce5ee91471d19c403ff91227204fb37c95fb2).
Cheers,
Petr T