Hi Daisuke,
On Oct 28, 2020, at 4:37 AM, d.hatayama@fujitsu.com<mailto:d.hatayama@fujitsu.com>
wrote:
/*
+ * Find virtual (VA) and physical (PA) addresses of kernel start
+ *
+ * va:
+ * Actual address of the kernel start (_stext) placed
+ * randomly by kaslr feature. To be more accurate,
+ * VA = _stext(from vmlinux) + kaslr_offset
+ *
+ * pa:
+ * Physical address where the kerenel is placed.
+ *
+ * In nokaslr case, VA = _stext (from vmlinux)
+ * In kaslr case, virtual address of the kernel placement goes
+ * in this range: ffffffff80000000..ffffffff9fffffff, or
+ * __START_KERNEL_map..+512MB
+ *
+ *
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ker...
+ *
+ * Randomized VA will be the first valid page starting from
+ * ffffffff80000000 (__START_KERNEL_map). Page tree entry of
+ * this page will contain the PA of the kernel start.
I didn't come up with this natural idea, which is better in that
IDTR is unnecessary.
+ *
+ * NOTES:
+ * 1. This method does not support PTI (Page Table Isolation)
+ * case where CR3 points to the isolated page tree.
calc_kaslr_offset() already deals with PTI here:
if (st->pti_init_vmlinux || st->kaiser_init_vmlinux)
pgd = cr3 & ~(CR3_PCID_MASK|PTI_USER_PGTABLE_MASK);
else
pgd = cr3 & ~CR3_PCID_MASK;
Thus it's OK to think that the CR3 points at the kernel counterpart.
Good point, thanks!
+ * 2. 4-level paging support only, as caller (calc_kaslr_offset)
+ * does not support 5-level paging.
According to the mm.txt the address range for kernel text appears
same in 5-level paging. What is the reason not to cover 5-level
paging in this patch? Is there something that cannot be assumed on
5-level paging?
There are no technical challenges. I just do not have Ice lake machine to test it.
Do you know how I can get dump/vmlinux with 5-level paging enabled?
Once 5-level paging support is done, this method can be used as default, as there are
no limitations.
+ */
+static int
+find_kernel_start(ulong *va, ulong *pa)
+{
+ int i, pgd_idx, pud_idx, pmd_idx, pte_idx;
+ uint64_t pgd_pte, pud_pte, pmd_pte, pte;
+
+ pgd_idx = pgd_index(__START_KERNEL_map);
+ pud_idx = pud_index(__START_KERNEL_map);
+ pmd_idx = pmd_index(__START_KERNEL_map);
+ pte_idx = pte_index(__START_KERNEL_map);
+
+ for (; pgd_idx < PTRS_PER_PGD; pgd_idx++) {
+ pgd_pte = ULONG(machdep->pgd + pgd_idx * sizeof(uint64_t));
machdep->pgd is not guaranteed to be aligned by PAGE_SIZE.
This could refer to the pgd for userland that resides in the next page.
I guess it's necessary to get the 1st pgd entry in the page machdep->pgd belongs
to.
Like this?
pgd_pte = ULONG((machdep->pgd & PHYSICAL_PAGE_MASK) + pgd_idx *
sizeof(uint64_t));
As I understand machdep->pgd is a buffer, cached value of some pgd table from the
dump.
machdep->pgd does not have to be aligned in the memory.
We just need to read at specific offset "pgd_idx * sizeof(uint64_t)” to get our
pgd_pte.
I think " & PHYSICAL_PAGE_MASK” is not needed here.
Let me know if I wrong.
But i’m going to introduce pgd prefetch from inside find_kernel_start() to do not depend
on
prefetch from the caller. So caller must provide top pgd physical address
@@ -350,7 +350,7 @@ quit:
* does not support 5-level paging.
*/
static int
-find_kernel_start(ulong *va, ulong *pa)
+find_kernel_start(uint64_t pgd, ulong *va, ulong *pa)
{
int i, pgd_idx, pud_idx, pmd_idx, pte_idx;
uint64_t pgd_pte, pud_pte, pmd_pte, pte;
@@ -361,6 +358,7 @@ find_kernel_start(ulong *va, ulong *pa)
pmd_idx = pmd_index(__START_KERNEL_map);
pte_idx = pte_index(__START_KERNEL_map);
+ FILL_PGD(pgd & PHYSICAL_PAGE_MASK, PHYSADDR, PAGESIZE());
for (; pgd_idx < PTRS_PER_PGD; pgd_idx++) {
pgd_pte = ULONG(machdep->pgd + pgd_idx * sizeof(uint64_t));
if (pgd_pte & _PAGE_PRESENT)
Thanks for review. Again will wait for 5-level paging dump/machine availability and send
the improved patch.
Do you want me to switch to this method as default (to use it before IDTR method)?
Regards,
—Alexey