New subject: increase __PHYSICAL_MASK_SHIFT_XEN?

Friday, 22 January 2021

Hi, Jiri
在 2021年01月22日 01:00, crash-utility-request(a)redhat.com 写道:
...
 Date: Thu, 21 Jan 2021 17:18:29 +0100
 From: Jiri Bohac <jbohac(a)suse.cz&gt;
 To: crash-utility(a)redhat.com
 Cc: anderson(a)redhat.com
 Subject: [Crash-utility] increase __PHYSICAL_MASK_SHIFT_XEN?
 Message-ID: <20210121161829.vizm27sb6pfrgmrl(a)dwarf.suse.cz&gt;
 Content-Type: text/plain; charset=us-ascii

 Hi,

 I've just come across a situation where crash failed to open a
 dump generated by a 4.12 XEN PV dom0 kernel, terminating with
 this message:

 crash: read error: physical address: ffffffffffffffff  type: "pud page"

 The problem is a failed machine-to-physical translation.
 xen_m2p() returns an error (-1UL) and x86_64_pud_offset() than
 uses that value as a physical address.

 I debugged the problem by running crash inside gdb. The backtrace was:
  #0  xen_m2p (machine=973135872) at kernel.c:9714
  #1  0x000000000053ed4a in x86_64_pud_offset (pgd_pte=3299508019303,
vaddr=18446744072642223552, verbose=0, IS_XEN=1)
      at x86_64.c:1889
  #2  0x0000000000540c8a in x86_64_kvtop_xen_wpt (tc=0x0, kvaddr=18446744072642223552,
paddr=0x7fffffffdb30, verbose=0)
      at x86_64.c:2523
  #3  0x00000000005407d0 in x86_64_kvtop (tc=0x0, kvaddr=18446744072642223552,
paddr=0x7fffffffdb30, verbose=0)
      at x86_64.c:2413
  #4  0x0000000000491a97 in kvtop (tc=0x0, kvaddr=18446744072642223552,
paddr=0x7fffffffdb30, verbose=0) at memory.c:3062
  #5  0x000000000048f3f0 in readmem (addr=18446744072642223552, memtype=1, buffer=0xba63a0
<shared_bufs>, size=832,
      type=0x92e1f2 "module struct", error_handle=6) at memory.c:2314
  #6  0x00000000005071e2 in module_init () at kernel.c:3699
  ....

 I tracked the problem to a wrong value of
 __PHYSICAL_MASK_SHIFT_XEN. The current value of 40 does not
 correspond to the current kernel value of 52 since kernel commit
 6f0e8bf16730a36ff6773802d8c8df56d10e60cd (xen: support 52 bit
 physical addresses in pv guests).

 The result is visible in the above backtrace:
 x86_64_pud_offset() is called with pgd_pte=0x3003a00e067 and that
 value is wrongly masked by "pud_paddr = pgd_pte &
 PHYSICAL_PAGE_MASK" and passed to xen_m2p() as 0x3a00e000 instead
 of 0x3003a00e000, causing the m2p translation to fail.

Thank you for sharing the details.

...
 Setting __PHYSICAL_MASK_SHIFT_XEN to 52 fixes the problem with
 this dump.

 But I am not confident it's a safe change. My understanding is
 that it should be safe, as all the unused bits of the physical
 address inside the PTEs should be 0 and thus having the mask wider
 than necessary should not hurt. But I am suspicious if my
 reasoning is correct. Why does crash go into such trouble
 differentiating between different kernels and sets
 machdep->machspec->physical_mask_shift dynamically to one of
 __PHYSICAL_MASK_SHIFT_XEN (40), __PHYSICAL_MASK_SHIFT_2_6 (46), 
The reason is that the supported kernel virtual address allows to
be configured on other architectures, and that still relies on the
hardware capability.

...
 and __PHYSICAL_MASK_SHIFT_5LEVEL (52)? Would something break if
 it were always set to 52? The commit adding the logic is
 307e7f35.
  It's time to update the physical mask for xen, but can you help to
test the address translation via some command such as vtop, ptov,
etc? Or follow up the 5-level page changes to check if it has the
capability(cr4 & CR4_LA57)?

Thanks.
Lianbo

...
 Thanks,

 -- Jiri Bohac <jbohac(a)suse.cz&gt; SUSE Labs, Prague, Czechia 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] increase __PHYSICAL_MASK_SHIFT_XEN?