Re: [Crash-utility] [PATCH] Speed up "kmem -[sS]" by optimizing is_page_ptr()

Tuesday, 20 February 2018

----- Original Message -----
...
 Hi Dave,

 On 2/20/2018 11:32 AM, Dave Anderson wrote:
 ...
 >>>>> Another suggestion/question -- if is_page_ptr() is called with a
NULL
 >>>>> phys
 >>>>> argument (as is done most of the time),  could it skip the "if
 >>>>> IS_SPARSEMEM()"
 >>>>> section at the top, and still utilize the part at the bottom, where
it
 >>>>> walks
 >>>>> through the vt->node_table[x] array?  I'm not sure about the
"ppend"
 >>>>> calculation
 >>>>> though -- even if there are holes in the node's address space,
is it
 >>>>> still
 >>>>> a
 >>>>> contiguous chunk of page structure addresses per-node?
 >>>>
 >>>> I'm still investigating and not sure yet, but I think that SPASEMEM
uses
 >>>> mem_section instead of node_mem_map means page structures could be
 >>>> non-contignuous per-node according to architecture or condition.
 >>>>
 >>>>   typedef struct pglist_data {
 >>>>   ...
 >>>>   #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
 >>>>           struct page *node_mem_map;
 >>>>
 >>>> I'll continue to check it.
 >>>
 >>> You are right, but in the case where pglist_data.node_mem_map does *not*
 >>> exist,
 >>> the crash utility initializes each vt->node_table[node].mem_map with the
 >>> node's
 >>> starting mem_map address by using the return value from phys_to_page() of
 >>> the
 >>> node's starting physical address -- which uses the sparsemem functions.
 >>>  
 >>> The question is whether the current "ppend" calculation is correct
for
 >>> the
 >>> last
 >>> physical page in a node.   If it is not correct, then perhaps an
 >>> "mem_map_end" value
 >>> can be added to the node_table structure, initialized by using
 >>> phys_to_page() to get
 >>> the page address of the last physical address in the node.  And then in
 >>> that case, the
 >>> question is whether the mem_map range of virtual addresses are contiguous
 >>> -- even if
 >>> there are holes in the mem_map virtual address range.
 >>
 >> "node_size" is set to pglist_data.node_spanned_pages, which includes
 >> holes.
 >> So I think that if VMEMMAP, which a page address is linear against its
 >> pfn,
 >> the current "ppend" calculation is correct for the last page in a
node.
 >> But if not VMEMMAP, since there is no guarantee of the linearity, the
 >> calculation could be incorrect.
 >>
 >> I found an example with RHEL5:
 >>
 >> crash> help -o
 >> ...
 >>                     size_table:
 >>                           page: 56
 >> ...
 >> crash> kmem -n
 >> NODE    SIZE      PGLIST_DATA       BOOTMEM_DATA       NODE_ZONES
 >>   0    524279   ffff810000014000  ffffffff804e1900  ffff810000014000
 >>                                                     ffff810000014b00
 >>                                                     ffff810000015600
 >>                                                     ffff810000016100
 >>     MEM_MAP       START_PADDR  START_MAPNR
 >> ffff8100007da000       0            0
 >>
 >> ZONE  NAME         SIZE       MEM_MAP      START_PADDR  START_MAPNR
 >>   0   DMA          4096  ffff8100007da000            0            0
 >>   1   DMA32      520183  ffff810000812000      1000000         4096
 >>   2   Normal          0                 0            0            0
 >>   3   HighMem         0                 0            0            0
 >>
 >> -------------------------------------------------------------------
 >>
 >> NR      SECTION        CODED_MEM_MAP        MEM_MAP       PFN
 >>  0  ffff810009000000  ffff8100007da000  ffff8100007da000  0
 >>  1  ffff810009000008  ffff8100007da000  ffff81000099a000  32768
 >>  2  ffff810009000010  ffff8100007da000  ffff810000b5a000  65536
 >>  3  ffff810009000018  ffff8100007da000  ffff810000d1a000  98304   <= there
 >>  is a
 >>  4  ffff810009000020  ffff810008901000  ffff810009001000  131072  <=
 >>  mem_map gap.
 >>  5  ffff810009000028  ffff810008901000  ffff8100091c1000  163840
 >>  :
 >> 14  ffff810009000070  ffff810008901000  ffff81000a181000  458752
 >> 15  ffff810009000078  ffff810008901000  ffff81000a341000  491520
 >> crash>
 >>
 >> In this case, the "ppend" will be
 >>
 >>   0xffff8100007da000 + (524279 * 56)
 >>   = 0xffff8100023d9e08
 >>
 >> but it looks like the actual value is around 0xffff81000a501000.
 > 
 > Right, I understand that the current "ppend" calculation wouldn't
work.
 > 
 >> And also, we can see the gap between NR=3 and 4.  This means that if the
 >> correct "mem_map_end" is added to the node_table structure, it would
be
 >> not enough to check whether an address is a page structure.
 > 
 > Why?  Wouldn't it still give us an ascending range of page structure
 > addresses
 > on a per-node basis?  (even if there was a physical and/or virtual memory
 > hole?)
 > AFAICT, for each section NR, the MEM_MAP and PFN values always increment.

 Sorry if I misunderstood something..
 First, I assume that we are talking about the case of kernels with SPARSEMEM
 and using the vm->numnodes loop after skipping the IS_SPARSEMEM() section.

 The "mem_map_end" I mean here is the page address of the last physical
 address in the node, and the example system has only one node.  So I think
 that the "kmem -n" output above suggests that it could return TRUE for an
 incoming "addr" between the end of NR=3 and the start of NR=4, but it's
 not a page address.

  NR                 MEM_MAP
   0 +---------+ ffff8100007da000 = nt->mem_map
   : | pages.. |        :
   2 +---------+ ffff810000b5a000
   3 +---------+ ffff810000d1a000
     +---------+ ffff810000eda000 = ffff810000d1a000 + (32768 * 56)
     |   ???   |            <-- for an "addr" here, it could returns TRUE.
   4 +---------+ ffff810009001000
   5 +---------+ ffff8100091c1000
   : | pages.. |        :
  15 +---------+ ffff81000a341000
     +---------+ ffff81000a501000 = nt->mem_map_end

 Because of such mem_map holes in a node, I don't think that the vm->numnodes
 loop could be utilized for kernels with SPARSEMEM as it is.
 Is this "mem_map_end" different from the one you assumed? 
No.  

I understand that a page address in the "???" section above would return
true (unless a "phys" argument was passed in).  Checking whether an incoming
address was between nt->mem_map and nt->mem_map_end would be slightly more
refined as compared to adding a new simple function that would check whether
the incoming address was between VMEMMAP_VADDR and VMEMMAP_END, which we
discussed earlier.  

So I'm suggesting that a vmemmap page address could be checked for validity by:

 (1) verifying that the incoming address is located in the vmemmap address range, and
 (2) it is accessible()

Dave

...

 Thanks,
 Kazuhito Hagio

 --
 Crash-utility mailing list
 Crash-utility(a)redhat.com
 https://www.redhat.com/mailman/listinfo/crash-utility

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] [PATCH] Speed up "kmem -[sS]" by optimizing is_page_ptr()