Re: [Crash-utility] [PATCH] Speed up "kmem -[sS]" by optimizing is_page_ptr()

Tuesday, 20 February 2018

Hi Dave,

On 2/20/2018 11:32 AM, Dave Anderson wrote:
...
...
>>>> Another suggestion/question -- if is_page_ptr() is
called with a NULL
>>>> phys
>>>> argument (as is done most of the time),  could it skip the "if
>>>> IS_SPARSEMEM()"
>>>> section at the top, and still utilize the part at the bottom, where it
>>>> walks
>>>> through the vt->node_table[x] array?  I'm not sure about the
"ppend"
>>>> calculation
>>>> though -- even if there are holes in the node's address space, is it
>>>> still
>>>> a
>>>> contiguous chunk of page structure addresses per-node?
>>>
>>> I'm still investigating and not sure yet, but I think that SPASEMEM uses
>>> mem_section instead of node_mem_map means page structures could be
>>> non-contignuous per-node according to architecture or condition.
>>>
>>>   typedef struct pglist_data {
>>>   ...
>>>   #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
>>>           struct page *node_mem_map;
>>>
>>> I'll continue to check it.
>>
>> You are right, but in the case where pglist_data.node_mem_map does *not*
>> exist,
>> the crash utility initializes each vt->node_table[node].mem_map with the
>> node's
>> starting mem_map address by using the return value from phys_to_page() of
>> the
>> node's starting physical address -- which uses the sparsemem functions.
>>  
>> The question is whether the current "ppend" calculation is correct for
the
>> last
>> physical page in a node.   If it is not correct, then perhaps an
>> "mem_map_end" value
>> can be added to the node_table structure, initialized by using
>> phys_to_page() to get
>> the page address of the last physical address in the node.  And then in
>> that case, the
>> question is whether the mem_map range of virtual addresses are contiguous
>> -- even if
>> there are holes in the mem_map virtual address range.
>
> "node_size" is set to pglist_data.node_spanned_pages, which includes
holes.
> So I think that if VMEMMAP, which a page address is linear against its pfn,
> the current "ppend" calculation is correct for the last page in a node.
> But if not VMEMMAP, since there is no guarantee of the linearity, the
> calculation could be incorrect.
>
> I found an example with RHEL5:
>
> crash> help -o
> ...
>                     size_table:
>                           page: 56
> ...
> crash> kmem -n
> NODE    SIZE      PGLIST_DATA       BOOTMEM_DATA       NODE_ZONES
>   0    524279   ffff810000014000  ffffffff804e1900  ffff810000014000
>                                                     ffff810000014b00
>                                                     ffff810000015600
>                                                     ffff810000016100
>     MEM_MAP       START_PADDR  START_MAPNR
> ffff8100007da000       0            0
>
> ZONE  NAME         SIZE       MEM_MAP      START_PADDR  START_MAPNR
>   0   DMA          4096  ffff8100007da000            0            0
>   1   DMA32      520183  ffff810000812000      1000000         4096
>   2   Normal          0                 0            0            0
>   3   HighMem         0                 0            0            0
>
> -------------------------------------------------------------------
>
> NR      SECTION        CODED_MEM_MAP        MEM_MAP       PFN
>  0  ffff810009000000  ffff8100007da000  ffff8100007da000  0
>  1  ffff810009000008  ffff8100007da000  ffff81000099a000  32768
>  2  ffff810009000010  ffff8100007da000  ffff810000b5a000  65536
>  3  ffff810009000018  ffff8100007da000  ffff810000d1a000  98304   <= there is a
>  4  ffff810009000020  ffff810008901000  ffff810009001000  131072  <= mem_map gap.
>  5  ffff810009000028  ffff810008901000  ffff8100091c1000  163840
>  :
> 14  ffff810009000070  ffff810008901000  ffff81000a181000  458752
> 15  ffff810009000078  ffff810008901000  ffff81000a341000  491520
> crash>
>
> In this case, the "ppend" will be
>
>   0xffff8100007da000 + (524279 * 56)
>   = 0xffff8100023d9e08
>
> but it looks like the actual value is around 0xffff81000a501000.

 Right, I understand that the current "ppend" calculation wouldn't work.

> And also, we can see the gap between NR=3 and 4.  This means that if the
> correct "mem_map_end" is added to the node_table structure, it would be
> not enough to check whether an address is a page structure.

 Why?  Wouldn't it still give us an ascending range of page structure addresses
 on a per-node basis?  (even if there was a physical and/or virtual memory hole?) 
 AFAICT, for each section NR, the MEM_MAP and PFN values always increment. 
Sorry if I misunderstood something..
First, I assume that we are talking about the case of kernels with SPARSEMEM
and using the vm->numnodes loop after skipping the IS_SPARSEMEM() section.

The "mem_map_end" I mean here is the page address of the last physical
address in the node, and the example system has only one node.  So I think
that the "kmem -n" output above suggests that it could return TRUE for an
incoming "addr" between the end of NR=3 and the start of NR=4, but it's
not a page address.

 NR                 MEM_MAP
  0 +---------+ ffff8100007da000 = nt->mem_map
  : | pages.. |        :
  2 +---------+ ffff810000b5a000
  3 +---------+ ffff810000d1a000
    +---------+ ffff810000eda000 = ffff810000d1a000 + (32768 * 56)
    |   ???   |            <-- for an "addr" here, it could returns TRUE.
  4 +---------+ ffff810009001000
  5 +---------+ ffff8100091c1000
  : | pages.. |        :
 15 +---------+ ffff81000a341000
    +---------+ ffff81000a501000 = nt->mem_map_end

Because of such mem_map holes in a node, I don't think that the vm->numnodes
loop could be utilized for kernels with SPARSEMEM as it is.
Is this "mem_map_end" different from the one you assumed?

Thanks,
Kazuhito Hagio

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] [PATCH] Speed up "kmem -[sS]" by optimizing is_page_ptr()