Hi Dave,
On 2/15/2018 12:38 PM, Dave Anderson wrote:
...
>>> Because of your questions about ppc64, possible
backwards-compatibility issues,
>>> or potential future changes to page.flags usage, this permanent change to
the
>>> is_page_ptr() function solely for the purposes of SLUB's count_partial()
function
>>> makes me nervous.
>>
>> Sorry, my explanation was not enough. Please let me supplement a little.
>> As far as I know, the count_partial() is the function which could receive the
>> effect of this patch most.  But is_page_ptr() could be called many times also
>> through the other functions, so this will improve them, too.  Moreover, the
>> is_page_ptr()'s loop could loop NR_MEM_SECTIONS() times, which is 33554432
>> on x86_64(5level) if I understand correctly, in the worst case.
>>
>> So I thought that ideally we should improve the is_page_ptr() itself if
possible,
>> rather than find the callers which could call it many times and change them.
>> Also, I am willing to drop the definition of VMEMMAP_VADDR for ppc64 this
>> time.
>
> OK, I understand your point.  But what concerns me is that the function's
> purpose is to absolutely identify whether the incoming page structure address
> is a correct page structure address.  But if an invalid address gets passed
> into is_page_ptr(), your patch would take the invalid address, calculate an
> invalid "nr", and continue from there, right? 
Yes, if an invalid "nr" is the number where section does not exist,
valid_section_nr() would return 0.  Even if it is the number where section
exists by accident, the invalid "addr" is not between mem_map and end_mem_map,
or not page-aligned, because if so, it is a page structure address.
Also without this patch, when an invalid address comes, the loop could tries
many invalid "nr"s less than NR_MEM_SECTIONS().
I hope this answers your concern..
 
 Another suggestion/question -- if is_page_ptr() is called with a NULL phys 
 argument (as is done most of the time),  could it skip the "if IS_SPARSEMEM()"
 section at the top, and still utilize the part at the bottom, where it walks
 through the vt->node_table[x] array?  I'm not sure about the "ppend"
calculation
 though -- even if there are holes in the node's address space, is it still a 
 contiguous chunk of page structure addresses per-node?  
I'm still investigating and not sure yet, but I think that SPASEMEM uses
mem_section instead of node_mem_map means page structures could be
non-contignuous per-node according to architecture or condition.
  typedef struct pglist_data {
  ...
  #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
          struct page *node_mem_map;
I'll continue to check it.
Thanks,
Kazuhito Hagio
 
>
> Dave
>
>>
>>>
>>> There is really no compelling reason that count_partial() absolutely
>>> *must* use
>>> is_page_ptr(), and so I'm thinking that perhaps you could come up with a
>>> less
>>> heavy-handed method for simply testing whether a page.lru entry points to
>>> another
>>> vmemmap'd page.  Something along the lines of adding this for
>>> vmemmap-enabled kernels:
>>>
>>>   #define IN_VMEMMAP_RANGE(page) ((page >= VMEMMAP_VADDR) && (page
<=
>>>   VMEMMAP_END))
>>>
>>> and then have count_partial() replace the is_page_ptr() call with another
>>> slub function that does something like this for vmemmap-enabled kernels:
>>>
>>>    (IN_VMMEMAP_RANGE(next) && accessible(next))
>>>
>>> Or instead of accessible(), it could read "next" as a list_head
with
>>> RETURN_ON_ERROR,
>>> and verify that next->prev points back to the current list_head.
>>>
>>> Non-vmemmap-enabled kernels could still use is_page_ptr().
>>>
>>> What do you think of doing something like that?
>>
>> Given possible compatibility issues you said, I think that the way you
>> suggested
>> might well be enough for now.  I'll try a method like the above.
>>
>> Thanks,
>> Kazuhito Hagio
>>
>>>
>>> Dave
>>>
>>>
>>>       
>>>
>>> ----- Original Message -----
>>>> Hi,
>>>>
>>>> The "kmem -[sS]" commands can take several minutes to complete
with
>>>> the following conditions:
>>>>   * The system has a lot of memory sections with CONFIG_SPARSEMEM.
>>>>   * The kernel uses SLUB and it has a very long partial slab list.
>>>>
>>>>   crash> kmem -s dentry | awk '{print strftime("%T"),
$0}'
>>>>   10:18:34 CACHE            NAME                 OBJSIZE  ALLOCATED
>>>>   TOTAL
>>>>   SLABS  SSIZE
>>>>   10:19:41 ffff88017fc78a00 dentry                   192    9038949
>>>>   10045728
>>>>   239184     8k
>>>>   crash> kmem -S dentry | bash -c 'cat >/dev/null ; echo
$SECONDS'
>>>>   334
>>>>
>>>> One of the causes is that is_page_ptr() in count_partial() checks if
>>>> a given slub page address is a page struct by searching all memory
>>>> sections linearly for the one which includes it.
>>>>
>>>>         nr_mem_sections = NR_MEM_SECTIONS();
>>>>         for (nr = 0; nr < nr_mem_sections ; nr++) {
>>>>                 if ((sec_addr = valid_section_nr(nr))) {
>>>>                         ...
>>>>
>>>> With CONFIG_SPARSEMEM{_VMEMMAP}, we can calculate the memory section
>>>> which includes a page struct with its page.flags, or its address and
>>>> VMEMMAP_VADDR. With this patch doing so, the computation amount can be
>>>> significantly reduced in that case.
>>>>
>>>>   crash> kmem -s dentry | awk '{print strftime("%T"),
$0}'
>>>>   10:34:55 CACHE            NAME                 OBJSIZE  ALLOCATED
>>>>   TOTAL
>>>>   SLABS  SSIZE
>>>>   10:34:55 ffff88017fc78a00 dentry                   192    9038949
>>>>   10045728
>>>>   239184     8k
>>>>   crash> kmem -S dentry | bash -c 'cat >/dev/null ; echo
$SECONDS'
>>>>   2
>>>>
>>>> This patch uses VMEMMAP_VADDR. It is not defined on PPC64, but it looks
>>>> like PPC64 supports VMEMMAP flag and
machdep->machspec->vmemmap_base is
>>>> it, so this patch also defines it for PPC64. This might need some help
>>>> from PPC folks.
>>>>
>>>> Signed-off-by: Kazuhito Hagio <k-hagio(a)ab.jp.nec.com>
>>>> ---
>>>>  defs.h   |  2 ++
>>>>  memory.c | 15 +++++++++++++++
>>>>  2 files changed, 17 insertions(+)
>>>>
>>>> diff --git a/defs.h b/defs.h
>>>> index aa17792..84e68ca 100644
>>>> --- a/defs.h
>>>> +++ b/defs.h
>>>> @@ -3861,6 +3861,8 @@ struct efi_memory_desc_t {
>>>>  #define IS_VMALLOC_ADDR(X) machdep->machspec->is_vmaddr(X)
>>>>  #define KERNELBASE      machdep->pageoffset
>>>>  
>>>> +#define VMEMMAP_VADDR   (machdep->machspec->vmemmap_base)
>>>> +
>>>>  #define PGDIR_SHIFT     (machdep->pageshift + (machdep->pageshift
-3) +
>>>>  (machdep->pageshift - 2))
>>>>  #define PMD_SHIFT       (machdep->pageshift + (machdep->pageshift
- 3))
>>>>  
>>>> diff --git a/memory.c b/memory.c
>>>> index 0df8ecc..0696763 100644
>>>> --- a/memory.c
>>>> +++ b/memory.c
>>>> @@ -13348,10 +13348,25 @@ is_page_ptr(ulong addr, physaddr_t *phys)
>>>>  	ulong nr_mem_sections;
>>>>  	ulong coded_mem_map, mem_map, end_mem_map;
>>>>  	physaddr_t section_paddr;
>>>> +#ifdef VMEMMAP_VADDR
>>>> +	ulong flags;
>>>> +#endif
>>>>  
>>>>  	if (IS_SPARSEMEM()) {
>>>>  		nr_mem_sections = NR_MEM_SECTIONS();
>>>> +#ifdef VMEMMAP_VADDR
>>>> +		nr = nr_mem_sections;
>>>> +		if (machdep->flags & VMEMMAP)
>>>> +			nr = pfn_to_section_nr((addr - VMEMMAP_VADDR) / SIZE(page));
>>>> +		else if (readmem(addr + OFFSET(page_flags), KVADDR, &flags,
>>>> +			sizeof(ulong), "page.flags", RETURN_ON_ERROR|QUIET))
>>>> +			nr = (flags >> (SIZE(page_flags)*8 - SECTIONS_SHIFT())
>>>> +				& ((1UL << SECTIONS_SHIFT()) - 1));
>>>> +
>>>> +		if (nr < nr_mem_sections) {
>>>> +#else
>>>>  	        for (nr = 0; nr < nr_mem_sections ; nr++) {
>>>> +#endif
>>>>  	                if ((sec_addr = valid_section_nr(nr))) {
>>>>  	                        coded_mem_map =
>>>>  	                        section_mem_map_addr(sec_addr);
>>>>  	                        mem_map = sparse_decode_mem_map(coded_mem_map,
>>>>  	                        nr);
>>>> --
>>>> 1.8.3.1
>>>>
>>>> --
>>>> Crash-utility mailing list
>>>> Crash-utility(a)redhat.com
>>>> 
https://www.redhat.com/mailman/listinfo/crash-utility
>>>>
>>>
>>> --
>>> Crash-utility mailing list
>>> Crash-utility(a)redhat.com
>>> 
https://www.redhat.com/mailman/listinfo/crash-utility
>>>
>>
>> --
>> Crash-utility mailing list
>> Crash-utility(a)redhat.com
>> 
https://www.redhat.com/mailman/listinfo/crash-utility
>>
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
> 
https://www.redhat.com/mailman/listinfo/crash-utility
>
 
 --
 Crash-utility mailing list
 Crash-utility(a)redhat.com
 
https://www.redhat.com/mailman/listinfo/crash-utility