----- Original Message -----
Hi Dave,
On 2/15/2018 12:38 PM, Dave Anderson wrote:
...
>>>> Because of your questions about ppc64, possible backwards-compatibility
>>>> issues,
>>>> or potential future changes to page.flags usage, this permanent change
>>>> to the
>>>> is_page_ptr() function solely for the purposes of SLUB's
count_partial()
>>>> function
>>>> makes me nervous.
>>>
>>> Sorry, my explanation was not enough. Please let me supplement a little.
>>> As far as I know, the count_partial() is the function which could receive
>>> the
>>> effect of this patch most. But is_page_ptr() could be called many times
>>> also
>>> through the other functions, so this will improve them, too. Moreover,
>>> the
>>> is_page_ptr()'s loop could loop NR_MEM_SECTIONS() times, which is
>>> 33554432
>>> on x86_64(5level) if I understand correctly, in the worst case.
>>>
>>> So I thought that ideally we should improve the is_page_ptr() itself if
>>> possible,
>>> rather than find the callers which could call it many times and change
>>> them.
>>> Also, I am willing to drop the definition of VMEMMAP_VADDR for ppc64 this
>>> time.
>>
>> OK, I understand your point. But what concerns me is that the function's
>> purpose is to absolutely identify whether the incoming page structure
>> address
>> is a correct page structure address. But if an invalid address gets
>> passed
>> into is_page_ptr(), your patch would take the invalid address, calculate
>> an
>> invalid "nr", and continue from there, right?
Yes, if an invalid "nr" is the number where section does not exist,
valid_section_nr() would return 0. Even if it is the number where section
exists by accident, the invalid "addr" is not between mem_map and
end_mem_map,
or not page-aligned, because if so, it is a page structure address.
Also without this patch, when an invalid address comes, the loop could tries
many invalid "nr"s less than NR_MEM_SECTIONS().
I hope this answers your concern..
>
> Another suggestion/question -- if is_page_ptr() is called with a NULL phys
> argument (as is done most of the time), could it skip the "if
> IS_SPARSEMEM()"
> section at the top, and still utilize the part at the bottom, where it
> walks
> through the vt->node_table[x] array? I'm not sure about the
"ppend"
> calculation
> though -- even if there are holes in the node's address space, is it still
> a
> contiguous chunk of page structure addresses per-node?
I'm still investigating and not sure yet, but I think that SPASEMEM uses
mem_section instead of node_mem_map means page structures could be
non-contignuous per-node according to architecture or condition.
typedef struct pglist_data {
...
#ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
struct page *node_mem_map;
I'll continue to check it.
You are right, but in the case where pglist_data.node_mem_map does *not* exist,
the crash utility initializes each vt->node_table[node].mem_map with the node's
starting mem_map address by using the return value from phys_to_page() of the
node's starting physical address -- which uses the sparsemem functions.
The question is whether the current "ppend" calculation is correct for the last
physical page in a node. If it is not correct, then perhaps an "mem_map_end"
value
can be added to the node_table structure, initialized by using phys_to_page() to get
the page address of the last physical address in the node. And then in that case, the
question is whether the mem_map range of virtual addresses are contiguous -- even if
there are holes in the mem_map virtual address range.
Thanks,
Dave
Thanks,
Kazuhito Hagio
>
>>
>> Dave
>>
>>>
>>>>
>>>> There is really no compelling reason that count_partial() absolutely
>>>> *must* use
>>>> is_page_ptr(), and so I'm thinking that perhaps you could come up
with a
>>>> less
>>>> heavy-handed method for simply testing whether a page.lru entry points
>>>> to
>>>> another
>>>> vmemmap'd page. Something along the lines of adding this for
>>>> vmemmap-enabled kernels:
>>>>
>>>> #define IN_VMEMMAP_RANGE(page) ((page >= VMEMMAP_VADDR) &&
(page <=
>>>> VMEMMAP_END))
>>>>
>>>> and then have count_partial() replace the is_page_ptr() call with
>>>> another
>>>> slub function that does something like this for vmemmap-enabled
kernels:
>>>>
>>>> (IN_VMMEMAP_RANGE(next) && accessible(next))
>>>>
>>>> Or instead of accessible(), it could read "next" as a
list_head with
>>>> RETURN_ON_ERROR,
>>>> and verify that next->prev points back to the current list_head.
>>>>
>>>> Non-vmemmap-enabled kernels could still use is_page_ptr().
>>>>
>>>> What do you think of doing something like that?
>>>
>>> Given possible compatibility issues you said, I think that the way you
>>> suggested
>>> might well be enough for now. I'll try a method like the above.
>>>
>>> Thanks,
>>> Kazuhito Hagio
>>>
>>>>
>>>> Dave
>>>>
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> Hi,
>>>>>
>>>>> The "kmem -[sS]" commands can take several minutes to
complete with
>>>>> the following conditions:
>>>>> * The system has a lot of memory sections with CONFIG_SPARSEMEM.
>>>>> * The kernel uses SLUB and it has a very long partial slab list.
>>>>>
>>>>> crash> kmem -s dentry | awk '{print
strftime("%T"), $0}'
>>>>> 10:18:34 CACHE NAME OBJSIZE ALLOCATED
>>>>> TOTAL
>>>>> SLABS SSIZE
>>>>> 10:19:41 ffff88017fc78a00 dentry 192 9038949
>>>>> 10045728
>>>>> 239184 8k
>>>>> crash> kmem -S dentry | bash -c 'cat >/dev/null ; echo
$SECONDS'
>>>>> 334
>>>>>
>>>>> One of the causes is that is_page_ptr() in count_partial() checks
if
>>>>> a given slub page address is a page struct by searching all memory
>>>>> sections linearly for the one which includes it.
>>>>>
>>>>> nr_mem_sections = NR_MEM_SECTIONS();
>>>>> for (nr = 0; nr < nr_mem_sections ; nr++) {
>>>>> if ((sec_addr = valid_section_nr(nr))) {
>>>>> ...
>>>>>
>>>>> With CONFIG_SPARSEMEM{_VMEMMAP}, we can calculate the memory
section
>>>>> which includes a page struct with its page.flags, or its address
and
>>>>> VMEMMAP_VADDR. With this patch doing so, the computation amount can
be
>>>>> significantly reduced in that case.
>>>>>
>>>>> crash> kmem -s dentry | awk '{print
strftime("%T"), $0}'
>>>>> 10:34:55 CACHE NAME OBJSIZE ALLOCATED
>>>>> TOTAL
>>>>> SLABS SSIZE
>>>>> 10:34:55 ffff88017fc78a00 dentry 192 9038949
>>>>> 10045728
>>>>> 239184 8k
>>>>> crash> kmem -S dentry | bash -c 'cat >/dev/null ; echo
$SECONDS'
>>>>> 2
>>>>>
>>>>> This patch uses VMEMMAP_VADDR. It is not defined on PPC64, but it
looks
>>>>> like PPC64 supports VMEMMAP flag and
machdep->machspec->vmemmap_base is
>>>>> it, so this patch also defines it for PPC64. This might need some
help
>>>>> from PPC folks.
>>>>>
>>>>> Signed-off-by: Kazuhito Hagio <k-hagio(a)ab.jp.nec.com>
>>>>> ---
>>>>> defs.h | 2 ++
>>>>> memory.c | 15 +++++++++++++++
>>>>> 2 files changed, 17 insertions(+)
>>>>>
>>>>> diff --git a/defs.h b/defs.h
>>>>> index aa17792..84e68ca 100644
>>>>> --- a/defs.h
>>>>> +++ b/defs.h
>>>>> @@ -3861,6 +3861,8 @@ struct efi_memory_desc_t {
>>>>> #define IS_VMALLOC_ADDR(X) machdep->machspec->is_vmaddr(X)
>>>>> #define KERNELBASE machdep->pageoffset
>>>>>
>>>>> +#define VMEMMAP_VADDR (machdep->machspec->vmemmap_base)
>>>>> +
>>>>> #define PGDIR_SHIFT (machdep->pageshift +
(machdep->pageshift -3)
>>>>> +
>>>>> (machdep->pageshift - 2))
>>>>> #define PMD_SHIFT (machdep->pageshift +
(machdep->pageshift -
>>>>> 3))
>>>>>
>>>>> diff --git a/memory.c b/memory.c
>>>>> index 0df8ecc..0696763 100644
>>>>> --- a/memory.c
>>>>> +++ b/memory.c
>>>>> @@ -13348,10 +13348,25 @@ is_page_ptr(ulong addr, physaddr_t *phys)
>>>>> ulong nr_mem_sections;
>>>>> ulong coded_mem_map, mem_map, end_mem_map;
>>>>> physaddr_t section_paddr;
>>>>> +#ifdef VMEMMAP_VADDR
>>>>> + ulong flags;
>>>>> +#endif
>>>>>
>>>>> if (IS_SPARSEMEM()) {
>>>>> nr_mem_sections = NR_MEM_SECTIONS();
>>>>> +#ifdef VMEMMAP_VADDR
>>>>> + nr = nr_mem_sections;
>>>>> + if (machdep->flags & VMEMMAP)
>>>>> + nr = pfn_to_section_nr((addr - VMEMMAP_VADDR) / SIZE(page));
>>>>> + else if (readmem(addr + OFFSET(page_flags), KVADDR, &flags,
>>>>> + sizeof(ulong), "page.flags", RETURN_ON_ERROR|QUIET))
>>>>> + nr = (flags >> (SIZE(page_flags)*8 - SECTIONS_SHIFT())
>>>>> + & ((1UL << SECTIONS_SHIFT()) - 1));
>>>>> +
>>>>> + if (nr < nr_mem_sections) {
>>>>> +#else
>>>>> for (nr = 0; nr < nr_mem_sections ; nr++) {
>>>>> +#endif
>>>>> if ((sec_addr = valid_section_nr(nr))) {
>>>>> coded_mem_map =
>>>>> section_mem_map_addr(sec_addr);
>>>>> mem_map =
>>>>> sparse_decode_mem_map(coded_mem_map,
>>>>> nr);
>>>>> --
>>>>> 1.8.3.1
>>>>>
>>>>> --
>>>>> Crash-utility mailing list
>>>>> Crash-utility(a)redhat.com
>>>>>
https://www.redhat.com/mailman/listinfo/crash-utility
>>>>>
>>>>
>>>> --
>>>> Crash-utility mailing list
>>>> Crash-utility(a)redhat.com
>>>>
https://www.redhat.com/mailman/listinfo/crash-utility
>>>>
>>>
>>> --
>>> Crash-utility mailing list
>>> Crash-utility(a)redhat.com
>>>
https://www.redhat.com/mailman/listinfo/crash-utility
>>>
>>
>> --
>> Crash-utility mailing list
>> Crash-utility(a)redhat.com
>>
https://www.redhat.com/mailman/listinfo/crash-utility
>>
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
>
https://www.redhat.com/mailman/listinfo/crash-utility
>
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility