在 2023/12/1 22:34, Shijie Huang 写道:
在 2023/12/1 16:08, HAGIO KAZUHITO(萩尾 一仁) 写道:
> On 2023/11/30 17:26, Shijie Huang wrote:
>> Hi Kazu,
>>
>> 在 2023/11/30 15:29, HAGIO KAZUHITO(萩尾 一仁) 写道:
>>> On 2023/11/30 16:21, HAGIO KAZUHITO(萩尾 一仁) wrote:
>>>> thanks for the patch.
>>>>
>>>> On 2023/11/28 12:41, Huang Shijie wrote:
>>>>> If the kernel exports the vmmemap then we can use that symbol in
>>>>> crash to optimize access. vmmemap is just an array of page structs
>>>>> after all.
>>>>>
>>>>> This patch tries to get the "vmemmap" from the vmcore
file.
>>>>> If we can use the "vmemmap", use the vmemmap_is_page_ptr to
replace
>>>>> the generic_is_page_ptr().
>>>> I recalled that I also did a similar thing for x86_64 [1] in the
>>>> past..
>>>> so it's not so slow on x86_64 and it has valid section check too.
>> Maybe your vmlinux is not so big as mine(441M)...
> It's not so small :)
>
> crash> files | grep -m1 vmlinux
> 5 ffff97c21fa2cf00 ffff97c2400e5200 ffff97c0f5cf5ec8 REG
> /usr/lib/debug/usr/lib/modules/4.18.0-305.el8.x86_64/vmlinux
> crash> !ls -lh
> /usr/lib/debug/usr/lib/modules/4.18.0-305.el8.x86_64/vmlinux
> -rwxr-xr-x. 3 root root 826M Apr 29 2021
> /usr/lib/debug/usr/lib/modules/4.18.0-305.el8.x86_64/vmlinux
> crash> files -n ffff97c0f5cf5ec8 | ts.py
> 16:42:30.902 START
> 16:42:30.902 INODE NRPAGES
> 16:42:30.902 ffff97c0f5cf5ec8 211283
> 16:42:30.902
> 16:42:31.106 NODE PAGES
> 16:42:31.106 0 211283
> 16:42:31.106 END (ELAPS 0:00:00.204)
>
> (This crash has your "files -n" v2 patch and is_page_ptr() check at the
> beginning of summary_inode_page)
>
>>>> [1]
>>>>
https://github.com/crash-utility/crash/commit/4141373d9de3fea29f5d2b58f60...
>>>>
>>>>
>>>> I don't object to add an entry to vmcoreinfo as a way of getting the
>>>> vmemmap value. Seems like arm64 has another value from
>>>> VMEMMAP_START.
>>>>
>>>> but there are some concerns in crash:
>>>>
>>>> - machdep->flags already has VMEMMAP flag. It's used like
"if
>>>> (IS_SPARSEMEM() && (machdep->flags & VMEMMAP)" e.g.
[1]. Adding
>>>> the new
>>>> flag is very confusing and inconsistent.
>>>>
>>>> - Do all architectures have VMEMMAP_VADDR/END values in crash with
>>>> CONFIG_SPARSEMEM_VMEMMAP configured?
>> yes. your concern are right. I found some arch does not support the
>> VMEMMAP_VADDR/END.
>>
>>
>> So it is better to add this into ARM64 first. (I use the arm64 machine
>> all the time..)
>>
>>
>>>> we cannot test all archs, so this kind of arch-independent change
>>>> makes
>>>> me nervous. I would like to suggest an arch-specific use first.
>>>> e.g.
>>>> how about making x86_64_is_page_ptr more generic and use it?
>>>>
>> I will add arm64 version code. I am not familar with x86, and I do not
>> have x86 machine for the testing.
>>
>> so it's better to not change it.
> OK, I see.
>
>>
>>>> What happens if you use it on arm64? maybe the valid section can
>>>> check
>>>> it with VMEMMAP_VADDR without the vmemmap value. Just an idea
>>>> though.
>>> hmm sorry, probably this does not work, I missed this.
>>>
>>> + pfn = (addr - vt->vmemmap) / size;
>> The code is converted by the linux code:
>>
>>
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/i...
>>
>>
>> #define __page_to_pfn(page) (unsigned long)((page) - vmemmap)
>>
>>
>> So it's correct.
>>
>>
>>> but can VMEMMAP_VADDR and phys_offset can be used to get vmemmap value
>>> on arm64?
>> On arm64, I can get vmemmap by parsing "memstart_addr".
>>
>> But I cannot know if CONFIG_SPARSEMEM_VMEMMAP is enabled.
> on x86_64, vmemmap_populate is used:
>
> if (symbol_exists("vmemmap_populate"))
> machdep->flags |= VMEMMAP;
>
> on arm64, it's set unconditionally, is it ok?
>
> arm64_init(int when)
> {
> ...
> case PRE_GDB:
> ...
> machdep->stacksize = ARM64_STACK_SIZE;
> machdep->flags |= VMEMMAP;
>
>>
>> So if the kernel can export "vmemmap", everything become easy.
>>
>> All the archs supporting the VMEMMAP can benifit from it, such as
>> arm64,
>> ppc,loongarch,riscv..
> Yes but we're not sure whether the users of the other archs have the
> same issue (e.g. x86_64 is fast enough) and it's hard to test all archs.
I suddenly realised that the reason why x86_64 is fast is because the
x86 has already implement
the vmemmap (or part of the vmemmap).
I read the x86 code, and find this:
https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/pgtabl...
#define vmemmap ((struct page *)VMEMMAP_START)
In crash code, the x86_64_is_page_ptr():
--
1812:x86_64_is_page_ptr(ulong addr, physaddr_t *phys)
1813-{
1814- ulong pfn, nr;
1815-
1816- if (IS_SPARSEMEM() && (machdep->flags & VMEMMAP) &&
1817- (addr >= VMEMMAP_VADDR && addr <= VMEMMAP_END) &&
1818- !((addr - VMEMMAP_VADDR) % SIZE(page))) {
1819-
1820- pfn = (addr - VMEMMAP_VADDR) / SIZE(page);
1821- nr = pfn_to_section_nr(pfn);
1822- if (valid_section_nr(nr)) {
1823- if (phys)
1824- *phys = PTOB(pfn);
1825- return TRUE;
1826- }
1827- }
1828- return FALSE;
1829-}
In x86, the VMEMMAP_VADDR is equal to "vmemmap". Since the current
kernel does not export "vmemmap",
x86 cannot use the vmemmap too, so it uses the VMEMMAP_VADDR in the code.
In logic, the x86_64_is_page_ptr() is wrong, it gets the correct result
by accident.
If the "vmemmap" is exported by kernel, the x86_64_is_page_ptr() and
arm64_vmemmap_is_page_ptr() can merge into
one function, and rename it to "vmemmap_is_page_ptr()", and move it to a
common place, such as memory.c.
Thanks
Huang Shijie