Hello Tao Liu,
Thanks for your comments, I'll double check my patch and resend the patch later today
(fixing also make warn).
With respect to the performance issue, it is more a scaling issue.
For 32GB, dump_mem_map() walks through 8,388,608 struct page, taking 3 seconds
For 64GB, dump_mem_map() walks through 16,777,216 struct page, taking 6 seconds
For 128GB, dump_mem_map() walks through 33,554,432 struct page, taking 12 seconds
For 256GB, dump_mem_map() walks through 67,108,864 struct page, taking 24 seconds
...
For 32TB , dump_mem_map() walks through 8,589,934,592 struct page, taking 50 minutes
Taking a few seconds is acceptable, and it was not too noticeable for small systems,
but for 32TB it does not scale at all, and our highest-end system goes up to 64TB ...
The purpose of "kmem -i" is to provide general memory information (mainly
physical ram, free memory, and slabs)
as to get an initial assessment on memory usage. Support engineers doing initial crash
assessments are typically
spending a few minutes to get the big picture (crash backtrace, memory usage, and so on),
and waiting 50 minutes
to get "kmem -i" is not acceptable.
To be honest, SHARED does not tell anything useful, it tells just that we have some
physical pages with a reference
count greater than 1. Such shared physical pages could be used by shared virtual objects
(eg. shared map files, sysV shm),
but not all shared virtual objects do have physical pages with a refcount greater than 1
(eg. single mmap, or single
shm attach), and not all shared virtual objects are fully populated with physical pages,
and shared virtual objects having
multiple attach/mmap users may not be used through the same physical ranges, and so on,
hence it looks like
we can't really make any correlation between shared virtual objects and SHARED
physical pages. So, I would be in favor
of just dropping SHARED, at least, we should try to skip paying the SHARED mem map walk
high price in the "kmem -i" path.
Thanks,
Georges
|-----Original Message-----
|From: Tao Liu <ltao(a)redhat.com>
|Sent: Thursday, December 12, 2024 3:42 AM
|To: Aureau, Georges (Kernel Tools ERT) <georges.aureau(a)hpe.com>
|Cc: devel(a)lists.crash-utility.osci.io
|Subject: Re: [Crash-utility] [PATCH] "kmem -i" extremely slow on dumps from
|large memory systems
|
|Hi Aureau,
|
|Thanks for your improvements.
|
|Please check your patch, I guess there are formatting issues, so I cannot apply
|it directly by command, but only copy & paste code hunk manually...
|
|On Mon, Dec 9, 2024 at 7:50 PM Aureau, Georges (Kernel Tools ERT)
|<georges.aureau(a)hpe.com> wrote:
|>
|> The “kmem -i” command is extremely slow (appears to hang) on dumps
|from large memory systems.
|>
|>
|>
|> For example, on 120GB crash dump from a high-end server with 32TB of
|> RAM (ie. 8Giga Pages),
|>
|> the “kmem -i” command is taking over 50 minutes to execute on a DL380
|> Gen10. To report basic
|>
|> general memory usage figures, we should only be reading global
|> counters, without having to walk
|>
|> the full flat/sparse mem map page table. Hence, dump_kmeminfo() should
|> first be reading globals,
|>
|> and then only call dump_mem_map() if important information (ie. slabs or
|total ram) is missing.
|>
|
|After applying your patch, I see the "SHARED 3942872 15 GB
|23% of TOTAL MEM" line is missing of my "kmem -i" outputs. Though the
test
|case I use doesn't have the performance issue as you mentioned.
|So my question is, is it necessary for the output change for the non-
|performance influenced vmcores?
|
|In addition, I see your approach for improving the performance is to bypass
|the dump_mem_map() calling. Can we get dump_mem_map() optimized
|instead, so we can still call dump_mem_map() and leave the output
|unchanged? E.g. if the performance bottleneck is due to too much memory
|pages reading, can we skip some of unnecessary page reading, or increase
|memory cache to decrease memory page decompressing or else?
|Frankly you didn't point where exactly the performance bottleneck is in your
|patch.
|
|Thanks,
|Tao Liu
|
|>
|>
|> Signed-off-by: Georges Aureau <georges.aureau(a)hpe.com>
|>
|> --
|>
|> diff --git a/memory.c b/memory.c
|>
|> index 8c01ed0..8e2b2e9 100644
|>
|> --- a/memory.c
|>
|> +++ b/memory.c
|>
|> @@ -8546,18 +8546,19 @@ dump_kmeminfo(void)
|>
|> ulong get_buffers;
|>
|> ulong get_slabs;
|>
|> char buf[BUFSIZE];
|>
|> + ulong flags;
|>
|> -
|>
|> - BZERO(&meminfo, sizeof(struct meminfo));
|>
|> - meminfo.flags = GET_ALL;
|>
|> - dump_mem_map(&meminfo);
|>
|> - get_totalram = meminfo.get_totalram;
|>
|> - shared_pages = meminfo.get_shared;
|>
|> - get_buffers = meminfo.get_buffers;
|>
|> - get_slabs = meminfo.get_slabs;
|>
|> + /*
|>
|> + * By default, we will no longer call dump_mem_map() as
|> + this is too
|>
|> + * slow for large memory systems. If we have to call it
|> + (eg. missing
|>
|> + * important information such as slabs or total ram), we
|> + will also
|>
|> + * collect shared pages. Otherwise, we won't print shared pages.
|>
|> + */
|>
|> + flags = GET_SHARED_PAGES;
|>
|> + shared_pages = 0;
|>
|> /*
|>
|> - * If vm_stat array exists, override page search info.
|>
|> + * If vm_stat array does not exists, then set mem map flag.
|>
|> */
|>
|> if (vm_stat_init()) {
|>
|> if (dump_vm_stat("NR_SLAB", &nr_slab,
0))
|>
|> @@ -8571,10 +8572,11 @@ dump_kmeminfo(void)
|>
|> get_slabs = nr_slab;
|>
|> if
|> (dump_vm_stat("NR_SLAB_UNRECLAIMABLE_B", &nr_slab, 0))
|>
|> get_slabs +=
|> nr_slab;
|>
|> + } else {
|>
|> + flags |= GET_SLAB_PAGES;
|>
|> }
|>
|> }
|>
|> - fprintf(fp, "%s", kmeminfo_hdr);
|>
|> /*
|>
|> * Get total RAM based upon how the various versions of
|> si_meminfo()
|>
|> * have done it, latest to earliest:
|>
|> @@ -8586,9 +8588,32 @@ dump_kmeminfo(void)
|>
|> symbol_exists("_totalram_pages")) {
|>
|> totalram_pages = vt->totalram_pages ?
|>
|> vt->totalram_pages :
|> get_totalram;
|>
|> - } else
|>
|> - totalram_pages = get_totalram;
|>
|> + } else {
|>
|> + flags |= GET_TOTALRAM_PAGES;
|>
|> + }
|>
|> + /*
|>
|> + * If we want more than just shared pages (missing slabs
|> + or total ram)
|>
|> + * then call dump_mem_map() and also collect buffers.
|>
|> + */
|>
|> + if (flags != GET_SHARED_PAGES) {
|>
|> + BZERO(&meminfo, sizeof(struct meminfo));
|>
|> + flags |= GET_BUFFERS_PAGES;
|>
|> + meminfo.flags = flags;
|>
|> + dump_mem_map(&meminfo);
|>
|> + /* Update the missing information */
|>
|> + if (flags & GET_SLAB_PAGES) {
|>
|> + get_slabs =
|> + meminfo.get_slabs;
|>
|> + }
|>
|> + if (flags & GET_TOTALRAM_PAGES) {
|>
|> + get_totalram =
|> + meminfo.get_totalram;
|>
|> + totalram_pages =
|> + get_totalram;
|>
|> + }
|>
|> + shared_pages = meminfo.get_shared;
|>
|> + get_buffers = meminfo.get_buffers;
|>
|> + }
|>
|> +
|>
|> + fprintf(fp, "%s", kmeminfo_hdr);
|>
|> fprintf(fp, "%13s %7ld %11s ----\n", "TOTAL
MEM",
|>
|> totalram_pages,
|> pages_to_size(totalram_pages, buf));
|>
|> @@ -8613,9 +8638,11 @@ dump_kmeminfo(void)
|>
|> * differently than the kernel -- it just tallies the
|> non-reserved
|>
|> * pages that have a count of greater than 1.
|>
|> */
|>
|> - pct = (shared_pages * 100)/totalram_pages;
|>
|> - fprintf(fp, "%13s %7ld %11s %3ld%% of TOTAL MEM\n",
|>
|> - "SHARED", shared_pages,
pages_to_size(shared_pages, buf),
|pct);
|>
|> + if (shared_pages) {
|>
|> + pct = (shared_pages * 100)/totalram_pages;
|>
|> + fprintf(fp, "%13s %7ld %11s %3ld%% of TOTAL MEM\n",
|>
|> + "SHARED", shared_pages,
|> + pages_to_size(shared_pages, buf), pct);
|>
|> + }
|>
|> subtract_buffer_pages = 0;
|>
|> if (symbol_exists("buffermem_pages")) {
|>
|> @@ -8634,7 +8661,7 @@ dump_kmeminfo(void)
|>
|> fprintf(fp, "%13s %7ld %11s %3ld%% of TOTAL MEM\n",
|>
|> "BUFFERS", buffer_pages,
|> pages_to_size(buffer_pages, buf), pct);
|>
|> - if (CRASHDEBUG(1))
|>
|> + if (CRASHDEBUG(1)&&(flags&GET_BUFFERS_PAGES))
|>
|> error(NOTE, "pages with buffers: %ld\n", get_buffers);
|>
|> /*
|>
|>
|>
|> --
|> Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io To
|> unsubscribe send an email to devel-leave(a)lists.crash-utility.osci.io
|> https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
|> Contribution Guidelines:
https://github.com/crash-utility/crash/wiki