August 2018 - Crash-utility - Crash Utility List Archives

Re: [Crash-utility] Kernel Crash Analysis on Android

by Shankar, AmarX

Hi Dave, Thanks for your info regarding kexec tool. I am unable to download kexec from below link. http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/kexec-too... It says HTTP 404 Page Not Found. Could you please guide me on this? Thanks & Regards, Amar Shankar > On Wed, Mar 21, 2012 at 06:00:00PM +0000, Shankar, AmarX wrote: > > > I want to do kernel crash Analysis on Android Merrifield Target. > > > > Could someone please help me how to do it? > > Merrifield is pretty much similar than Medfield, e.g it has x86 core. So I > guess you can follow the instructions how to setup kdump on x86 (see > Documentation/kdump/kdump.txt) unless you already have that configured. > > crash should support this directly presuming you have vmlinux/vmcore files to > feed it. You can configure crash to support x86 on x86_64 host by running: > > % make target=X86 > & make > > (or something along those lines). Right -- just the first make command will suffice, i.e., when running on an x86_64 host: $ wget http://people.redhat.com/anderson/crash-6.0.4.tar.gz $ tar xzf crash-6.0.4.tar.gz ... $ cd crash-6.0.4 $ make target=X86 ... $ ./crash <path-to>/vmlinux <path-to>/vmcore Dave From: Shankar, AmarX Sent: Wednesday, March 21, 2012 11:30 PM To: 'crash-utility(a)redhat.com' Subject: Kernel Crash Analysis on Android Hi, I want to do kernel crash Analysis on Android Merrifield Target. Could someone please help me how to do it? Thanks & Regards, Amar Shankar

2 years, 4 months

2
1
0 / 0

[PATCH] kmem, snap: iomem/ioport display and vmcore snapshot support

by HATAYAMA Daisuke

Some days ago I was in a situation that I had to convert vmcore in kvmdump format into ELF since some extension module we have locally can be used only on relatively old crash utility, around version 4, but such old crash utility cannot handle kvmdump format. To do the conversion in handy, I used snap command with some modifications so that it tries to use iomem information in vmcore instead of host's /proc/iomem. This patch is its cleaned-up version. In this development, I naturally got down to also making an interface for an access to resource objects, and so together with the snap command's patch, I also extended kmem command for iomem/ioport support. Actually: kmem -r displays /proc/iomem crash> kmem -r 00000000-0000ffff : reserved 00010000-0009dbff : System RAM 0009dc00-0009ffff : reserved 000c0000-000c7fff : Video ROM ... and kmem -R displays /proc/ioport crash> kmem -R 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 ... Looking into old version of kernel source code back, resource structure has been unchanged since linux-2.4.0. I borrowed the way of walking on resouce tree in this patch from the lastest v3.3-rc series, but I guess the logic is also applicable to old kernels. I expect Dave's regression testsuite. Also, there would be another command more sutable for iomem/ioport. If necessay, I'll repost the patch. --- HATAYAMA Daisuke (4): Add vmcore snapshot support Add kmem -r and -R options Add dump iomem/ioport functions; a helper for resource objects Add a helper function for iterating resource objects defs.h | 9 ++++ extensions/snap.c | 54 ++++++++++++++++++++++- help.c | 2 + memory.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 4 files changed, 180 insertions(+), 7 deletions(-) -- Thanks. HATAYAMA Daisuke

2 years, 4 months

2
4
0 / 0

Re: [Crash-utility] question about phys_base

by Dave Anderson

----- Original Message ----- > > > > OK, so then I don't understand what you mean by "may be the same"? > > > > You didn't answer my original question, but if I understand you correctly, > > it would be impossible for the qemu host to create a PT_LOAD segment that > > describes an x86_64 guest's __START_KERNEL_map region, because the host > > doesn't know that what kind of kernel the guest is running. > > Yes. Even if the guest is linux, it is still impossible to do it. Because > the guest maybe in the second kernel. > > qemu-dump walks all guest's page table and collect virtual address and > physical address mapping. If the page is not used by guest, the virtual is set > to 0. I create PT_LOAD according to such mapping. So if the guest is linux, > there may be a PT_LOAD segment that describes __START_KERNEL_map region. > But the information stored in PT_LOAD maybe for the second kernel. If crash > uses it, crash will see the second kernel, not the first kernel. Just to be clear -- what do you mean by the "second" kernel? Do you mean that a guest kernel crashed guest, and did a kdump operation, and that second kdump kernel failed somehow, and now you're trying to do a "virsh dump" on the kdump kernel? Dave

2 years, 4 months

3
16
0 / 0

[PATCH makedumpfile 0/2] LZO Compression Support

by HATAYAMA Daisuke

The following series implements LZO compression support to makedumpfile. LZO is as good as in size but by far better in speed than ZLIB, readucing down time during generation of crash dump and refiltering. The RFC discussion: http://lists.infradead.org/pipermail/kexec/2011-November/005783.html http://lists.infradead.org/pipermail/kexec/2011-December/005868.html How to build: 1. Get lzo libraries: lzo, lzo-devel and lzo-minilzo from either of the following: 1) Original author's website: http://www.oberhumer.com/opensource/lzo/ 2) yum framework on fedora. Older releases don't have the packages. 2. Apply the patch set to makedumpfile v1.4.2. 3. Do make. How to use: Introduce new -l option. If a user specify this, makedumpfile generates dumpfile compressed by pages with lzo compression. Example) $ makedumpfile -l vmcore dumpfile Performance evaluation: - Kumagai-san's evaluation simulating actually working servers: http://lists.infradead.org/pipermail/kexec/2011-December/005868.html - My evaluation focusing on the worst cases: http://lists.infradead.org/pipermail/kexec/2011-November/005783.html LZO Support for crash: I'll post LZO support patch for crash after makedumpfile merges these patches. --- HATAYAMA Daisuke (2): Add help and manual messages about LZO compression support Add LZO Support Makefile | 2 +- diskdump_mod.h | 3 ++- makedumpfile.8 | 6 +++--- makedumpfile.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++--------- makedumpfile.h | 2 ++ print_info.c | 16 ++++++++-------- 6 files changed, 64 insertions(+), 22 deletions(-) -- HATAYAMA Daisuke

2 years, 4 months

2
6
0 / 0

question about phys_base

by Wen Congyang

Hi, Dave I am implementing a new dump command in the qemu. The vmcore's format is elf(like kdump). And I try to provide phys_base in the PT_LOAD. But if the os uses the first vcpu do kdump, the value of phys_base is wrong. I find a function x86_64_virt_phys_base() in crash's code. Is it OK to call this function first? If the function successes, we do not calculate phys_base according to PT_LOAD. Thanks Wen Congyang

2 years, 4 months

3
6
0 / 0

[PATCH] runq: search current task's runqueue explicitly

by HATAYAMA Daisuke

Currently, runq sub-command doesn't consider CFS runqueue's current task removed from CFS runqueue. Due to this, the remaining CFS runqueus that follow the current task's is not displayed. This patch fixes this by making runq sub-command search current task's runqueue explicitly. Note that CFS runqueue exists for each task group, and so does CFS runqueue's current task, and the above search needs to be done recursively. Test ==== On vmcore I made 7 task groups: root group --- A --- AA --- AAA + +- AAB | +- AB --- ABA +- ABB and then I ran three CPU bound tasks, which is exactly the same as int main(void) { for (;;) continue; return 0; } for each task group, including root group; so total 24 tasks. For readability, I annotated each task name with its belonging group name. For example, loop.ABA belongs to task group ABA. Look at CPU0 collumn below. [before] lacks 8 tasks and [after] successfully shows all tasks on the runqueue, which is identical to the result of [sched debug] that is expected to ouput correct result. I'll send this vmcore later. [before] crash> runq | cat CPU 0 RUNQUEUE: ffff88000a215f80 CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA" RT PRIO_ARRAY: ffff88000a216098 [no tasks queued] CFS RB_ROOT: ffff88000a216010 [120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA" <cut> [after] crash_fix> runq CPU 0 RUNQUEUE: ffff88000a215f80 CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA" RT PRIO_ARRAY: ffff88000a216098 [no tasks queued] CFS RB_ROOT: ffff88000a216010 [120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA" [120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB" [120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB" [120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB" [120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB" [120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA" [120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA" [120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA" [120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A" <cut> [sched debug] crash> runq -d CPU 0 [120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A" [120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA" [120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA" [120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA" [120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB" [120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB" [120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA" [120] PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA" [120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB" [120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB" <cut> Diff stat ========= defs.h | 1 + task.c | 37 +++++++++++++++++-------------------- 2 files changed, 18 insertions(+), 20 deletions(-) Thanks. HATAYAMA, Daisuke

2 years, 4 months

2
2
0 / 0

[RFC] makedumpfile, crash: LZO compression support

by HATAYAMA Daisuke

Hello, This is a RFC patch set that adds LZO compression support to makedumpfile and crash utility. LZO is as good as in size but by far better in speed than ZLIB, leading to reducing down time during generation of crash dump and refiltering. How to build: 1. Get LZO library, which is provided as lzo-devel package on recent linux distributions, and is also available on author's website: http://www.oberhumer.com/opensource/lzo/. 2. Apply the patch set to makedumpfile v1.4.0 and crash v6.0.0. 3. Build both using make. But for crash, do the following now: $ make CFLAGS="-llzo2" How to use: I've newly used -l option for lzo compression in this patch. So for example, do as follows: $ makedumpfile -l vmcore dumpfile $ crash vmlinux dumpfile Request of configure-like feature for crash utility: I would like configure-like feature on crash utility for users to select wheather to add LZO feature actually or not in build-time, that is: ./configure --enable-lzo or ./configure --disable-lzo. The reason is that support staff often downloads and installs the latest version of crash utility on machines where lzo library is not provided. Looking at the source code, it looks to me that crash does some kind of configuration processing in a local manner, around configure.c, and I guess it's difficult to use autoconf tools directly. Or is there another better way? Performance Comparison: Sample Data Ideally, I must have measured the performance for many enough vmcores generated from machines that was actually running, but now I don't have enough sample vmcores, I couldn't do so. So this comparison doesn't answer question on I/O time improvement. This is TODO for now. Instead, I choosed worst and best cases regarding compression ratio and speed only. Specifically, the former is /dev/urandom and the latter is /dev/zero. I get the sample data of 10MB, 100MB and 1GB by doing like this: $ dd bs=4096 count=$((1024*1024*1024/4096)) if=/dev/urandom of=urandom.1GB How to measure Then I performed compression for each block, 4096 bytes, and measured total compression time and output size. See attached mycompress.c. Result See attached file result.txt. Discussion For both kinds of data, lzo's compression was considerably quicker than zlib's. Compression ratio is about 37% for urandom data, and about 8.5% for zero data. Actual situation of physical memory would be in between the two cases, and so I guess average compression time ratio is between 37% and 8.5%. Although beyond the topic of this patch set, we can estimate worst compression time on more data size since compression is performed block size wise and the compression time increases linearly. Estimated worst time on 2TB memory is about 15 hours for lzo and about 40 hours for zlib. In this case, compressed data size is larger than the original, so they are really not used, compression time is fully meaningless. I think compression must be done in parallel, and I'll post such patch later. Diffstat * makedumpfile diskdump_mod.h | 3 +- makedumpfile.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++------ makedumpfile.h | 12 +++++++ 3 files changed, 101 insertions(+), 12 deletions(-) * crash defs.h | 1 + diskdump.c | 20 +++++++++++++++++++- diskdump.h | 3 ++- 3 files changed, 22 insertions(+), 2 deletions(-) TODO * evaluation including I/O time using actual vmcores Thanks. HATAYAMA, Daisuke

2 years, 4 months

3
15
0 / 0

Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format

by tachibana＠mxm.nes.nec.co.jp

Hi Hatayama-san, On 2011/06/29 12:12:18 +0900, HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com> wrote: > From: Dave Anderson <anderson(a)redhat.com> > Subject: Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format > Date: Tue, 28 Jun 2011 08:57:42 -0400 (EDT) > > > > > > > ----- Original Message ----- > >> Fujitsu has stand-alone dump mechanism based on firmware level > >> functionality, which we call SADUMP, in short. > >> > >> We've maintained utility tools internally but now we're thinking that > >> the best is crash utility and makedumpfile supports the sadump format > >> for the viewpoint of both portability and maintainability. > >> > >> We'll be of course responsible for its maintainance in a continuous > >> manner. The sadump dump format is very similar to diskdump format and > >> so kdump (compressed) format, so we estimate patch set would be a > >> relatively small size. > >> > >> Could you tell me whether crash utility and makedumpfile can support > >> the sadump format? If OK, we'll start to make patchset. I think it's not bad to support sadump by makedumpfile. However I have several questions. - Do you want to use makedumpfile to make an existing file that sadump has dumped small? - It isn't possible to support the same form as kdump-compressed format now, is it? - When the information that makedumpfile reads from a note of /proc/vmcore (or a header of kdump-compressed format) is added by an extension of makedumpfile, do you need to modify sadump? Thanks tachibana > > > > Sure, yes, the crash utility can always support another dumpfile format. > > > > Thanks. It helps a lot. > > > It's unclear to me how similar SADUMP is to diskdump/compressed-kdump. > > Does your internal version patch diskdump.c, or do you maintain your > > own "sadump.c"? I ask because if your patchset is at all intrusive, > > I'd prefer it be kept in its own file, primarily for maintainability, > > but also because SADUMP is essentially a black-box to anybody outside > > Fujitsu. > > What I meant when I used ``similar'' is both literally and > logically. The format consists of diskdump header-like header, two > kinds of bitmaps used for the same purpose as those in diskump format, > and memory data. They can be handled in common with the existing data > structure, diskdump_data, non-intrusively, so I hope they are placed > in diskdump.c. > > On the other hand, there's a code to be placed at such specific > area. sadump is triggered depending on kdump's progress and so > register values to be contained in vmcore varies according to the > progress: If crash_notes has been initialized when sadump is > triggered, sadump packs the register values in crash_notes; if not > yet, packs registers gathered by firmware. This is sadump specific > processing, so I think putting it in specific sadump.c file is a > natural and reasonable choise. > > Anyway, I have not made any patch set for this. I'll post a patch set > when I complete. > > Again, thanks a lot for the positive answer. > > Thanks. > HATAYAMA, Daisuke > > > _______________________________________________ > kexec mailing list > kexec(a)lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec

2 years, 4 months

3
5
0 / 0

[PATCH] Fix for 4.19-rc1 and later "relative __ksymtab entries"

by Dominique Martinet

kernels which have CONFIG_HAVE_ARCH_PREL32_RELOCATIONS will now store module symbol values relative to themselves since linux kernel's commit 7290d580957 ("module: use relative references for __ksymtab entries") Do the lookup one way or another depending on the kernel's kernel_symbol struct, the case without the ARCH_PREL32_RELOCATIONS is the same as the old behaviour --- Hi! This is a follow-up on the bz I opened yesterday[1] I'll let you commit your part of the tasks_struct pids, with both patches I can start crash normally and could check that address of symbols within a module look correct when compared to /proc/kallsyms, so I guess this isn't too bad. I also tried with an older kernel without problem. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1623127 defs.h | 1 + symbols.c | 98 +++++++++++++++++++++++++++++++++++++------------------ 2 files changed, 68 insertions(+), 31 deletions(-) diff --git a/defs.h b/defs.h index 8687ff1..166ee48 100644 --- a/defs.h +++ b/defs.h @@ -2036,6 +2036,7 @@ struct offset_table { /* stash of commonly-used offsets */ long memcg_cache_params___root_caches_node; long memcg_cache_params_children; long memcg_cache_params_children_node; + long kernel_symbol_value; }; struct size_table { /* stash of commonly-used sizes */ diff --git a/symbols.c b/symbols.c index 2e6713a..7e73bad 100644 --- a/symbols.c +++ b/symbols.c @@ -1595,12 +1595,40 @@ store_module_symbols_v1(ulong total, int mods_installed) st->flags |= INSMOD_BUILTIN; } -struct kernel_symbol -{ - unsigned long value; - const char *name; +union kernel_symbol { + struct kernel_symbol_v1 { + unsigned long value; + const char *name; + } v1; + /* kernel 4.19 introduced relative symbol positionning */ + struct kernel_symbol_v2 { + int value_offset; + int name_offset; + } v2; }; +static ulong +modsym_name(ulong syms, union kernel_symbol *modsym, int i) +{ + if (VALID_MEMBER(kernel_symbol_value)) + return (ulong)modsym->v1.name; + + return syms + i * sizeof(struct kernel_symbol_v2) + + offsetof(struct kernel_symbol_v2, name_offset) + + modsym->v2.name_offset; +} + +static ulong +modsym_value(ulong syms, union kernel_symbol *modsym, int i) +{ + if (VALID_MEMBER(kernel_symbol_value)) + return (ulong)modsym->v1.value; + + return syms + i * sizeof(struct kernel_symbol_v2) + + offsetof(struct kernel_symbol_v2, value_offset) + + modsym->v2.value_offset; +} + void store_module_symbols_v2(ulong total, int mods_installed) { @@ -1614,7 +1642,8 @@ store_module_symbols_v2(ulong total, int mods_installed) long strbuflen; ulong size; int mcnt, lm_mcnt; - struct kernel_symbol *modsym; + union kernel_symbol *modsym; + size_t kernel_symbol_size; struct load_module *lm; char buf1[BUFSIZE]; char buf2[BUFSIZE]; @@ -1639,6 +1668,13 @@ store_module_symbols_v2(ulong total, int mods_installed) "re-initialization of module symbols not implemented yet!\n"); } + MEMBER_OFFSET_INIT(kernel_symbol_value, "kernel_symbol", "value"); + if (VALID_MEMBER(kernel_symbol_value)) { + kernel_symbol_size = sizeof(struct kernel_symbol_v1); + } else { + kernel_symbol_size = sizeof(struct kernel_symbol_v2); + } + if ((st->ext_module_symtable = (struct syment *) calloc(total, sizeof(struct syment))) == NULL) error(FATAL, "v2 module syment space malloc (%ld symbols): %s\n", @@ -1750,20 +1786,20 @@ store_module_symbols_v2(ulong total, int mods_installed) } if (nsyms) { - modsymbuf = GETBUF(sizeof(struct kernel_symbol)*nsyms); + modsymbuf = GETBUF(kernel_symbol_size*nsyms); readmem((ulong)syms, KVADDR, modsymbuf, - nsyms * sizeof(struct kernel_symbol), + nsyms * kernel_symbol_size, "module symbols", FAULT_ON_ERROR); } for (i = first = last = 0; i < nsyms; i++) { - modsym = (struct kernel_symbol *) - (modsymbuf + (i * sizeof(struct kernel_symbol))); + modsym = (union kernel_symbol *) + (modsymbuf + (i * kernel_symbol_size)); if (!first - || first > (ulong)modsym->name) - first = (ulong)modsym->name; - if ((ulong)modsym->name > last) - last = (ulong)modsym->name; + || first > modsym_name(syms, modsym, i)) + first = modsym_name(syms, modsym, i); + if (modsym_name(syms, modsym, i) > last) + last = modsym_name(syms, modsym, i); } if (last > first) { @@ -1787,21 +1823,21 @@ store_module_symbols_v2(ulong total, int mods_installed) for (i = 0; i < nsyms; i++) { - modsym = (struct kernel_symbol *) - (modsymbuf + (i * sizeof(struct kernel_symbol))); + modsym = (union kernel_symbol *) + (modsymbuf + (i * kernel_symbol_size)); BZERO(buf1, BUFSIZE); if (strbuf) strcpy(buf1, - &strbuf[(ulong)modsym->name - first]); + &strbuf[modsym_name(syms, modsym, i) - first]); else - read_string((ulong)modsym->name, buf1, + read_string(modsym_name(syms, modsym, i), buf1, BUFSIZE-1); if (strlen(buf1)) { st->ext_module_symtable[mcnt].value = - modsym->value; + modsym_value(syms, modsym, i); st->ext_module_symtable[mcnt].type = '?'; st->ext_module_symtable[mcnt].flags |= MODULE_SYMBOL; strip_module_symbol_end(buf1); @@ -1823,21 +1859,21 @@ store_module_symbols_v2(ulong total, int mods_installed) FREEBUF(strbuf); if (ngplsyms) { - modsymbuf = GETBUF(sizeof(struct kernel_symbol) * + modsymbuf = GETBUF(kernel_symbol_size * ngplsyms); readmem((ulong)gpl_syms, KVADDR, modsymbuf, - ngplsyms * sizeof(struct kernel_symbol), + ngplsyms * kernel_symbol_size, "module gpl symbols", FAULT_ON_ERROR); } for (i = first = last = 0; i < ngplsyms; i++) { - modsym = (struct kernel_symbol *) - (modsymbuf + (i * sizeof(struct kernel_symbol))); + modsym = (union kernel_symbol *) + (modsymbuf + (i * kernel_symbol_size)); if (!first - || first > (ulong)modsym->name) - first = (ulong)modsym->name; - if ((ulong)modsym->name > last) - last = (ulong)modsym->name; + || first > modsym_name(gpl_syms, modsym, i)) + first = modsym_name(gpl_syms, modsym, i); + if (modsym_name(gpl_syms, modsym, i) > last) + last = modsym_name(gpl_syms, modsym, i); } if (last > first) { @@ -1860,21 +1896,21 @@ store_module_symbols_v2(ulong total, int mods_installed) for (i = 0; i < ngplsyms; i++) { - modsym = (struct kernel_symbol *) - (modsymbuf + (i * sizeof(struct kernel_symbol))); + modsym = (union kernel_symbol *) + (modsymbuf + (i * kernel_symbol_size)); BZERO(buf1, BUFSIZE); if (strbuf) strcpy(buf1, - &strbuf[(ulong)modsym->name - first]); + &strbuf[modsym_name(gpl_syms, modsym, i) - first]); else - read_string((ulong)modsym->name, buf1, + read_string(modsym_name(gpl_syms, modsym, i), buf1, BUFSIZE-1); if (strlen(buf1)) { st->ext_module_symtable[mcnt].value = - modsym->value; + modsym_value(gpl_syms, modsym, i); st->ext_module_symtable[mcnt].type = '?'; st->ext_module_symtable[mcnt].flags |= MODULE_SYMBOL; strip_module_symbol_end(buf1); -- 2.17.1

7 years, 6 months

2
1
0 / 0

[PATCH RFC] Add "kmem -r" to display accumulated slab statistics like /proc/slabinfo

by Kazuhito Hagio

Nowadays, the "kmem -s" output can become very long vertically too, due to the memcg kmem caches. It look like the longer a system has run, the longer it becomes. crash> kmem -s | wc -l 19855 On the other hand, since /proc/slabinfo accumulates the values of each slab_root_caches and its children, it's still short relatively. And I think there are many cases that support folks want to see the accumulated values like /proc/slabinfo from vmcore, in order to grasp the overview of slab activity quickly. We can use something like the attached script to accumulate them, but I believe it would be more useful to implement it in crash. This patch introduces the "kmem -r" option to imitate /proc/slabinfo, but it is limited to CONFIG_SLUB for now. I tested this patch with the kmem-s2r.awk script: crash> kmem -s | awk -f kmem-s2r.awk > kmem-s2r.txt crash> kmem -r > kmem-r.txt # diff -u kmem-s2r.txt kmem-r.txt Supported: crash> kmem -r crash> kmem -r list crash> kmem -r <slab name> Signed-off-by: Kazuhito Hagio <k-hagio(a)ab.jp.nec.com> --- defs.h | 5 ++ help.c | 10 +-- memory.c | 219 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------ symbols.c | 9 +++ 4 files changed, 220 insertions(+), 23 deletions(-) diff --git a/defs.h b/defs.h index 6fdb478..8687ff1 100644 --- a/defs.h +++ b/defs.h @@ -2032,6 +2032,10 @@ struct offset_table { /* stash of commonly-used offsets */ long bpf_prog_aux_user; long user_struct_uid; long idr_cur; + long kmem_cache_memcg_params; + long memcg_cache_params___root_caches_node; + long memcg_cache_params_children; + long memcg_cache_params_children_node; }; struct size_table { /* stash of commonly-used sizes */ @@ -2438,6 +2442,7 @@ struct vm_table { /* kernel VM-related data */ #define PAGEFLAGS (0x4000000) #define SLAB_OVERLOAD_PAGE (0x8000000) #define SLAB_CPU_CACHE (0x10000000) +#define SLAB_ROOT_CACHES (0x20000000) #define IS_FLATMEM() (vt->flags & FLATMEM) #define IS_DISCONTIGMEM() (vt->flags & DISCONTIGMEM) diff --git a/help.c b/help.c index aeeb056..ee8b999 100644 --- a/help.c +++ b/help.c @@ -6448,7 +6448,7 @@ char *help_kmem[] = { "kmem", "kernel memory", "[-f|-F|-c|-C|-i|-v|-V|-n|-z|-o|-h] [-p | -m member[,member]]\n" -" [[-s|-S] [slab] [-I slab[,slab]]] [-g [flags]] [[-P] address]]", +" [[-s|-S|-r] [slab] [-I slab[,slab]]] [-g [flags]] [[-P] address]]", " This command displays information about the use of kernel memory.\n", " -f displays the contents of the system free memory headers.", " also verifies that the page count equals nr_free_pages.", @@ -6490,10 +6490,12 @@ char *help_kmem[] = { " slab data for each per-cpu slab is displayed, along with the", " address of each kmem_cache_node, its count of full and partial", " slabs, and a list of all tracked slabs.", -" slab when used with -s or -S, limits the command to only the slab cache", -" of name \"slab\". If the slab argument is \"list\", then", +" -r displays accumulated kmalloc() slab data of each slab_root_caches", +" and its children. Available only if CONFIG_SLUB for now.", +" slab when used with -s, -S or -r, limits the command to only the slab", +" cache of name \"slab\". If the slab argument is \"list\", then", " all slab cache names and addresses are listed.", -" -I slab when used with -s or -S, one or more slab cache names in a", +" -I slab when used with -s, -S or -r, one or more slab cache names in a", " comma-separated list may be specified as slab caches to ignore.", " -g displays the enumerator value of all bits in the page structure's", " \"flags\" field.", diff --git a/memory.c b/memory.c index e02ba68..1501b21 100644 --- a/memory.c +++ b/memory.c @@ -167,12 +167,12 @@ static int kmem_cache_downsize(void); static int ignore_cache(struct meminfo *, char *); static char *is_kmem_cache_addr(ulong, char *); static char *is_kmem_cache_addr_common(ulong, char *); -static void kmem_cache_list(void); +static void kmem_cache_list(struct meminfo *); static void dump_kmem_cache(struct meminfo *); static void dump_kmem_cache_percpu_v1(struct meminfo *); static void dump_kmem_cache_percpu_v2(struct meminfo *); static void dump_kmem_cache_slub(struct meminfo *); -static void kmem_cache_list_common(void); +static void kmem_cache_list_common(struct meminfo *); static ulong get_cpu_slab_ptr(struct meminfo *, int, ulong *); static unsigned int oo_order(ulong); static unsigned int oo_objects(ulong); @@ -276,6 +276,8 @@ static int generic_read_dumpfile(ulonglong, void *, long, char *, ulong); static int generic_write_dumpfile(ulonglong, void *, long, char *, ulong); static int page_to_nid(ulong); static int get_kmem_cache_list(ulong **); +static int get_kmem_cache_root_list(ulong **); +static int get_kmem_cache_child_list(ulong **, ulong); static int get_kmem_cache_slub_data(long, struct meminfo *); static ulong compound_head(ulong); static long count_partial(ulong, struct meminfo *, ulong *); @@ -815,6 +817,23 @@ vm_init(void) "kmem_slab_s", "s_magic"); } + if (kernel_symbol_exists("slab_root_caches")) { + MEMBER_OFFSET_INIT(kmem_cache_memcg_params, + "kmem_cache", "memcg_params"); + MEMBER_OFFSET_INIT(memcg_cache_params___root_caches_node, + "memcg_cache_params", "__root_caches_node"); + MEMBER_OFFSET_INIT(memcg_cache_params_children, + "memcg_cache_params", "children"); + MEMBER_OFFSET_INIT(memcg_cache_params_children_node, + "memcg_cache_params", "children_node"); + + if (VALID_MEMBER(kmem_cache_memcg_params) + && VALID_MEMBER(memcg_cache_params___root_caches_node) + && VALID_MEMBER(memcg_cache_params_children) + && VALID_MEMBER(memcg_cache_params_children_node)) + vt->flags |= SLAB_ROOT_CACHES; + } + if (!kt->kernel_NR_CPUS) { if (enumerator_value("WORK_CPU_UNBOUND", (long *)&value1)) kt->kernel_NR_CPUS = (int)value1; @@ -4713,6 +4732,7 @@ get_task_mem_usage(ulong task, struct task_mem_usage *tm) #define SLAB_OVERLOAD_PAGE_PTR (ADDRESS_SPECIFIED << 24) #define SLAB_BITFIELD (ADDRESS_SPECIFIED << 25) #define SLAB_GATHER_FAILURE (ADDRESS_SPECIFIED << 26) +#define GET_SLAB_ROOT_CACHES (ADDRESS_SPECIFIED << 27) #define GET_ALL \ (GET_SHARED_PAGES|GET_TOTALRAM_PAGES|GET_BUFFERS_PAGES|GET_SLAB_PAGES) @@ -4724,6 +4744,7 @@ cmd_kmem(void) int c; int sflag, Sflag, pflag, fflag, Fflag, vflag, zflag, oflag, gflag; int nflag, cflag, Cflag, iflag, lflag, Lflag, Pflag, Vflag, hflag; + int rflag; struct meminfo meminfo; ulonglong value[MAXARGS]; char buf[BUFSIZE]; @@ -4733,13 +4754,13 @@ cmd_kmem(void) spec_addr = 0; sflag = Sflag = pflag = fflag = Fflag = Pflag = zflag = oflag = 0; vflag = Cflag = cflag = iflag = nflag = lflag = Lflag = Vflag = 0; - gflag = hflag = 0; + gflag = hflag = rflag = 0; escape = FALSE; BZERO(&meminfo, sizeof(struct meminfo)); BZERO(&value[0], sizeof(ulonglong)*MAXARGS); pc->curcmd_flags &= ~HEADER_PRINTED; - while ((c = getopt(argcnt, args, "gI:sSFfm:pvczCinl:L:PVoh")) != EOF) { + while ((c = getopt(argcnt, args, "gI:sSrFfm:pvczCinl:L:PVoh")) != EOF) { switch(c) { case 'V': @@ -4775,11 +4796,15 @@ cmd_kmem(void) break; case 's': - sflag = 1; Sflag = 0; + sflag = 1; Sflag = rflag = 0; break; case 'S': - Sflag = 1; sflag = 0; + Sflag = 1; sflag = rflag = 0; + break; + + case 'r': + rflag = 1; sflag = Sflag = 0; break; case 'F': @@ -4859,12 +4884,13 @@ cmd_kmem(void) cmd_usage(pc->curcmd, SYNOPSIS); if ((sflag + Sflag + pflag + fflag + Fflag + Vflag + oflag + - vflag + Cflag + cflag + iflag + lflag + Lflag + gflag + hflag) > 1) { + vflag + Cflag + cflag + iflag + lflag + Lflag + gflag + + hflag + rflag) > 1) { error(INFO, "only one flag allowed!\n"); cmd_usage(pc->curcmd, SYNOPSIS); } - if (sflag || Sflag || !(vt->flags & KMEM_CACHE_INIT)) + if (sflag || Sflag || rflag || !(vt->flags & KMEM_CACHE_INIT)) kmem_cache_init(); while (args[optind]) { @@ -4881,7 +4907,7 @@ cmd_kmem(void) escape = TRUE; } else meminfo.reqname = args[optind]; - if (!sflag && !Sflag) + if (!sflag && !Sflag && !rflag) cmd_usage(pc->curcmd, SYNOPSIS); } @@ -4994,7 +5020,7 @@ cmd_kmem(void) * no value arguments allowed! */ if (zflag || nflag || iflag || Fflag || Cflag || Lflag || - Vflag || oflag || hflag) { + Vflag || oflag || hflag || rflag) { error(INFO, "no address arguments allowed with this option\n"); cmd_usage(pc->curcmd, SYNOPSIS); @@ -5030,9 +5056,17 @@ cmd_kmem(void) if (hflag == 1) dump_hstates(); - if (sflag == 1) { + if (sflag == 1 || rflag == 1) { + if (rflag) { + if (!((vt->flags & KMALLOC_SLUB) + && (vt->flags & SLAB_ROOT_CACHES))) + error(FATAL, + "-r option doesn't support this kernel\n"); + + meminfo.flags = GET_SLAB_ROOT_CACHES; + } if (!escape && STREQ(meminfo.reqname, "list")) - kmem_cache_list(); + kmem_cache_list(&meminfo); else if (vt->flags & KMEM_CACHE_UNAVAIL) error(FATAL, "kmem cache slab subsystem not available\n"); @@ -5042,7 +5076,7 @@ cmd_kmem(void) if (Sflag == 1) { if (STREQ(meminfo.reqname, "list")) - kmem_cache_list(); + kmem_cache_list(&meminfo); else if (vt->flags & KMEM_CACHE_UNAVAIL) error(FATAL, "kmem cache slab subsystem not available\n"); @@ -5092,7 +5126,8 @@ cmd_kmem(void) if (!(sflag + Sflag + pflag + fflag + Fflag + vflag + Vflag + zflag + oflag + cflag + Cflag + iflag + - nflag + lflag + Lflag + gflag + hflag + meminfo.calls)) + nflag + lflag + Lflag + gflag + hflag + rflag + + meminfo.calls)) cmd_usage(pc->curcmd, SYNOPSIS); } @@ -9117,7 +9152,7 @@ is_kmem_cache_addr(ulong vaddr, char *kbuf) * dumps all slab cache names and their addresses. */ static void -kmem_cache_list(void) +kmem_cache_list(struct meminfo *mi) { ulong cache, cache_cache, name; long next_offset, name_offset; @@ -9132,7 +9167,7 @@ kmem_cache_list(void) } if (vt->flags & (KMALLOC_SLUB|KMALLOC_COMMON)) { - kmem_cache_list_common(); + kmem_cache_list_common(mi); return; } @@ -13564,6 +13599,8 @@ dump_vm_table(int verbose) fprintf(fp, "%sSLAB_OVERLOAD_PAGE", others++ ? "|" : "");\ if (vt->flags & SLAB_CPU_CACHE) fprintf(fp, "%sSLAB_CPU_CACHE", others++ ? "|" : "");\ + if (vt->flags & SLAB_ROOT_CACHES) + fprintf(fp, "%sSLAB_ROOT_CACHES", others++ ? "|" : "");\ if (vt->flags & USE_VMAP_AREA) fprintf(fp, "%sUSE_VMAP_AREA", others++ ? "|" : "");\ if (vt->flags & CONFIG_NUMA) @@ -18044,14 +18081,17 @@ kmem_cache_init_slub(void) } static void -kmem_cache_list_common(void) +kmem_cache_list_common(struct meminfo *mi) { int i, cnt; ulong *cache_list; ulong name; char buf[BUFSIZE]; - cnt = get_kmem_cache_list(&cache_list); + if (mi->flags & GET_SLAB_ROOT_CACHES) + cnt = get_kmem_cache_root_list(&cache_list); + else + cnt = get_kmem_cache_list(&cache_list); for (i = 0; i < cnt; i++) { fprintf(fp, "%lx ", cache_list[i]); @@ -18087,7 +18127,11 @@ dump_kmem_cache_slub(struct meminfo *si) } order = objects = 0; - si->cache_count = get_kmem_cache_list(&si->cache_list); + if (si->flags & GET_SLAB_ROOT_CACHES) + si->cache_count = get_kmem_cache_root_list(&si->cache_list); + else + si->cache_count = get_kmem_cache_list(&si->cache_list); + si->cache_buf = GETBUF(SIZE(kmem_cache)); if (VALID_MEMBER(page_objects) && @@ -18168,6 +18212,79 @@ dump_kmem_cache_slub(struct meminfo *si) !get_kmem_cache_slub_data(GET_SLUB_OBJECTS, si)) si->flags |= SLAB_GATHER_FAILURE; + /* accumulate children's slabinfo */ + if (si->flags & GET_SLAB_ROOT_CACHES) { + struct meminfo *mi; + int j; + char buf2[BUFSIZE]; + + mi = (struct meminfo *)GETBUF(sizeof(struct meminfo)); + memcpy(mi, si, sizeof(struct meminfo)); + + mi->cache_count = get_kmem_cache_child_list(&mi->cache_list, + si->cache_list[i]); + + if (!mi->cache_count) + goto no_children; + + mi->cache_buf = GETBUF(SIZE(kmem_cache)); + + for (j = 0; j < mi->cache_count; j++) { + BZERO(mi->cache_buf, SIZE(kmem_cache)); + if (!readmem(mi->cache_list[j], KVADDR, mi->cache_buf, + SIZE(kmem_cache), "kmem_cache buffer", + RETURN_ON_ERROR|RETURN_PARTIAL)) + continue; + + name = ULONG(mi->cache_buf + OFFSET(kmem_cache_name)); + if (!read_string(name, buf2, BUFSIZE-1)) + sprintf(buf2, "(unknown)"); + + objsize = UINT(mi->cache_buf + OFFSET(kmem_cache_objsize)); + size = UINT(mi->cache_buf + OFFSET(kmem_cache_size)); + offset = UINT(mi->cache_buf + OFFSET(kmem_cache_offset)); + if (VALID_MEMBER(kmem_cache_objects)) { + objects = UINT(mi->cache_buf + + OFFSET(kmem_cache_objects)); + order = UINT(mi->cache_buf + OFFSET(kmem_cache_order)); + } else if (VALID_MEMBER(kmem_cache_oo)) { + oo = ULONG(mi->cache_buf + OFFSET(kmem_cache_oo)); + objects = oo_objects(oo); + order = oo_order(oo); + } else + error(FATAL, "cannot determine " + "kmem_cache objects/order values\n"); + + mi->cache = mi->cache_list[j]; + mi->curname = buf2; + mi->objsize = objsize; + mi->size = size; + mi->objects = objects; + mi->slabsize = (PAGESIZE() << order); + mi->inuse = mi->num_slabs = 0; + mi->slab_offset = offset; + mi->random = VALID_MEMBER(kmem_cache_random) ? + ULONG(mi->cache_buf + OFFSET(kmem_cache_random)) : 0; + + if (!get_kmem_cache_slub_data(GET_SLUB_SLABS, mi) || + !get_kmem_cache_slub_data(GET_SLUB_OBJECTS, mi)) { + si->flags |= SLAB_GATHER_FAILURE; + continue; + } + + si->inuse += mi->inuse; + si->free += mi->free; + si->num_slabs += mi->num_slabs; + + if (CRASHDEBUG(1)) + dump_kmem_cache_info(mi); + } + FREEBUF(mi->cache_buf); + FREEBUF(mi->cache_list); +no_children: + FREEBUF(mi); + } + DUMP_KMEM_CACHE_INFO(); if (si->flags & SLAB_GATHER_FAILURE) { @@ -18964,6 +19081,70 @@ get_kmem_cache_list(ulong **cache_buf) return cnt; } +static int +get_kmem_cache_root_list(ulong **cache_buf) +{ + int cnt; + ulong vaddr; + struct list_data list_data, *ld; + + get_symbol_data("slab_root_caches", sizeof(void *), &vaddr); + + ld = &list_data; + BZERO(ld, sizeof(struct list_data)); + ld->flags |= LIST_ALLOCATE; + ld->start = vaddr; + ld->list_head_offset = OFFSET(kmem_cache_memcg_params) + + OFFSET(memcg_cache_params___root_caches_node); + ld->end = symbol_value("slab_root_caches"); + if (CRASHDEBUG(3)) + ld->flags |= VERBOSE; + + cnt = do_list(ld); + *cache_buf = ld->list_ptr; + + return cnt; +} + +static int +get_kmem_cache_child_list(ulong **cache_buf, ulong root) +{ + int cnt; + ulong vaddr, children; + struct list_data list_data, *ld; + + children = root + OFFSET(kmem_cache_memcg_params) + + OFFSET(memcg_cache_params_children); + + readmem(children, KVADDR, &vaddr, sizeof(ulong), + "kmem_cache.memcg_params.children", + FAULT_ON_ERROR); + + /* + * When no children, since there is the difference of offset + * of children list between root and child, do_list returns + * an incorrect cache_buf[0]. So we determine wheather it has + * children or not with the value of list_head.next. + */ + if (children == vaddr) + return 0; + + ld = &list_data; + BZERO(ld, sizeof(struct list_data)); + ld->flags |= LIST_ALLOCATE; + ld->start = vaddr; + ld->list_head_offset = + OFFSET(kmem_cache_memcg_params) + + OFFSET(memcg_cache_params_children_node); + ld->end = children; + if (CRASHDEBUG(3)) + ld->flags |= VERBOSE; + + cnt = do_list(ld); + *cache_buf = ld->list_ptr; + + return cnt; +} /* * Get the address of the head page of a compound page. diff --git a/symbols.c b/symbols.c index bee60ba..2e6713a 100644 --- a/symbols.c +++ b/symbols.c @@ -9451,6 +9451,15 @@ dump_offset_table(char *spec, ulong makestruct) fprintf(fp, " kmem_cache_flags: %ld\n", OFFSET(kmem_cache_flags)); + fprintf(fp, " kmem_cache_memcg_params: %ld\n", + OFFSET(kmem_cache_memcg_params)); + fprintf(fp, "memcg_cache_params___root_caches_node: %ld\n", + OFFSET(memcg_cache_params___root_caches_node)); + fprintf(fp, " memcg_cache_params_children: %ld\n", + OFFSET(memcg_cache_params_children)); + fprintf(fp, " memcg_cache_params_children_node: %ld\n", + OFFSET(memcg_cache_params_children_node)); + fprintf(fp, " net_device_next: %ld\n", OFFSET(net_device_next)); fprintf(fp, " net_device_name: %ld\n", -- 1.8.3.1

7 years, 7 months

2
3
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility August 2018