[PATCH] runq: search current task's runqueue explicitly
by HATAYAMA Daisuke
Currently, runq sub-command doesn't consider CFS runqueue's current
task removed from CFS runqueue. Due to this, the remaining CFS
runqueus that follow the current task's is not displayed. This patch
fixes this by making runq sub-command search current task's runqueue
explicitly.
Note that CFS runqueue exists for each task group, and so does CFS
runqueue's current task, and the above search needs to be done
recursively.
Test
====
On vmcore I made 7 task groups:
root group --- A --- AA --- AAA
+ +- AAB
|
+- AB --- ABA
+- ABB
and then I ran three CPU bound tasks, which is exactly the same as
int main(void) { for (;;) continue; return 0; }
for each task group, including root group; so total 24 tasks. For
readability, I annotated each task name with its belonging group name.
For example, loop.ABA belongs to task group ABA.
Look at CPU0 collumn below. [before] lacks 8 tasks and [after]
successfully shows all tasks on the runqueue, which is identical to
the result of [sched debug] that is expected to ouput correct result.
I'll send this vmcore later.
[before]
crash> runq | cat
CPU 0 RUNQUEUE: ffff88000a215f80
CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA"
RT PRIO_ARRAY: ffff88000a216098
[no tasks queued]
CFS RB_ROOT: ffff88000a216010
[120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA"
<cut>
[after]
crash_fix> runq
CPU 0 RUNQUEUE: ffff88000a215f80
CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA"
RT PRIO_ARRAY: ffff88000a216098
[no tasks queued]
CFS RB_ROOT: ffff88000a216010
[120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA"
[120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB"
[120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB"
[120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB"
[120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB"
[120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA"
[120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA"
[120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA"
[120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A"
<cut>
[sched debug]
crash> runq -d
CPU 0
[120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A"
[120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA"
[120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA"
[120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA"
[120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB"
[120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB"
[120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA"
[120] PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA"
[120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB"
[120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB"
<cut>
Diff stat
=========
defs.h | 1 +
task.c | 37 +++++++++++++++++--------------------
2 files changed, 18 insertions(+), 20 deletions(-)
Thanks.
HATAYAMA, Daisuke
1 year, 1 month
[RFC] makedumpfile, crash: LZO compression support
by HATAYAMA Daisuke
Hello,
This is a RFC patch set that adds LZO compression support to
makedumpfile and crash utility. LZO is as good as in size but by far
better in speed than ZLIB, leading to reducing down time during
generation of crash dump and refiltering.
How to build:
1. Get LZO library, which is provided as lzo-devel package on recent
linux distributions, and is also available on author's website:
http://www.oberhumer.com/opensource/lzo/.
2. Apply the patch set to makedumpfile v1.4.0 and crash v6.0.0.
3. Build both using make. But for crash, do the following now:
$ make CFLAGS="-llzo2"
How to use:
I've newly used -l option for lzo compression in this patch. So for
example, do as follows:
$ makedumpfile -l vmcore dumpfile
$ crash vmlinux dumpfile
Request of configure-like feature for crash utility:
I would like configure-like feature on crash utility for users to
select wheather to add LZO feature actually or not in build-time,
that is: ./configure --enable-lzo or ./configure --disable-lzo.
The reason is that support staff often downloads and installs the
latest version of crash utility on machines where lzo library is not
provided.
Looking at the source code, it looks to me that crash does some kind
of configuration processing in a local manner, around configure.c,
and I guess it's difficult to use autoconf tools directly.
Or is there another better way?
Performance Comparison:
Sample Data
Ideally, I must have measured the performance for many enough
vmcores generated from machines that was actually running, but now
I don't have enough sample vmcores, I couldn't do so. So this
comparison doesn't answer question on I/O time improvement. This
is TODO for now.
Instead, I choosed worst and best cases regarding compression
ratio and speed only. Specifically, the former is /dev/urandom and
the latter is /dev/zero.
I get the sample data of 10MB, 100MB and 1GB by doing like this:
$ dd bs=4096 count=$((1024*1024*1024/4096)) if=/dev/urandom of=urandom.1GB
How to measure
Then I performed compression for each block, 4096 bytes, and
measured total compression time and output size. See attached
mycompress.c.
Result
See attached file result.txt.
Discussion
For both kinds of data, lzo's compression was considerably quicker
than zlib's. Compression ratio is about 37% for urandom data, and
about 8.5% for zero data. Actual situation of physical memory
would be in between the two cases, and so I guess average
compression time ratio is between 37% and 8.5%.
Although beyond the topic of this patch set, we can estimate worst
compression time on more data size since compression is performed
block size wise and the compression time increases
linearly. Estimated worst time on 2TB memory is about 15 hours for
lzo and about 40 hours for zlib. In this case, compressed data
size is larger than the original, so they are really not used,
compression time is fully meaningless. I think compression must be
done in parallel, and I'll post such patch later.
Diffstat
* makedumpfile
diskdump_mod.h | 3 +-
makedumpfile.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++------
makedumpfile.h | 12 +++++++
3 files changed, 101 insertions(+), 12 deletions(-)
* crash
defs.h | 1 +
diskdump.c | 20 +++++++++++++++++++-
diskdump.h | 3 ++-
3 files changed, 22 insertions(+), 2 deletions(-)
TODO
* evaluation including I/O time using actual vmcores
Thanks.
HATAYAMA, Daisuke
1 year, 1 month
Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format
by tachibana@mxm.nes.nec.co.jp
Hi Hatayama-san,
On 2011/06/29 12:12:18 +0900, HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com> wrote:
> From: Dave Anderson <anderson(a)redhat.com>
> Subject: Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format
> Date: Tue, 28 Jun 2011 08:57:42 -0400 (EDT)
>
> >
> >
> > ----- Original Message -----
> >> Fujitsu has stand-alone dump mechanism based on firmware level
> >> functionality, which we call SADUMP, in short.
> >>
> >> We've maintained utility tools internally but now we're thinking that
> >> the best is crash utility and makedumpfile supports the sadump format
> >> for the viewpoint of both portability and maintainability.
> >>
> >> We'll be of course responsible for its maintainance in a continuous
> >> manner. The sadump dump format is very similar to diskdump format and
> >> so kdump (compressed) format, so we estimate patch set would be a
> >> relatively small size.
> >>
> >> Could you tell me whether crash utility and makedumpfile can support
> >> the sadump format? If OK, we'll start to make patchset.
I think it's not bad to support sadump by makedumpfile. However I have
several questions.
- Do you want to use makedumpfile to make an existing file that sadump has
dumped small?
- It isn't possible to support the same form as kdump-compressed format
now, is it?
- When the information that makedumpfile reads from a note of /proc/vmcore
(or a header of kdump-compressed format) is added by an extension of
makedumpfile, do you need to modify sadump?
Thanks
tachibana
> >
> > Sure, yes, the crash utility can always support another dumpfile format.
> >
>
> Thanks. It helps a lot.
>
> > It's unclear to me how similar SADUMP is to diskdump/compressed-kdump.
> > Does your internal version patch diskdump.c, or do you maintain your
> > own "sadump.c"? I ask because if your patchset is at all intrusive,
> > I'd prefer it be kept in its own file, primarily for maintainability,
> > but also because SADUMP is essentially a black-box to anybody outside
> > Fujitsu.
>
> What I meant when I used ``similar'' is both literally and
> logically. The format consists of diskdump header-like header, two
> kinds of bitmaps used for the same purpose as those in diskump format,
> and memory data. They can be handled in common with the existing data
> structure, diskdump_data, non-intrusively, so I hope they are placed
> in diskdump.c.
>
> On the other hand, there's a code to be placed at such specific
> area. sadump is triggered depending on kdump's progress and so
> register values to be contained in vmcore varies according to the
> progress: If crash_notes has been initialized when sadump is
> triggered, sadump packs the register values in crash_notes; if not
> yet, packs registers gathered by firmware. This is sadump specific
> processing, so I think putting it in specific sadump.c file is a
> natural and reasonable choise.
>
> Anyway, I have not made any patch set for this. I'll post a patch set
> when I complete.
>
> Again, thanks a lot for the positive answer.
>
> Thanks.
> HATAYAMA, Daisuke
>
>
> _______________________________________________
> kexec mailing list
> kexec(a)lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
1 year, 1 month
[PATCH] display MCNT and PRIVATE when using kmem -p
by Qiao Nuohan
Hello Dave,
I was using ‘kmem �p’ to get status of memory. And I could only get
"PAGE PHYSICAL MAPPING INDEX CNT FLAGS" in 2.6.x kernel and later, which
makes me feel the lack of information. So I think of displaying
‘page._mapcount’ and ‘page.private’, when using ‘kmem -p’.
When adding these two items, I found ‘page._count’ is declared to be
atomic_t whose definition is:
typedef struct {
volatile int counter;
} atomic_t;
However, current crash codes use UINT to get the value of ‘page._count’.
The first patch (0001-kmem_p_6.0.2.patch) is used to change UINT to INT,
and the second one (0002-kmem_p_6.0.2.patch) will add the items talked
above. Both patches are based on crash 6.0.2.
BTW, I have tested these two patches on RHEL6.2_x86_64, RHEL6.2_i386,
RHEL5.8_x86_64 and RHEL5.8_i386.
--
Regards
Qiao Nuohan
12 years, 10 months
[PATCH] [PPC32] Fix vmalloc address translation for BookE
by Suzuki K. Poulose
This patch fixes the vmalloc address translation for BookE.This
patch is based on the PPC44x definitions and may not work fine for
other systems.
crash> mod
mod: cannot access vmalloc'd module memory
crash>
After the patch :
crash> mod
MODULE NAME SIZE OBJECT FILE
d1018fd8 mbcache 6023 (not loaded) [CONFIG_KALLSYMS]
d1077190 jbd 58360 (not loaded) [CONFIG_KALLSYMS]
d107ca98 llc 4525 (not loaded) [CONFIG_KALLSYMS]
d1130de4 ext3 203186 (not loaded) [CONFIG_KALLSYMS]
d114bbac squashfs 26129 (not loaded) [CONFIG_KALLSYMS]
On ppc44x, the virtual-address is split as below :
Bits |0 10|11 19|20 31|
-----------------------------------
| PGD | PMD | PAGE_OFFSET |
-----------------------------------
The PAGE_BASE_ADDR is a 64bit value(of type phys_addr_t).
Note : I am not sure how do we distinguish the different values (PGDIR_SHIFT etc)
for different PPC32 systems. Since there are a lot of different platforms
under PPC32, we need some mechanism to dynamically determine the PGDIR, PTE
shift values. One option is to put the information in the VMCOREINFO.
Or we should hard code these values for each platform and
compile a crash for a particular platform.
Thoughts ?
Signed-off-by: Suzuki K. Poulose <suzuki(a)in.ibm.com>
---
defs.h | 4 ++--
ppc.c | 20 ++++++++++++--------
2 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/defs.h b/defs.h
index 82d51e5..844f369 100755
--- a/defs.h
+++ b/defs.h
@@ -2603,8 +2603,8 @@ struct load_module {
#define VTOP(X) ((unsigned long)(X)-(machdep->kvbase))
#define IS_VMALLOC_ADDR(X) (vt->vmalloc_start && (ulong)(X) >= vt->vmalloc_start)
-#define PGDIR_SHIFT (22)
-#define PTRS_PER_PTE (1024)
+#define PGDIR_SHIFT (21)
+#define PTRS_PER_PTE (512)
#define PTRS_PER_PGD (1024)
#define _PAGE_PRESENT 0x001 /* software: pte contains a translation */
diff --git a/ppc.c b/ppc.c
index 2a10fac..6a1db2a 100755
--- a/ppc.c
+++ b/ppc.c
@@ -381,8 +381,8 @@ ppc_kvtop(struct task_context *tc, ulong kvaddr, physaddr_t *paddr, int verbose)
ulong *page_dir;
ulong *page_middle;
ulong *page_table;
- ulong pgd_pte;
- ulong pte;
+ ulong pgd_pte;
+ unsigned long long pte; /* PTE is 64 bit */
if (!IS_KVADDR(kvaddr))
return FALSE;
@@ -404,9 +404,13 @@ ppc_kvtop(struct task_context *tc, ulong kvaddr, physaddr_t *paddr, int verbose)
fprintf(fp, "PAGE DIRECTORY: %lx\n", (ulong)pgd);
page_dir = pgd + (kvaddr >> PGDIR_SHIFT);
-
- FILL_PGD(PAGEBASE(pgd), KVADDR, PAGESIZE());
- pgd_pte = ULONG(machdep->pgd + PAGEOFFSET(page_dir));
+ /*
+ * The (kvaddr >> PGDIR_SHIFT) may exceed PAGESIZE().
+ * Use PAGEBASE(page_dir) to read the page containing the
+ * translation.
+ */
+ FILL_PGD(PAGEBASE(page_dir), KVADDR, PAGESIZE());
+ pgd_pte = ULONG((unsigned long)machdep->pgd + PAGEOFFSET(page_dir));
if (verbose)
fprintf(fp, " PGD: %lx => %lx\n", (ulong)page_dir, pgd_pte);
@@ -417,7 +421,7 @@ ppc_kvtop(struct task_context *tc, ulong kvaddr, physaddr_t *paddr, int verbose)
page_middle = (ulong *)pgd_pte;
if (machdep->flags & CPU_BOOKE)
- page_table = page_middle + (BTOP(kvaddr) & (PTRS_PER_PTE - 1));
+ page_table = (unsigned long long *)page_middle + (BTOP(kvaddr) & (PTRS_PER_PTE - 1));
else {
page_table = (ulong *)((pgd_pte & (ulong)machdep->pagemask) + machdep->kvbase);
page_table += ((ulong)BTOP(kvaddr) & (PTRS_PER_PTE-1));
@@ -428,10 +432,10 @@ ppc_kvtop(struct task_context *tc, ulong kvaddr, physaddr_t *paddr, int verbose)
(ulong)page_table);
FILL_PTBL(PAGEBASE(page_table), KVADDR, PAGESIZE());
- pte = ULONG(machdep->ptbl + PAGEOFFSET(page_table));
+ pte = ULONGLONG((unsigned long)machdep->ptbl + PAGEOFFSET(page_table));
if (verbose)
- fprintf(fp, " PTE: %lx => %lx\n", (ulong)page_table, pte);
+ fprintf(fp, " PTE: %lx => %llx\n", (ulong)page_table, pte);
if (!(pte & _PAGE_PRESENT)) {
if (pte && verbose) {
12 years, 11 months
crash-6.0.2 won't do module source lines on our kernel
by Bob Montgomery
I've reverted back to crash-5.1.9 and applied my kmem patch to that for
our use here. We're using a 3.1-based kernel, and need the kmem patch
so crash can deal with the change in CONFIG_SLAB, but we're building
with gcc-4.4.5 and don't really need the new gdb in crash-6.0.2, and
crash-6.0.2 is not giving module source line numbers for us with "dis
-l".
This is just a heads up. I don't know why 6.0.2 is failing this, and
since I found the last module source line number problem, it's not my
turn ;-)
Bob Montgomery
12 years, 11 months
introduce a new command to display the disk's information
by Wen Congyang
Hi, Dave
When we investigate the problems of disk I/O, we want to get the disk's
gendisk address and request queue's address easily, and the requests num
is also important.
Tha attached patch introduce a new command diskio to display such information.
Thanks
Wen Congyang
12 years, 11 months
[PATCH] ARM: corrupted pages tables vs unwind
by Rabin Vincent
I have access to a system whose crashdump extraction mechanism
unfortunately trashes the first few hundred kilobytes of physical
memory, which includes the page tables at swapper_pg_dir.
crash does chug along and remains quite useful without these page tables
since much of the interesting information is in the direct mapped
region, but it disables the use of the unwind tables because it
fails to read the module unwind tables, which are placed at a
non-direct-mapped address.
The patch below allows unwind tables to be used only for core kernel
addresses if the module tables are inaccessible.
Alternatively, we could perhaps not attempt to read the
module unwind tables when --no_modules is specified.
Rabin
diff --git a/unwind_arm.c b/unwind_arm.c
index 6554804..a21c592 100644
--- a/unwind_arm.c
+++ b/unwind_arm.c
@@ -148,8 +148,6 @@ init_unwind_tables(void)
if (!init_module_unwind_tables()) {
error(WARNING,
"UNWIND: failed to initialize module unwind tables\n");
- free_kernel_unwind_table();
- return FALSE;
}
/*
@@ -347,6 +345,7 @@ fail:
}
free(module_unwind_tables);
+ module_unwind_tables = NULL;
return FALSE;
}
@@ -536,7 +535,7 @@ search_table(ulong ip)
*/
if (is_core_kernel_text(ip)) {
return kernel_unwind_table;
- } else {
+ } else if (module_unwind_tables) {
struct unwind_table *tbl;
for (tbl = &module_unwind_tables[0]; tbl->idx; tbl++) {
12 years, 11 months
Problem in command net -s
by Karlsson, Jan
Hi Dave
I found a problem with the net -s command. It concerns line 1451 in net.c
struct_socket = inode - SIZE(socket);
As I understand it we have the type
struct socket_alloc {
struct socket socket;
struct inode vfs_inode;
}
and we have the address of the second field and want the address of the first. The calculation, using the size of the socket struct, used in net.c require that the second field is aligned directly after the first field. This is unfortunately not true in cases I have seen. By changing the line 1451 to:
struct_socket = inode - MEMBER_OFFSET("socket_alloc", "vfs_inode");
things work better.
Is this something you would like to change in Crash? I assume you will move the offset calculation to somewhere else so it is only performed once.
Jan
12 years, 11 months
crash failing with CentOS 5 under VMware
by Brian Reichert
I'm trying to explore crash dumps under these conditions:
- VMware workstation 7.1.5 build-491717
- CentOS 5.7 +updates, as of today
- kernel 2.6.18-274.17.1.el5 x86
- crash 6.0.2
I've sucessfully enabled kdump to generate crash dumps, but the
'crash' utility can neither find the vmlinux image for the live
system, nor match the crash dump with the vmlinux image I direct
it to.
Misc RPMs installed:
[root@172-20-1-25 modules]# uname -r; rpm -qa | grep kernel
2.6.18-274.17.1.el5
kernel-2.6.18-274.17.1.el5
kernel-debug-2.6.18-274.17.1.el5
kernel-debuginfo-2.6.18-274.17.1.el5.centos.plus
kernel-debuginfo-common-2.6.18-274.17.1.el5.centos.plus
kernel-devel-2.6.18-274.17.1.el5
Here are various invocations I've attempted; can anyone suggest what's
causing these failures, and how to work around them?
Please let me know if there are any details I can provide to assist.
Additional notes:
- I've tries both crashkernel=64M@16M and crashkernel=128M@16M as
kernel arguments, with the same results.
- I've attempted a symlink to match the debug modules directly to
the name of the kernel, but that didn't change the results:
2.6.18-274.17.1.el5.centos.plus -> 2.6.18-274.17.1.el5
-------------------------------
Against the live system:
[root@172-20-1-25 modules]# crash
crash 6.0.2
[...]
crash: cannot find booted kernel -- please enter namelist argument
-------------------------------
Against the vmlinux file supplied by the kernel-debuginfo RPM:
[root@172-20-1-25 modules]# crash /usr/lib/debug/lib/modules/2.6.18-274.17.1.el5.centos.plus/vmlinux
crash 6.0.2
[...]
WARNING: /usr/lib/debug/lib/modules/2.6.18-274.17.1.el5.centos.plus/vmlinux
and /proc/version do not match!
WARNING: /proc/version indicates kernel version: 2.6.18-274.17.1.el5
crash: please use the vmlinux file for that kernel version, or try using
the System.map for that kernel version as an additional argument.
-------------------------------
Against my crash dump:
[root@172-20-1-25 modules]# crash /usr/lib/debug/lib/modules/2.6.18-274.17.1.el5.centos.plus/vmlinux /home/crash/127.0.0.1-2012-01-26-19\:19\:19/vmcore
crash 6.0.2
[...]
crash: cannot determine base kernel version
WARNING: cannot read linux_banner string
crash: /usr/lib/debug/lib/modules/2.6.18-274.17.1.el5.centos.plus/vmlinux and /home/crash/127.0.0.1-2012-01-26-19:19:19/vmcore do not match!
--
Brian Reichert <reichert(a)numachi.com>
BSD admin/developer at large
12 years, 11 months