Re: [Crash-utility] Kernel Crash Analysis on Android
by Shankar, AmarX
Hi Dave,
Thanks for your info regarding kexec tool.
I am unable to download kexec from below link.
http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/kexec-too...
It says HTTP 404 Page Not Found.
Could you please guide me on this?
Thanks & Regards,
Amar Shankar
> On Wed, Mar 21, 2012 at 06:00:00PM +0000, Shankar, AmarX wrote:
>
> > I want to do kernel crash Analysis on Android Merrifield Target.
> >
> > Could someone please help me how to do it?
>
> Merrifield is pretty much similar than Medfield, e.g it has x86 core. So I
> guess you can follow the instructions how to setup kdump on x86 (see
> Documentation/kdump/kdump.txt) unless you already have that configured.
>
> crash should support this directly presuming you have vmlinux/vmcore files to
> feed it. You can configure crash to support x86 on x86_64 host by running:
>
> % make target=X86
> & make
>
> (or something along those lines).
Right -- just the first make command will suffice, i.e., when running
on an x86_64 host:
$ wget http://people.redhat.com/anderson/crash-6.0.4.tar.gz
$ tar xzf crash-6.0.4.tar.gz
...
$ cd crash-6.0.4
$ make target=X86
...
$ ./crash <path-to>/vmlinux <path-to>/vmcore
Dave
From: Shankar, AmarX
Sent: Wednesday, March 21, 2012 11:30 PM
To: 'crash-utility(a)redhat.com'
Subject: Kernel Crash Analysis on Android
Hi,
I want to do kernel crash Analysis on Android Merrifield Target.
Could someone please help me how to do it?
Thanks & Regards,
Amar Shankar
1 year
[PATCH] kmem, snap: iomem/ioport display and vmcore snapshot support
by HATAYAMA Daisuke
Some days ago I was in a situation that I had to convert vmcore in
kvmdump format into ELF since some extension module we have locally
can be used only on relatively old crash utility, around version 4,
but such old crash utility cannot handle kvmdump format.
To do the conversion in handy, I used snap command with some modifications
so that it tries to use iomem information in vmcore instead of host's
/proc/iomem. This patch is its cleaned-up version.
In this development, I naturally got down to also making an interface
for an access to resource objects, and so together with the snap
command's patch, I also extended kmem command for iomem/ioport
support. Actually:
kmem -r displays /proc/iomem
crash> kmem -r
00000000-0000ffff : reserved
00010000-0009dbff : System RAM
0009dc00-0009ffff : reserved
000c0000-000c7fff : Video ROM
...
and kmem -R displays /proc/ioport
crash> kmem -R
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
...
Looking into old version of kernel source code back, resource structure
has been unchanged since linux-2.4.0. I borrowed the way of walking on
resouce tree in this patch from the lastest v3.3-rc series, but I
guess the logic is also applicable to old kernels. I expect Dave's
regression testsuite.
Also, there would be another command more sutable for iomem/ioport.
If necessay, I'll repost the patch.
---
HATAYAMA Daisuke (4):
Add vmcore snapshot support
Add kmem -r and -R options
Add dump iomem/ioport functions; a helper for resource objects
Add a helper function for iterating resource objects
defs.h | 9 ++++
extensions/snap.c | 54 ++++++++++++++++++++++-
help.c | 2 +
memory.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++--
4 files changed, 180 insertions(+), 7 deletions(-)
--
Thanks.
HATAYAMA Daisuke
1 year
Re: [Crash-utility] question about phys_base
by Dave Anderson
----- Original Message -----
> >
> > OK, so then I don't understand what you mean by "may be the same"?
> >
> > You didn't answer my original question, but if I understand you correctly,
> > it would be impossible for the qemu host to create a PT_LOAD segment that
> > describes an x86_64 guest's __START_KERNEL_map region, because the host
> > doesn't know that what kind of kernel the guest is running.
>
> Yes. Even if the guest is linux, it is still impossible to do it. Because
> the guest maybe in the second kernel.
>
> qemu-dump walks all guest's page table and collect virtual address and
> physical address mapping. If the page is not used by guest, the virtual is set
> to 0. I create PT_LOAD according to such mapping. So if the guest is linux,
> there may be a PT_LOAD segment that describes __START_KERNEL_map region.
> But the information stored in PT_LOAD maybe for the second kernel. If crash
> uses it, crash will see the second kernel, not the first kernel.
Just to be clear -- what do you mean by the "second" kernel? Do you
mean that a guest kernel crashed guest, and did a kdump operation,
and that second kdump kernel failed somehow, and now you're trying
to do a "virsh dump" on the kdump kernel?
Dave
1 year
question about phys_base
by Wen Congyang
Hi, Dave
I am implementing a new dump command in the qemu. The vmcore's
format is elf(like kdump). And I try to provide phys_base in
the PT_LOAD. But if the os uses the first vcpu do kdump, the
value of phys_base is wrong.
I find a function x86_64_virt_phys_base() in crash's code.
Is it OK to call this function first? If the function
successes, we do not calculate phys_base according to PT_LOAD.
Thanks
Wen Congyang
1 year
[PATCH] runq: search current task's runqueue explicitly
by HATAYAMA Daisuke
Currently, runq sub-command doesn't consider CFS runqueue's current
task removed from CFS runqueue. Due to this, the remaining CFS
runqueus that follow the current task's is not displayed. This patch
fixes this by making runq sub-command search current task's runqueue
explicitly.
Note that CFS runqueue exists for each task group, and so does CFS
runqueue's current task, and the above search needs to be done
recursively.
Test
====
On vmcore I made 7 task groups:
root group --- A --- AA --- AAA
+ +- AAB
|
+- AB --- ABA
+- ABB
and then I ran three CPU bound tasks, which is exactly the same as
int main(void) { for (;;) continue; return 0; }
for each task group, including root group; so total 24 tasks. For
readability, I annotated each task name with its belonging group name.
For example, loop.ABA belongs to task group ABA.
Look at CPU0 collumn below. [before] lacks 8 tasks and [after]
successfully shows all tasks on the runqueue, which is identical to
the result of [sched debug] that is expected to ouput correct result.
I'll send this vmcore later.
[before]
crash> runq | cat
CPU 0 RUNQUEUE: ffff88000a215f80
CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA"
RT PRIO_ARRAY: ffff88000a216098
[no tasks queued]
CFS RB_ROOT: ffff88000a216010
[120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA"
<cut>
[after]
crash_fix> runq
CPU 0 RUNQUEUE: ffff88000a215f80
CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA"
RT PRIO_ARRAY: ffff88000a216098
[no tasks queued]
CFS RB_ROOT: ffff88000a216010
[120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA"
[120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB"
[120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB"
[120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB"
[120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB"
[120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA"
[120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA"
[120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA"
[120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A"
<cut>
[sched debug]
crash> runq -d
CPU 0
[120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A"
[120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA"
[120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA"
[120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA"
[120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB"
[120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB"
[120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA"
[120] PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA"
[120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB"
[120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB"
<cut>
Diff stat
=========
defs.h | 1 +
task.c | 37 +++++++++++++++++--------------------
2 files changed, 18 insertions(+), 20 deletions(-)
Thanks.
HATAYAMA, Daisuke
1 year
[RFC] makedumpfile, crash: LZO compression support
by HATAYAMA Daisuke
Hello,
This is a RFC patch set that adds LZO compression support to
makedumpfile and crash utility. LZO is as good as in size but by far
better in speed than ZLIB, leading to reducing down time during
generation of crash dump and refiltering.
How to build:
1. Get LZO library, which is provided as lzo-devel package on recent
linux distributions, and is also available on author's website:
http://www.oberhumer.com/opensource/lzo/.
2. Apply the patch set to makedumpfile v1.4.0 and crash v6.0.0.
3. Build both using make. But for crash, do the following now:
$ make CFLAGS="-llzo2"
How to use:
I've newly used -l option for lzo compression in this patch. So for
example, do as follows:
$ makedumpfile -l vmcore dumpfile
$ crash vmlinux dumpfile
Request of configure-like feature for crash utility:
I would like configure-like feature on crash utility for users to
select wheather to add LZO feature actually or not in build-time,
that is: ./configure --enable-lzo or ./configure --disable-lzo.
The reason is that support staff often downloads and installs the
latest version of crash utility on machines where lzo library is not
provided.
Looking at the source code, it looks to me that crash does some kind
of configuration processing in a local manner, around configure.c,
and I guess it's difficult to use autoconf tools directly.
Or is there another better way?
Performance Comparison:
Sample Data
Ideally, I must have measured the performance for many enough
vmcores generated from machines that was actually running, but now
I don't have enough sample vmcores, I couldn't do so. So this
comparison doesn't answer question on I/O time improvement. This
is TODO for now.
Instead, I choosed worst and best cases regarding compression
ratio and speed only. Specifically, the former is /dev/urandom and
the latter is /dev/zero.
I get the sample data of 10MB, 100MB and 1GB by doing like this:
$ dd bs=4096 count=$((1024*1024*1024/4096)) if=/dev/urandom of=urandom.1GB
How to measure
Then I performed compression for each block, 4096 bytes, and
measured total compression time and output size. See attached
mycompress.c.
Result
See attached file result.txt.
Discussion
For both kinds of data, lzo's compression was considerably quicker
than zlib's. Compression ratio is about 37% for urandom data, and
about 8.5% for zero data. Actual situation of physical memory
would be in between the two cases, and so I guess average
compression time ratio is between 37% and 8.5%.
Although beyond the topic of this patch set, we can estimate worst
compression time on more data size since compression is performed
block size wise and the compression time increases
linearly. Estimated worst time on 2TB memory is about 15 hours for
lzo and about 40 hours for zlib. In this case, compressed data
size is larger than the original, so they are really not used,
compression time is fully meaningless. I think compression must be
done in parallel, and I'll post such patch later.
Diffstat
* makedumpfile
diskdump_mod.h | 3 +-
makedumpfile.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++------
makedumpfile.h | 12 +++++++
3 files changed, 101 insertions(+), 12 deletions(-)
* crash
defs.h | 1 +
diskdump.c | 20 +++++++++++++++++++-
diskdump.h | 3 ++-
3 files changed, 22 insertions(+), 2 deletions(-)
TODO
* evaluation including I/O time using actual vmcores
Thanks.
HATAYAMA, Daisuke
1 year
Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format
by tachibana@mxm.nes.nec.co.jp
Hi Hatayama-san,
On 2011/06/29 12:12:18 +0900, HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com> wrote:
> From: Dave Anderson <anderson(a)redhat.com>
> Subject: Re: [Crash-utility] [RFI] Support Fujitsu's sadump dump format
> Date: Tue, 28 Jun 2011 08:57:42 -0400 (EDT)
>
> >
> >
> > ----- Original Message -----
> >> Fujitsu has stand-alone dump mechanism based on firmware level
> >> functionality, which we call SADUMP, in short.
> >>
> >> We've maintained utility tools internally but now we're thinking that
> >> the best is crash utility and makedumpfile supports the sadump format
> >> for the viewpoint of both portability and maintainability.
> >>
> >> We'll be of course responsible for its maintainance in a continuous
> >> manner. The sadump dump format is very similar to diskdump format and
> >> so kdump (compressed) format, so we estimate patch set would be a
> >> relatively small size.
> >>
> >> Could you tell me whether crash utility and makedumpfile can support
> >> the sadump format? If OK, we'll start to make patchset.
I think it's not bad to support sadump by makedumpfile. However I have
several questions.
- Do you want to use makedumpfile to make an existing file that sadump has
dumped small?
- It isn't possible to support the same form as kdump-compressed format
now, is it?
- When the information that makedumpfile reads from a note of /proc/vmcore
(or a header of kdump-compressed format) is added by an extension of
makedumpfile, do you need to modify sadump?
Thanks
tachibana
> >
> > Sure, yes, the crash utility can always support another dumpfile format.
> >
>
> Thanks. It helps a lot.
>
> > It's unclear to me how similar SADUMP is to diskdump/compressed-kdump.
> > Does your internal version patch diskdump.c, or do you maintain your
> > own "sadump.c"? I ask because if your patchset is at all intrusive,
> > I'd prefer it be kept in its own file, primarily for maintainability,
> > but also because SADUMP is essentially a black-box to anybody outside
> > Fujitsu.
>
> What I meant when I used ``similar'' is both literally and
> logically. The format consists of diskdump header-like header, two
> kinds of bitmaps used for the same purpose as those in diskump format,
> and memory data. They can be handled in common with the existing data
> structure, diskdump_data, non-intrusively, so I hope they are placed
> in diskdump.c.
>
> On the other hand, there's a code to be placed at such specific
> area. sadump is triggered depending on kdump's progress and so
> register values to be contained in vmcore varies according to the
> progress: If crash_notes has been initialized when sadump is
> triggered, sadump packs the register values in crash_notes; if not
> yet, packs registers gathered by firmware. This is sadump specific
> processing, so I think putting it in specific sadump.c file is a
> natural and reasonable choise.
>
> Anyway, I have not made any patch set for this. I'll post a patch set
> when I complete.
>
> Again, thanks a lot for the positive answer.
>
> Thanks.
> HATAYAMA, Daisuke
>
>
> _______________________________________________
> kexec mailing list
> kexec(a)lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
1 year
crash: vz extension for OpenVZ kernels
by Vasily Averin
Dear Dave,
I've prepared vz crash extension with commands
useful for troubleshooting of OpenVZ kernel crashes.
Could you please review the patch and advise how it's better
to distribute this extension.
vz extension uses openVZ-specific kernel structures
and works on OpenVZ kernels only.
It implements vzlist, vzps and ctid commands:
vzlist -- shows list of OpenVZ containers,
contains Container ID, references to ve_struct
and parent task inside container.
crash> vzlist
CTID VE_STRUCT TASK PID COMM
121 ffff8801491e7000 ffff8801493d0ff0 95990 init
123 ffff880135a37000 ffff8803fb0a3470 95924 init
321 ffff88045a778000 ffff880400616300 95923 init
700 ffff88019ddae000 ffff88019ddd4fb0 95882 init
503 ffff88045a84e800 ffff8803c3c782c0 95902 init
122 ffff8804004ea000 ffff88045612afb0 95886 init
600 ffff88016e467000 ffff880459d653f0 95885 init
0 ffffffff81aaa220 ffff88045e530b30 1 init
vzps -- shows list of processes inside of specified container
crash> vzps -E 121
CTID PID TASK COMM
121 95990 ffff8801493d0ff0 init
121 95996 ffff8803c3e3b0f0 kthreadd/121
121 95997 ffff8803cd3aacb0 khelper/121
121 97267 ffff880405e4b2f0 udevd
121 99341 ffff8803c3fd2440 syslogd
121 99404 ffff880405e0c2c0 klogd
121 99424 ffff8803fb0f68c0 sshd
121 99445 ffff8801493d0500 xinetd
121 99557 ffff8804599f9230 sendmail
121 99568 ffff8804591b00c0 sendmail
121 99583 ffff880203e56f30 httpd
121 99594 ffff88016e4e01c0 crond
121 99614 ffff8803fb26cf70 xfs
121 99624 ffff88045a6ce2c0 saslauthd
121 99625 ffff8801ce134ff0 saslauthd
121 248691 ffff88040e2ee9c0 httpd
ctid -- shows ContainerID for given task
crash> ctid 99614
CTID PID TASK COMM
121 99614 ffff8803fb26cf70 xfs
Thank you,
Vasily Averin
8 years, 8 months
[PATCH v4] files: support dump file memory mapping
by yangoliver
Hi Dave,
This is v4 patch for files memory mapping dump support.
The major changes in this version are,
1. Your alignment patch for NRPAGES
2. Changed files -a to files -p
Changed output and displayed INODE, ADDRESS_SPACE, NRPAGES
at beginning.
3. Updated help.c and added exmaple outputs for new options.
4. Some minor code cleanup, for function name defined in defs.h
Here is my patch,
Added two options in files command,
1. -m option, which allows dump file mapping and
page count for each files
2. -p option, which could dump each pages within
the mapping for given inode address
The foreach command also could work with -m, so
that we can easily find which processes/files hold
biggest page cache within the system.
Signed-off-by: Yong Yang <yangoliver(a)gmail.com>
---
defs.h | 6 +++
filesys.c | 166 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
help.c | 39 ++++++++++++++-
memory.c | 61 +++++++++++++++++++++++
symbols.c | 2 +
task.c | 19 +++++--
6 files changed, 271 insertions(+), 22 deletions(-)
diff --git a/defs.h b/defs.h
index b25b505..ba4e0d8 100644
--- a/defs.h
+++ b/defs.h
@@ -1940,6 +1940,7 @@ struct offset_table { /* stash of commonly-used offsets */
long task_struct_thread_reg31;
long pt_regs_regs;
long pt_regs_cp0_badvaddr;
+ long address_space_page_tree;
};
struct size_table { /* stash of commonly-used sizes */
@@ -2598,6 +2599,7 @@ struct load_module {
#define PRINT_SINGLE_VMA (0x80)
#define PRINT_RADIX_10 (0x100)
#define PRINT_RADIX_16 (0x200)
+#define PRINT_PAGES (0x400)
#define MIN_PAGE_SIZE (4096)
@@ -4707,6 +4709,8 @@ void alter_stackbuf(struct bt_info *);
int vaddr_type(ulong, struct task_context *);
char *format_stack_entry(struct bt_info *bt, char *, ulong, ulong);
int in_user_stack(ulong, ulong);
+void dump_file_addr_mapping(ulong);
+long get_file_mapping_nrpages(ulong);
/*
* filesys.c
@@ -4743,6 +4747,7 @@ int is_readable(char *);
#define RADIX_TREE_SEARCH (2)
#define RADIX_TREE_DUMP (3)
#define RADIX_TREE_GATHER (4)
+#define RADIX_TREE_DUMP_CB (5)
struct radix_tree_pair {
ulong index;
void *value;
@@ -4753,6 +4758,7 @@ int file_dump(ulong, ulong, ulong, int, int);
#define DUMP_INODE_ONLY 2
#define DUMP_DENTRY_ONLY 4
#define DUMP_EMPTY_FILE 8
+#define DUMP_FILE_PAGE 16
#endif /* !GDB_COMMON */
int same_file(char *, char *);
#ifndef GDB_COMMON
diff --git a/filesys.c b/filesys.c
index 0573fe6..a54576f 100644
--- a/filesys.c
+++ b/filesys.c
@@ -49,7 +49,7 @@ static void *radix_tree_lookup(ulong, ulong, int);
static int match_file_string(char *, char *, char *);
static ulong get_root_vfsmount(char *);
static void check_live_arch_mismatch(void);
-
+static void dump_file_addr_space(ulong);
#define DENTRY_CACHE (20)
#define INODE_CACHE (20)
@@ -2167,6 +2167,50 @@ show_hit_rates:
}
}
+static void
+dump_file_addr_space(ulong inode)
+{
+ char *inode_buf;
+ ulong i_mapping;
+ ulong nrpages;
+ char header[BUFSIZE];
+ char buf1[BUFSIZE];
+ char buf2[BUFSIZE];
+ char buf3[BUFSIZE];
+
+ inode_buf = GETBUF(SIZE(inode));
+ readmem(inode, KVADDR, inode_buf, SIZE(inode), "inode buffer",
+ FAULT_ON_ERROR);
+
+ i_mapping = ULONG(inode_buf + OFFSET(inode_i_mapping));
+ nrpages = get_file_mapping_nrpages(i_mapping);
+
+ sprintf(header, "%s%s%s%sNRPAGES\n",
+ mkstring(buf1, VADDR_PRLEN, CENTER|LJUST, "INODE"),
+ space(MINSPACE),
+ mkstring(buf2, VADDR_PRLEN, CENTER|LJUST, "MAPPING"),
+ space(MINSPACE));
+ fprintf(fp, "%s", header);
+
+ fprintf(fp, "%s%s%s%s%s\n\n",
+ mkstring(buf1, VADDR_PRLEN,
+ CENTER|RJUST|LONG_HEX,
+ MKSTR(inode)),
+ space(MINSPACE),
+ mkstring(buf2, VADDR_PRLEN,
+ CENTER|RJUST|LONG_HEX,
+ MKSTR(i_mapping)),
+ space(MINSPACE),
+ mkstring(buf3, strlen("NRPAGES"),
+ RJUST|LONG_DEC,
+ MKSTR(nrpages)));
+
+ dump_file_addr_mapping(i_mapping);
+
+ FREEBUF(inode_buf);
+ return;
+}
+
/*
* This command displays information about the open files of a context.
* For each open file descriptor the file descriptor number, a pointer
@@ -2187,11 +2231,12 @@ cmd_files(void)
int subsequent;
struct reference reference, *ref;
char *refarg;
+ int open_flags = 0;
ref = NULL;
refarg = NULL;
- while ((c = getopt(argcnt, args, "d:R:")) != EOF) {
+ while ((c = getopt(argcnt, args, "d:R:p:m")) != EOF) {
switch(c)
{
case 'R':
@@ -2210,6 +2255,24 @@ cmd_files(void)
display_dentry_info(value);
return;
+ case 'p':
+ if (VALID_MEMBER(address_space_page_tree) &&
+ VALID_MEMBER(inode_i_mapping)) {
+ value = htol(optarg, FAULT_ON_ERROR, NULL);
+ dump_file_addr_space(value);
+ } else {
+ option_not_supported('p');
+ }
+ return;
+
+ case 'm':
+ if (VALID_MEMBER(address_space_page_tree) &&
+ VALID_MEMBER(inode_i_mapping))
+ open_flags |= PRINT_PAGES;
+ else
+ option_not_supported('m');
+ break;
+
default:
argerrs++;
break;
@@ -2222,7 +2285,9 @@ cmd_files(void)
if (!args[optind]) {
if (!ref)
print_task_header(fp, CURRENT_CONTEXT(), 0);
- open_files_dump(CURRENT_TASK(), 0, ref);
+
+ open_files_dump(CURRENT_TASK(), open_flags, ref);
+
return;
}
@@ -2241,7 +2306,7 @@ cmd_files(void)
for (tc = pid_to_context(value); tc; tc = tc->tc_next) {
if (!ref)
print_task_header(fp, tc, subsequent);
- open_files_dump(tc->task, 0, ref);
+ open_files_dump(tc->task, open_flags, ref);
fprintf(fp, "\n");
}
break;
@@ -2249,7 +2314,7 @@ cmd_files(void)
case STR_TASK:
if (!ref)
print_task_header(fp, tc, subsequent);
- open_files_dump(tc->task, 0, ref);
+ open_files_dump(tc->task, open_flags, ref);
break;
case STR_INVALID:
@@ -2321,6 +2386,7 @@ open_files_dump(ulong task, int flags, struct reference *ref)
char buf4[BUFSIZE];
char root_pwd[BUFSIZE];
int root_pwd_printed = 0;
+ int file_dump_flags = 0;
BZERO(root_pathname, BUFSIZE);
BZERO(pwd_pathname, BUFSIZE);
@@ -2329,15 +2395,26 @@ open_files_dump(ulong task, int flags, struct reference *ref)
fdtable_buf = GETBUF(SIZE(fdtable));
fill_task_struct(task);
- sprintf(files_header, " FD%s%s%s%s%s%s%sTYPE%sPATH\n",
- space(MINSPACE),
- mkstring(buf1, VADDR_PRLEN, CENTER|LJUST, "FILE"),
- space(MINSPACE),
- mkstring(buf2, VADDR_PRLEN, CENTER|LJUST, "DENTRY"),
- space(MINSPACE),
- mkstring(buf3, VADDR_PRLEN, CENTER|LJUST, "INODE"),
- space(MINSPACE),
- space(MINSPACE));
+ if (flags & PRINT_PAGES) {
+ sprintf(files_header, " FD%s%s%s%s%sNRPAGES%sTYPE%sPATH\n",
+ space(MINSPACE),
+ mkstring(buf1, VADDR_PRLEN, CENTER|LJUST, "INODE"),
+ space(MINSPACE),
+ mkstring(buf2, VADDR_PRLEN, CENTER|LJUST, "MAPPING"),
+ space(MINSPACE),
+ space(MINSPACE),
+ space(MINSPACE));
+ } else {
+ sprintf(files_header, " FD%s%s%s%s%s%s%sTYPE%sPATH\n",
+ space(MINSPACE),
+ mkstring(buf1, VADDR_PRLEN, CENTER|LJUST, "FILE"),
+ space(MINSPACE),
+ mkstring(buf2, VADDR_PRLEN, CENTER|LJUST, "DENTRY"),
+ space(MINSPACE),
+ mkstring(buf3, VADDR_PRLEN, CENTER|LJUST, "INODE"),
+ space(MINSPACE),
+ space(MINSPACE));
+ }
tc = task_to_context(task);
@@ -2523,6 +2600,10 @@ open_files_dump(ulong task, int flags, struct reference *ref)
return;
}
+ file_dump_flags = DUMP_FULL_NAME | DUMP_EMPTY_FILE;
+ if (flags & PRINT_PAGES)
+ file_dump_flags |= DUMP_FILE_PAGE;
+
j = 0;
for (;;) {
unsigned long set;
@@ -2539,8 +2620,7 @@ open_files_dump(ulong task, int flags, struct reference *ref)
if (ref && file) {
open_tmpfile();
- if (file_dump(file, 0, 0, i,
- DUMP_FULL_NAME|DUMP_EMPTY_FILE)) {
+ if (file_dump(file, 0, 0, i, file_dump_flags)) {
BZERO(buf4, BUFSIZE);
rewind(pc->tmpfile);
ret = fgets(buf4, BUFSIZE,
@@ -2558,8 +2638,7 @@ open_files_dump(ulong task, int flags, struct reference *ref)
fprintf(fp, "%s", files_header);
header_printed = 1;
}
- file_dump(file, 0, 0, i,
- DUMP_FULL_NAME|DUMP_EMPTY_FILE);
+ file_dump(file, 0, 0, i, file_dump_flags);
}
}
i++;
@@ -2754,6 +2833,8 @@ file_dump(ulong file, ulong dentry, ulong inode, int fd, int flags)
char buf1[BUFSIZE];
char buf2[BUFSIZE];
char buf3[BUFSIZE];
+ ulong i_mapping = 0;
+ ulong nrpages = 0;
file_buf = NULL;
@@ -2863,6 +2944,28 @@ file_dump(ulong file, ulong dentry, ulong inode, int fd, int flags)
type,
space(MINSPACE),
pathname+1);
+ } else if (flags & DUMP_FILE_PAGE) {
+ i_mapping = ULONG(inode_buf + OFFSET(inode_i_mapping));
+ nrpages = get_file_mapping_nrpages(i_mapping);
+
+ fprintf(fp, "%3d%s%s%s%s%s%s%s%s%s%s\n",
+ fd,
+ space(MINSPACE),
+ mkstring(buf1, VADDR_PRLEN,
+ CENTER|RJUST|LONG_HEX,
+ MKSTR(inode)),
+ space(MINSPACE),
+ mkstring(buf2, VADDR_PRLEN,
+ CENTER|RJUST|LONG_HEX,
+ MKSTR(i_mapping)),
+ space(MINSPACE),
+ mkstring(buf3, strlen("NRPAGES"),
+ RJUST|LONG_DEC,
+ MKSTR(nrpages)),
+ space(MINSPACE),
+ type,
+ space(MINSPACE),
+ pathname);
} else {
fprintf(fp, "%3d%s%s%s%s%s%s%s%s%s%s\n",
fd,
@@ -3870,6 +3973,9 @@ ulong RADIX_TREE_MAP_MASK = UNINITIALIZED;
* limit the number of returned entries by putting the array size
* (max count) in the rtp->index field of the first structure
* in the passed-in array.
+ * RADIX_TREE_DUMP_CB - Similar with RADIX_TREE_DUMP, but for each
+ * radix tree entry, a user defined callback at rtp->value will
+ * be invoked.
*
* rtp: Unused by RADIX_TREE_COUNT and RADIX_TREE_DUMP.
* A pointer to a radix_tree_pair structure for RADIX_TREE_SEARCH.
@@ -3877,6 +3983,8 @@ ulong RADIX_TREE_MAP_MASK = UNINITIALIZED;
* RADIX_TREE_GATHER; the dimension (max count) of the array may
* be stored in the index field of the first structure to avoid
* any chance of an overrun.
+ * For RADIX_TREE_DUMP_CB, the rtp->value need to be initialized as
+ * callback function. The callback prototype must be int (*)(ulong);
*/
ulong
do_radix_tree(ulong root, int flag, struct radix_tree_pair *rtp)
@@ -3889,6 +3997,7 @@ do_radix_tree(ulong root, int flag, struct radix_tree_pair *rtp)
struct radix_tree_pair *r;
ulong root_rnode;
void *ret;
+ int (*cb)(ulong) = NULL;
count = 0;
@@ -3993,6 +4102,27 @@ do_radix_tree(ulong root, int flag, struct radix_tree_pair *rtp)
}
break;
+ case RADIX_TREE_DUMP_CB:
+ if (rtp->value == NULL) {
+ error(FATAL, "do_radix_tree: need set callback function");
+ return -EINVAL;
+ }
+ cb = (int (*)(ulong))rtp->value;
+ for (index = count = 0; index <= maxindex; index++) {
+ if ((ret =
+ radix_tree_lookup(root_rnode, index, height))) {
+ /* Caller defined operation */
+ if (cb((ulong)ret) != 0) {
+ error(FATAL, "do_radix_tree: dump "
+ "operation failed, count: %ld\n",
+ count);
+ return -EIO;
+ }
+ count++;
+ }
+ }
+ break;
+
default:
error(FATAL, "do_radix_tree: invalid flag: %lx\n", flag);
}
diff --git a/help.c b/help.c
index f36316f..25df6e5 100644
--- a/help.c
+++ b/help.c
@@ -6488,7 +6488,7 @@ NULL
char *help_files[] = {
"files",
"open files",
-"[-d dentry] | [-R reference] [pid | taskp] ... ",
+"[-d dentry] | [-p inode] | [-m] [-R reference] [pid | taskp] ... ",
" This command displays information about open files of a context.",
" It prints the context's current root directory and current working",
" directory, and then for each open file descriptor it prints a pointer",
@@ -6501,6 +6501,10 @@ char *help_files[] = {
" specific, and only shows the data requested.\n",
" -d dentry given a hexadecimal dentry address, display its inode,",
" super block, file type, and full pathname.",
+" -p inode given a hexadecimal inode address, dump all memory pages in",
+" its address space.",
+" -m show inode memory mapping information, including mapping",
+" address, page counts within the mapping.",
" -R reference search for references to this file descriptor number,",
" filename, or dentry, inode, or file structure address.",
" pid a process PID.",
@@ -6578,6 +6582,39 @@ char *help_files[] = {
" %s> files -d f745fd60",
" DENTRY INODE SUPERBLK TYPE PATH",
" f745fd60 f7284640 f73a3e00 REG /var/spool/lpd/lpd.lock",
+" ",
+" Show all tasks file mappings for REG file type:\n",
+" %s> foreach files -m -R REG",
+" PID: 1 TASK: f5c94000 CPU: 0 COMMAND: \"systemd\"",
+" ROOT: / CWD: /",
+" FD INODE MAPPING NRPAGES TYPE PATH",
+" 29 f5b7f338 f5b7f404 0 REG /proc/1/mountinfo",
+" 32 f5b728f0 f5b729bc 0 REG /proc/swaps",
+" ",
+" PID: 241 TASK: f5fcb020 CPU: 0 COMMAND: \"systemd-journal\"",
+" ROOT: / CWD: /",
+" FD INODE MAPPING NRPAGES TYPE PATH",
+" 16 f560a820 f560a8ec 1359 REG /var/log/journal/1f05.../system.journal",
+" 32 f3e42fb8 f3e43084 3 REG /var/log/journal/1f05.../user-42.journal",
+" 38 f577efb8 f577f084 438 REG /var/log/journal/1f05.../user-1000.journal",
+" <...snipped...>",
+" ",
+" PID: 280 TASK: f5d17020 CPU: 0 COMMAND: \"systemd-udevd\"",
+" ROOT: / CWD: /",
+" FD INODE MAPPING NRPAGES TYPE PATH",
+" 6 ea5adc0c ea5adcd8 1 REG /run/udev/queue.bin",
+" 11 f554efb8 f554f084 0 REG /etc/udev/hwdb.bin",
+" ",
+" Display file mapping and pages information about the inode at address f3e42fb8:\n",
+" %s> files -p f3e42fb8",
+" INODE MAPPING NRPAGES",
+" f3e42fb8 f3e43084 3",
+" ",
+" PAGE PHYSICAL MAPPING INDEX CNT FLAGS",
+" f71d4e60 1ebf3000 f3e43084 0 3 4002002c referenced,uptodate,lru,mappedtodisk",
+" f6eabf80 577c000 f3e43084 394 2 4002006c referenced,uptodate,lru,active,mappedtodisk",
+" f6e6fd60 396b000 f3e43084 396 2 4002006c referenced,uptodate,lru,active,mappedtodisk",
+" ",
NULL
};
diff --git a/memory.c b/memory.c
index 765732b..973d4eb 100644
--- a/memory.c
+++ b/memory.c
@@ -292,6 +292,7 @@ static void dump_per_cpu_offsets(void);
static void dump_page_flags(ulonglong);
static ulong kmem_cache_nodelists(ulong);
static void dump_hstates(void);
+static int dump_file_page(ulong);
/*
* Memory display modes specific to this file.
@@ -476,6 +477,7 @@ vm_init(void)
MEMBER_OFFSET_INIT(block_device_bd_list, "block_device", "bd_list");
MEMBER_OFFSET_INIT(block_device_bd_disk, "block_device", "bd_disk");
MEMBER_OFFSET_INIT(inode_i_mapping, "inode", "i_mapping");
+ MEMBER_OFFSET_INIT(address_space_page_tree, "address_space", "page_tree");
MEMBER_OFFSET_INIT(address_space_nrpages, "address_space", "nrpages");
if (INVALID_MEMBER(address_space_nrpages))
MEMBER_OFFSET_INIT(address_space_nrpages, "address_space", "__nrpages");
@@ -6465,6 +6467,65 @@ translate_page_flags(char *buffer, ulong flags)
}
/*
+ * Radix page tree dump callback.
+ */
+static int
+dump_file_page(ulong page)
+{
+ struct meminfo meminfo;
+
+ BZERO(&meminfo, sizeof(struct meminfo));
+ meminfo.spec_addr = page;
+ meminfo.memtype = KVADDR;
+ meminfo.flags = ADDRESS_SPECIFIED;
+ dump_mem_map(&meminfo);
+
+ return 0;
+}
+
+/*
+ * The address space file mapping radix tree walker.
+ */
+void
+dump_file_addr_mapping(ulong i_mapping)
+{
+ ulong root_rnode;
+ struct radix_tree_pair rtp;
+
+ root_rnode = i_mapping + OFFSET(address_space_page_tree);
+
+ rtp.index = 0;
+ rtp.value = (void *)&dump_file_page;
+
+ /* Dump each pages in radix tree */
+ (void) do_radix_tree(root_rnode, RADIX_TREE_DUMP_CB, &rtp);
+
+ return;
+}
+
+/*
+ * Get the page count for the specific mapping
+ */
+long
+get_file_mapping_nrpages(ulong i_mapping)
+{
+ ulong address_space = i_mapping;
+ char *address_space_buf;
+ ulong nrpages = 0;
+
+ address_space_buf = GETBUF(SIZE(address_space));
+
+ readmem(address_space, KVADDR, address_space_buf,
+ SIZE(address_space), "address_space buffer",
+ FAULT_ON_ERROR);
+ nrpages = ULONG(address_space_buf + OFFSET(address_space_nrpages));
+
+ FREEBUF(address_space_buf);
+
+ return nrpages;
+}
+
+/*
* dump_page_hash_table() displays the entries in each page_hash_table.
*/
diff --git a/symbols.c b/symbols.c
index 6acfcae..984cb55 100644
--- a/symbols.c
+++ b/symbols.c
@@ -8634,6 +8634,8 @@ dump_offset_table(char *spec, ulong makestruct)
OFFSET(block_device_bd_disk));
fprintf(fp, " address_space_nrpages: %ld\n",
OFFSET(address_space_nrpages));
+ fprintf(fp, " address_space_page_tree: %ld\n",
+ OFFSET(address_space_page_tree));
fprintf(fp, " gendisk_major: %ld\n",
OFFSET(gendisk_major));
fprintf(fp, " gendisk_fops: %ld\n",
diff --git a/task.c b/task.c
index 3a88d68..5fe650b 100644
--- a/task.c
+++ b/task.c
@@ -6234,6 +6234,13 @@ foreach(struct foreach_data *fd)
print_header = FALSE;
break;
+ case FOREACH_FILES:
+ if (fd->flags & FOREACH_p_FLAG)
+ error(FATAL,
+ "foreach files command does not "
+ "support -p option\n");
+ break;
+
case FOREACH_TEST:
break;
}
@@ -6460,9 +6467,15 @@ foreach(struct foreach_data *fd)
case FOREACH_FILES:
pc->curcmd = "files";
- open_files_dump(tc->task,
- fd->flags & FOREACH_i_FLAG ?
- PRINT_INODES : 0,
+ cmdflags = 0;
+
+ if (fd->flags & FOREACH_i_FLAG)
+ cmdflags |= PRINT_INODES;
+ if (fd->flags & FOREACH_m_FLAG)
+ cmdflags |= PRINT_PAGES;
+
+ open_files_dump(tc->task,
+ cmdflags,
fd->reference ? ref : NULL);
break;
--
2.4.0
9 years, 4 months