What is the difference among CFI version 1, 3 and 4?
by HATAYAMA Daisuke
crash utility has a dwarf unwinder that is derived from Linux kernel
just as the one in systemtap. I'm using it in gcore extension module
for crash utility, ELF core dumper on crash utility for user-mode
tasks, on x86_64 to restore callee-saved register values saved at the
first stack frame on kernel stack.
The problem is that the dwarf unwinder doesn't work well for RHEL6's
debuginfo since it supports version 1 only while RHEL6 contains CIEs
with version 3.
I want to make crash utility support at least version 3. BTW, I've
already confirmed systemtap supports all versions, i.e. 1, 3 and 4,
and appears to work fine. So, I think that the simplest way to achieve
the purpose is to barrow the code.
However, I don't know what was added from version 1 to 3, and from
version 3 and 4. Could anyone suggest me that or tell me documentation
useful to recognize the difference in addition to the original Dwarf
specifications?
Thanks.
HATAYAMA Daisuke
1 year
[PATCH 0/6] gcore: a bug fix and improvements on register restoration for active tasks
by HATAYAMA Daisuke
Hello Dave,
This patch series for gcore extension module fix a bug and improves
register restoration for active tasks. Could you review these? I'm
going to release these after your review.
In summary,
- The first patch fixes a bug that a target task context is wrongly
specified in the process of getting NT_PRSTATUS on vmcore in
ELF. Without this patch, all the register values but the one of
the task user mannually specifies are unintensionally equal each
other.
- The remainig patches implements register restorement for active
tasks from exception frame and KVMDUMP notes. By this, register
values for active tasks can be correctly restored for KDUMP,
NETDUMP, DISKDUMP and KVMDUMP formats.
Thanks.
HATAYAMA Daisuke
1 year
Re: [Crash-utility] [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
by HATAYAMA Daisuke
Hello Dave,
I've downloaded vmcore and vmlinux. Thanks a lot.
From: Dave Anderson <anderson(a)redhat.com>
Subject: Re: [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Date: Mon, 31 Jan 2011 08:51:04 -0500 (EST)
>
> Hello Daisuke,
>
> The test dump can be found here:
>
> http://people.redhat.com/anderson/.gcore_test_dump
>
> One important thing to note -- the dumpfile was taken with
> the "snap.so" extension module while running live. It
> selects the "crash" process that was doing the live dump
> as the panic task. So when you do a backtrace on it, it
> looks like this:
>
> crash> bt
> PID: 2080 TASK: ffff880079ed2480 CPU: 0 COMMAND: "crash"
> #0 [ffff88007a615b08] schedule at ffffffff81480533
> #1 [ffff88007a615bf0] rcu_read_unlock at ffffffff811edfd3
> #2 [ffff88007a615c00] avc_has_perm_noaudit at ffffffff811eea76
> #3 [ffff88007a615c90] avc_has_perm at ffffffff811eeae3
> #4 [ffff88007a615d10] inode_has_perm at ffffffff811f2815
> #5 [ffff88007a615de8] might_fault at ffffffff810f22ec
> #6 [ffff88007a615e80] might_fault at ffffffff810f2335
> #7 [ffff88007a615eb0] crash_read at ffffffffa004f103 [crash]
> #8 [ffff88007a615f00] vfs_read at ffffffff8112115b
> #9 [ffff88007a615f40] sys_read at ffffffff81121278
> #10 [ffff88007a615f80] system_call_fastpath at ffffffff81009c72
> RIP: 000000333a0d41b0 RSP: 00007fffac23a7f0 RFLAGS: 00000206
> RAX: 0000000000000000 RBX: ffffffff81009c72 RCX: 0000000000000000
> RDX: 0000000000001000 RSI: 0000000000ca5440 RDI: 0000000000000004
> RBP: 0000000000000004 R8: 000000007a615000 R9: 0000000000000006
> R10: 00000000fffffff8 R11: 0000000000000246 R12: 0000000000ca5440
> R13: 0000000000001000 R14: 0000000000001000 R15: 000000007a615000
> ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
> crash>
>
> Now, when using "snap.so" to create a dumpfile, all of the "active"
> backtraces are not legitimate, because they were *running* when their
> kernel stacks were being read. So, for example, the "snap.so" code
> was running -- doing a read() -- when the "crash" stack was read. But
> since it had not panicked, there were no legitimate starting RIP/RSP
> values to use for starting points for the backtrace. So frame #'s 0
> through #7 above should not be accepted as "real". But I presume that
> starting from frame #7 , would be correct.
Ah, there's no method to obtain active registers...
If register values is unavailable for an active task, gcore is now
treating it in the same way as for a sleeping task. This means gcore
chooses RIP and RSP the scheduler saved last time.
Applying this story to here, it seems to me that the old logic of
resotre_frame_pointer() can surely result in non-termination around
the frame #7, since at the point old stack frame is switching to new
one and a list of frame pointers is not connected.
I'll now verify this story by looking at vmcore you gave me.
Thanks,
HATAYAMA Daisuke
1 year
[RFC v2] gcore: process core dump feature for crash utility
by HATAYAMA Daisuke
Hello.
This is the RFC version 2 of gcore sub-command that provides a process
core dump feature for crash utility.
During the period from RFC v1, I had investigated how to restore
user-mode register values. The patch reflects the investigation.
Any comments or suggestions are welcome.
Changes in short
================
The changes include:
1) implement a collection of user-space register values more
appropriately, but not ideally.
2) re-design gcore sub-command as an extension module
By (1), GDB's bt command displays backtrace normally.
diffstat ouput
Makefile | 6 +-
defs.h | 2 +
extensions/gcore.c | 21 +
extensions/gcore.mk | 48 +
extensions/libgcore/2.6.34/x86_64/gcore.c | 2033 +++++++++++++++++++++++++++++
extensions/libgcore/2.6.34/x86_64/gcore.h | 651 +++++++++
netdump.c | 27 +
tools.c | 1 -
8 files changed, 2787 insertions(+), 2 deletions(-)
Current Status
==============
I've continued to develop gcore sub-command, but this version is
still under development.
Ultimately, I'm going to implement gcore as I described in RFC v1
and as I will explain in ``Detailed Changes and Issues'' below.
How to build and use
====================
I've attached the patchset to this mail.
- crash-gcore-RFCv2.patch
Please use crash version 5.0.5 on x86_64.
Follow the next instructions:
$ tar xf crash-5.0.5.tar.gz
$ cd crash-5.0.5/
$ patch -p 1 < crash-gcore-v2.patch
$ make
$ make extensions
$ crash <debuginfo> <vmcore> .... (*)
crash> extend gcore.so
In (*), gcore.so is generated under the extensions/ directory.
Detailed Changes and Issues
===========================
1) implement collection of user-space register values more
appropriately, but not ideally
The previous version doesn't retrieve appropriate register values
because it doesn't consider save/restore operations at interrupts on
kernel at all.
I've added restore operations according to which kinds of interrupts
the target task entered kernel-mode. See fill_pr_reg() in gcore.c.
But unfortunately, the current version is still not ideal, since it
would take some time to do.
More precisely, all part of user-mode registers are not always
restored. The full part is saved only at exceptions, NMI and some
kinds of system calls. At other kinds of interrupts, saved are
register values except for 6 callee-saved registers: rbp, rbx, r12,
r13, r14, r15.
In theory, these can be restored using Call Frame Information
generated by a compiler as part of debugging information, whose
section name is .debug_frame, which tells us offsets of respective
callee-saved registers.
But currently, I don't do this yet, since I don't find any useful
library to do this. Yes, I think I can implement it manually, but it
would take some time. I've of course found unwind_x86_32_64.c
providing related library but it looks to me unfinished.
On the other hand, a frame pointer, rbp, can be restored by
unwinding it repeatedly until its address value reaches any
user-space address.
2) re-design gcore sub-command as an extension module
In respond to my previous post, Dave gave me a suggestion that gcore
subcommand should be provided as an extension module per kernel
versions and type of architecutes, since process core dump feature
inherently depends on kernel data structures.
I agreed the suggestion and have tried to redesign the patchset.
Although the current patchset merely moved gcore files into
./extensions directory, I've also considered better design. That is,
(1) architecture- or kernel-version independent part is provided
just under ./extensions
(2) only architecture- or kernel-version specific part is provided as
certain extension module.
The next directory structure depicts this shortly:
crash-5.0.5/
extensions/
gcore.mk
gcore.c ... (1)
libgcore/ ... (2)
2.6.34/
x86_64/
gcore_note.h
gcore_note.c
I think it relatively easily feasible by porting regset interface in
kernel, which is used to implement ptrace feature, hiding
implementation details on many architectures.
Also, it helps port kernel codes in gcore and maintain source codes
ranging over a variety of kernel versions on multiple architectures
uniformly.
I'm going to re-implement this way in the next version. From that
version, I won't change gcore source code dramatically, change only
when adding newly extension modules.
Thanks
--
HATAYAMA Daisuke
1 year
Patches for zram, swap cache fixes
by Johan.Erlandsson@sony.com
Hi
Sharing 3 changes for zram regarding swap cache handling. Please have a look.
Subject: [PATCH 1/3] zram, swap cache missing page tree offset
Subject: [PATCH 2/3] zram, swap cache entries are pointer to struct page
Subject: [PATCH 3/3] zram, exclude shadow entries from swap cache lookup
Thanks
Johan
1 year
Re: Mailling list migration complete
by lijiang
On Thu, Nov 2, 2023 at 8:11 PM <devel-request(a)lists.crash-utility.osci.io>
wrote:
> From: Michael Scherer <mscherer(a)redhat.com>
> Subject: [Crash-utility] Mailling list migration complete
> To: devel(a)lists.crash-utility.osci.io
> Message-ID:
> <
> CABwDVZWO2aZKUyRbVd-YJCvkaGqFRQqp7Uvh85ZheG+fbz9-dw(a)mail.gmail.com>
> Content-Type: multipart/alternative;
> boundary="000000000000b3662b0608f1f6de"
>
> --000000000000b3662b0608f1f6de
> Content-Type: text/plain; charset="UTF-8"
> Content-Transfer-Encoding: quoted-printable
>
> Hi,
>
> if you receive this email, then you have been successfully migrated to the
> new list server for the crash utility project.
>
>
Thank you so much for helping with the mailing list migration, Michael.
Excellent!
> As announced by Lianbo (
> https://listman.redhat.com/archives/crash-utility/2023-October/011087.html
> ), we migrated from the RH server listman.redhat.com based on mailman 2 to
> a new dedicated VM hosted by a different RH team.
>
> A automated responder have been set, so if you send to the wrong email, you
> should get a error message quite fast.
>
>
Nice.
> Due to technical limitations of the mailman 2 to mailman 3 migration, we
> were not able to migrate some specific settings. If the list do not behave
> like it used to be (mostly on digest, etc), you may need to go on the web
> interface and change the setting. Contrary to what I said a few days ago
BTW: I just noticed that all members settings are not set, for example:
Members->Member Options
Delivery status
Enabled Disabled
I'm not sure if this behaviour is expected(maybe affecting mail delivery?),
or still need to change the setting via the web interface.
(and that was in the announce mail), the reset password feature do not work
> unless you created a account first, sorry about the wrong information. So
> if you need to change anything, you need to create first a account, using
> the same email as subscribed to the list.
>
>
This works well for me.
> We are still working on the final touch of the migration, but the list
> should be officially usable. If there is anything wrong or weird, please
> contact me or my team using the alias comminfra(a)osci.io (I am not
> subscribed to the list directly ), and we will take a look. If the mail
> also end in spam, we would be interested to know as it might be some
> misconfiguration on our side.
>
> We hope the migration didn't disturb too much the ongoing work.
>
Thank you for the great job, Michael.
Thanks
Lianbo
1 year
[PATCH v3] add "files -n" command for an inode
by Huang Shijie
In the NUMA machine, it is useful to know the memory distribution of
an inode page cache:
How many pages in the node 0?
How many pages in the node 1?
Add "files -n" command to get the memory distribution information:
1.) Add new argument for dump_inode_page_cache_info()
2.) make page_to_nid() a global function.
3.) Add summary_inode_page() to check each page's node
information.
4.) Use print_inode_summary_info() to print the
memory distribution information of an inode.
Signed-off-by: Huang Shijie <shijie(a)os.amperecomputing.com>
---
v2 --> v3:
1.) Always return 1 for summary_inode_page().
2.) Add more comment for help_files.
---
defs.h | 1 +
filesys.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++-----
help.c | 14 +++++++++++++-
memory.c | 3 +--
4 files changed, 63 insertions(+), 8 deletions(-)
diff --git a/defs.h b/defs.h
index 788f63a..1fe2d0b 100644
--- a/defs.h
+++ b/defs.h
@@ -5750,6 +5750,7 @@ int dump_inode_page(ulong);
ulong valid_section_nr(ulong);
void display_memory_from_file_offset(ulonglong, long, void *);
void swap_info_init(void);
+int page_to_nid(ulong);
/*
* filesys.c
diff --git a/filesys.c b/filesys.c
index 1d0ee7f..2c7cc74 100644
--- a/filesys.c
+++ b/filesys.c
@@ -49,7 +49,7 @@ static int match_file_string(char *, char *, char *);
static ulong get_root_vfsmount(char *);
static void check_live_arch_mismatch(void);
static long get_inode_nrpages(ulong);
-static void dump_inode_page_cache_info(ulong);
+static void dump_inode_page_cache_info(ulong, void *callback);
#define DENTRY_CACHE (20)
#define INODE_CACHE (20)
@@ -2192,8 +2192,31 @@ get_inode_nrpages(ulong i_mapping)
return nrpages;
}
+/* Used to collect the numa information for an inode */
+static ulong *numa_node;
+
+static void
+print_inode_summary_info(void)
+{
+ int i;
+
+ fprintf(fp, " NODE PAGES\n");
+ for (i = 0; i < vt->numnodes; i++)
+ fprintf(fp, " %2d %8ld\n", i, numa_node[i]);
+}
+
+static int
+summary_inode_page(ulong page)
+{
+ int node = page_to_nid(page);
+
+ if (0 <= node && node < vt->numnodes)
+ numa_node[node]++;
+ return 1;
+}
+
static void
-dump_inode_page_cache_info(ulong inode)
+dump_inode_page_cache_info(ulong inode, void *callback)
{
char *inode_buf;
ulong i_mapping, nrpages, root_rnode, xarray, count;
@@ -2236,7 +2259,7 @@ dump_inode_page_cache_info(ulong inode)
root_rnode = i_mapping + OFFSET(address_space_page_tree);
lp.index = 0;
- lp.value = (void *)&dump_inode_page;
+ lp.value = callback;
if (root_rnode)
count = do_radix_tree(root_rnode, RADIX_TREE_DUMP_CB, &lp);
@@ -2276,7 +2299,7 @@ cmd_files(void)
ref = NULL;
refarg = NULL;
- while ((c = getopt(argcnt, args, "d:R:p:c")) != EOF) {
+ while ((c = getopt(argcnt, args, "d:n:R:p:c")) != EOF) {
switch(c)
{
case 'R':
@@ -2295,11 +2318,31 @@ cmd_files(void)
display_dentry_info(value);
return;
+ case 'n':
+ if (VALID_MEMBER(address_space_page_tree) &&
+ VALID_MEMBER(inode_i_mapping)) {
+ value = htol(optarg, FAULT_ON_ERROR, NULL);
+
+ /* Allocate the array for this inode */
+ numa_node = malloc(sizeof(ulong) * vt->numnodes);
+ BZERO(numa_node, sizeof(ulong) * vt->numnodes);
+
+ dump_inode_page_cache_info(value, (void *)&summary_inode_page);
+
+ /* Print out the NUMA node information for this inode */
+ print_inode_summary_info();
+
+ free(numa_node);
+ numa_node = NULL;
+ } else
+ option_not_supported('n');
+ return;
+
case 'p':
if (VALID_MEMBER(address_space_page_tree) &&
VALID_MEMBER(inode_i_mapping)) {
value = htol(optarg, FAULT_ON_ERROR, NULL);
- dump_inode_page_cache_info(value);
+ dump_inode_page_cache_info(value, (void *)&dump_inode_page);
} else
option_not_supported('p');
return;
diff --git a/help.c b/help.c
index cc7ab20..e9e28b7 100644
--- a/help.c
+++ b/help.c
@@ -7850,7 +7850,7 @@ NULL
char *help_files[] = {
"files",
"open files",
-"[-d dentry] | [-p inode] | [-c] [-R reference] [pid | taskp] ... ",
+"[-d dentry] | [-p inode] | [-n inode] | [-c] [-R reference] [pid | taskp] ... ",
" This command displays information about open files of a context.",
" It prints the context's current root directory and current working",
" directory, and then for each open file descriptor it prints a pointer",
@@ -7863,6 +7863,8 @@ char *help_files[] = {
" specific, and only shows the data requested.\n",
" -d dentry given a hexadecimal dentry address, display its inode,",
" super block, file type, and full pathname.",
+" -n inode given a hexadecimal inode address, check all the pages",
+" in the page cache, and display a NUMA node distribution.",
" -p inode given a hexadecimal inode address, dump all of its pages",
" that are in the page cache.",
" -c for each open file descriptor, prints a pointer to its",
@@ -7974,6 +7976,16 @@ char *help_files[] = {
" ca1ddde0 2eeef000 f59b91ac 3 2 82c referenced,uptodate,lru,private",
" ca36b300 3b598000 f59b91ac 4 2 82c referenced,uptodate,lru,private",
" ca202680 30134000 f59b91ac 5 2 82c referenced,uptodate,lru,private",
+" ",
+" For the inode at address ffff07ff8c6f97f8, display the NUMA node",
+" distribution of its pages that are in the page cache:",
+" %s> files -n ffff07ff8c6f97f8",
+" INODE NRPAGES",
+" ffff07ff8c6f97f8 25240",
+" ",
+" NODE PAGES",
+" 0 25240",
+" 1 0",
" ",
NULL
};
diff --git a/memory.c b/memory.c
index 86ccec5..ed1a4fb 100644
--- a/memory.c
+++ b/memory.c
@@ -300,7 +300,6 @@ static int dump_vm_event_state(void);
static int dump_page_states(void);
static int generic_read_dumpfile(ulonglong, void *, long, char *, ulong);
static int generic_write_dumpfile(ulonglong, void *, long, char *, ulong);
-static int page_to_nid(ulong);
static int get_kmem_cache_list(ulong **);
static int get_kmem_cache_root_list(ulong **);
static int get_kmem_cache_child_list(ulong **, ulong);
@@ -19846,7 +19845,7 @@ is_kmem_cache_addr_common(ulong vaddr, char *kbuf)
/*
* Kernel-config-neutral page-to-node evaluator.
*/
-static int
+int
page_to_nid(ulong page)
{
int i;
--
2.40.1
1 year