Hi Dave,
On a recurring series of crashes on kernel 3.10.0-693.17.1.el7.x86_64 (RHEL 7.4), a
problem was triggered by deletion of a file by a process with this stack trace:
#0 [ffff88115b0ef798] __schedule at ffffffff816ab2ac
#1 [ffff88115b0ef828] schedule at ffffffff816ab8a9
#2 [ffff88115b0ef838] jbd2_log_wait_commit at ffffffffc0177455 [jbd2]
#3 [ffff88115b0ef8b0] jbd2_log_do_checkpoint at ffffffffc017405f [jbd2]
#4 [ffff88115b0ef918] __jbd2_log_wait_for_space at ffffffffc017445f [jbd2]
#5 [ffff88115b0ef960] add_transaction_credits at ffffffffc016e3d3 [jbd2]
#6 [ffff88115b0ef9c0] start_this_handle at ffffffffc016e5e1 [jbd2]
#7 [ffff88115b0efa58] jbd2__journal_restart at ffffffffc016ec9e [jbd2]
#8 [ffff88115b0efa98] jbd2_journal_restart at ffffffffc016ed13 [jbd2]
#9 [ffff88115b0efaa8] ext4_truncate_restart_trans at ffffffffc027be7e [ext4]:
#10 [ffff88115b0efad8] ext4_free_branches at ffffffffc02bc9d7 [ext4]
#11 [ffff88115b0efb38] ext4_free_branches at ffffffffc02bc887 [ext4]
#12 [ffff88115b0efb98] ext4_free_branches at ffffffffc02bc887 [ext4]
#13 [ffff88115b0efbf8] ext4_ind_truncate at ffffffffc02bda5e [ext4]
#14 [ffff88115b0efcb8] ext4_truncate at ffffffffc0280ca8 [ext4]
#15 [ffff88115b0efcf0] ext4_evict_inode at ffffffffc02818e0 [ext4]
#16 [ffff88115b0efd10] evict at ffffffff8121f879
#17 [ffff88115b0efd38] iput at ffffffff81220189
#18 [ffff88115b0efd68] rfs_d_iput at ffffffffc03f7d10 [redirfs]
#19 [ffff88115b0efe00] dentry_kill at ffffffff8121a90c
#20 [ffff88115b0efe30] dput at ffffffff8121a9ce
#21 [ffff88115b0efe50] path_put at ffffffff8120d576
#22 [ffff88115b0efe68] gsch_nd_release at ffffffffc043733e [gsch]
#23 [ffff88115b0efe78] gsch_unlink_hook_fn at ffffffffc043
It was useful for our troubleshooting to identify the file being deleted in each of these
crashes, and dentry_kill() takes a dentry pointer as its only argument:
struct dentry *dentry_kill(struct dentry *);
Finding the dentry pointer (using 'fregs' command in PyKdump, or digging it off
the stack):
#19 dentry_kill called from 0xffffffff8121a9ce <dput+94>
+R12: 0xffff880328399858
+R13: 0x0
+R14: 0xffff880a0fb975b8
+RBP: 0xffff88115b0efe48
+RBX: 0xffff880328399800
1 RDI: 0xffff880328399800 arg0 struct dentry *
'files -d' on this dentry doesn't return the path:
crash64> files -d 0xffff880328399800
DENTRY INODE SUPERBLK TYPE PATH
ffff880328399800 0 0 N/A
This is because dentry.d_inode is null; at this point in the removal process,
dentry_iput() has cleared it.
crash64> dentry.d_inode 0xffff880328399800
d_inode = 0x0
And display_dentry_info() in crash gives up if this is the case:
if (!inode || !superblock)
goto nopath;
But the dentry still contains all the information needed to find the path:
crash64> dentry.d_sb,d_name ffff880328399800
d_sb = 0xffff8817daaf6800
d_name = {
{
{
hash = 3169988838,
len = 30
},
hash_len = 132019007718
},
name = 0xffff880328399838 "MRAQ0431_1_10357_982129979.arc"
So I modified the following:
(defs.h)
2069d2068
< long dentry_d_sb;
(filesys.c)
1698,1702c1698,1702
< } else {
< inode_buf = NULL;
< }
<
< superblock = ULONG(dentry_buf + OFFSET(dentry_d_sb));
---
superblock = ULONG(inode_buf + OFFSET(inode_i_sb));
} else {
inode_buf = NULL;
superblock = 0;
}
1704c1704
< if (!superblock)
---
if (!inode || !superblock)
2018d2017
< MEMBER_OFFSET_INIT(dentry_d_sb, "dentry", "d_sb");
With this patch, 'files -d' correctly returns the path:
crash> files -d ffff880328399800
DENTRY INODE SUPERBLK TYPE PATH
ffff880328399800 0 ffff8817daaf6800 N/A
/u02/oraarch/MRAQ0431/MRAQ0431_1_10357_982129979.arc
Can this be included as a patch to crash?
Thanks,
Martin
Martin Moore
Linux/Tru64 RTCC Engineer
CSC Americas
HPE Technology Services
Hewlett Packard Enterprise