----- Original Message -----
Hi,
I need the stack traces of the tasks that are on-proc as well as the
tasks that are not. "bt" fails for the on-proc tasks, even though there
is a backup mechanism for finding the stack: the "stack" field of the
task structure. Even if it is a bit out-of-date, it is better than an
"I dunno" message. Perhaps augment the stack trace with a "this
might be slightly out-of-date because the task was running when
the kernel crashed" message.
Example:
crash> foreach bt
[...]
PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc"
bt: cannot determine starting stack pointer
[...]
crash> ps | egrep '^>'
> 0 0 4 ffff880205f6b0c0 RU 0.0 0 0 [swapper]
> 0 0 5 ffff880205f77870 RU 0.0 0 0 [swapper]
> 0 0 7 ffff880205d557f0 RU 0.0 0 0 [swapper]
> 0 0 10 ffff880205d5c080 RU 0.0 0 0 [swapper]
> 2982 2 11 ffff8801fd3b07f0 RU 0.0 0 0 [ldlm_cb_00]
> 2983 2 8 ffff880205548080 RU 0.0 0 0 [ldlm_cb_01]
> 20250 20245 1 ffff880202deb0c0 RU 0.0 82388 2372 fcntl17
> 20251 20245 2 ffff88020537b7b0 RU 0.0 82388 2396 fcntl17
> 20252 20245 3 ffff8801fd3b4770 RU 0.0 82388 2376 fcntl17
> 20264 20249 0 ffff8801fd444830 RU 0.0 0 0 fcntl17
> 20290 1 6 ffff8803fe86f7b0 RU 0.0 14044 516 xtnhc
> 20311 20305 9 ffff8803ff654140 RU 0.0 14044 516 xtnhc
crash> set ffff8803ff654140
PID: 20311
COMMAND: "xtnhc"
TASK: ffff8803ff654140 [THREAD_INFO: ffff8803fd85a000]
CPU: 9
STATE: TASK_RUNNING (ACTIVE)
crash> p task->stack
p: gdb request failed: p task->stack
crash> task
PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc"
struct task_struct {
state = 0,
stack = 0xffff8803fd85a000,
[...]
crash> bt -S 0xffff8803fd85a000
PID: 20311 TASK: ffff8803ff654140 CPU: 9 COMMAND: "xtnhc"
#0 [ffff8803fd85a000] schedule at ffffffff81297bc5
#1 [ffff8803fd85b830] ldlm_resource_get at ffffffffa0269380 [ptlrpc]
#2 [ffff8803fd85b900] ldlm_lock_match at ffffffffa0267359 [ptlrpc]
#3 [ffff8803fd85ba10] mdc_revalidate_lock at ffffffffa0423a8e [mdc]
#4 [ffff8803fd85bac0] mdc_intent_lock at ffffffffa042723f [mdc]
#5 [ffff8803fd85bbc0] __ll_inode_revalidate_it at ffffffffa04a79c2 [lustre]
#6 [ffff8803fd85bcf0] ll_inode_permission at ffffffffa04a8266 [lustre]
#7 [ffff8803fd85bd90] inode_permission at ffffffff810f0a09
#8 [ffff8803fd85bda0] may_open at ffffffff810f14d7
#9 [ffff8803fd85bdd0] do_filp_open at ffffffff810f5294
#10 [ffff8803fd85bf20] do_sys_open at ffffffff810e5850
#11 [ffff8803fd85bf70] sys_open at ffffffff810e596b
#12 [ffff8803fd85bf80] system_call_fastpath at ffffffff81002eab
RIP: 00007ffff78f2f80 RSP: 00007fffffffd818 RFLAGS: 00010202
RAX: 0000000000000002 RBX: ffffffff81002eab RCX: 00000000006130f0
RDX: 00000000000001b6 RSI: 0000000000000000 RDI: 000000000060f960
RBP: 0000000000000008 R8: 0000000000000008 R9: 0000000000000001
R10: 000000000040a261 R11: 0000000000000246 R12: ffffffff810e596b
R13: ffff8803fd85bf78 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
crash>
You could also try "bt -t" or "bt -T".
But what kind of dumpfile was this anyway? I'm wondering why you aren't
getting any stack traces at all for the active tasks?
Dave