Rachita Kothiyal wrote:
Not really. I don't understand why IRQ exception frame contents
> Hi Rachita,
>
> I've only been able to test this on a live system that has __start_unwind
> and __end_unwind symbols, so I don't know what a backtrace with an
> in-kernel exception frame, or a backtrace with a transition to the x86_64
> IRQ stack or x86_64 exception stacks, would look like. If you have
> an example, I'd be interested in seeing how they get handled.Hi Dave
I was trying to use LKDTM to create various scenarios for crash dump.
To start with is the case of panic() in an interrupt context. Here I
am inducing a panic in handle_IRQ_event(), where I am registering a
jprobe. jp_handle_irq_event() is the jprobe handler which in turn calls
lkdtm_handler(). Running crash on the dump gives the following:crash> bt
PID: 3898 TASK: ffff81022e988e20 CPU: 0 COMMAND: "slapd"
#0 [ffffffff8064bcf8] crash_kexec at ffffffff80152211
#1 [ffffffff8064bd40] machine_kexec at ffffffff8011a739
#2 [ffffffff8064bd80] crash_kexec at ffffffff8015222d
#3 [ffffffff8064be08] crash_kexec at ffffffff80152211
#4 [ffffffff8064be30] bust_spinlocks at ffffffff8011fd6d
#5 [ffffffff8064be40] panic at ffffffff80131410
#6 [ffffffff8064beb0] cdrom_pc_intr at ffffffff802ebe68
#7 [ffffffff8064bef0] ide_intr at ffffffff802df26f
#8 [ffffffff8064bf30] lkdtm_handler at ffffffff8800230d
#9 [ffffffff8064bf40] jp_handle_irq_event at ffffffff880023e8
#10 [ffffffff8064bf50] __do_IRQ at ffffffff801544f4
#11 [ffffffff8064bf58] __do_softirq at ffffffff80136b8f
#12 [ffffffff8064bf90] do_IRQ at ffffffff8010bda1
--- <IRQ stack> ---
#13 [ffff810229fd5f80] ret_from_intr at ffffffff80109b95
[exception RIP: unknown or invalid address]
RIP: 0000000000000000 RSP: 0000000000000000 RFLAGS: 00000000
RAX: ffffffffffffffff RBX: 00002afe35608c98 RCX: 00002afe359f7be4
RDX: 0000000000000033 RSI: 0000000000000202 RDI: 00007fff754bfbe0
RBP: 000000000000000a R8: 000055555590bca0 R9: 0000000000000000
R10: 00002afe35608c98 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 00002afe3597b1e0 R15: 000055555590b760
ORIG_RAX: 000000000000002b CS: 0000 SS: 0000
bt: WARNING: possibly bogus exception frame
RIP: 00002afe359f7be4 RSP: 00007fff754bfbe0 RFLAGS: 00000202
RAX: 00002afe35608c98 RBX: 000055555590b760 RCX: 0000000000000001
RDX: 00002afe35608c98 RSI: 0000000000000000 RDI: 000055555590bca0
RBP: ffffffff80109c0b R8: 000000000000000a R9: 0000000000000000
R10: 0000000000000000 R11: 00002afe3597b1e0 R12: 000055555590b760
R13: 00007fff754bfd38 R14: 0000000000000001 R15: 000055555590b760
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
crash> set unwind on
unwind: on
crash> bt
PID: 3898 TASK: ffff81022e988e20 CPU: 0 COMMAND: "slapd"
#0 [ffffffff8064bd88] crash_kexec at ffffffff80152211
#1 [ffffffff8064be48] panic at ffffffff80131410
#2 [ffffffff8064bf38] lkdtm_handler at ffffffff8800230d
--- <IRQ stack> ---
#3 [ffff810229fd5f80] ret_from_intr at ffffffff80109b95
[exception RIP: unknown or invalid address]
RIP: 0000000000000000 RSP: 0000000000000000 RFLAGS: 00000000
RAX: ffffffffffffffff RBX: 00002afe35608c98 RCX: 00002afe359f7be4
RDX: 0000000000000033 RSI: 0000000000000202 RDI: 00007fff754bfbe0
RBP: 000000000000000a R8: 000055555590bca0 R9: 0000000000000000
R10: 00002afe35608c98 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 00002afe3597b1e0 R15: 000055555590b760
ORIG_RAX: 000000000000002b CS: 0000 SS: 0000
bt: WARNING: possibly bogus exception frame
#4 [ffff810229fd5f80] common_interrupt at ffffffff80109b95
RIP: 00002afe359f7be4 RSP: 00007fff754bfbe0 RFLAGS: 00000202
RAX: 00002afe35608c98 RBX: 000055555590b760 RCX: 0000000000000001
RDX: 00002afe35608c98 RSI: 0000000000000000 RDI: 000055555590bca0
RBP: ffffffff80109c0b R8: 000000000000000a R9: 0000000000000000
R10: 0000000000000000 R11: 00002afe3597b1e0 R12: 000055555590b760
R13: 00007fff754bfd38 R14: 0000000000000001 R15: 000055555590b760
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
crash>Comments?
If you do a "help -m", the very last line shows the base address
of each per-cpu interrupt stack, as in:
crash> help -m
...
ibase[cpus]:
ffffffff80453d00 0000010037e28000 0000010037e60000
0000010037e94000
crash>
Each of the interrupt stacks are 16k long, so you can
look at them with a "rd" with a count of 2048. The linkage
between the interrupt stack and the stack from where it came
is supposed to be at the first word just below the top 64 words
on that stack.
So if I take cpu 2's interrupt stack from above, which is at
10037e60000, and dump it using "rd -s", I see this:
crash> rd -s 10037e60000 2048
...
10037e63f90: 0000007fbffff4b0 0000007fbffff410
10037e63fa0: 0000000000000046 0000000000000046
10037e63fb0: apic_timer_interrupt+0x85
000001003741df58
10037e63fc0: 0000000000000000 0000000000000000
10037e63fd0: 0000000000000000 0000000000000000
10037e63fe0: 0000000000000000 0000000000000000
10037e63ff0: 0000000000000000 0000000000000000
crash>
The linkage to the previous stack is at IRQ stack address
10037e63fb8, which points back to a process stack address of
1003741df58. That process stack address is also a pointer
to the exception frame that was laid down there:
crash> pt_regs 000001003741df58
struct pt_regs {
r15 = 0x0,
r14 = 0x1003741df58,
r13 = 0x7fbfffe840,
r12 = 0xa8,
rbp = 0x17dbfc0,
rbx = 0xffffffff801103ce,
r11 = 0x246,
r10 = 0x31d072f928,
r9 = 0x4587d9,
r8 = 0x3741df58,
rax = 0xffffffffffffffda,
rcx = 0xffffffffffffffff,
rdx = 0xa8,
rsi = 0x17dbfc0,
rdi = 0x4,
orig_rax = 0x0,
rip = 0x31d05b85b2,
cs = 0x33,
eflags = 0x246,
rsp = 0x7fbfffe808,
ss = 0x2b
}
So the last time an interrupt occurred on this
cpu, a program was running in user space (cs 0x33)
when a timer interrupt occurred.
If it had occurred in kernel mode, a kernel mode
exception frame should be there, So for example,
if I take cpu 3's interrupt stack from above, it
shows this:
crash> rd -s 0000010037e94000 2048
...
10037e97f90: 0000000000000000 0000000000000000
................
10037e97fa0: 0000000000000046 0000000000000046
F.......F.......
10037e97fb0: apic_timer_interrupt+0x85
0000010037e83e98 .........>.7....
10037e97fc0: 0000000000000000 0000000000000000
................
10037e97fd0: 0000000000000000 0000000000000000
................
10037e97fe0: 0000000000000000 0000000000000000
................
10037e97ff0: 0000000000000000 0000000000000000
................
crash>
And in this case, the timer interrupt occurred while running
in kernel mode (cs 0x10):
crash> pt_regs 0000010037e83e98
struct pt_regs {
r15 = 0x0,
r14 = 0x0,
r13 = 0x0,
r12 = 0x0,
rbp = 0x1,
rbx = 0x0,
r11 = 0x5,
r10 = 0x0,
r9 = 0x8,
r8 = 0x10037e82000,
rax = 0x0,
rcx = 0x0,
rdx = 0x0,
rsi = 0x1003f047030,
rdi = 0x1000102a7e0,
orig_rax = 0xffffffffffffffef,
rip = 0xffffffff8010e84c,
cs = 0x10,
eflags = 0x246,
rsp = 0x10037e83f48,
ss = 0x18
}
crash>
The rip above shows that the cpu was in idle:
crash> sym 0xffffffff8010e84c
ffffffff8010e84c (t) mwait_idle+0x56 include/asm/thread_info.h:
63
crash>
So I cannot explain what's going on in your kernel,
where it looks like that basic stack-linkage assumption
is being violated, or has changed somehow? Maybe kdump
or jprobes has done something? I don't know what else to
tell you.
I don't know. Maybe since the jprobes procedure mucks with
In the stacktrace with 'unwind on', I was expecting to see jp_handle_irq_event
appear too (as frame 3)..Could my using a module to register the probe be the
reason ?
Helpless,
Dave