Dave
We had some observation with x86_64 dumps and wanted to know your
opinion on them
On x86_64 dumps for active processes we are reading the register
content from ELF Notes and we found that register content doesn't match
with observed output of bt command. SP and IP register content we got
from ELF_NOTES are breaking the code when we do stack unwinding, using
information from dwarf section, while the unwinding, atleast the first
stage, works with SP and IP got from bt way.
This issue is similar to gdb, gdb too breaks when unwinding is
attempted on this dump.
I wanted to know what you think about this and how can we proceed.
1. Is it reliable way to parse through the stack frame looking for valid
address as is done in 'x86_64_get_dumpfile_stack_frame'. Is it the
right/safe way to do, does any x86_ABI talks about ?
I do it that way because I've never wanted to depend upon the ELF prstatus
note section, because netdump/diskdump only has the panic cpu's info, and
kdump's sections can be difficult to match to a cpu if there has been any
cpu hot-plugging. And not to mention that there are several other dumpfile
formats supported. You can do it any way you'd like.
2. It looks like we can't safely rely on ELF_NOTES, is this a
known
issue with kexec dumping ?
The "sp: ffff88020f471dc8 Breaks unwinding" issue that you're seeing
is a result of using "echo c > /proc/sysrq-trigger", or if panic() was
called, in which case crash_kexec() is called with a NULL pt_regs pointer.
When that's the case, a "fake" register set is hand-created in
crash_setup_regs(), which is what you are seeing. Check out the kernel
code in crash_setup_regs() -- it just reads the rsp as it was in that
function, and populates the IP with current_text_addr().
3. If Parsing the stack frame is the right thing to do, can we modify
bt_cmd routines so as to reuse some of the routines for repopulating our
register contents, especially esp/eip.
Sorry -- I don't understand what you're asking.
Dave
Scenario We are Facing
------------------------
Register Content from ELF_NOTES: Matches with gdb out put
crash> local display
IP: ffffffff80255d7b
ax: 1
bx: 0
cx: 6237
dx: 0
sp: ffff88020f471dc8 <=== Breaks unwinding
bp: 0
si: 0
di: ffffffff80596ec0
cs: 10
oirg_ax: 8241000001b6
flags: 46
ip: ffffffff80255d7b
r8: 0
r9: ffff880028080c80
r10: ffff880028080c80
r11: d805926f0
r12: 63
r13: 0
------------------------
crash> bt
PID: 4814 TASK: ffff8802104397f0 CPU: 3 COMMAND: "bash"
#0 [ffff88020f471cf0] machine_kexec at ffffffff8021db38
#1 [ffff88020f471dc0] crash_kexec at ffffffff80255d9c
#2 [ffff88020f471e80] __handle_sysrq at ffffffff80385756
#3 [ffff88020f471ec0] write_sysrq_trigger at ffffffff802d291b
#4 [ffff88020f471ed0] proc_reg_write at ffffffff802cca2d
#5 [ffff88020f471f10] vfs_write at ffffffff8029125d
#6 [ffff88020f471f40] sys_write at ffffffff802916e5
#7 [ffff88020f471f80] system_call_fastpath at ffffffff8020be0b
RIP: 000000311bcc4150 RSP: 00007fff976f40d0 RFLAGS: 00010202
RAX: 0000000000000001 RBX: ffffffff8020be0b RCX: 00000000000003e
4
RDX: 0000000000000002 RSI: 00007f428f6ec000 RDI: 000000000000000
1
RBP: 0000000000000002 R8: 00000000ffffffff R9: 00007f428f6d86e
0
R10: 0000000000000072 R11: 0000000000000246 R12: 000000311bf4d76
0
R13: 00007f428f6ec000 R14: 0000000000000002 R15: 000000008f6ec00
0
Regards
Sharyathi N