Thank you very much for the info, really helpful and very much
apprecaited. I have a few follow on questions:
1. When the page fault occurs, is some of the registers (which might
contain parameters passed to the offending function) trampled on? If yes,
is there a document or would you happen to know what registers (in the
worst case) are written to.
The reason I ask is in my dump below the register - RDI (used to pass the
first param to ahaahh() ) should be zero (to have caused the page fault),
but it is not.
From register dump after panic:
RBX: 0000000000000000 RDI:
ffff88035daef4e0 (I expect this to be zero
per the dis-assembly code).
Reverse dis-assembly from RP when panic occurred:
crash> dis -r ffffffffa06ce48f
0xffffffffa06ce460 <ahaann>: push %rbp
0xffffffffa06ce461 <ahaann+1>: mov %rsp,%rbp
0xffffffffa06ce464 <ahaann+4>: push %r12
0xffffffffa06ce466 <ahaann+6>: push %rbx
0xffffffffa06ce467 <ahaann+7>: nopl 0x0(%rax,%rax,1)
0xffffffffa06ce46c <ahaann+12>: mov $0xffffffffa092c548,%rdx
0xffffffffa06ce473 <ahaann+19>: movzwl %si,%ecx
0xffffffffa06ce476 <ahaann+22>: mov %rdi,%rbx <==========
0xffffffffa06ce479 <ahaann+25>: mov %esi,%r12d
0xffffffffa06ce47c <ahaann+28>: mov $0xffffffffa092e5f0,%rdi
0xffffffffa06ce483 <ahaann+35>: xor %esi,%esi
0xffffffffa06ce485 <ahaann+37>: callq 0xffffffffa06cd860 <ahahtl>
0xffffffffa06ce48a <ahaann+42>: test %rax,%rax
0xffffffffa06ce48d <ahaann+45>: jne 0xffffffffa06ce500 <ahaann+160>
0xffffffffa06ce48f <ahaann+47>: mov (%rbx),%rdi <==========
2. Does Linux (specifically crash) treat access to invalid address or NULL
ptr dereference the same way, as in calling them both page fault? (In one
of my past work places, the crash dump was explicit is stating when a NULL
ptr dereference occurred, and I am wondering now if that was due to a
customization in crash).
3. Expanding on the meaning of the address in [] at the beginning of each
line of
the bt
[addr0] function0 at addr2
[addr1] function1 at addr2
addr1 - 8 : starting address of the stack frame from function1 upto the
addr0. I can use this info to peek into the values of function local
variables pushed onto the stack (specifically the function's stack frame).
Thank you,
Ahmed.
On Thu, Jan 24, 2013 at 7:34 AM, Dave Anderson <anderson(a)redhat.com> wrote:
----- Original Message -----
> >
> > I am using crash version: 6.0.4-2.el6 on CentOS 6.3 (kernel
> > 2.6.32-279.el6.x86_64). I apologize for my newbie questions, but
> > googling did not help much.
> >
> > When analyzing a kernel dump, I am getting the following bt.
> >
> > crash> bt
> > PID: 12663 TASK: ffff88036304f500 CPU: 0 COMMAND: "bash"
> > #0 [ffff88035b949570] machine_kexec at ffffffff8103281b
> > #1 [ffff88035b9495d0] crash_kexec at ffffffff810ba662
> > #2 [ffff88035b9496a0] oops_end at ffffffff81501290
> > #3 [ffff88035b9496d0] no_context at ffffffff81043bab
> > #4 [ffff88035b949720] __bad_area_nosemaphore at ffffffff81043e35
> > #5 [ffff88035b949770] bad_area at ffffffff81043f5e
> > #6 [ffff88035b9497a0] __do_page_fault at ffffffff81044710
> > #7 [ffff88035b9498c0] do_page_fault at ffffffff8150326e
> > #8 [ffff88035b9498f0] page_fault at ffffffff81500625
> > [exception RIP: ahaann+47]
> > RIP: ffffffffa06ce48f RSP: ffff88035b9499a8 RFLAGS: 00010246
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88035daef4e0
> > RBP: ffff88035b9499b8 R8: 0000000004a47daf R9: ffffffffa06dae99
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000007
> > R13: 00007fc82f4b8000 R14: 000000000000000a R15: 0000000000000000
> > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > #9 [ffff88035b9499c0] ahaecho at ffffffffa06d2899 [ahadrv]
> > #10 [ffff88035b949a00] writectl at ffffffffa06c366e [ahadrv]
> > #11 [ffff88035b949e40] writeaha at ffffffffa06d3e7b [ahadrv]
> > #12 [ffff88035b949e60] proc_file_write at ffffffff811e6e44
> > #13 [ffff88035b949ea0] proc_reg_write at ffffffff811e0abe
> > #14 [ffff88035b949ef0] vfs_write at ffffffff8117b068
> > #15 [ffff88035b949f30] sys_write at ffffffff8117ba81
> > #16 [ffff88035b949f80] system_call_fastpath at ffffffff8100b0f2
> > RIP: 0000003a29ada3c0 RSP: 00007ffffaec6830 RFLAGS: 00010202
> > RAX: 0000000000000001 RBX: ffffffff8100b0f2 RCX: 0000000000000065
> > RDX: 000000000000000a RSI: 00007fc82f4b8000 RDI: 0000000000000001
> > RBP: 00007fc82f4b8000 R8: 000000000000000a R9: 00007fc82f4aa700
> > R10: 00000000fffffff7 R11: 0000000000000246 R12: 000000000000000a
> > R13: 0000003a29d8c780 R14: 000000000000000a R15: 0000000001e18460
> > ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
> > crash>
> >
> >
> > 1. Are the hex addr in [] right before the function name the stack
> > frame ptr for that function?
>
> On x86_64 machines, the "at <address>" shown is the address in that
frame's
> function where the call instruction that it has made will return to. So
for
> example, taking frame #15, where "sys_write at ffffffff8117ba81" has
called
> vfs_write(), you can disassemble all instructions from the beginning of
> sys_write() to that address like this example:
>
> crash> dis -r ffffffff80016e6b
> 0xffffffff80016e26 <sys_write>: push %r13
> 0xffffffff80016e28 <sys_write+2>: mov %rsi,%r13
> 0xffffffff80016e2b <sys_write+5>: push %r12
> 0xffffffff80016e2d <sys_write+7>: mov $0xfffffffffffffff7,%r12
> 0xffffffff80016e34 <sys_write+14>: push %rbp
> 0xffffffff80016e35 <sys_write+15>: mov %rdx,%rbp
> 0xffffffff80016e38 <sys_write+18>: push %rbx
> 0xffffffff80016e39 <sys_write+19>: sub $0x18,%rsp
> 0xffffffff80016e3d <sys_write+23>: lea 0x14(%rsp),%rsi
> 0xffffffff80016e42 <sys_write+28>: callq 0xffffffff8000b5b4
<fget_light>
> 0xffffffff80016e47 <sys_write+33>: test %rax,%rax
> 0xffffffff80016e4a <sys_write+36>: mov %rax,%rbx
> 0xffffffff80016e4d <sys_write+39>: je 0xffffffff80016e86
<sys_write+96>
> 0xffffffff80016e4f <sys_write+41>: mov 0x38(%rax),%rax
> 0xffffffff80016e53 <sys_write+45>: lea 0x8(%rsp),%rcx
> 0xffffffff80016e58 <sys_write+50>: mov %rbp,%rdx
> 0xffffffff80016e5b <sys_write+53>: mov %r13,%rsi
> 0xffffffff80016e5e <sys_write+56>: mov %rbx,%rdi
> 0xffffffff80016e61 <sys_write+59>: mov %rax,0x8(%rsp)
> 0xffffffff80016e66 <sys_write+64>: callq 0xffffffff800164d0
<vfs_write>
> 0xffffffff80016e6b <sys_write+69>: mov %rax,%r12
> crash>
>
> And the stack address of the frame contains that return address location.
Just to clarify -- the answer to your question is the that the
address in the the [brackets] is the stack address that contains
the return address location.
> > 2. I am assuming the panic occurred in function ahaann() (and not in
> > ahaecho() ). Is that right?
>
> That's correct. The exception occurred precisely when executing the
> instruction here: [exception RIP: ahadrv], which is at RIP
> ffffffffa06ce48f.
And to clarify the above -- where I made a cut-and-paste error -- I meant
to state:
The exception occurred precisely when executing the instruction
here: [exception RIP: ahaann+47], which is at RIP ffffffffa06ce48f
Sorry for any confusion...
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility