Re: [Crash-utility] [PATCH] bug on get_be_long() and improvement of bt
by Dave Anderson
----- "Hu Tao" <hutao(a)cn.fujitsu.com> wrote:
> Hi Dave,
>
> These are updated patches tested with SMP system and panic task.
>
> When testing a x86 guest, I found another bug about reading cpu
> registers from dumpfile. Qemu simulated system is x86_64
> (qemu-system-x86_64), guest OS is x86. When crash reads cpu registers
> from dumpfile, it uses cpu_load_32(), this will read gp registers by
> get_be_long(fp, 32), that is, treate them as 32bits. But in fact,
> qemu-system-x86_64 saves 64bits for each of them(although guest OS
> uses only lower 32 bits). As a result, crash gets wrong cpu gp
> register values.
As I understand it, you're running a 32-bit guest on a 64-bit host.
If you were to read 64-bit register values instead of 32-bit register
values, wouldn't that cause the file offsets of the subsequent get_xxx()
calls in cpu_load() to read from the wrong file offsets? And then
that would leave the ending file offset incorrect, such that the
qemu_load() loop would fail to find the next device?
In other words, the cpu_load() function, which is used for both
32-bit and 64-bit guests, must be reading the correct amount of
data from the "cpu" device, or else qemu_load() would fail to
find the next device in the next location in the dumpfile.
> Is there any way we can know from dumpfile that these gp
> registers(and those similar registers) are 32bits or 64bits?
I don't know. If what you say is true, when would those registers
ever be 32-bit values?
Dave
14 years, 1 month
Re: [Crash-utility] [patch] crash on a KVM-generated dump
by Sami Liedes
On Mon, Oct 18, 2010 at 05:01:39PM -0400, Dave Anderson wrote:
> Hi Sami,
>
> Can you try the attached patch on your dumpfile containing the "slirp" device?
>
> Given that the crash utility really only cares about the "ram", "cpu" and
> "cpu_common" devices, when the patch encounters a device like "slirp" that's
> not in the existing devices table, it just skips it and searches for the next
> "known" device.
Of course. Seems to work fine.
Sami
14 years, 1 month
Re: [Crash-utility] trace.so failing to load on newer kernels
by Dave Anderson
----- "Jeff Moyer" <jmoyer(a)redhat.com> wrote:
> Hi,
>
> I was trying to use the trace.so extension module, but it was bailing
> out early with no explanation. I tracked it down to the fact that the
> system member of the trace_event_call structure no longer exists. It
> was moved up to the class structure. The change was introduced in this
> upstream commit:
>
> commit 8f0820183056ad26dabc0202115848a92f1143fc
> Author: Steven Rostedt <srostedt(a)redhat.com>
> Date: Tue Apr 20 10:47:33 2010 -0400
>
> tracing: Create class struct for events
>
> I don't have the cycles to fix this up right now, so I was hoping
> someone else would. ;-) Bonus points for printing useful error messages
> when the module fails to load for some reason.
>
> Cheers,
> Jeff
(I've added the author Lai Jiangshan to the cc: list to address this issue.)
Hello Lai,
Can I also make a couple suggestions/requests when you fix this issue?
(1) There should be a protection mechanism in place to prevent the use
of a bogus structure member offset in any virtual address calculation.
(2) I've also run into the frustration of trying to figure out which
of the multiple failure reasons that can occur when the command
ftrace_init fails to load, so I agree with Jeff that it would be
very helpful to put some error messages in place.
(3) When you make the fix for for the movement of the "system" member
from the ftrace_event_call to the ftrace_event_class structure,
please make it backwards-compatible so that the module still
works for earlier kernels.
Thanks,
Dave
14 years, 1 month
Re: [Crash-utility] [patch] crash on a KVM-generated dump
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> ----- "Sami Liedes" <sliedes(a)cc.hut.fi> wrote:
>
> > On Fri, Oct 08, 2010 at 02:48:11PM -0400, Dave Anderson wrote:
> > > Can you send me the -d1 output from that dumpfile session with
> > > your slirp-patch applied? Like this:
> > >
> > > # crash -d1 vmlinux dumpfile > /tmp/junk
> > > q
> > > #
> >
> > Of course. Attached.
>
> OK, I was hoping that perhaps it would show up after the cpu and cpu_common
> devices, because if that were the case, we could just bail out on reading
> any more devices.
>
> As far as the slirp device handling, I got this response from Paolo:
>
> > slirp is not supported on RHEL, so I never encountered it.
> >
> > Looking at QEMU's source code, SLIRP savevm/loadvm is a mess, and 131 is
> > definitely not ok in general because there are several variable-length
> > fields. Looking at it later.
> >
> > Paolo
Hi Sami,
Can you try the attached patch on your dumpfile containing the "slirp" device?
Given that the crash utility really only cares about the "ram", "cpu" and
"cpu_common" devices, when the patch encounters a device like "slirp" that's
not in the existing devices table, it just skips it and searches for the next
"known" device.
Dave
14 years, 1 month
Another backtrace problem when running into exception stack(x86_64)
by Hu Tao
Hi Dave,
When we run into exception stack(test module attached) and take a
dumpfile by virsh dump, it seems crash doesn't show backtrace
properly.
crash shows:
crash> bt -a
PID: 1115 TASK: ffff88001e082d60 CPU: 0 COMMAND: "bash"
#0 [ffff88001f8a3c58] schedule at ffffffff813e9a41
#1 [ffff88001f8a3c60] _raw_spin_unlock at ffffffff813eb486
#2 [ffff88001f8a3c70] vt_console_print at ffffffff8123fc17
#3 [ffff88001f8a3cf0] _raw_spin_unlock_irqrestore at ffffffff813eb4c3
#4 [ffff88001f8a3d10] _raw_spin_unlock_irqrestore at ffffffff813eb4c3
#5 [ffff88001f8a3d20] release_console_sem at ffffffff8103c6a6
#6 [ffff88001f8a3d50] vprintk at ffffffff8103cca0
#7 [ffff88001f8a3df0] printk at ffffffff813e90e4
#8 [ffff88001f8a3e50] __handle_sysrq at ffffffff8124585d
#9 [ffff88001f8a3e90] write_sysrq_trigger at ffffffff8124593f
#10 [ffff88001f8a3eb0] proc_reg_write at ffffffff8112b9f0
#11 [ffff88001f8a3f00] vfs_write at ffffffff810e9101
#12 [ffff88001f8a3f40] sys_write at ffffffff810e9213
#13 [ffff88001f8a3f80] system_call_fastpath at ffffffff81002a82
RIP: 00007f26509b2200 RSP: 00007fff188c9900 RFLAGS: 00010206
RAX: 0000000000000001 RBX: ffffffff81002a82 RCX: 0000000000000400
RDX: 0000000000000002 RSI: 00007f2651293000 RDI: 0000000000000001
RBP: 00007f2651293000 R8: 000000000000000a R9: 00007f2651296700
R10: 00000000ffffffff R11: 0000000000000246 R12: 00007f2650c57780
R13: 0000000000000002 R14: 0000000000000002 R15: 0000000000000000
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
The module's output:
[root@localhost ~]# insmod km_kprobe.ko
kprobe registered
[root@localhost ~]# echo q > /proc/sysrq-trigger
SysRq : Show clockevent devices & pending hrtimers (no others)
post_handler will loop.
Pid: 1116, comm: bash Not tainted 2.6.36-rc6-00140-g183b469-dirty #5
Call Trace:
<#DB> ffff880001a09da0 [<ffffffff8124552e>] ? sysrq_handle_show_timers+0x1/0xb
ffff880001a09dc0 [<ffffffffa00ea017>] handler_post+0x17/0x1c [km_kprobe]
ffff880001a09dd0 [<ffffffff813edb44>] kprobe_exceptions_notify+0x376/0x460
ffff880001a09df0 [<ffffffff8124552d>] ? sysrq_handle_show_timers+0x0/0xb
ffff880001a09e00 [<ffffffff813ed971>] ? kprobe_exceptions_notify+0x1a3/0x460
ffff880001a09e40 [<ffffffff813ee7e7>] notifier_call_chain+0x32/0x5e
ffff880001a09e80 [<ffffffff813ee850>] __atomic_notifier_call_chain+0x3d/0x6b
ffff880001a09ec0 [<ffffffff813ee88d>] atomic_notifier_call_chain+0xf/0x11
ffff880001a09ed0 [<ffffffff813ee8bd>] notify_die+0x2e/0x30
ffff880001a09f00 [<ffffffff813ec035>] do_debug+0x93/0x156
ffff880001a09f50 [<ffffffff813ebbb8>] debug+0x28/0x40
ffff880001a09fd8 [<ffffffff8124552e>] ? sysrq_handle_show_timers+0x1/0xb
<<EOE>> ffff88001d99fe50 [<ffffffff8124585d>] ? __handle_sysrq+0xba/0x156
ffff88001d99fe90 [<ffffffff8124593f>] write_sysrq_trigger+0x46/0x4e
ffff88001d99fea0 [<ffffffff812458f9>] ? write_sysrq_trigger+0x0/0x4e
ffff88001d99feb0 [<ffffffff8112b9f0>] proc_reg_write+0x8d/0xac
ffff88001d99ff00 [<ffffffff810e9101>] vfs_write+0xa9/0x105
ffff88001d99ff40 [<ffffffff810e9213>] sys_write+0x45/0x69
ffff88001d99ff80 [<ffffffff81002a82>] system_call_fastpath+0x16/0x1b
Notice that two backtrace output differ from <<EOE>>, crash doesn't show
exception stack frames. Although crash does check exception stack
against sp, but the problem seems to be we can't get right sp value
(ffff880001a09da0 in this example) from the dump file. Am I right? Is
there any we can get right sp value from dump file?
I did a manually check on these stack frames using crash on the dump
file(rd then sym), and the stack contents are the same as the module's
output.
BTW, bt -S ffff880001a09da0 causes crash to seg fault.
--
Thanks,
Hu Tao
14 years, 1 month
trace.so failing to load on newer kernels
by Jeff Moyer
Hi,
I was trying to use the trace.so extension module, but it was bailing
out early with no explanation. I tracked it down to the fact that the
system member of the trace_event_call structure no longer exists. It
was moved up to the class structure. The change was introduced in this
upstream commit:
commit 8f0820183056ad26dabc0202115848a92f1143fc
Author: Steven Rostedt <srostedt(a)redhat.com>
Date: Tue Apr 20 10:47:33 2010 -0400
tracing: Create class struct for events
I don't have the cycles to fix this up right now, so I was hoping
someone else would. ;-) Bonus points for printing useful error messages
when the module fails to load for some reason.
Cheers,
Jeff
14 years, 1 month
Re: [Crash-utility] [patch] crash on a KVM-generated dump
by Dave Anderson
----- "Sami Liedes" <sliedes(a)cc.hut.fi> wrote:
> On Fri, Oct 08, 2010 at 02:48:11PM -0400, Dave Anderson wrote:
> > Can you send me the -d1 output from that dumpfile session with
> > your slirp-patch applied? Like this:
> >
> > # crash -d1 vmlinux dumpfile > /tmp/junk
> > q
> > #
>
> Of course. Attached.
OK, I was hoping that perhaps it would show up after the cpu and cpu_common
devices, because if that were the case, we could just bail out on reading
any more devices.
> I also got this on stderr:
>
> ------------------------------------------------------------
> WARNING: Because this kernel was compiled with gcc version 4.4.5, certain
> commands or command options may fail unless crash is invoked with
> the "--readnow" command line option.
> ------------------------------------------------------------
>
That debug message is essentially harmless/useless. There was a gcc version
in the 3.4.0 timeframe that required the embedded gdb to have --readnow passed
to it in order to gather the required debuginfo data. It was fixed subsequently,
although I'm not sure in which gcc version. I should probably at least cap the
message at gcc 4.x.x or something like that.
As far as the slirp device handling, I got this response from Paolo:
> slirp is not supported on RHEL, so I never encountered it.
>
> Looking at QEMU's source code, SLIRP savevm/loadvm is a mess, and 131 is
> definitely not ok in general because there are several variable-length
> fields. Looking at it later.
>
> Paolo
Thanks,
Dave
14 years, 1 month
Re: [Crash-utility] [patch] crash on a KVM-generated dump
by Sami Liedes
On Fri, Oct 08, 2010 at 02:48:11PM -0400, Dave Anderson wrote:
> Can you send me the -d1 output from that dumpfile session with
> your slirp-patch applied? Like this:
>
> # crash -d1 vmlinux dumpfile > /tmp/junk
> q
> #
Of course. Attached.
I also got this on stderr:
------------------------------------------------------------
WARNING: Because this kernel was compiled with gcc version 4.4.5, certain
commands or command options may fail unless crash is invoked with
the "--readnow" command line option.
------------------------------------------------------------
Sami
14 years, 1 month
Re: [Crash-utility] [patch] crash on a KVM-generated dump
by Dave Anderson
----- "Sami Liedes" <sliedes(a)cc.hut.fi> wrote:
> On Fri, Oct 08, 2010 at 11:26:35AM -0400, Dave Anderson wrote:
> > Looking at the qemu-kvm sources, it's not obvious to me what the size
> > of the the "slirp" device would be in the dumpfile. And apparently
> > Red Hat kernels don't use that device or somebody else would have
> > bumped into it, but I'll check with Paolo Bonzini to verify the number.
>
> I actually ran into it with KVM under virsh. The section disappears if
> there's no -net user option to the kvm.
>
> Sami
Can you send me the -d1 output from that dumpfile session with
your slirp-patch applied? Like this:
# crash -d1 vmlinux dumpfile > /tmp/junk
q
#
Thanks,
Dave
14 years, 1 month
Re: [Crash-utility] [patch] crash on a KVM-generated dump
by Sami Liedes
On Fri, Oct 08, 2010 at 11:26:35AM -0400, Dave Anderson wrote:
> Looking at the qemu-kvm sources, it's not obvious to me what the size
> of the the "slirp" device would be in the dumpfile. And apparently
> Red Hat kernels don't use that device or somebody else would have
> bumped into it, but I'll check with Paolo Bonzini to verify the number.
I actually ran into it with KVM under virsh. The section disappears if
there's no -net user option to the kvm.
Sami
14 years, 1 month