October 2010 - Crash-utility - Crash Utility List Archives

Re: [Crash-utility] [PATCH] bug on get_be_long() and improvement of bt

by Dave Anderson

----- "Hu Tao" <hutao(a)cn.fujitsu.com> wrote: > Hi Dave, > > These are updated patches tested with SMP system and panic task. > > When testing a x86 guest, I found another bug about reading cpu > registers from dumpfile. Qemu simulated system is x86_64 > (qemu-system-x86_64), guest OS is x86. When crash reads cpu registers > from dumpfile, it uses cpu_load_32(), this will read gp registers by > get_be_long(fp, 32), that is, treate them as 32bits. But in fact, > qemu-system-x86_64 saves 64bits for each of them(although guest OS > uses only lower 32 bits). As a result, crash gets wrong cpu gp > register values. As I understand it, you're running a 32-bit guest on a 64-bit host. If you were to read 64-bit register values instead of 32-bit register values, wouldn't that cause the file offsets of the subsequent get_xxx() calls in cpu_load() to read from the wrong file offsets? And then that would leave the ending file offset incorrect, such that the qemu_load() loop would fail to find the next device? In other words, the cpu_load() function, which is used for both 32-bit and 64-bit guests, must be reading the correct amount of data from the "cpu" device, or else qemu_load() would fail to find the next device in the next location in the dumpfile. > Is there any way we can know from dumpfile that these gp > registers(and those similar registers) are 32bits or 64bits? I don't know. If what you say is true, when would those registers ever be 32-bit values? Dave

15 years, 9 months

2
1
0 / 0

Re: [Crash-utility] [patch] crash on a KVM-generated dump

by Sami Liedes

On Mon, Oct 18, 2010 at 05:01:39PM -0400, Dave Anderson wrote: > Hi Sami, > > Can you try the attached patch on your dumpfile containing the "slirp" device? > > Given that the crash utility really only cares about the "ram", "cpu" and > "cpu_common" devices, when the patch encounters a device like "slirp" that's > not in the existing devices table, it just skips it and searches for the next > "known" device. Of course. Seems to work fine. Sami

15 years, 9 months

1
0
0 / 0

Re: [Crash-utility] trace.so failing to load on newer kernels

by Dave Anderson

----- "Jeff Moyer" <jmoyer(a)redhat.com> wrote: > Hi, > > I was trying to use the trace.so extension module, but it was bailing > out early with no explanation. I tracked it down to the fact that the > system member of the trace_event_call structure no longer exists. It > was moved up to the class structure. The change was introduced in this > upstream commit: > > commit 8f0820183056ad26dabc0202115848a92f1143fc > Author: Steven Rostedt <srostedt(a)redhat.com> > Date: Tue Apr 20 10:47:33 2010 -0400 > > tracing: Create class struct for events > > I don't have the cycles to fix this up right now, so I was hoping > someone else would. ;-) Bonus points for printing useful error messages > when the module fails to load for some reason. > > Cheers, > Jeff (I've added the author Lai Jiangshan to the cc: list to address this issue.) Hello Lai, Can I also make a couple suggestions/requests when you fix this issue? (1) There should be a protection mechanism in place to prevent the use of a bogus structure member offset in any virtual address calculation. (2) I've also run into the frustration of trying to figure out which of the multiple failure reasons that can occur when the command ftrace_init fails to load, so I agree with Jeff that it would be very helpful to put some error messages in place. (3) When you make the fix for for the movement of the "system" member from the ftrace_event_call to the ftrace_event_class structure, please make it backwards-compatible so that the module still works for earlier kernels. Thanks, Dave

15 years, 9 months

2
1
0 / 0

Re: [Crash-utility] [patch] crash on a KVM-generated dump

by Dave Anderson

----- "Dave Anderson" <anderson(a)redhat.com> wrote: > ----- "Sami Liedes" <sliedes(a)cc.hut.fi> wrote: > > > On Fri, Oct 08, 2010 at 02:48:11PM -0400, Dave Anderson wrote: > > > Can you send me the -d1 output from that dumpfile session with > > > your slirp-patch applied? Like this: > > > > > > # crash -d1 vmlinux dumpfile > /tmp/junk > > > q > > > # > > > > Of course. Attached. > > OK, I was hoping that perhaps it would show up after the cpu and cpu_common > devices, because if that were the case, we could just bail out on reading > any more devices. > > As far as the slirp device handling, I got this response from Paolo: > > > slirp is not supported on RHEL, so I never encountered it. > > > > Looking at QEMU's source code, SLIRP savevm/loadvm is a mess, and 131 is > > definitely not ok in general because there are several variable-length > > fields. Looking at it later. > > > > Paolo Hi Sami, Can you try the attached patch on your dumpfile containing the "slirp" device? Given that the crash utility really only cares about the "ram", "cpu" and "cpu_common" devices, when the patch encounters a device like "slirp" that's not in the existing devices table, it just skips it and searches for the next "known" device. Dave

15 years, 9 months

1
0
0 / 0

Another backtrace problem when running into exception stack(x86_64)

by Hu Tao

Hi Dave, When we run into exception stack(test module attached) and take a dumpfile by virsh dump, it seems crash doesn't show backtrace properly. crash shows: crash> bt -a PID: 1115 TASK: ffff88001e082d60 CPU: 0 COMMAND: "bash" #0 [ffff88001f8a3c58] schedule at ffffffff813e9a41 #1 [ffff88001f8a3c60] _raw_spin_unlock at ffffffff813eb486 #2 [ffff88001f8a3c70] vt_console_print at ffffffff8123fc17 #3 [ffff88001f8a3cf0] _raw_spin_unlock_irqrestore at ffffffff813eb4c3 #4 [ffff88001f8a3d10] _raw_spin_unlock_irqrestore at ffffffff813eb4c3 #5 [ffff88001f8a3d20] release_console_sem at ffffffff8103c6a6 #6 [ffff88001f8a3d50] vprintk at ffffffff8103cca0 #7 [ffff88001f8a3df0] printk at ffffffff813e90e4 #8 [ffff88001f8a3e50] __handle_sysrq at ffffffff8124585d #9 [ffff88001f8a3e90] write_sysrq_trigger at ffffffff8124593f #10 [ffff88001f8a3eb0] proc_reg_write at ffffffff8112b9f0 #11 [ffff88001f8a3f00] vfs_write at ffffffff810e9101 #12 [ffff88001f8a3f40] sys_write at ffffffff810e9213 #13 [ffff88001f8a3f80] system_call_fastpath at ffffffff81002a82 RIP: 00007f26509b2200 RSP: 00007fff188c9900 RFLAGS: 00010206 RAX: 0000000000000001 RBX: ffffffff81002a82 RCX: 0000000000000400 RDX: 0000000000000002 RSI: 00007f2651293000 RDI: 0000000000000001 RBP: 00007f2651293000 R8: 000000000000000a R9: 00007f2651296700 R10: 00000000ffffffff R11: 0000000000000246 R12: 00007f2650c57780 R13: 0000000000000002 R14: 0000000000000002 R15: 0000000000000000 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b The module's output: [root@localhost ~]# insmod km_kprobe.ko kprobe registered [root@localhost ~]# echo q > /proc/sysrq-trigger SysRq : Show clockevent devices & pending hrtimers (no others) post_handler will loop. Pid: 1116, comm: bash Not tainted 2.6.36-rc6-00140-g183b469-dirty #5 Call Trace: <#DB> ffff880001a09da0 [<ffffffff8124552e>] ? sysrq_handle_show_timers+0x1/0xb ffff880001a09dc0 [<ffffffffa00ea017>] handler_post+0x17/0x1c [km_kprobe] ffff880001a09dd0 [<ffffffff813edb44>] kprobe_exceptions_notify+0x376/0x460 ffff880001a09df0 [<ffffffff8124552d>] ? sysrq_handle_show_timers+0x0/0xb ffff880001a09e00 [<ffffffff813ed971>] ? kprobe_exceptions_notify+0x1a3/0x460 ffff880001a09e40 [<ffffffff813ee7e7>] notifier_call_chain+0x32/0x5e ffff880001a09e80 [<ffffffff813ee850>] __atomic_notifier_call_chain+0x3d/0x6b ffff880001a09ec0 [<ffffffff813ee88d>] atomic_notifier_call_chain+0xf/0x11 ffff880001a09ed0 [<ffffffff813ee8bd>] notify_die+0x2e/0x30 ffff880001a09f00 [<ffffffff813ec035>] do_debug+0x93/0x156 ffff880001a09f50 [<ffffffff813ebbb8>] debug+0x28/0x40 ffff880001a09fd8 [<ffffffff8124552e>] ? sysrq_handle_show_timers+0x1/0xb <<EOE>> ffff88001d99fe50 [<ffffffff8124585d>] ? __handle_sysrq+0xba/0x156 ffff88001d99fe90 [<ffffffff8124593f>] write_sysrq_trigger+0x46/0x4e ffff88001d99fea0 [<ffffffff812458f9>] ? write_sysrq_trigger+0x0/0x4e ffff88001d99feb0 [<ffffffff8112b9f0>] proc_reg_write+0x8d/0xac ffff88001d99ff00 [<ffffffff810e9101>] vfs_write+0xa9/0x105 ffff88001d99ff40 [<ffffffff810e9213>] sys_write+0x45/0x69 ffff88001d99ff80 [<ffffffff81002a82>] system_call_fastpath+0x16/0x1b Notice that two backtrace output differ from <<EOE>>, crash doesn't show exception stack frames. Although crash does check exception stack against sp, but the problem seems to be we can't get right sp value (ffff880001a09da0 in this example) from the dump file. Am I right? Is there any we can get right sp value from dump file? I did a manually check on these stack frames using crash on the dump file(rd then sym), and the stack contents are the same as the module's output. BTW, bt -S ffff880001a09da0 causes crash to seg fault. -- Thanks, Hu Tao

15 years, 9 months

2
3
0 / 0

trace.so failing to load on newer kernels

by Jeff Moyer

Hi, I was trying to use the trace.so extension module, but it was bailing out early with no explanation. I tracked it down to the fact that the system member of the trace_event_call structure no longer exists. It was moved up to the class structure. The change was introduced in this upstream commit: commit 8f0820183056ad26dabc0202115848a92f1143fc Author: Steven Rostedt <srostedt(a)redhat.com> Date: Tue Apr 20 10:47:33 2010 -0400 tracing: Create class struct for events I don't have the cycles to fix this up right now, so I was hoping someone else would. ;-) Bonus points for printing useful error messages when the module fails to load for some reason. Cheers, Jeff

15 years, 9 months

1
0
0 / 0

Re: [Crash-utility] [patch] crash on a KVM-generated dump

by Dave Anderson

----- "Sami Liedes" <sliedes(a)cc.hut.fi> wrote: > On Fri, Oct 08, 2010 at 02:48:11PM -0400, Dave Anderson wrote: > > Can you send me the -d1 output from that dumpfile session with > > your slirp-patch applied? Like this: > > > > # crash -d1 vmlinux dumpfile > /tmp/junk > > q > > # > > Of course. Attached. OK, I was hoping that perhaps it would show up after the cpu and cpu_common devices, because if that were the case, we could just bail out on reading any more devices. > I also got this on stderr: > > ------------------------------------------------------------ > WARNING: Because this kernel was compiled with gcc version 4.4.5, certain > commands or command options may fail unless crash is invoked with > the "--readnow" command line option. > ------------------------------------------------------------ > That debug message is essentially harmless/useless. There was a gcc version in the 3.4.0 timeframe that required the embedded gdb to have --readnow passed to it in order to gather the required debuginfo data. It was fixed subsequently, although I'm not sure in which gcc version. I should probably at least cap the message at gcc 4.x.x or something like that. As far as the slirp device handling, I got this response from Paolo: > slirp is not supported on RHEL, so I never encountered it. > > Looking at QEMU's source code, SLIRP savevm/loadvm is a mess, and 131 is > definitely not ok in general because there are several variable-length > fields. Looking at it later. > > Paolo Thanks, Dave

15 years, 9 months

1
0
0 / 0

Re: [Crash-utility] [patch] crash on a KVM-generated dump

by Sami Liedes

On Fri, Oct 08, 2010 at 02:48:11PM -0400, Dave Anderson wrote: > Can you send me the -d1 output from that dumpfile session with > your slirp-patch applied? Like this: > > # crash -d1 vmlinux dumpfile > /tmp/junk > q > # Of course. Attached. I also got this on stderr: ------------------------------------------------------------ WARNING: Because this kernel was compiled with gcc version 4.4.5, certain commands or command options may fail unless crash is invoked with the "--readnow" command line option. ------------------------------------------------------------ Sami

15 years, 9 months

1
0
0 / 0

Re: [Crash-utility] [patch] crash on a KVM-generated dump

by Dave Anderson

----- "Sami Liedes" <sliedes(a)cc.hut.fi> wrote: > On Fri, Oct 08, 2010 at 11:26:35AM -0400, Dave Anderson wrote: > > Looking at the qemu-kvm sources, it's not obvious to me what the size > > of the the "slirp" device would be in the dumpfile. And apparently > > Red Hat kernels don't use that device or somebody else would have > > bumped into it, but I'll check with Paolo Bonzini to verify the number. > > I actually ran into it with KVM under virsh. The section disappears if > there's no -net user option to the kvm. > > Sami Can you send me the -d1 output from that dumpfile session with your slirp-patch applied? Like this: # crash -d1 vmlinux dumpfile > /tmp/junk q # Thanks, Dave

15 years, 9 months

1
0
0 / 0

Re: [Crash-utility] [patch] crash on a KVM-generated dump

by Sami Liedes

On Fri, Oct 08, 2010 at 11:26:35AM -0400, Dave Anderson wrote: > Looking at the qemu-kvm sources, it's not obvious to me what the size > of the the "slirp" device would be in the dumpfile. And apparently > Red Hat kernels don't use that device or somebody else would have > bumped into it, but I'll check with Paolo Bonzini to verify the number. I actually ran into it with KVM under virsh. The section disappears if there's no -net user option to the kvm. Sami

15 years, 9 months

1
0
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility October 2010