May 2009 - Crash-utility - Crash Utility List Archives

Re: [PATCH 0/3] Display local variables & function parameters from stack frames

by Dave Anderson

----- "Sharyathi Nagesh" <sharyath(a)in.ibm.com> wrote: > Dave > Attaching the patches that we have developed till now. > We have tried to accommodate your suggestions regarding Makefiles in > this patches. We have tried to provide unwinding support in ppc64, x86_64 > and x86 architecture. The feature is tested on ppc64 with dumps taken through > panic and echo c > /proc/sysrq-trigger. We have observed that many times > the variable information is optimized out and accessing variable > information is not possible (volatile variables are shown correctly), > IMHO this behavior is similar to that shown by gdb, please spare your thoughts > on this. This implementation works reasonably on ppc64, we were able to unwind > to stack and get variable information. But we are still facing problem with x86 > and x86_64 architecture, Our generic implementation for stack unwinding using dwarf > is not succeeding, since frame pointer is optimized out by default. IMHO stack > unwinding as talked about in ABI's is not valid for the same reason. But if you > enable CONFIG_FRAME_POINTER in kernel, then unwinding using bp will be possible. > > Please let me know your thoughts in general Well, it's not like I didn't warn you several months ago when you decided to undertake this task... My general thoughts are the still the same: - Tesing with just panic() and "echo c > /proc/sysrq-trigger" are pretty much exceptional ways of the kernel being shut down. Typically a "real" crash is due to a BUG() or segmentation violation, or some scenario where an exception frame is laid down on the stack -- and perhaps a switch is made to another kernel stack entirely. Coming from panic() or "echo c", the trace will stay on the kernel stack and no exception will be raised. (unless you're using a RHEL4 or earlier kernel that called BUG() if either netdump/diskdump were in play.) In any case, without an exception, you're making it "easy" for the unwinder. Your code is going to have to deal with exception frames on the stack and with potential stack-switches. - These days you'd be hard-pressed to find a distribution kernel that is compiled with CONFIG_FRAME_POINTER. And if it is, the handling can be folded into the currently-existing backtracer. (In fact, the original x86 backtrace code does have a separate code path for kernels that were built without -fomit-frame-pointer.) But by the time x86_64 came around, -fomit-frame-pointer was pretty much the default, and I don't think I've ever seen an x86_64 dumpfile with frame-pointers, certainly not from a distributor. So, yes it's nice you've got something that works with that configuration, but it's pretty unrealistic. - The problems you're running into getting local variables is not surprising. And without a rock-solid backtrace as a prerequisite, it doesn't make any sense to even try getting local variables. - Your dependency on getting the starting register set from the ELF header presumes just that -- i.e., that you've got them and that they are meaningful. I'm not sure how/if you've worked around the "hand-carved" pt_regs created when no exception frame has been laid down (i.e. when panic() or "echo c" are used.) But the bigger picture is that there are multiple types of dumpfiles, and while kdump ELF vmcores are prevalent, they are often only the starting point. With the huge cumbersome memory sizes of modern machines, it's more likely that the ELF vmcore seen by the secondary kdump kernel will be run through "makedumpfile -c ..." into the compressed kdump format prior to the dump ever being seen by whoever analyzes it. At Red Hat, the support organization pretty much makes all customers use "makedumpfile -c ..." by default. And of course with the compressed kdump format, there is no register set as your starting point. That all being said, here's what I can do for you. I will take your new functions that you've added to netdump.c, and their protos in defs.h, and apply them to the next crash utility release. Since they are not being used by anybody but your module at the moment, it's harmless to add them, and then you will not have to patch the crash sources at all in your subsequent module patch/posts. Dave > > Here I am attaching 3 patches > 1. Display-local-variables-and-function-parameters.patch > Provides feature to print local variables and arguments in the current > stack frame using dwarf information. > > 2. Provide-stack-unwinding-feature.patch > Provides option to unwind the stack, using dwarf information, works > on ppc64 > > 3. unwind_x86_64.patch > provides unwinding feature in x86_64 with out using dwarf information > > but requires CONFIG_FRAME_POINTER to be enabled

16 years, 1 month

2
1
0 / 0

[PATCH 0/3] Display local variables & function parameters from stack frames

by Sharyathi Nagesh

Dave Attaching the patches that we have developed till now. We have tried to accommodate your suggestions regarding Makefiles in this patches. We have tried to provide unwinding support in ppc64, x86_64 and x86 architecture. The feature is tested on ppc64 with dumps taken through panic and echo c > /proc/sysrq-trigger. We have observed that many times the variable information is optimized out and accessing variable information is not possible (volatile variables are shown correctly), IMHO this behavior is similar to that shown by gdb, please spare your thoughts on this This implementation works reasonably on ppc64, we were able to unwind to stack and get variable information. But we are still facing problem with x86 and x86_64 architecture, Our generic implementation for stack unwinding using dwarf is not succeeding , since frame pointer is optimized out by default. IMHO stack unwinding as talked about in ABI's is not valid for the same reason. But if you enable CONFIG_FRAME_POINTER in kernel, then unwinding using bp will be possible. Please let me know your thoughts in general Here I am attaching 3 patches 1. Display-local-variables-and-function-parameters.patch Provides feature to print local variables and arguments in the current stack frame using dwarf information. 2. Provide-stack-unwinding-feature.patch Provides option to unwind the stack, using dwarf information, works on ppc64 3. unwind_x86_64.patch provides unwinding feature in x86_64 with out using dwarf information but requires CONFIG_FRAME_POINTER to be enabled Note: It is tested on elfutils-0.137 and above Thank you Sharyathi Nagesh

16 years, 1 month

1
3
0 / 0

Re: [PATCH 0/2] Display local variables & function parameters fromstack frames

by Dave Anderson

> Dave > We had some observation with x86_64 dumps and wanted to know your > opinion on them > On x86_64 dumps for active processes we are reading the register > content from ELF Notes and we found that register content doesn't match > with observed output of bt command. SP and IP register content we got > from ELF_NOTES are breaking the code when we do stack unwinding, using > information from dwarf section, while the unwinding, atleast the first > stage, works with SP and IP got from bt way. > This issue is similar to gdb, gdb too breaks when unwinding is > attempted on this dump. > > I wanted to know what you think about this and how can we proceed. > > 1. Is it reliable way to parse through the stack frame looking for valid > address as is done in 'x86_64_get_dumpfile_stack_frame'. Is it the > right/safe way to do, does any x86_ABI talks about ? I do it that way because I've never wanted to depend upon the ELF prstatus note section, because netdump/diskdump only has the panic cpu's info, and kdump's sections can be difficult to match to a cpu if there has been any cpu hot-plugging. And not to mention that there are several other dumpfile formats supported. You can do it any way you'd like. > 2. It looks like we can't safely rely on ELF_NOTES, is this a known > issue with kexec dumping ? The "sp: ffff88020f471dc8 Breaks unwinding" issue that you're seeing is a result of using "echo c > /proc/sysrq-trigger", or if panic() was called, in which case crash_kexec() is called with a NULL pt_regs pointer. When that's the case, a "fake" register set is hand-created in crash_setup_regs(), which is what you are seeing. Check out the kernel code in crash_setup_regs() -- it just reads the rsp as it was in that function, and populates the IP with current_text_addr(). > 3. If Parsing the stack frame is the right thing to do, can we modify > bt_cmd routines so as to reuse some of the routines for repopulating our > register contents, especially esp/eip. Sorry -- I don't understand what you're asking. Dave > > Scenario We are Facing > ------------------------ > Register Content from ELF_NOTES: Matches with gdb out put > > crash> local display > > IP: ffffffff80255d7b > ax: 1 > bx: 0 > cx: 6237 > dx: 0 > sp: ffff88020f471dc8 <=== Breaks unwinding > bp: 0 > si: 0 > di: ffffffff80596ec0 > cs: 10 > oirg_ax: 8241000001b6 > flags: 46 > ip: ffffffff80255d7b > r8: 0 > r9: ffff880028080c80 > r10: ffff880028080c80 > r11: d805926f0 > r12: 63 > r13: 0 > ------------------------ > crash> bt > PID: 4814 TASK: ffff8802104397f0 CPU: 3 COMMAND: "bash" > #0 [ffff88020f471cf0] machine_kexec at ffffffff8021db38 > #1 [ffff88020f471dc0] crash_kexec at ffffffff80255d9c > #2 [ffff88020f471e80] __handle_sysrq at ffffffff80385756 > #3 [ffff88020f471ec0] write_sysrq_trigger at ffffffff802d291b > #4 [ffff88020f471ed0] proc_reg_write at ffffffff802cca2d > #5 [ffff88020f471f10] vfs_write at ffffffff8029125d > #6 [ffff88020f471f40] sys_write at ffffffff802916e5 > #7 [ffff88020f471f80] system_call_fastpath at ffffffff8020be0b > RIP: 000000311bcc4150 RSP: 00007fff976f40d0 RFLAGS: 00010202 > RAX: 0000000000000001 RBX: ffffffff8020be0b RCX: 00000000000003e > 4 > RDX: 0000000000000002 RSI: 00007f428f6ec000 RDI: 000000000000000 > 1 > RBP: 0000000000000002 R8: 00000000ffffffff R9: 00007f428f6d86e > 0 > R10: 0000000000000072 R11: 0000000000000246 R12: 000000311bf4d76 > 0 > R13: 00007f428f6ec000 R14: 0000000000000002 R15: 000000008f6ec00 > 0 > > Regards > Sharyathi N

16 years, 1 month

1
0
0 / 0

Re: [PATCH 0/2] Display local variables & function parameters fromstack frames

by Sharyathi Nagesh

Dave We had some observation with x86_64 dumps and wanted to know your opinion on them On x86_64 dumps for active processes we are reading the register content from ELF Notes and we found that register content doesn't match with observed output of bt command. SP and IP register content we got from ELF_NOTES are breaking the code when we do stack unwinding, using information from dwarf section, while the unwinding, atleast the first stage, works with SP and IP got from bt way. This issue is similar to gdb, gdb too breaks when unwinding is attempted on this dump. I wanted to know what you think about this and how can we proceed. 1. Is it reliable way to parse through the stack frame looking for valid address as is done in 'x86_64_get_dumpfile_stack_frame'. Is it the right/safe way to do, does any x86_ABI talks about ? 2. It looks like we can't safely rely on ELF_NOTES, is this a known issue with kexec dumping ? 3. If Parsing the stack frame is the right thing to do, can we modify bt_cmd routines so as to reuse some of the routines for repopulating our register contents, especially esp/eip. Scenario We are Facing ------------------------ Register Content from ELF_NOTES: Matches with gdb out put crash> local display IP: ffffffff80255d7b ax: 1 bx: 0 cx: 6237 dx: 0 sp: ffff88020f471dc8 <=== Breaks unwinding bp: 0 si: 0 di: ffffffff80596ec0 cs: 10 oirg_ax: 8241000001b6 flags: 46 ip: ffffffff80255d7b r8: 0 r9: ffff880028080c80 r10: ffff880028080c80 r11: d805926f0 r12: 63 r13: 0 ------------------------ crash> bt PID: 4814 TASK: ffff8802104397f0 CPU: 3 COMMAND: "bash" #0 [ffff88020f471cf0] machine_kexec at ffffffff8021db38 #1 [ffff88020f471dc0] crash_kexec at ffffffff80255d9c #2 [ffff88020f471e80] __handle_sysrq at ffffffff80385756 #3 [ffff88020f471ec0] write_sysrq_trigger at ffffffff802d291b #4 [ffff88020f471ed0] proc_reg_write at ffffffff802cca2d #5 [ffff88020f471f10] vfs_write at ffffffff8029125d #6 [ffff88020f471f40] sys_write at ffffffff802916e5 #7 [ffff88020f471f80] system_call_fastpath at ffffffff8020be0b RIP: 000000311bcc4150 RSP: 00007fff976f40d0 RFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffffff8020be0b RCX: 00000000000003e 4 RDX: 0000000000000002 RSI: 00007f428f6ec000 RDI: 000000000000000 1 RBP: 0000000000000002 R8: 00000000ffffffff R9: 00007f428f6d86e 0 R10: 0000000000000072 R11: 0000000000000246 R12: 000000311bf4d76 0 R13: 00007f428f6ec000 R14: 0000000000000002 R15: 000000008f6ec00 0 Regards Sharyathi N

16 years, 1 month

1
0
0 / 0

Fwd: [Crash-utility] crash without namelist?

by Dave Anderson

Thanks Cai -- the capability will be in RHEL5.4 and it looks to have been put in place in the upstream kernel and sourceforge makedumpfile already. The bugzillas referenced below are restricted access so I removed the numbers, but the option allows you to extract the log buffer from a kdump vmcore into an output file like this: # makedumpfile --dump-dmesg /proc/vmcore outputfile Dave ----- Forwarded Message ----- From: "CAI Qian" <caiqian(a)redhat.com> To: anderson(a)redhat.com Cc: caiqian(a)redhat.com Sent: Monday, May 11, 2009 7:56:36 PM GMT -05:00 US/Canada Eastern Subject: Re: [Crash-utility] crash without namelist? Hi Dave, From: Dave Anderson <anderson(a)redhat.com> Subject: Re: [Crash-utility] crash without namelist? Date: Mon, 11 May 2009 16:55:00 -0400 (EDT) > > ----- "urgrue" <urgrue(a)bulbous.org> wrote: > >> ----- "Jun Koi" <junkoi2004 gmail com> wrote: >> >> But I think sometimes we only have the System.map file in hand, >> >> without the namelist. Can we do anything with that? Because in many >> >> cases only some restricted operations are enough to debug the crashed >> >> dump. >> > >> >The basic design of the crash utility is that it is essentially >> >a huge wrapper around the embedded gdb module, taking advantage >> >of gdb's allowance of an alternative user interface. So during >> >> But is it really so that one can not get _any_ useful info from the dump >> without the namelist? >> Most of the time just the trace is all I need (i.e. that which is >> displayed on the console in a panic). That info is not accessible >> conveniently because a) the console can only show the last bits of it >> and b) I'd rather autoreboot on a panic. >> I know I can use netconsole, but it would just be more convenient if I >> could read this from the core file easily. > > I understand, but that's not how the crash utility is designed. > You basically want to roll your own utility that doesn't have all > the gdb dependencies that crash has. > > If I'm not mistaken, wasn't there a proposed makedumpfile feature that > would just pull the log buffer from a vmcore? Or is that something > from the diskdumputils package? I know I've heard talk of it, but > I can't recall where. > Yes, it is in RHEL5.4. This is the kexec-tools part of changes, https://bugzilla.redhat.com/show_bug.cgi?id=xxxxxx This is the kernel part of changes, https://bugzilla.redhat.com/show_bug.cgi?id=xxxxxx Thanks, CAI Qian

16 years, 1 month

1
0
0 / 0

Re: [Crash-utility] crash without namelist?

by Dave Anderson

----- "urgrue" <urgrue(a)bulbous.org> wrote: > ----- "Jun Koi" <junkoi2004 gmail com> wrote: > >> But I think sometimes we only have the System.map file in hand, > >> without the namelist. Can we do anything with that? Because in many > >> cases only some restricted operations are enough to debug the crashed > >> dump. > > > >The basic design of the crash utility is that it is essentially > >a huge wrapper around the embedded gdb module, taking advantage > >of gdb's allowance of an alternative user interface. So during > > But is it really so that one can not get _any_ useful info from the dump > without the namelist? > Most of the time just the trace is all I need (i.e. that which is > displayed on the console in a panic). That info is not accessible > conveniently because a) the console can only show the last bits of it > and b) I'd rather autoreboot on a panic. > I know I can use netconsole, but it would just be more convenient if I > could read this from the core file easily. I understand, but that's not how the crash utility is designed. You basically want to roll your own utility that doesn't have all the gdb dependencies that crash has. If I'm not mistaken, wasn't there a proposed makedumpfile feature that would just pull the log buffer from a vmcore? Or is that something from the diskdumputils package? I know I've heard talk of it, but I can't recall where. Dave

16 years, 1 month

1
0
0 / 0

Re: [Crash-utility] crash without namelist?

by urgrue

----- "Jun Koi" <junkoi2004 gmail com> wrote: >> But I think sometimes we only have the System.map file in hand, >> without the namelist. Can we do anything with that? Because in many >> cases only some restricted operations are enough to debug the crashed >> dump. > >The basic design of the crash utility is that it is essentially >a huge wrapper around the embedded gdb module, taking advantage >of gdb's allowance of an alternative user interface. So during But is it really so that one can not get _any_ useful info from the dump without the namelist? Most of the time just the trace is all I need (i.e. that which is displayed on the console in a panic). That info is not accessible conveniently because a) the console can only show the last bits of it and b) I'd rather autoreboot on a panic. I know I can use netconsole, but it would just be more convenient if I could read this from the core file easily.

16 years, 1 month

1
0
0 / 0

Re: [PATCH 0/2] Display local variables & function parameters fromstack frames

by Sharyathi Nagesh

Dave Excuse me for the late response, My laptop HW crashed and had some difficulty accessing mails. Packaging and x86_64/x86 support is still work in progress, we are hitting some unwinding issues with x86_64 code and trying to fix them along with local.mk changes. More help on the packaging front / re-writing local.mk will be an added favor :) Dave Anderson wrote: > ----- "Sharyathi Nagesh" <sharyath(a)in.ibm.com> wrote: > >> Hi >> Mohan, Sachin and myself have implemented this feature in crash to >> display local variables and arguments from vmcore dumps. This feature >> introduces a new command 'local' in crash utility which provides >> interface for stack unwinding along with option to display local >> variables and arguments. This patch is based on crash utility >> crash-4.0-8.9. It has dependency on libdw/libelf libraries provided by >> elfutils package. >> This has been tested on dumps taken on ppc64 machine. We were able to >> unwind the stack as well as display local variables, arguments. It >> currently displays values for non-optimized variables only (this fallows >> gdb's convention) >> >> TODO Items: >> 1. Support on x86_64 and x86 need to be implemented/tested >> 2. Makefile need to be updated to help packaging this feature >> >> Regards >> Sharyathi Nagesh > > A couple suggestions -- move the get_netdump_arch() and get_regs_from_elf_notes() > prototypes to defs.h under the others listed for netdump.c. > > Then remove this from local.c: > > + #include <../netdump.h> > > By removing the netdump.h inclusion, you can build your package with just > the "defs.h" file like so: Sure that can be done > > # make -f local.mk TARGET=X86_64 > gcc -nostartfiles -shared -g -rdynamic -o local.so local.c unwind_dw.c -fPIC \ > -ldw -L ../../elfutils-0.137/libdw -I ../../elfutils-0.137/libdw \ > -I ../../elfutils-0.137/libelf/ -DX86_64 -Wall; > # > > Also, the TARGET_FLAGS setting you have in your local.mk doesn't do anything. > I see that you copied it from the sial.mk file, where it's used as a replacement > in sial.mk to replace the suggested "-D$(TARGET) $(TARGET_CFLAGS)" part of the > compile line. For x86_64 nothing is needed in TARGET_CFLAGS -- these are what > the supported arches need: Oops my apologies for overlooking this, yes we will correct this > #define TARGET_CFLAGS_X86 "TARGET_CFLAGS=-D_FILE_OFFSET_BITS=64" > #define TARGET_CFLAGS_ALPHA "TARGET_CFLAGS=" > #define TARGET_CFLAGS_PPC "TARGET_CFLAGS=-D_FILE_OFFSET_BITS=64" > #define TARGET_CFLAGS_IA64 "TARGET_CFLAGS=" > #define TARGET_CFLAGS_S390 "TARGET_CFLAGS=-D_FILE_OFFSET_BITS=64" > #define TARGET_CFLAGS_S390X "TARGET_CFLAGS=" > #define TARGET_CFLAGS_PPC64 "TARGET_CFLAGS=-m64" > #define TARGET_CFLAGS_X86_64 "TARGET_CFLAGS=" Ok we will include this, after verifying > > So I presume you do need the -m64 for ppc64, but I don't see how your > local.mk file would pick it up? I also don't understand where your extra > $ADD_CFLAGS is supposed to get set up? This is again copied from some other code and need to be removed > > For that matter, the additional -L and -I for the elfutils stuff you've added > seem to be unnecessary, just -ldw seems to be suffice: > > # make -f local2.mk TARGET=X86_64 > gcc -nostartfiles -shared -g -rdynamic -o local.so local.c unwind_dw.c -fPIC -ldw -DX86_64 -Wall; > # Yes you are right, along with that code has some requirements like libdw/libelf already installed and elfutils installed is version > 0.125, as this library has a bug that breaks the code, needs to be checked Thanks for the information, we will try our best to fix some issues we are facing and incorporate these changes Thanks Yeehaw

16 years, 1 month

1
0
0 / 0

Re: [Crash-utility] x86_64 bt

by Dave Anderson

----- "Shahar Luxenberg" <shahar(a)checkpoint.com> wrote: > Hi, > > > > I've bumped into two issues while using crash' bt command on x86_64 > architecture: > > 1. Incomplete disassembly of gdb: gdb's x/i command was unable to > detect the nopl machine instruction (opcode 0x0f) – output was > "(bad)". This resulted in an incorrect stack back trace since the > frame size couldn't be calculated correctly. I've done a quick test, > replacing some gdb files with a newer version taken from binutils > (i386-dis.c for example) which solved the problem. Is there a plan of > updating gdb version or part of it? No, not at this time. If the gdb code can be safely patched, and for it to recognize a new instruction, that sounds do-able. If you can pare down the requirement, please forward a patch. BTW, the wholesale replacement of the embedded gdb code is a massive undertaking. And since its primary purpose is for gathering structure data type information and text disassembly, a patch to the existing version is preferable. > 2. x86_64_get_framesize() is very naïve. It is bailing out once > the 'retq' instruction is seen. Is this issue going to be addressed? Well continuing on from that point would most likely end up calculating a framesize that is too large, so it's bailing out on the "short" side. Dave

16 years, 1 month

1
0
0 / 0

x86_64 bt

by Shahar Luxenberg

Hi, I've bumped into two issues while using crash' bt command on x86_64 architecture: 1. Incomplete disassembly of gdb: gdb's x/i command was unable to detect the nopl machine instruction (opcode 0x0f) - output was "(bad)". This resulted in an incorrect stack back trace since the frame size couldn't be calculated correctly. I've done a quick test, replacing some gdb files with a newer version taken from binutils (i386-dis.c for example) which solved the problem. Is there a plan of updating gdb version or part of it? 2. x86_64_get_framesize() is very naïve. It is bailing out once the 'retq' instruction is seen. Is this issue going to be addressed? Thanks, Shahar. Email secured by Check Point

16 years, 1 month

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility May 2009