August 2010 - Crash-utility - Crash Utility List Archives

by Bob Montgomery

I'm working on a dump of a system that did not have a PID 1. I don't think it's relevant to the crash itself, but it does cause crash get a seg fault. crash> ps | head PID PPID CPU TASK ST %MEM VSZ RSS COMM 0 0 0 ffffffff805144c0 RU 0.0 0 0 [swapper] 0 -1 1 ffff81012bc0a100 RU 0.0 0 0 [swapper] 2 -1 0 ffff81012bd3c040 IN 0.0 0 0 [migration/0] 3 -1 0 ffff81012bd3e7c0 RU 0.0 0 0 [ksoftirqd/0] 4 -1 0 ffff81012bd3e080 IN 0.0 0 0 [watchdog/0] 5 -1 1 ffff81012bd3f800 IN 0.0 0 0 [migration/1] 6 -1 1 ffff81012bd3f0c0 RU 0.0 0 0 [ksoftirqd/1] 7 -1 1 ffff81012bc0a840 IN 0.0 0 0 [watchdog/1] 8 -1 0 ffff81012af02880 IN 0.0 0 0 [events/0] crash> mount Segmentation fault (core dumped) In cmd_mount, this returns null and subsequent use causes the seg fault: 1156 1157 namespace_context = pid_to_context(1); I don't know if it was important to have the context of pid 1 for reporting mounts, or just any context, but this hack makes the problem go away, although not a very efficient way to find the lowest existing PID above 0. --- filesys.c.orig 2010-08-18 14:03:26.000000000 -0600 +++ filesys.c 2010-08-18 14:10:02.000000000 -0600 @@ -1153,8 +1153,12 @@ cmd_mount(void) ulong vfsmount = 0; int flags = 0; int save_next; + ulong pid; - namespace_context = pid_to_context(1); + /* find a context */ + pid = 1; + while ((namespace_context = pid_to_context(pid)) == NULL) + pid++; while ((c = getopt(argcnt, args, "ifn:")) != EOF) { switch(c) Bob Montgomery At HP

14 years, 11 months

1
0
0 / 0

Re: [Crash-utility] crash: invalid structure member offset

by Dave Anderson

----- "Reinoud Koornstra" <koornstra(a)hp.com> wrote: > > -----Original Message----- > > From: crash-utility-bounces(a)redhat.com [mailto:crash-utility- > > bounces(a)redhat.com] On Behalf Of Dave Anderson > > Sent: Thursday, August 12, 2010 12:18 PM > > To: Discussion list for crash utility usage, maintenance and > > development > > Subject: Re: [Crash-utility] crash: invalid structure member offset > > > > > > ----- "Reinoud Koornstra" <koornstra(a)hp.com> wrote: > > > > > Thanks, > > > > > > Using crash 5.0.6 worked nicely. > > > However, I can't really look at a lot because of a bad EIP code. > > > > > > [ 726.601381] 802.1Q VLAN Support v1.8 Ben Greear <greearb(a)candelatech.com> > > > [ 726.601384] All bugs added by David S. Miller <davem(a)redhat.com> > > > [ 726.646757] BUG: unable to handle kernel NULL pointer dereference at 00000000 > > > [ 726.732410] IP: [<00000000>] > > > [ 726.766933] *pdpt = 0000000000431001 *pde = 0000000000000000 > > > [ 726.766937] Oops: 0010 [#1] SMP > > > [ 726.790844] Modules linked in: 8021q iptable_filter ip_tables > > > x_tables ip_gre af_packet i2c_dev i2c_qs i2c_algo_bit i2c_core garp > > > stp llc ixgbe inet_lro psmouse serio_raw intel_agp shpchp iTCO_wdt > > > pci_hotplug iTCO_vendor_support agpgart ext3 jbd mbcache sd_mod > > > crc_t10dif sg ata_piix ata_generic ahci libata scsi_mod ehci_hcd > > > uhci_hcd usbcore [last unloaded: 8021q] > > > [ 726.790844] > > > [ 726.790844] Pid: 4, comm: ksoftirqd/0 Tainted: P (2.6.27) > > > [ 726.790844] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0 > > > [ 726.790844] EIP is at 0x0 > > > [ 726.790844] EAX: e7f4c498 EBX: 00000000 ECX: 77470000 EDX: e7f4c498 > > > [ 726.790844] ESI: 4bd1d300 EDI: 00000007 EBP: f784df88 ESP: f784df78 > > > [ 726.790844] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > > > [ 726.790844] Process ksoftirqd/0 (pid: 4, ti=f784c000 task=f783a5b0 task.ti=f784c000) > > > [ 726.790844] Stack: 40168080 00000001 403daaa0 4042c500 f784df90 401681bf f784dfb0 4012fe92 > > > [ 726.790844] 0000000a 00000000 40429340 00000246 00000000 40130120 f784dfbc 4012ff55 > > > [ 726.790844] 4042c500 f784dfcc 40130182 fffffffc 00000000 > > f784dfe0 4013e707 4013e6c0 > > > [ 726.790844] Call Trace: > > > [ 726.790844] [<40168080>] ? __rcu_process_callbacks+0x70/0x190 > > > [ 726.790844] [<401681bf>] ? rcu_process_callbacks+0x1f/0x40 > > > [ 726.790844] [<4012fe92>] ? __do_softirq+0x82/0x100 > > > [ 726.790844] [<40130120>] ? ksoftirqd+0x0/0xe0 > > > [ 726.790844] [<4012ff55>] ? do_softirq+0x45/0x50 > > > [ 726.790844] [<40130182>] ? ksoftirqd+0x62/0xe0 > > > [ 726.790844] [<4013e707>] ? kthread+0x47/0x80 > > > [ 726.790844] [<4013e6c0>] ? kthread+0x0/0x80 > > > [ 726.790844] [<4010494f>] ? kernel_thread_helper+0x7/0x10 > > > [ 726.790844] ======================= > > > [ 726.790844] Code: Bad EIP value. > > > [ 726.790844] EIP: [<00000000>] 0x0 SS:ESP 0068:f784df78 > > > > > > So now I can't figure out the piece of code where this dereferencing > > > occurred. :( > > > > Yeah, I don't know why the exception frame didn't displayed below in the > > bt output, but I think it may have been confusion due the kernel text > > region starting a 4000000 (instead of the typical 3G/1G user/kernel virtual > > address split). I'm guessing your kernel is configured as 1G/3G user-kernel? > > That's right, the kernel is configured as 1G/3G user/kernel. > > > (I've never seen that before...) > > It's a weird config indeed. I'll try rewriting some stuff so it > consumes way less memory so a normal kernel/user split can be used. > Never the less, why the pointer became null remains unsolved for the moment. :-) > Would the user/kernel split also be an issue in 64 bit? I wouldn't expect you'd ever need to modify the user-kernel split in x86_64, if that's what you're asking? The 64-bit virtual address range is so vast that it's hard to conceive of a need to do anything like that. Dave

14 years, 11 months

1
0
0 / 0

Re: [Crash-utility] crash: invalid structure member offset

by Dave Anderson

----- "Reinoud Koornstra" <koornstra(a)hp.com> wrote: > Thanks, > > Using crash 5.0.6 worked nicely. > However, I can't really look at a lot because of a bad EIP code. > > [ 726.601381] 802.1Q VLAN Support v1.8 Ben Greear <greearb(a)candelatech.com> > [ 726.601384] All bugs added by David S. Miller <davem(a)redhat.com> > [ 726.646757] BUG: unable to handle kernel NULL pointer dereference at 00000000 > [ 726.732410] IP: [<00000000>] > [ 726.766933] *pdpt = 0000000000431001 *pde = 0000000000000000 > [ 726.766937] Oops: 0010 [#1] SMP > [ 726.790844] Modules linked in: 8021q iptable_filter ip_tables > x_tables ip_gre af_packet i2c_dev i2c_qs i2c_algo_bit i2c_core garp > stp llc ixgbe inet_lro psmouse serio_raw intel_agp shpchp iTCO_wdt > pci_hotplug iTCO_vendor_support agpgart ext3 jbd mbcache sd_mod > crc_t10dif sg ata_piix ata_generic ahci libata scsi_mod ehci_hcd > uhci_hcd usbcore [last unloaded: 8021q] > [ 726.790844] > [ 726.790844] Pid: 4, comm: ksoftirqd/0 Tainted: P (2.6.27) > [ 726.790844] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0 > [ 726.790844] EIP is at 0x0 > [ 726.790844] EAX: e7f4c498 EBX: 00000000 ECX: 77470000 EDX: e7f4c498 > [ 726.790844] ESI: 4bd1d300 EDI: 00000007 EBP: f784df88 ESP: f784df78 > [ 726.790844] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > [ 726.790844] Process ksoftirqd/0 (pid: 4, ti=f784c000 task=f783a5b0 task.ti=f784c000) > [ 726.790844] Stack: 40168080 00000001 403daaa0 4042c500 f784df90 401681bf f784dfb0 4012fe92 > [ 726.790844] 0000000a 00000000 40429340 00000246 00000000 40130120 f784dfbc 4012ff55 > [ 726.790844] 4042c500 f784dfcc 40130182 fffffffc 00000000 f784dfe0 4013e707 4013e6c0 > [ 726.790844] Call Trace: > [ 726.790844] [<40168080>] ? __rcu_process_callbacks+0x70/0x190 > [ 726.790844] [<401681bf>] ? rcu_process_callbacks+0x1f/0x40 > [ 726.790844] [<4012fe92>] ? __do_softirq+0x82/0x100 > [ 726.790844] [<40130120>] ? ksoftirqd+0x0/0xe0 > [ 726.790844] [<4012ff55>] ? do_softirq+0x45/0x50 > [ 726.790844] [<40130182>] ? ksoftirqd+0x62/0xe0 > [ 726.790844] [<4013e707>] ? kthread+0x47/0x80 > [ 726.790844] [<4013e6c0>] ? kthread+0x0/0x80 > [ 726.790844] [<4010494f>] ? kernel_thread_helper+0x7/0x10 > [ 726.790844] ======================= > [ 726.790844] Code: Bad EIP value. > [ 726.790844] EIP: [<00000000>] 0x0 SS:ESP 0068:f784df78 > > So now I can't figure out the piece of code where this dereferencing > occurred. :( Yeah, I don't know why the exception frame didn't displayed below in the bt output, but I think it may have been confusion due the the kernel text region starting a 4000000 (instead of the typical 3G/1G user/kernel virtual address split). I'm guessing your kernel is configured as 1G/3G user-kernel? (I've never seen that before...) Anyway, somehow the EIP got zeroed out, and it took a fault trying to handle that. That can happen if a kernel function corrupts its own stack by incorrectly writing to its own local stack variables, and in so doing writes a zero into the return address saved on the stack. Then when the function returns, that zero is loaded into the EIP, and you'd see something like the above. The exception frame in the log shows that the ESP is f784df78, and looking at the trace data below, it looks like rcu_process_callbacks() may have ended up calling something that lead to the EIP corruption. Just a guess though... Dave > > crash> bt > PID: 4 TASK: f783a5b0 CPU: 0 COMMAND: "ksoftirqd/0" > #0 [f784de88] crash_kexec at 401534a8 > #1 [f784df28] __slab_free at 4019677f > #2 [f784df8c] rcu_process_callbacks at 401681ba > #3 [f784df94] __do_softirq at 4012fe90 > #4 [f784dfb4] do_softirq at 4012ff50 > #5 [f784dfd0] kthread at 4013e705 > #6 [f784dfe4] kernel_thread_helper at 4010494d > > Thanks, > > Reinoud. > > > > -----Original Message----- > > From: crash-utility-bounces(a)redhat.com [mailto:crash-utility- > > bounces(a)redhat.com] On Behalf Of Dave Anderson > > Sent: Thursday, August 12, 2010 6:14 AM > > To: Discussion list for crash utility usage, maintenance and > > development > > Subject: Re: [Crash-utility] crash: invalid structure member offset > > > > > > ----- "Reinoud Koornstra" <koornstra(a)hp.com> wrote: > > > > > Hi Everyone, > > > > > > I am trying to read a core file into crash, but I've got bad luck > as > > you can see below. > > > Is core file corrupt? It is a vmcore file from a 32 bits kernel > that > > > was compiled with PAE, could that have corrupted things? > > > Any hints here? > > > Thanks, > > > > > > Reinoud. > > > > > > $ crash System.map-2.6.27 ./vmlinux-2.6.27 ./vmcore > > > > > > crash 4.0-3.7 > > > > I don't know if the vmcore is corrupt, but PAE wouldn't be an > issue. > > > > However, you are running a version of crash that was released > almost > > 4 years ago (13-Oct-2006) against a two-year-old kernel that was > > released 15-Oct-2008. That's pretty much a guarantee of failure. > > > > Try updating to version 5.0.6 and see what happens. > > > > And BTW, if the vmlinux file is the exact same kernel as the > > one that generated the vmcore file, you don't need a System.map > > argument. > > > > Dave > > > > > > > > 15-Oct-2008 > > > > > Copyright 2002, 2003, 2004, 2005, 2006 Red Hat, Inc. > > > Copyright 2004, 2005, 2006 IBM Corporation > > > Copyright 1999-2006 Hewlett-Packard Co > > > Copyright 2005 Fujitsu Limited > > > Copyright 2005 NEC Corporation > > > Copyright 1999, 2002 Silicon Graphics, Inc. > > > Copyright 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. > > > This program is free software, covered by the GNU General Public > > License, > > > and you are welcome to change it and/or distribute copies of it > under > > > certain conditions. Enter "help copying" to see the conditions. > > > This program has absolutely no warranty. Enter "help warranty" > for > > > details. > > > > > > GNU gdb 6.1 > > > Copyright 2004 Free Software Foundation, Inc. > > > GDB is free software, covered by the GNU General Public License, > and > > you are > > > welcome to change it and/or distribute copies of it under certain > > conditions. > > > Type "show copying" to see the conditions. > > > There is absolutely no warranty for GDB. Type "show warranty" > for > > details. > > > This GDB was configured as "i686-pc-linux-gnu"... > > > > > > please wait... (gathering kmem slab cache data) > > > > > > crash: invalid structure member offset: kmem_cache_s_c_num > > > FILE: memory.c LINE: 6891 FUNCTION: kmem_cache_init() > > > > > > [/usr/bin/crash] error trace: 80827a9 => 8095398 => 80aa7ef => > > > 8131e88 > > > /usr/bin/nm: /usr/bin/crash: no symbols > > > /usr/bin/nm: /usr/bin/crash: no symbols > > > /usr/bin/nm: /usr/bin/crash: no symbols > > > /usr/bin/nm: /usr/bin/crash: no symbols > > > > > > WARNING: Because this kernel was compiled with gcc version 4.1.2, > > certain > > > commands or command options may fail unless crash is > invoked > > with > > > the "--readnow" command line option. > > > > -- > > Crash-utility mailing list > > Crash-utility(a)redhat.com > > https://www.redhat.com/mailman/listinfo/crash-utility > > -- > Crash-utility mailing list > Crash-utility(a)redhat.com > https://www.redhat.com/mailman/listinfo/crash-utility

14 years, 11 months

2
1
0 / 0

Re: [Crash-utility] crash: invalid structure member offset

by Dave Anderson

----- "Reinoud Koornstra" <koornstra(a)hp.com> wrote: > Hi Everyone, > > I am trying to read a core file into crash, but I've got bad luck as you can see below. > Is core file corrupt? It is a vmcore file from a 32 bits kernel that > was compiled with PAE, could that have corrupted things? > Any hints here? > Thanks, > > Reinoud. > > $ crash System.map-2.6.27 ./vmlinux-2.6.27 ./vmcore > > crash 4.0-3.7 I don't know if the vmcore is corrupt, but PAE wouldn't be an issue. However, you are running a version of crash that was released almost 4 years ago (13-Oct-2006) against a two-year-old kernel that was released 15-Oct-2008. That's pretty much a guarantee of failure. Try updating to version 5.0.6 and see what happens. And BTW, if the vmlinux file is the exact same kernel as the one that generated the vmcore file, you don't need a System.map argument. Dave 15-Oct-2008 > Copyright 2002, 2003, 2004, 2005, 2006 Red Hat, Inc. > Copyright 2004, 2005, 2006 IBM Corporation > Copyright 1999-2006 Hewlett-Packard Co > Copyright 2005 Fujitsu Limited > Copyright 2005 NEC Corporation > Copyright 1999, 2002 Silicon Graphics, Inc. > Copyright 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. > This program is free software, covered by the GNU General Public License, > and you are welcome to change it and/or distribute copies of it under > certain conditions. Enter "help copying" to see the conditions. > This program has absolutely no warranty. Enter "help warranty" for > details. > > GNU gdb 6.1 > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i686-pc-linux-gnu"... > > please wait... (gathering kmem slab cache data) > > crash: invalid structure member offset: kmem_cache_s_c_num > FILE: memory.c LINE: 6891 FUNCTION: kmem_cache_init() > > [/usr/bin/crash] error trace: 80827a9 => 8095398 => 80aa7ef => > 8131e88 > /usr/bin/nm: /usr/bin/crash: no symbols > /usr/bin/nm: /usr/bin/crash: no symbols > /usr/bin/nm: /usr/bin/crash: no symbols > /usr/bin/nm: /usr/bin/crash: no symbols > > WARNING: Because this kernel was compiled with gcc version 4.1.2, certain > commands or command options may fail unless crash is invoked with > the "--readnow" command line option.

14 years, 11 months

2
1
0 / 0

Re: [Crash-utility] [RFC] gcore subcommand: a process coredump feature

by anderson＠prospeed.net

> > Hello, > > For some weeks I've developed gcore subcommand for crash utility which > provides process coredump feature for crash kernel dump, strongly > demanded by users who want to investigate user-space applications > contained in kernel crash dump. > > I've now finished making a prototype version of gcore and found out > what are the issues to be addressed intensely. Could you give me any > comments and suggestions on this work? Hello Daisuke, As I mentioned in my previous email re: cpu numbering, I am currently on vacation, and cannot spend much time looking at this issue until I get back on August 9th. However, I think that this could be a useful feature, and I did take a quick look at how it could be done several months ago when it was brought up on this mailing list. However, as you discovered, I also noted that the user-space core dump code in the kernel has undergone significant changes over time, and so the implemetation by the crash utility would have to adapt to the kernel data structures used by the various kernel versions. And because of that, I don't want to put it into the base crash binary, but rather it should be maintained as one or more extension modules, which can be located in the "extensions" subdirectory in the crash source package, as well as stored in the "extensions" web page link from the crash "people" web site. It is quite simple to re-adapt your patch as an extension module. Check the "snap.c" and "snap.mk" files in the extensions subdirectory as templates for your "gcore" command. As to the other questions below, I will get back to you after August 9th. Thanks, Dave > Motivation > ========== > > It's a relatively familiar technique that in a cluster system a > currently running node triggers crash kernel dump mechanism when > detecting a kind of a critical error in order for the running, error > detecting server to cease as soon as possible. Concequently, the > residual crash kernel dump contains a process image for the erroneous > user application. At the case, developpers are interested in user > space, rather than kernel space. > > There's also a merit of gcore that it allows us to use several > userland debugging tools, such as GDB and binutils, in order to > analyze user space memory. > > > Current Status > ============== > > I confirm the prototype version runs on the following configuration: > > Linux Kernel Version: 2.6.34 > Supporting Architecture: x86_64 > Crash Version: 5.0.5 > Dump Format: ELF > > I'm planning to widen a range of support as follows: > > Linux Kernel Version: Any > Supporting Architecture: i386, x86_64 and IA64 > Dump Format: Any > > > Issues > ====== > > Currently, I have issues below. > > 1) Retrieval of appropriate register values > > The prototype version retrieves register values from a _wrong_ > location: a top of the kernel stack, into which register values are > saved at any preemption context switch. On the other hand, the > register values that should be included here are the ones saved at > user-to-kernel context switch on any interrupt event. > > I've yet to implement this. Specifically, I need to do the following > task from now. > > (1) list all entries from user-space to kernel-space execution path. > > (2) divide the entries according to where and how the register > values from user-space context are saved. > > (3) compose a program that retrieves the saved register values from > appropriate locations that is traced by means of (1) and (2). > > Ideally, I think it's best if crash library provides any means of > retrieving this kind of register values, that is, ones saved on > various stack frames. Is there such a plan to do? > > > 2) Getting a signal number for a task which was during core dump > process at kernel crash > > If a target task is halfway of core dump process, it's better to know > a signal number in order to know why the task was about to be core > dumped. > > Unfortunately, I have no choice but backtrace the kernel stack to > retrieve a signal number saved there as an argument of, for example, > do_coredump(). > > > 3) Kernel version compatibility > > crash's policy is to support all kernel versions by the latest crash > package. On the other hand, the prototype is based on kernel 2.6.34. > This means more kernel versions need to be supported. > > Well, the question is: to what versions do I need to really test in > addition to the latest upstream kernel? I think it's practically > enough to support RHEL4, RHEL5 and RHEL6. > > > Build Instruction > ================= > > $ tar xf crash-5.0.5.tar.gz > $ cd crash-5.0.5/ > $ patch -p 1 < gcore.patch > $ make > > > Usage > ===== > > Use help subcommand of crash utility as ``help gcore''. > > > Attached File > ============= > > * gcore.patch > > A patch implementing gcore subcommand for crash-5.0.5. > > The diffstat output is as follows. > > $ diffstat gcore.patch > Makefile | 10 +- > defs.h | 15 + > gcore.c | 1858 > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > gcore.h | 639 ++++++++++++++++++++ > global_data.c | 3 + > help.c | 28 + > netdump.c | 27 + > tools.c | 37 ++ > 8 files changed, 2615 insertions(+), 2 deletions(-) > > -- > HATAYAMA Daisuke > d.hatayama(a)jp.fujitsu.com >

14 years, 11 months

5
5
0 / 0

[crash-utility] [lkcd-devel] Patch to add LKCD vmcore validation feature

by Vitaly Kuzmichev

Hello, Attached is the patch to add separate tool for validating LKCD netdumps and blockdumps. We are planning to add this feature in our fork of crash-3.10. Our customers requested this feature, but we have found that the 'crash' does not print any warnings when someone tries to load incomplete vmcore. They need a simple way to verify if core file generated from LKCD is complete. -- Best regards, Vitaly Kuzmichev, Software Engineer, MontaVista Software, LLC.

14 years, 11 months

1
1
0 / 0

Re: [Crash-utility] crash fails to start with RHEL4/ia64 vmcore

by Dave Anderson

----- "Mark Goodwin" <mgoodwin(a)redhat.com> wrote: > [re-send now that I'm subscribed] > > Any help would be appreciated - I have an IA64 vmcore > from a RHEL4 system, and crash doesn't like it : > > # file vmcore > vmcore: ELF 64-bit LSB core file IA-64, version 1 (SYSV), SVR4-style, from 'vmlinux' > > # crash usr__2.6.9-89.EL/lib/debug/lib/modules/2.6.9-89.EL/vmlinux vmcore > > crash 5.0.6 > .. copyright stuff omitted .. > GNU gdb (GDB) 7.0 > ... > WARNING: invalid linux_banner pointer: 8a653075c6c36178 > crash: usr__2.6.9-89.EL/lib/debug/lib/modules/2.6.9-89.EL/vmlinux and vmcore do not match! > Usage: ... > > I've tried hacking the 5.0.6 src to avoid the linux_banner check, > but it just breaks further down the track : > > WARNING: boot command line memory limit: 59ab0c85d99b2fa9 > crash: ia64_VTOP(b929ee8856a6b891): unexpected region 5 address > > So I'm stuck. The customer assures me the vmcore is from their 2.6.9-89.EL > ia64 system. At the very least it would help if I could extract some or all > of the dmesg buffer. It appears that there's simply a mismatch between the vmlinux and vmcore file. If it's not a compressed diskdump vmcore, you can verify that easily enough by just doing: $ strings vmlinux | grep "Linux version" $ strings vmcore | grep "Linux version" and comparing the output. > Dave, the vmcore is on an internal Red Hat RHEL4/ia64 system - I can > give you access details off-list if you think it will help. Sure -- if the strings output doesn't give you a clue, please point me to the files. Thanks, Dave

14 years, 11 months

1
0
0 / 0

crash fails to start with RHEL4/ia64 vmcore

by Mark Goodwin

[re-send now that I'm subscribed] Any help would be appreciated - I have an IA64 vmcore from a RHEL4 system, and crash doesn't like it : # file vmcore vmcore: ELF 64-bit LSB core file IA-64, version 1 (SYSV), SVR4-style, from 'vmlinux' # crash usr__2.6.9-89.EL/lib/debug/lib/modules/2.6.9-89.EL/vmlinux vmcore crash 5.0.6 .. copyright stuff omitted .. GNU gdb (GDB) 7.0 ... WARNING: invalid linux_banner pointer: 8a653075c6c36178 crash: usr__2.6.9-89.EL/lib/debug/lib/modules/2.6.9-89.EL/vmlinux and vmcore do not match! Usage: ... I've tried hacking the 5.0.6 src to avoid the linux_banner check, but it just breaks further down the track : WARNING: boot command line memory limit: 59ab0c85d99b2fa9 crash: ia64_VTOP(b929ee8856a6b891): unexpected region 5 address So I'm stuck. The customer assures me the vmcore is from their 2.6.9-89.EL ia64 system. At the very least it would help if I could extract some or all of the dmesg buffer. Dave, the vmcore is on an internal Red Hat RHEL4/ia64 system - I can give you access details off-list if you think it will help. Thanks -- Mark Goodwin

14 years, 11 months

1
0
0 / 0

Re: [Crash-utility] Question on online/present/possible CPUS

by anderson＠prospeed.net

> > Hi all, > > before making a larger cleanup, I want to ask here for your opinion. It > seems that there is quite a bit of confusion about the meaning of CPU > count printed out by the crash utility. > > 1. Number of CPUs > > Some people think that crash should always output the number of CPUs in > the system (ie. a quad-core server should always output 'CPUS: 4'), > while other people think that only online CPUs should be counted. > > 2. CPU numbering > > For example, if there are 4 CPUs in the system, but some of them are > taken offline (e.g. CPU 1 and CPU 3), _and_ crash output the number of > online CPUs, it would print out 'CPUS: 2'. It's not easy to find out > that valid CPU numbers are 0 and 2 in this case. Hi Petr, For all but ppc64, the number shown by the initial banner and the "sys" command is essentially "the-highest-cpu-number-plus-one". For ppc64 (as requested and implemented by the IBM/ppc64 maintainers), it shows the number of online cpus. There's reasons for doing it either of the two ways, but I'm on vacation now, and you can research the list archives for the various arguments for-and-against doing it either way. Check the changelog.html for when it was changed for ppc64, and then cross-reference the revision date with the list archives. > 3. Examining offline CPU > > Sometimes, it may be useful to examine the state of an offline CPU. Now, > I know that the saved state is most likely stale, but it can be useful > in some cases (e.g. a crash after dropping to kdb). The crash utility > currently refuses to select an offline CPU with 'set -c #'. Are there > any concerns about allowing it? I tend to agree with you, but the only thing that's useful and available from an offline cpu is the swapper task for that cpu and the runqueue for that cpu. And both of those entities are readily accessible if you really need them. Although I don't know anything about kdb status, so maybe there's something of per-cpu interest, but I don't know why it would be necessary to "set" that cpu? In any case, like I said before, I'm just temporarily online while on vacation, and will be back to work on the 9th. Thanks, Dave

14 years, 11 months

1
0
0 / 0

Question on online/present/possible CPUs

by Petr Tesarik

Hi all, before making a larger cleanup, I want to ask here for your opinion. It seems that there is quite a bit of confusion about the meaning of CPU count printed out by the crash utility. 1. Number of CPUs Some people think that crash should always output the number of CPUs in the system (ie. a quad-core server should always output 'CPUS: 4'), while other people think that only online CPUs should be counted. 2. CPU numbering For example, if there are 4 CPUs in the system, but some of them are taken offline (e.g. CPU 1 and CPU 3), _and_ crash output the number of online CPUs, it would print out 'CPUS: 2'. It's not easy to find out that valid CPU numbers are 0 and 2 in this case. 3. Examining offline CPU Sometimes, it may be useful to examine the state of an offline CPU. Now, I know that the saved state is most likely stale, but it can be useful in some cases (e.g. a crash after dropping to kdb). The crash utility currently refuses to select an offline CPU with 'set -c #'. Are there any concerns about allowing it? Regards, Petr Tesarik

14 years, 11 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility August 2010