netdump starting problem
by Anirudh Srinivasan
hello friends,
I was setting up netdump server in my workplace. I followed the following
procedure:
Server Configuration:
1.
Verify that the netdump server is installed: rpm -q netdump-server. If it
is not installed, install it by running the command: up2date
netdump-server.
2.
After the netdump server package is installed change the password for the
"netdump" user to something that you know: passwd netdump
3.
Enable the netdump server: chkconfig netdump-server on
4.
Start the netdump server: service netdump-server start
Client Configuration:
1.
Verify that the netdump client is installed: rpm -q netdump. If it is not
installed, install it by running the command: up2date netdump.
2.
Edit /etc/sysconfig/netdump and add the following line:
NETDUMPADDR=192.168.0.5
**192.168.0.5 should be changed to the ip address of the netdump server.
3.
Enter the following command and give the netdump password when
prompted: service
netdump propagate
4.
Enable the netdump client: chkconfig netdump on
5. Start the netdump client: service netdump start
Now after doing this i get the following message:
# service netdump start
netdump: cannot arp <ipaddress>
netdump: cannot find <ipaddress>in arp cache
netdump: can't resolve <ipaddress> MAC address
netdump server address resolution [FAILED]
What could be the reason for this ? How could i solve this?
Thanks
Anirudh Srinivasan
15 years, 6 months
live crash(4.0-5.0.3) invocation fails on rhel5
by Nipul Gandhi
Hi all -
What am I doing wrong here ?
[root@wal-rhel5-04 kern]# uname -a
Linux wal-rhel5-04 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
Using the installed by this debuginfo RPM:
kernel-debuginfo-common-2.6.18-92.el5
kernel-debug-debuginfo-2.6.18-92.el5
[root@wal-rhel5-04 kern]# crash /usr/lib/debug/lib/modules/2.6.18-92.el5debug/vmlinux
:
:
WARNING: /usr/lib/debug/lib/modules/2.6.18-92.el5debug/vmlinux
and /proc/version do not match!
WARNING: /proc/version indicates kernel version: 2.6.18-92.el5
crash: please use the vmlinux file for that kernel version, or try using
the System.map for that kernel version as an additional argument.
[root@wal-rhel5-04 tmp]# cat /proc/version
Linux version 2.6.18-92.el5 (brewbuilder(a)ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Apr 29 13:16:15 EDT 2008
I tried using the System.Map as well as argument.....but then it segfaulted.
# crash /usr/lib/debug/lib/modules/2.6.18-92.el5debug/vmlinux /boot/System.map-2.6.18-92.el5
Segmentation fault (core dumped)
Thanks in advance for any help.
-Nipul
15 years, 6 months
Re: [Crash-utility] Question about timestampt output for "sys"
by Dave Anderson
----- "James Washer" <washer(a)trlp.com> wrote:
> The time is aware of MY timezone (easily tested).. but I'd still not
> sure if is the time of the panic... or some later time
>
> On Mon, 2009-03-30 at 12:08 -0700, James Washer wrote:
> > If I run 'sys', I see timestamps such as
> > DATE: Thu Mar 26 08:53:13 2009
> >
> > What "time" is this.. the time the panic occurred? The time the dump was
> > "collected"? Is it Zulu timeszone, is it my (the crash investigators)
> > time zone, is it the timezone of the system that crashed?
It's a ctime() translation of the contents of the kernel's "xtime" timespec
structure. So running on a live system, you can see it change.
On a dumpfile, that's a good question, because thinking about it, it may have
slightly different meanings depending upon the dumpfile-creation mechanism used.
So, for example, on a netdump or diskdump it's whatever was last there when the
kernel memory containing the data structure was copied to disk or over the
network. With a kdump, it would still be getting bumped up until the point
where the kernel transitions/kexec's into the secondary kernel, right?
Anyway, it's *somewhere* around the time of the panic...
Dave
15 years, 6 months
Question about timestampt output for "sys"
by James Washer
If I run 'sys', I see timestamps such as
DATE: Thu Mar 26 08:53:13 2009
What "time" is this.. the time the panic occurred? The time the dump was
"collected"? Is it Zulu timeszone, is it my (the crash investigators)
time zone, is it the timezone of the system that crashed?
Thanks
- jim
15 years, 6 months
crash version 4.0-8.8 is available
by Dave Anderson
- If a live kernel crash session fails during initialization due to
read errors, and it appears to be because the running kernel was
configured with CONFIG_STRICT_DEVMEM, display this warning message:
"crash: This kernel may be configured with CONFIG_STRICT_DEVMEM,
which renders /dev/mem unusable as a live memory source."
(anderson(a)redhat.com)
- Fix for the "bt" command to prevent a segmentation violation seen
with an x86_64 Egenera/LKCD dumpfile where the starting stack hooks
for the active tasks in the dumpfile header were nonsensical.
(anderson(a)redhat.com)
- Fix for the chronological display of the kernel printk buffer data
by the "log" output if the administrator has cleared the buffer
with syslog() or klogctl(). (oomichi(a)mxs.nes.nec.co.jp)
- Change the message displayed when supplying a non-process stack
address as an argument to "bt -S". Because the supplied address
is typically valid, such as a hard or soft IRQ stack address,
the message will indicate "non-process address" instead of
"invalid stack address". (anderson(a)redhat.com)
- The crash-<release>.src.rpm will create an additional binary
crash-extensions-<release>.rpm file containing the sial.so and
dminfo.so extension modules. The modules will be installed in the
/usr/lib[64]/crash/extensions directory.
(holzheu(a)linux.vnet.ibm.com, anderson(a)redhat.com)
- If a shared-object filename passed to the "extend" command is not
expressed with a fully-qualified pathname, the following directories
will be searched in the order shown, and the first instance of the
file that is found will be selected:
1. the current working directory
2. the directory specified in the CRASH_EXTENSIONS shell
environment variable
3. /usr/lib64/crash/extensions (64-bit architectures)
4. /usr/lib/crash/extensions
The same rules will be applied when unloading shared object files
with "extend -u <shared-object>". Without the patch, only files
in the current directory or those specified with a fully-qualified
pathname were accepted. (anderson(a)redhat.com)
- Changed the manner in which the "bt" command determines which PID 0
swapper task was interrupted by an ia64 INIT or MCA exception.
There is an existing ia64 INIT/MCA handler bug which incorrectly
writes the pseudo task's command name in its comm[] name string
such that the cpu number may not be part of the string. If that
happens without this patch, the "bt" command fails to make the link
back to the interrupted task, and displays the error message:
"bt: unwind: failed to locate return link (ip=0x0)!"
(anderson(a)redhat.com)
- Removed an unused initialized variable in get_task_mem_usage().
(junkoi2004(a)gmail.com)
- Added a debug-level 8 statement in readmem() that will display the
current input address and its translated physical address under the
existing debug-level 4 "<readmem: ...>" debug line, put in place to
aid in debugging read and/or seek errors.
(anderson(a)redhat.com)
Download from: http://people.redhat.com/anderson
15 years, 6 months
Fwd: crash seek error
by Dave Anderson
----- Forwarded Message -----
From: "Dharmosoth Seetharam" <seetharam_21(a)yahoo.com>
To: "Dave Anderson" <anderson(a)redhat.com>
Sent: Wednesday, March 11, 2009 2:12:01 PM GMT -05:00 US/Canada Eastern
Subject: Re: crash seek error
Hi Dave,
I have compiled the latest crash tool and tried with the dump file, it looks good.
thanks for your quick suggestion.
Sure i will also include mailing list.
thanks a lot.
regards,
Seetharam
15 years, 7 months
Re: [Crash-utility] Re: crash seek error
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> ----- "Dharmosoth Seetharam" <seetharam_21(a)yahoo.com> wrote:
> > dump_header:
> > dh_magic_number: 618f23ed (DUMP_MAGIC_NUMBER)
> > dh_version: 8 (LKCD_DUMP_V8)
> > dh_header_size: 734
> > dh_dump_level: f
> > (DUMP_LEVEL_HEADER|DUMP_LEVEL_KERN|DUMP_LEVEL_USED|DUMP_LEVEL_ALL)
> > dh_page_size: 4096
> > dh_memory_size: 524153
> > dh_memory_start: c0000000
> > dh_memory_end: 618f23ed
> > dh_num_pages: 524153
> > dh_panic_string: Compulsory dump(stat of mkexec was set as 2).
> > dh_time: Tue Mar 10 17:48:00 2009
> > dh_utsname_sysname: Linux
> > dh_utsname_nodename: Assam
> > dh_utsname_release: 2.6.12-5MKEXEC
> > dh_utsname_version: #7 SMP Thu Mar 5 15:25:22 IST 2009
> > dh_utsname_machine: i686
> > dh_utsname_domainname: (none)
> > dh_current_task: efc4a020
> > dh_dump_compress: 0 (DUMP_COMPRESS_NONE)
> > dh_dump_flags: 80000000 ()
> > dh_dump_device: 0
> > unknown page flag in dump: 2de
> > found DUMP_DH_END
> > <readmem: 8015564b, KVADDR, "x86_omit_frame_pointer", 8, (ROE), 7fbbe228>
> > crash: seek error: kernel virtual address: 8015564b type: "x86_omit_frame_point er"
> > <readmem: 804b1210, KVADDR, "xtime", 8, (FOE), 834d234>
> > crash: seek error: kernel virtual address: 804b1210 type: "xtime"
> > [root@Assam ~]#
> >
> > can you please help me in this.
One other thing to look at...
> > dh_memory_start: c0000000
The failing kernel virtual addresses are 8015564b and 804b1210, so apparently
you're running a kernel configured with a 2G/2G split? I'm not sure
whether the crash utility even works with that configuration? Crash does
support the old RHEL4 "hugemem" 4G/4G kernels, but I've never worked with
a 2G/2G kernel. In any case, it may work by dumb luck -- to be sure, first
try to run crash on the live system.
Anyway, even though the dump header advertises a kernel configured with
the traditional 3G/1G split (with kernel memory starting at c000000),
that "dh_memory_start" field is not used by the crash utility.
Dave
15 years, 7 months
Re: crash seek error
by Dave Anderson
----- "Dharmosoth Seetharam" <seetharam_21(a)yahoo.com> wrote:
> Hi,
>
> I have configured the linux kernel 2.6.12 to support the kernel crash dump using the
> "mini kernel dump" method.
Sorry, but I have no clue what the "mini kernel dump" method is.
Although from the output below, it looks to be an LKCD derivative.
>
> I have few questions please help me.
>
> details:
> kernel : linux 2.6.12
> arch : i386
> distr: centOS
> System RAM : 8G
>
> 1) While writing dump to block device its got hung after writing 4GB
>
> 2) I have reduced my SYSTEM RAM to 2G and tried it dumped 2G to block device
>
> But, crash tool unable to read it.
> following is the error
>
> ----
> [root@Assam ~]# crash -d7 /root/linux-2.6.12/vmlinux
> /scratch/dump/2009031017583 1/lkcd_dump
> crash 4.0-2.15
Your crash version is remarkably old -- 3+ years old -- and it's
always worth your while to update to the latest version.
> Copyright (C) 2002, 2003, 2004, 2005 Red Hat, Inc.
> Copyright (C) 2004, 2005 IBM Corporation
> Copyright (C) 1999-2005 Hewlett-Packard Co
> Copyright (C) 2005 Fujitsu Limited
> Copyright (C) 2005 NEC Corporation
> Copyright (C) 1999, 2002 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for details.
> crash: diskdump: dump does not have panic dump header
> dump_header:
> dh_magic_number: 618f23ed (DUMP_MAGIC_NUMBER)
> dh_version: 8 (LKCD_DUMP_V8)
> dh_header_size: 734
> dh_dump_level: f
> (DUMP_LEVEL_HEADER|DUMP_LEVEL_KERN|DUMP_LEVEL_USED|DUMP_LEV EL_ALL)
> dh_page_size: 4096
> dh_memory_size: 524153
> dh_memory_start: c0000000
> dh_memory_end: 618f23ed
> dh_num_pages: 524153
> dh_panic_string: Compulsory dump(stat of mkexec was set as 2).
> dh_time: Tue Mar 10 17:48:00 2009
> dh_utsname_sysname: Linux
> dh_utsname_nodename: Assam
> dh_utsname_release: 2.6.12-5MKEXEC
> dh_utsname_version: #7 SMP Thu Mar 5 15:25:22 IST 2009
> dh_utsname_machine: i686
> dh_utsname_domainname: (none)
> dh_current_task: efc4a020
> dh_dump_compress: 0 (DUMP_COMPRESS_NONE)
> dh_dump_flags: 80000000 ()
> dh_dump_device: 0
> unknown page flag in dump: 2de
> found DUMP_DH_END
> <readmem: 8015564b, KVADDR, "x86_omit_frame_pointer", 8, (ROE), 7fbbe228>
> crash: seek error: kernel virtual address: 8015564b type: "x86_omit_frame_point er"
> <readmem: 804b1210, KVADDR, "xtime", 8, (FOE), 834d234>
> crash: seek error: kernel virtual address: 804b1210 type: "xtime"
> [root@Assam ~]#
>
> can you please help me in this.
Maybe, maybe not...
Seek errors are meant to indicate that, after the translation from
kernel virtual address to physical address to the dumpfile location
ended up with a dumpfile offset that was either:
(1) not accessible, or
(2) the physical page associated with the virtual address was not
found in the dumpfile.
I can't really help you with LKCD particulars, and like I mentioned
above, I don't know what the "mini kernel dump" version of LKCD is,
but I do note above that the dumpfile is being recognized as version
LKCD_DUMP_V8. And http://people.redhat.com/anderson/crash.changelog.html
contains this change to 4.0-2.17 that fixed something in your version
4.0-2.15:
4.0-2.17 - Fix to resurrect LKCD version 8 support, inadvertently broken in
4.0-2.15. (troy.heber(a)hp.com)
- Fix for "net -S" failures in certain 2.6 kernels that failed with
"net: cannot determine what an inet_sock structure is" message;
shows embedded sock structure instead of failing. (anonymous donor)
- Fix for erroneous "net -s" source/destination address and port
values in certain 2.6 kernels; added "net -s" source/destination
address and port values for IPv6 sockets. (anderson(a)redhat.com)
(12/16/05)
4.0-2.16 - Fix for the x86_64 backtrace code to search all of the exception
stacks for the origin of the active tasks' backtrace when the
information is not available in the dumpfile header. Up until now,
the search was made in the process stack, the per-cpu IRQ stack,
and the per-cpu NMI exception stack; this patch looks at all 3
exception stacks in 2.4 kernels (NMI, STACKFAULT and DOUBLEFAULT),
and all 5 exception stacks in 2.6 kernels (NMI, STACKFAULT,
DOUBLEFAULT, DEBUG and MCE).
- Fix to remove erroneous warning message re: the task cpu not being
the same as the IRQ or exception stack cpu, which was displayed when
doing a non-context-sensitive "bt -E" on an x86_64.
(12/12/05)
4.0-2.15 - Applied Kurt Rader's (kdrader(a)us.ibm.com) patch for SUSE SLES 9
"bigsmp" kernel LKCD dumpfiles, to fix "conflicting page" abort
caused by a dumpfile header that is larger than the formerly
hard-wired header size.
- Fix for ppc64-only segmentation violation when running "bt" on the
panic task when run against a dumpfile created by the diskdump
facility's new compressed format.
(12/02/05)
Perhaps upgrading to the latest version (4.0-7.7) will help?
Dave
> thanks in advance.
>
> regards,
> Seetharam
15 years, 7 months