crashdc: Memory issues when running in kexec context on SLES11
by Louis Bouchard
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello (again),
I have been able to get <insert your favorite name here> to run on
RHEL5, SLES10 and SLES11.
RHEL5 runs without any modification to the standard install.
SLES10 requires the crashkernel parameter to be increased from 64M to
128M so crash can run correctly.
My concern is with SLES11 which requires a non trivial modification of
the crashkernel parameter to correctly run.
When I use the standard crashkernel parameter, we've seen that crash
simply doesn't run. When I increase the reserved memory to 256M as
indicated by Bernhard, it runs but I get errors on pipe and some of the
commands do not complete.
When I use crashkernel=256M-:256M@16M everything works fine and I have
no problem with the pipes.
I got a "cat /proc/meminfo" executed right after execution of the script
on both RHEL and SLES11, but I do not see any evidence of memory limitation.
I have /proc/meminfo for all 3 runs if someone wants to have a look.
I feel that requiring 256M for crashkernel is a bit large, but I may be
wrong. What also puzzles me is the difference b/w SLES10 & SLES11.
TIA,
Kind Regards,
- --
Louis Bouchard, Linux Support Engineer
Team lead, EMEA Linux Competency Center,
Linux Ambassador, HP
HP Services 1 Ave du Canada
HP France Z.A. de Courtaboeuf
louis.bouchard(a)hp.com 91 947 Les Ulis
http://www.hp.com/go/linux France
http://www.hp.com/fr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkrMlJMACgkQDvqokHrhnCzZAwCglNp3dd0raoNvpwuNj67DaIXE
B20AnigVBnqhtY3MzZCSeF0Tyt2JjakE
=IUVg
-----END PGP SIGNATURE-----
15 years, 3 months
[ANNOUNCE] crash version 4.1.0 is available
by Dave Anderson
- Fix for s390x and x86 "extend" command regression created by the
"crash -x" option introducted in crash version 4.0.9. Without the
patch, the "extend" command on s390x and x86 machines fail with the
error message: "extend: <module>.so: not an ELF format object file".
(holzheu(a)linux.vnet.ibm.com, anderson(a)redhat.com)
- Cleanup of top-level source files to address compiler warnings
generated by the CFLAGS used in the Fedora build environment:
main.c ppc64.c tools.c symbols.c defs.h qemu-load.c qemu.c
xen_hyper_command.c xendump.c netdump.c s390_dump.c lkcd_common.c
remote.c cmdline.c x86_64.c net.c dev.c kernel.c task.c filesys.c
memory.c lkcd_x86_trace.c ppc64.c x86.c s390.c s390x.c s390dbf.c
Only two bugs (s390/s390x) were discovered as a result of this
exercise. The vast majority of the warnings were primarily benign
"may be used uninitialized in this function" false-positive warnings,
but were addressed nonetheless. A few "dereferencing type-punned
pointer will break strict-aliasing rules" warnings still exist, but a
fix attempt may prove more troublesome or dangerous than it's worth.
(anderson(a)redhat.com)
- Fix for "pte" command on s390 and s390x machines if the pte value
argument evaluates as not present. Without the patch, the command
would not display the pte value, but would either print random stack
data (if ASCII), or worse case, cause a segmentation violation.
(anderson(a)redhat.com)
- Allow command redirection to pipes or files when using gdb commands
alone on the command line without preceding the command string with
"gdb". Without the patch, the pipe/redirection data on the command
line would be appended to the command string passed to gdb, leading
to bizarre results when gdb attempts to evaluate the redirection
pieces of the command string.
(bob.montgomery(a)hp.com)
- Fix for the processing of bit fields on big endian systems in the
SIAL extension module. Without the patch, bits are not copied to
the correct position and are not shifted the right way.
(holzheu(a)linux.vnet.ibm.com)
- Fix for "dis -l" to properly display line-number information for
2.6.21 and later x86_64 kernel module text addresses. Without the
patch, a single erroneous file/line-number indication would be
displayed prior to the disassembly output, typically from the file
"include/linux/cpumask.h". This was due to an abnormal text block
descriptor from a function in hpet.c, which starts in the kernel
text segment and extends up into the vsyscall FIXMAP region,
effectively encompassing all kernel module address space.
(john.wright(a)hp.com)
- Related to the line number patch above, fix to prevent querying the
embedded gdb module for line numbers of kernel module text addresses
if the module's debuginfo data has not been loaded. Without the
patch, the same erroneous file/line-number could be displayed by
commands like "dis -l" or "bt -l" when a module's debuginfo data
has not been loaded, on 2.6.21 and later x86_64 kernels.
(anderson(a)redhat.com)
- Implemented a new "ps -G" option, which restricts the process status
output to show only the data of the thread group leader of a thread
group. The original request was to avoid the display of redundant
RSS data shared by many threads.
(anderson(a)redhat.com)
- Several fixes for the "repeat" command when used in conjunction
with an input file. Without the patch:
(1) Depending upon the command executed from the input file, a
a SIGINT would kill the command currently being executed from
the input file, but the "repeat" command would then restart it.
(2) If a command in the input file redirected its output to a pipe,
the repeat operation could stop prematurely after executing
that particular command.
(3) If a command in the input file redirected its output to a pipe,
the zombies of the command being piped to would not be cleaned
up until the repeat command was stopped.
(4) If the last command in the input file redirected its output to a
pipe, all subsequent executions of the input file would only
display the output of that last command.
(anderson(a)redhat.com)
- Added "trace.c" to the extensions subdirectory, where it will get
built automatically when "make extensions" is run from the top-level
source directory. The trace.so extension module has also been added
to the crash-extensions-<version>.rpm subpackage that is created
by the crash-<version>.src.rpm, which installs extension modules
in the /usr/lib[64]/crash/extensions directory.
(anderson(a)redhat.com)
- Fix for a potential failure to initialize the kmem slab cache
subsystem on 2.6.22 and later CONFIG_SLAB kernels if the dumpfile
has pages excluded by the makedumpfile facility. Without the patch,
the following error message would be displayed during initialization:
"crash: page excluded: kernel virtual address: <address> type:
kmem_cache_s buffer", followed by "crash: unable to initialize kmem
slab cache subsystem".
(anderson(a)redhat.com)
- Fix for a potential session initialization failure on x86_64 kernels
if the dumpfile has pages excluded by the makedumpfile facility.
Without the patch, the following error message would be displayed:
"crash: page excluded: kernel virtual address: <address> type:
tss_struct ist array".
(anderson(a)redhat.com)
- Fix for "kmem -z" option on 2.6.29 and later kernels. Without the
patch, against 2.6.29 and 2.6.30 kernels, the embedded zone VM_STAT
contents would not be displayed after the top line showing the SIZE,
PRESENT, /MIN/LOW/HIGH and FREE page counts; on 2.6.31 kernels, the
command would fail with the error message: "kmem: invalid (optional)
structure member offsets: zone_pages_min or zone_struct_pages_min".
(anderson(a)redhat.com)
- Fix for "irq" command on 2.6.29 and later CONFIG_SPARSE_IRQ kernels.
Without the patch, the "irq [number]" command would fail on x86_64
with the error message: "irq: x86_64_dump_irq: irq_desc[] does not
exist?", on ia64: "ia64_dump_irq: neither irq_desc or _irq_desc
exist", and on the other architectures: "irq: neither irq_desc nor
_irq_desc symbols exist".
(anderson(a)redhat.com)
- Fix for the "kmem -i" option on 2.6.31 kernels. Without the patch
the SHARED column may erroneously indicate 0 pages.
(anderson(a)redhat.com)
- Fix for the "kmem -i" option on 2.6.26 through 2.6.30 x86_64 kernels.
Without the patch, the swap page information would not be displayed,
and the error message "kmem: swap_info[0].swap_map at <address> is
unaccessible" would be displayed.
(anderson(a)redhat.com)
- Fix for "kmem -p" option on older 64-bit kernels that have a 32-bit
page.flags field. Without the patch, the page.count field in the
page structure would get merged with the page.flags field, and the
result displayed as a 64-bit value in the FLAGS column.
(anderson(a)redhat.com)
- Fix for "kmem -i" option on older 64-bit kernels whose page.count
unreferenced value was -1 (instead of 0). Without the patch, the
SHARED column would contain invalid values.
(anderson(a)redhat.com)
- Change the cursor location when cycling through the command history
when in "vi" editing mode (the default). When using the arrow keys,
or when using CTRL-n and CTRL-p, the cursor will be placed after the
last character in each line, and will be in "insert" mode. When
using ESC followed by j or k, the cursor will be placed on the last
character in the line, and will be in "command" mode. Without the
patch, the cursor would be placed on the first character in the line
regardless of the keys used to cycle through the history.
(anderson(a)redhat.com)
Download from: http://people.redhat.com/anderson
15 years, 3 months
Re: [Crash-utility] crashdc needs to change its name
by Dave Anderson
----- "Louis Bouchard" <louis.bouchard(a)hp.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello everyone,
>
> It did not occur to me that the name crashdc, which is identical to the
> Tru64's script name, might appear to some people as a reference to 9/11
> events at the pentagon.
>
> It was not obvious to me, a canadian living in France, but there is
> potential for a misunderstanding. So I decided to take the early
> opportunity to change the name to something else.
>
> Right now, I have a couple of potential names for it :
>
> - crashcoll
> - forensic
> - forensics
>
> crashcoll has a more "unix/linux" sounding name and its name refers to crash.
>
> forensic(s) are more related to the work that the script intend to do
> and the information it is meant to gather.
>
> None of them seem to have an existence in the Linux realm.
>
> Does someone have comment/suggestions ?
Wow -- I would never have even made that connection. Did somebody actually
bring that up to you or to HP?
IMHO I think you're being way too oversensitive...
Dave
15 years, 3 months
crashdc needs to change its name
by Louis Bouchard
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello everyone,
It did not occur to me that the name crashdc, which is identical to the
Tru64's script name, might appear to some people as a reference to 9/11
events at the pentagon.
It was not obvious to me, a canadian living in France, but there is
potential for a misunderstanding. So I decided to take the early
opportunity to change the name to something else.
Right now, I have a couple of potential names for it :
- crashcoll
- forensic
- forensics
crashcoll has a more "unix/linux" sounding name and its name refers to
crash.
forensic(s) are more related to the work that the script intend to do
and the information it is meant to gather.
None of them seem to have an existence in the Linux realm.
Does someone have comment/suggestions ?
Kind Regards,
- --
Louis Bouchard, Linux Support Engineer
Team lead, EMEA Linux Competency Center,
Linux Ambassador, HP
HP Services 1 Ave du Canada
HP France Z.A. de Courtaboeuf
louis.bouchard(a)hp.com 91 947 Les Ulis
http://www.hp.com/go/linux France
http://www.hp.com/fr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkrMjDUACgkQDvqokHrhnCwtxQCeNpyjVGI6HmrpyO6ahHQXKxrI
svoAn2MsvXwOzB1uKoOhjXvKeF9lcZTD
=535i
-----END PGP SIGNATURE-----
15 years, 3 months
Re: Symbol info from System.map vs. DLKM-debuginfo
by Dave Anderson
----- "Alex Sidorenko" <asid(a)hp.com> wrote:
> Hi Dave,
>
> while working on an NFS problem (Ubuntu/Hardy) I needed to get definitions
> of 'struct svc_cacherep'. I have compiled nfsd.ko with debugging for an older
> kernel.
>
> Current kernel: 2.6.24-24-generic
> nfsd.ko: 2.6.24-21-generic
>
> The definition of 'struct svc_cacherep' is the same. To avoid rebuilding
> nfsd.ko I decided to specify /boot/System.map-2.6.24-generic explicitly.
> Before loading 'nfsd.ko' the address of 'lru_cache' variable is correct, but
> after loading old nfsd.ko it changes:
>
> crash32> sym lru_head
> f908b988 (b) lru_head
> crash32> mod -s nfsd
> /usr/lib/debug/lib/modules/2.6.24-21-generic/nfsd.ko
> MODULE NAME SIZE OBJECT FILE
> f908a280 nfsd 228976 /usr/lib/debug/lib/modules/2.6.24-21-generic/nfsd.ko
> crash32> sym lru_head
> f908b984 (b) lru_head
>
> Is this expected? It would be nice to be able to give precedence to symbols as
> defined in System.map (maybe a startup option or internal set option).
I'm not sure I completely understand, but there shouldn't be any reference at
all in your System.map file to any kernel module symbols, i.e, it should only
contain symbols from the vmlinux file. So the System.map file should have no
bearing upon the change in the nfsd module's "lru_head" address.
What has happened is that the original "lru_head" address of f908b988 came
from the address that was exported to the kernel from the nfsd.ko module
when the module was insmod'd. (i.e., having nothing to do with the crash utility)
Then the crash utility was then brought up, and it found that f908b988 address
was exported to the kernel via the "good" nfsd.ko module structure data. The
first "sym lru_head" command shows that address.
However, then you loaded (via an internal "add-symbol-file" gdb operation) the
older nfsd.ko module's information into the crash utility -- and it overwrote the
original symbol value data with that from the (non-matching) nfsd.ko that contains
the incorrect f908b984 address. So the crash utility is just doing what you told
it to do. There's no support for loading a module that doesn't match reality, and
continuing to use the old (correct) symbol addresses.
Dave
15 years, 3 months
Symbol info from System.map vs. DLKM-debuginfo
by Alex Sidorenko
Hi Dave,
while working on an NFS problem (Ubuntu/Hardy) I needed to get definitions
of 'struct svc_cacherep'. I have compiled nfsd.ko with debugging for an older
kernel.
Current kernel: 2.6.24-24-generic
nfsd.ko: 2.6.24-21-generic
The definition of 'struct svc_cacherep' is the same. To avoid rebuilding
nfsd.ko I decided to specify /boot/System.map-2.6.24-generic explicitly.
Before loading 'nfsd.ko' the address of 'lru_cache' variable is correct, but
after loading old nfsd.ko it changes:
crash32> sym lru_head
f908b988 (b) lru_head
crash32> mod -s nfsd /usr/lib/debug/lib/modules/2.6.24-21-generic/nfsd.ko
MODULE NAME SIZE OBJECT FILE
f908a280 nfsd
228976 /usr/lib/debug/lib/modules/2.6.24-21-generic/nfsd.ko
crash32> sym lru_head
f908b984 (b) lru_head
Is this expected? It would be nice to be able to give precedence to symbols as
defined in System.map (maybe a startup option or internal set option).
Regards,
Alex
--
------------------------------------------------------------------
Alexandre Sidorenko email: asid(a)hp.com
WTEC Linux Hewlett-Packard (Canada)
------------------------------------------------------------------
15 years, 3 months
Re: crash-4.0.9 cannot load extensions on i686
by Dave Anderson
----- "Alex Sidorenko" <asid(a)hp.com> wrote:
> On October 5, 2009 01:04:43 pm Dave Anderson wrote:
> > Thanks -- I'll take a look. It appears that the extension module
> > is passing this test in is_shared_object():
> >
> > if (elf64->e_ident[EI_CLASS] == ELFCLASS64 ...
> >
> > What does "readelf -a" show for the echo.so file? (I don't have
> > a handy x86 machine readily available at the moment...)
>
> Hi Dave,
>
> I have attached both the output of "readelf -a" and 32-bit echo.so
> file.
>
> Cheers,
> Alex
4.0.9 regression -- patch attached.
Thanks for the report,
Dave
15 years, 3 months
Re: crash-4.0.9 cannot load extensions on i686
by Dave Anderson
----- "Alex Sidorenko" <asid(a)hp.com> wrote:
> Hi Dave,
>
> I have found that crash-4.0.9 cannot load any extensions on 32-bit hosts, even
> echo.so. Running crash with -d5 on a live 32-bit kernel I can see that it
> fails as
>
> crash> extend extensions/echo.so
> extend: ./extensions/echo.so: machine type mismatch: 3
> extend: ./extensions/echo.so: not an ELF format object file
>
> if (!is_shared_object(ext->filename)) {
> error(INFO, "%s: not an ELF format object file\n",
>
>
> Something is wrong in is_shared_object() logic on IA32. It is interesting that
> there are no problems with crash-4.0.9 and extensions on 64-bit systems. As
> most of us are running 64 bits, this probably explains why nobody reported
> this bug yet.
Thanks -- I'll take a look. It appears that the extension module
is passing this test in is_shared_object():
if (elf64->e_ident[EI_CLASS] == ELFCLASS64 ...
What does "readelf -a" show for the echo.so file? (I don't have
a handy x86 machine readily available at the moment...)
Dave
15 years, 3 months
Re: crash-4.0.9 cannot load extensions on i686
by Alex Sidorenko
Hi Dave,
I have found that crash-4.0.9 cannot load any extensions on 32-bit hosts, even
echo.so. Running crash with -d5 on a live 32-bit kernel I can see that it
fails as
crash> extend extensions/echo.so
extend: ./extensions/echo.so: machine type mismatch: 3
extend: ./extensions/echo.so: not an ELF format object file
if (!is_shared_object(ext->filename)) {
error(INFO, "%s: not an ELF format object file\n",
Something is wrong in is_shared_object() logic on IA32. It is interesting that
there are no problems with crash-4.0.9 and extensions on 64-bit systems. As
most of us are running 64 bits, this probably explains why nobody reported
this bug yet.
Regards,
Alex
------------------------------------------------------------------
Alexandre Sidorenko email: asid(a)hp.com
WTEC Linux Hewlett-Packard (Canada)
------------------------------------------------------------------
--
------------------------------------------------------------------
Alexandre Sidorenko email: asid(a)hp.com
WTEC Linux Hewlett-Packard (Canada)
------------------------------------------------------------------
15 years, 3 months
irq: x86_64_dump_irq: irq_desc[] does not exist?
by CAI Qian
Hallo!
When analyzing VMCores came from 2.6.31 kernel, irq sub-command is not
working.
crash> irq
irq: x86_64_dump_irq: irq_desc[] does not exist?
Is that because of the irq data structures changed in kernel from static
to dynamic allocated?
Thanks!
CAI Qian
15 years, 3 months